Replication losing data when diskchunk_flush_write_timeout enabled #3061

donhardman · 2025-02-07T11:25:37Z

Bug Description:

There is an issue with sharding logic. After investigation, we found that when creating a cluster with 2 nodes and configuring 3 shards on it, in most cases we encounter a "Waiting timeout exceed" error by default. Further investigation revealed that the issue is not related to buddy allocation but rather the "diskchunk_flush_write_timeout" setting. When we set "diskchunk_flush_write_timeout = -1" in the configuration, everything works perfectly without issues. However, when we leave it unset or set it to "diskchunk_flush_write_timeout = 1", the problem persists. After a deep analysis of logs, we discovered that in frequent cases where concurrent insert and update operations occur on the same table across different nodes, this setting causes some keys to be lost, which prevents sharding from working properly.

We should use CLT to reproduce it because without it I was unable to reproduce due to finding another issue: #3048

We should get the test from branch test/test-drop-sharded-table. Here's how to run it:

clt test -d -t test/clt-tests/sharding/mre.rec ghcr.io/manticoresoftware/manticoresearch:test-kit-latest

We should not see waiting timeout exceeded. When we update the config with disabled diskchunk, everything works fine.

We can modify the config test/clt-tests/base/searchd-with-flexible-ports.conf to set it or unset. Currently the diskchunk_flush_write_timeout disabled!!!

Manticore Search Version:

Latest dev version

Operating System Version:

Ubuntu

Have you tried the latest development version?

None

Internal Checklist:

^{To be completed by the assignee. Check off tasks that have been completed or are not applicable.}

The text was updated successfully, but these errors were encountered:

tomatolog · 2025-02-13T22:15:47Z

I can not reproduce the issue.

I switched to the branch test/test-drop-sharded-table and issue

clt/clt test -d -t test/clt-tests/sharding/mre.rec ghcr.io/manticoresoftware/manticoresearch:test-kit-latest

the last line of the output looks like

mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:tbl1(id bigint) shards=3 rf=2 timeout=5;"; echo $?;
––– output –––
ERROR 1064 (42000) at line 1: P03: syntax error, unexpected $undefined, expecting $end near ':tbl1(id bigint) shards=3 rf=2 timeout=5'
1
––– duration: 7ms (0.03%) –––

not sure what to expect there and what should I investigate as I see no daemon logs at the dev box.

Then I set diskchunk_flush_write_timeout = 1 at the test/clt-tests/base/searchd-with-flexible-ports.conf

I got errors

––– input –––
mysql -h0 -P1306 -e "create cluster ${CLUSTER_NAME}"
––– output –––
+ ERROR 1064 (42000) at line 1: can not create cluster 'c': (null)
––– input –––
mysql -h0 -P1306 -e "show status like 'cluster_${CLUSTER_NAME}_status'\G"
––– output –––
- *************************** 1. row ***************************
- Counter: cluster_#!/[a-z]+/!#_status
- Value: primary
––– input –––
for n in `seq 2 $INSTANCE`; do mysql -h0 -P${n}306 -e "join cluster ${CLUSTER_NAME} at '127.0.0.1:1312'"; done;
––– output –––
+ ERROR 1064 (42000) at line 1: cluster 'c', no nodes available(127.0.0.1:1312), error: '127.0.0.1:1312': retries limit exceeded
––– input –––
mysql -h0 -P${INSTANCE}306 -e "show status like 'cluster_${CLUSTER_NAME}_status'\G"
––– output –––
- *************************** 1. row ***************************
- Counter: cluster_#!/[a-z]+/!#_status
- Value: primary
––– input –––
mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:tbl1(id bigint) shards=3 rf=2 timeout=5;"; echo $?;
––– output –––
- 0
+ ERROR 1064 (42000) at line 1: Cluster 'c' does not exist
-
+ 1
+

donhardman · 2025-02-14T05:44:22Z

Can you provide the FULL log that you see? As I can see, something went wrong even when we created the cluster, so nothing else worked after. To understand, I need to see the full log. I tried to use the latest test kit and run the test; for me, the cluster is created with no issues, but it's still reproducible.

tomatolog · 2025-02-14T07:57:53Z

here is a full log with diskchunk_flush_write_timeout = 1 set

stas@dev2:~/manticore$ cat test/clt-tests/sharding/mre.rep
You can use regex in the output sections.
More info here: https://github.com/manticoresoftware/clt#refine
Time taken for test: 4602ms
––– input –––
export INSTANCE=1
––– output –––
––– duration: 3ms (0.07%) –––
––– input –––
mkdir -p /var/{run,lib,log}/manticore-${INSTANCE}
––– output –––
––– duration: 82ms (1.78%) –––
––– input –––
stdbuf -oL searchd -c test/clt-tests/base/searchd-with-flexible-ports.conf | grep -v precach
––– output –––
Manticore 7.0.1 763f4a0b9@25013111 dev (columnar 2.3.1 0be00ac@25012210) (secondary 2.3.1 0be00ac@25012210) (knn 2.3.1 0be00ac@25012210)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
[Fri Feb 14 07:55:15.570 2025] [49] using config file '/.clt/test/clt-tests/base/searchd-with-flexible-ports.conf' (448 chars)...
starting daemon version '7.0.1 763f4a0b9@25013111 dev (columnar 2.3.1 0be00ac@25012210) (secondary 2.3.1 0be00ac@25012210) (knn 2.3.1 0be00ac@25012210)' ...
listening on 127.0.0.1:1306 for mysql
listening on 127.0.0.1:1312 for sphinx and http(s)
listening on 127.0.0.1:1308 for sphinx and http(s)
––– duration: 653ms (14.19%) –––
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore-${INSTANCE}/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
Accepting connections!
––– duration: 32ms (0.70%) –––
––– input –––
export INSTANCE=2
––– output –––
––– duration: 3ms (0.07%) –––
––– input –––
mkdir -p /var/{run,lib,log}/manticore-${INSTANCE}
––– output –––
––– duration: 3ms (0.07%) –––
––– input –––
stdbuf -oL searchd -c test/clt-tests/base/searchd-with-flexible-ports.conf | grep -v precach
––– output –––
Manticore 7.0.1 763f4a0b9@25013111 dev (columnar 2.3.1 0be00ac@25012210) (secondary 2.3.1 0be00ac@25012210) (knn 2.3.1 0be00ac@25012210)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
[Fri Feb 14 07:55:15.776 2025] [96] using config file '/.clt/test/clt-tests/base/searchd-with-flexible-ports.conf' (448 chars)...
starting daemon version '7.0.1 763f4a0b9@25013111 dev (columnar 2.3.1 0be00ac@25012210) (secondary 2.3.1 0be00ac@25012210) (knn 2.3.1 0be00ac@25012210)' ...
listening on 127.0.0.1:2306 for mysql
listening on 127.0.0.1:2312 for sphinx and http(s)
listening on 127.0.0.1:2308 for sphinx and http(s)
––– duration: 139ms (3.02%) –––
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore-${INSTANCE}/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
Accepting connections!
––– duration: 8ms (0.17%) –––
––– input –––
export CLUSTER_NAME=c
––– output –––
––– duration: 3ms (0.07%) –––
––– input –––
mysql -h0 -P1306 -e "create cluster ${CLUSTER_NAME}"
––– output –––
ERROR 1064 (42000) at line 1: can not create cluster 'c': (null)
––– duration: 356ms (7.74%) –––
––– input –––
mysql -h0 -P1306 -e "show status like 'cluster_${CLUSTER_NAME}_status'\G"
––– output –––
––– duration: 1757ms (38.18%) –––
––– input –––
for n in `seq 2 $INSTANCE`; do mysql -h0 -P${n}306 -e "join cluster ${CLUSTER_NAME} at '127.0.0.1:1312'"; done;
––– output –––
ERROR 1064 (42000) at line 1: cluster 'c', no nodes available(127.0.0.1:1312), error: '127.0.0.1:1312': retries limit exceeded
––– duration: 1542ms (33.51%) –––
––– input –––
mysql -h0 -P${INSTANCE}306 -e "show status like 'cluster_${CLUSTER_NAME}_status'\G"
––– output –––
––– duration: 10ms (0.22%) –––
––– input –––
mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:tbl1(id bigint) shards=3 rf=2 timeout=5;"; echo $?;
––– output –––
ERROR 1064 (42000) at line 1: P03: syntax error, unexpected $undefined, expecting $end near ':tbl1(id bigint) shards=3 rf=2 timeout=5'
1
––– duration: 11ms (0.24%) –––

here is a full log with diskchunk_flush_write_timeout = -1 set

stas@dev2:~/manticore$ cat test/clt-tests/sharding/mre.rep
You can use regex in the output sections.
More info here: https://github.com/manticoresoftware/clt#refine
Time taken for test: 3961ms
––– input –––
export INSTANCE=1
––– output –––
––– duration: 1ms (0.03%) –––
––– input –––
mkdir -p /var/{run,lib,log}/manticore-${INSTANCE}
––– output –––
––– duration: 3ms (0.08%) –––
––– input –––
stdbuf -oL searchd -c test/clt-tests/base/searchd-with-flexible-ports.conf | grep -v precach
––– output –––
Manticore 7.0.1 763f4a0b9@25013111 dev (columnar 2.3.1 0be00ac@25012210) (secondary 2.3.1 0be00ac@25012210) (knn 2.3.1 0be00ac@25012210)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
[Fri Feb 14 07:57:20.506 2025] [49] using config file '/.clt/test/clt-tests/base/searchd-with-flexible-ports.conf' (449 chars)...
starting daemon version '7.0.1 763f4a0b9@25013111 dev (columnar 2.3.1 0be00ac@25012210) (secondary 2.3.1 0be00ac@25012210) (knn 2.3.1 0be00ac@25012210)' ...
listening on 127.0.0.1:1306 for mysql
listening on 127.0.0.1:1312 for sphinx and http(s)
listening on 127.0.0.1:1308 for sphinx and http(s)
––– duration: 134ms (3.38%) –––
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore-${INSTANCE}/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
Accepting connections!
––– duration: 8ms (0.20%) –––
––– input –––
export INSTANCE=2
––– output –––
––– duration: 3ms (0.08%) –––
––– input –––
mkdir -p /var/{run,lib,log}/manticore-${INSTANCE}
––– output –––
––– duration: 3ms (0.08%) –––
––– input –––
stdbuf -oL searchd -c test/clt-tests/base/searchd-with-flexible-ports.conf | grep -v precach
––– output –––
Manticore 7.0.1 763f4a0b9@25013111 dev (columnar 2.3.1 0be00ac@25012210) (secondary 2.3.1 0be00ac@25012210) (knn 2.3.1 0be00ac@25012210)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)
[Fri Feb 14 07:57:20.658 2025] [96] using config file '/.clt/test/clt-tests/base/searchd-with-flexible-ports.conf' (449 chars)...
starting daemon version '7.0.1 763f4a0b9@25013111 dev (columnar 2.3.1 0be00ac@25012210) (secondary 2.3.1 0be00ac@25012210) (knn 2.3.1 0be00ac@25012210)' ...
listening on 127.0.0.1:2306 for mysql
listening on 127.0.0.1:2312 for sphinx and http(s)
listening on 127.0.0.1:2308 for sphinx and http(s)
––– duration: 137ms (3.46%) –––
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore-${INSTANCE}/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
Accepting connections!
––– duration: 6ms (0.15%) –––
––– input –––
export CLUSTER_NAME=c
––– output –––
––– duration: 3ms (0.08%) –––
––– input –––
mysql -h0 -P1306 -e "create cluster ${CLUSTER_NAME}"
––– output –––
ERROR 1064 (42000) at line 1: can not create cluster 'c': (null)
––– duration: 10ms (0.25%) –––
––– input –––
mysql -h0 -P1306 -e "show status like 'cluster_${CLUSTER_NAME}_status'\G"
––– output –––
––– duration: 1890ms (47.72%) –––
––– input –––
for n in `seq 2 $INSTANCE`; do mysql -h0 -P${n}306 -e "join cluster ${CLUSTER_NAME} at '127.0.0.1:1312'"; done;
––– output –––
ERROR 1064 (42000) at line 1: cluster 'c', no nodes available(127.0.0.1:1312), error: '127.0.0.1:1312': retries limit exceeded
––– duration: 1726ms (43.57%) –––
––– input –––
mysql -h0 -P${INSTANCE}306 -e "show status like 'cluster_${CLUSTER_NAME}_status'\G"
––– output –––
––– duration: 10ms (0.25%) –––
––– input –––
mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:tbl1(id bigint) shards=3 rf=2 timeout=5;"; echo $?;
––– output –––
ERROR 1064 (42000) at line 1: Cluster 'c' does not exist
1
––– duration: 27ms (0.68%) –––

donhardman · 2025-02-14T08:16:10Z

Looks like it used a very old daemon that had problems with cluster creation and was unable to create a cluster with a null error returned.

Try to update the docker image and run again, it should help:

docker pull ghcr.io/manticoresoftware/manticoresearch:test-kit-latest

I'm using the February version of the daemon; in the log I can see some old version of the daemon from January.

donhardman · 2025-02-20T03:26:03Z

Here's the full log with --logreplication enabled from 2 nodes when the issue happens. Please take a look into it and let me know if you see something or no

mre.txt

tomatolog · 2025-02-20T13:22:20Z

it could be better to enable searchd.query_log_commands=1 and searchd.log_http = query_http.log and provide these files along with searchd.log from both nodes with timing for all requests into daemon. As from the mre.txt you provided I see clusters work well without any issues and the only failure is the buddy query

mysql -h0 -P1306 -e "create table ${CLUSTER_NAME}:tbl1(id bigint) shards=3 rf=2 timeout=5;"; echo $?;
––– output –––
ERROR 1064 (42000) at line 1: Waiting timeout exceeded.

but it is not clear why buddy failed to wait the command. What command buddy failed to wait. Рow much time has passed since the start of the command till the timeout happens in the buddy.

donhardman · 2025-02-21T03:10:38Z

We are looking here for a case where data has disappeared in a clustered table (only one row gone in some cases). That's why it's kind of hard to reproduce.

What we are looking here is the magic disappearance of the ROW with key = 'master'. If we look in logs, we see that there are insert and updates for key = 'master'. While after we got waiting timeout on buddy side after creating sharded table, we can see that there is no such row in table anymore. This is the issue with diskchunk enabled.

Here are the full logs with all info that hopefully will help to understand why this row is getting removed from replicated table.

mre.txt

sanikolaev · 2025-03-04T05:47:59Z

I'll try to reproduce.

sanikolaev · 2025-03-04T07:06:23Z

@donhardman I couldn't reproduce the issue via github here https://github.com/manticoresoftware/manticoresearch/actions/runs/13647861205 after 10 attempts. The workflow used is located here https://github.com/manticoresoftware/manticoresearch/blob/08d0023a23c5f3b8d21cffcc9f80cecfebdd3501/.github/workflows/mre.yml

donhardman · 2025-03-04T07:48:54Z

It was not reproducible due to us setting diskchunk_flush_write_timeout = -1 in config. I removed it, now it's reproducible which proves to us that this parameter is the root cause.

sanikolaev · 2025-03-04T18:53:30Z

I've reproduced the issue without CLT in https://github.com/manticoresoftware/manticoresearch/blob/test/test-drop-sharded-table/.github/workflows/mre.yml with extra logging.

@tomatolog please investigate the failure here https://github.com/manticoresoftware/manticoresearch/actions/runs/13660719391/job/38190978513

The failure is:

+ mysql -h0 -P1306 -e create table c:tbl1(id bigint) shards=2 rf=2 timeout=5
ERROR 1064 (42000) at line 1: Waiting timeout exceeded.

Below that you'll find:

searchd log in debug mode
query log in query_log_commands=1 mode
http log

sanikolaev · 2025-03-04T18:55:49Z

I also confirm adding:

diskchunk_flush_write_timeout = -1

fixes the issue.

sanikolaev · 2025-03-04T19:21:42Z

I could also reproduce it via docker on dev2 after tens of attempts, but I could reproduce w/o docker on Github (i.e. in a clean runner) https://github.com/manticoresoftware/manticoresearch/actions/runs/13661200658 .

sanikolaev · 2025-03-05T03:11:19Z

I was able to reproduce the issue on perf3. It seems that the key to triggering it is faster disk chunk flushing, like when saving to an SSD. Here’s how you can reproduce it on perf3:

root@perf3 ~ # cat s.sh
set -x
mkdir -p /tmp/1/data
cat << EOF > /tmp/1/manticore.conf
searchd {
  listen = 1306:mysql41
  listen = 1312
  log = /tmp/1/searchd.log
  pid_file = /tmp/1/searchd.pid
  data_dir = /tmp/1/data
  query_log = /tmp/1/query.log
  buddy_path = manticore-executor -n /usr/share/manticore/modules/manticore-buddy/src/main.php --debug
  log_http = /tmp/1/http.log
}
EOF
searchd -c /tmp/1/manticore.conf  --logreplication
mkdir -p /tmp/2/data
cat << EOF > /tmp/2/manticore.conf
searchd {
  listen = 2306:mysql41
  listen = 2312
  log = /tmp/2/searchd.log
  pid_file = /tmp/2/searchd.pid
  data_dir = /tmp/2/data
  query_log = /tmp/2/query.log
  buddy_path = manticore-executor -n /usr/share/manticore/modules/manticore-buddy/src/main.php --debug
  log_http = /tmp/2/http.log
}
EOF
searchd -c /tmp/2/manticore.conf  --logreplication
sleep 1
mysql -h0 -P1306 -e "create cluster c"
mysql -h0 -P2306 -e "join cluster c at '127.0.0.1:1312'"
mysql -h0 -P1306 -e "create table c:tbl1(id bigint) shards=2 rf=2"

echo "--------------------------------"
cat /tmp/1/query.log
echo "--------------------------------"
cat  /tmp/2/query.log
echo "--------------------------------"
cat  /tmp/1/searchd.log
echo "--------------------------------"
cat  /tmp/2/searchd.log
echo "--------------------------------"
cat  /tmp/1/http.log
echo "--------------------------------"
cat  /tmp/2/http.log

root@perf3 ~ # docker run --rm -it -v $(pwd)/s.sh:/tmp/s.sh ghcr.io/manticoresoftware/manticoresearch:test-kit-latest bash /tmp/s.sh|grep exceeded
ERROR 1064 (42000) at line 1: Waiting timeout exceeded.
/* Wed Mar 05 03:08:30.403 2025 conn 14 (127.0.0.1:55868) */ create table c:tbl1(id bigint) shards=2 rf=2 # error=Waiting timeout exceeded.

donhardman · 2025-03-05T12:53:46Z

Maybe a sketch of fixes made by AI would be helpful and help to understand possible solutions for it in a clustered environment. Can look into it to understand if it fits or not (but probably also need to fix some code): #3166

sanikolaev · 2025-03-06T08:40:39Z

The issue can be also reproduced with:

mysql -h0 -P1306 -e 'create table c:tbl1(id bigint) shards=1 rf=2'

Notice, shards=1 instead of shards=2.

tomatolog · 2025-03-10T15:06:55Z

the issue seems that all nodes has the same server_id and uuid seed if I set server_id to uniq number the is no such error anymore.
At he docker environment or local daemon runs at the same box both nodes started with the script s.sh have records
node1

[Mon Mar 10 11:25:38.538 2025] [12] DEBUG: MAC address 02:42:ac:11:00:02 for uuid-short server_id
[Mon Mar 10 11:25:38.538 2025] [12] DEBUG: uid-short server_id 114, started 184937138, seed 8217668450634432512

node2

[Mon Mar 10 11:25:38.647 2025] [51] DEBUG: MAC address 02:42:ac:11:00:02 for uuid-short server_id
[Mon Mar 10 11:25:38.647 2025] [51] DEBUG: uid-short server_id 114, started 184937138, seed 8217668450634432512

That cause auto-id generate the same sequences at both nodes.

node2 issue

INSERT INTO c:system.sharding_state
 (`key`, `value`, `updated_at`)
VALUES
('node:127.0.0.1:32312', '[{"seen_at":1741609553}]', 1741609553)
;

while the original document was just replicated from the node1 that is why that document got replaced on both nodes.

tomatolog · 2025-03-10T15:08:52Z

need to make sure that nodes at the shards has uniq server_id prior to using these nodes

tomatolog · 2025-03-10T15:21:20Z

or we could fail cluster join if the server_id is set with the same value or server_id is auto initialized from the MAC address and is the same as all other nodes in the cluster

sanikolaev · 2025-03-10T15:24:14Z

or we could fail cluster join if the server_id is set with the same value or server_id is auto initialized from the MAC address and is the same as all other nodes in the cluster

Good idea!

tomatolog · 2025-03-10T15:27:15Z

Not quite sure why INSERT into does not fail in this case as usual if I issue at the both nodes of the cluster

mysql -h0 -P 31306 -e "insert into t:t1 (id) values (2);" & mysql -h0 -P 32306 -e "insert into t:t1 (id) values (2);" &

I always get one node succeeds and other node error

ERROR 1064 (42000) at line 1: error at PostRollback, code 3 (transaction aborted, server can continue), seqno 3

ie Galera properly checks for conflicts and does not allow document that conflicts with the running transaction. Maybe it is a Galera bug and it was fixed in the recent versions

tomatolog · 2025-03-10T17:14:40Z

the issue that caused this case is #3186

added check of the server_id on cluster join to make sure all nodes has uniq server_id in the cluster; set replication protocol version to 1.10; fixed #3186 #3061

tomatolog · 2025-03-11T11:19:24Z

fixed at a3e3c4b default server_id use MAC along with PID file path to make sure multiple daemons started at the same node have different server_id

and also added a check of the server_id on join cluster statement to make sure all nodes in the cluster has uniq server_id

You could also set searchd.server_id to uniq value at all nodes to get this issue fixed at the daemon versions prior to this fix

sanikolaev · 2025-03-12T07:17:50Z

I can't reproduce the issue anymore neither in https://github.com/manticoresoftware/manticoresearch/actions/runs/13661200658 nor with ghcr.io/manticoresoftware/manticoresearch:test-kit-latest

donhardman · 2025-03-24T13:03:11Z

Closing as done cuz it's not reproducible anymore

sanikolaev · 2025-03-25T11:57:51Z

Reopening to complete the checklist (tests, changelog).

sanikolaev · 2025-03-28T05:17:55Z

The checklist is complete. Closing. A test will be implemented within #3186

donhardman added the bug label Feb 7, 2025

donhardman assigned tomatolog Feb 7, 2025

donhardman mentioned this issue Feb 11, 2025

Created test-drop-sharded-table.rec #3012

Merged

tomatolog assigned donhardman and unassigned tomatolog Feb 13, 2025

donhardman assigned tomatolog and unassigned donhardman Feb 20, 2025

tomatolog assigned donhardman and unassigned tomatolog Feb 20, 2025

donhardman assigned tomatolog and unassigned donhardman Feb 21, 2025

donhardman mentioned this issue Mar 4, 2025

Distributed inserts (_bulk, replace, update) #3073

Merged

sanikolaev assigned sanikolaev and unassigned tomatolog Mar 4, 2025

sanikolaev mentioned this issue Mar 4, 2025

MRE for #3061 #3164

Closed

sanikolaev assigned donhardman and unassigned sanikolaev Mar 4, 2025

sanikolaev assigned tomatolog and unassigned donhardman Mar 4, 2025

tomatolog added a commit that referenced this issue Mar 11, 2025

added PID file hash into default server_id

a3e3c4b

added check of the server_id on cluster join to make sure all nodes has uniq server_id in the cluster; set replication protocol version to 1.10; fixed #3186 #3061

tomatolog assigned donhardman and unassigned tomatolog Mar 11, 2025

donhardman closed this as completed Mar 24, 2025

sanikolaev reopened this Mar 25, 2025

sanikolaev closed this as completed Mar 28, 2025

Uh oh!

Replication losing data when diskchunk_flush_write_timeout enabled #3061

Replication losing data when diskchunk_flush_write_timeout enabled #3061

Comments

donhardman commented Feb 7, 2025 • edited by sanikolaev Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug Description:

Manticore Search Version:

Operating System Version:

Have you tried the latest development version?

Internal Checklist:

tomatolog commented Feb 13, 2025

Uh oh!

donhardman commented Feb 14, 2025

Uh oh!

tomatolog commented Feb 14, 2025

Uh oh!

donhardman commented Feb 14, 2025

Uh oh!

donhardman commented Feb 20, 2025

Uh oh!

tomatolog commented Feb 20, 2025

Uh oh!

donhardman commented Feb 21, 2025

Uh oh!

sanikolaev commented Mar 4, 2025

Uh oh!

sanikolaev commented Mar 4, 2025

Uh oh!

donhardman commented Mar 4, 2025

Uh oh!

sanikolaev commented Mar 4, 2025

Uh oh!

sanikolaev commented Mar 4, 2025

Uh oh!

sanikolaev commented Mar 4, 2025

Uh oh!

sanikolaev commented Mar 5, 2025

Uh oh!

donhardman commented Mar 5, 2025

Uh oh!

sanikolaev commented Mar 6, 2025

Uh oh!

tomatolog commented Mar 10, 2025

Uh oh!

tomatolog commented Mar 10, 2025

Uh oh!

tomatolog commented Mar 10, 2025

Uh oh!

sanikolaev commented Mar 10, 2025

Uh oh!

tomatolog commented Mar 10, 2025

Uh oh!

tomatolog commented Mar 10, 2025

Uh oh!

tomatolog commented Mar 11, 2025

Uh oh!

sanikolaev commented Mar 12, 2025

Uh oh!

donhardman commented Mar 24, 2025

Uh oh!

sanikolaev commented Mar 25, 2025

Uh oh!

sanikolaev commented Mar 28, 2025

Uh oh!

donhardman commented Feb 7, 2025 •

edited by sanikolaev

Loading