Bug #34050 | I/O thread disconnects/killed when replicating under load with partition tables | ||
---|---|---|---|
Submitted: | 25 Jan 2008 4:40 | Modified: | 25 Jul 2008 19:31 |
Reporter: | Omer Barnir (OCA) | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.1.23 | OS: | Any |
Assigned to: | Mats Kindahl | CPU Architecture: | Any |
[25 Jan 2008 4:40]
Omer Barnir
[30 Jan 2008 0:56]
Omer Barnir
How to repeat ============= 1) download the attached tar.gz file and extract it in the mysql-test directory 2) Start the server with: perl ./mysql-test-run.pl --suite=rpl --do-test=rpl_alter --mysqld=--innodb --start-and-exit note: 'rpl_alter' is used so both master and slave will be stareted. It is not related to the test itself) 3) Using the client log into the slave and initiate a 'start slave' command >> Verify using 'show slave status' that replication is running 4) Start the 'stress test' using the following command perl ./mysql-test-run.pl --extern --stress --stress-init-file=rpl_init.txt --stress-test-file=rpl_sys_test.txt --stress-threads=100 --stress-test-duration=600 --user=root --socket=<path_to_mysql-test_dir>/var/tmp/master.sock This will start a 10 minute stress test with 100 concurrent connections (on the screen you will see messages like test_loop[0:0 0:4708]: TID 10 test: 'rpl_row_sys_pinsdel2' Errors: No Errors. Test Passed OK 5) Once the test is completed, check the slave.err file in the var/log directory. You will notice the I/O thread disconnecting and reconnecting a few times and then killed permanently. If the same test is run when the tables are not partitioned (see t/rpl_setup.test for more details) the disconnects are not observed. The replication 'mode' does not seem to affect this test case
[30 Jan 2008 0:57]
Omer Barnir
test files for load test case
Attachment: files.tar.gz (application/x-gzip, text), 3.57 KiB.
[1 Feb 2008 0:32]
Omer Barnir
To clarify: The problem was reported with a test run against 5.1.23 but is also observed when running this test against 5.1.22
[25 Jul 2008 14:25]
MySQL Verification Team
failed to repeat bug by calling the sp in 100 threads.t got no disconnections. see attached for all infos.
Attachment: bug34050_not_repeated_infos.txt (text/plain), 49.52 KiB.
[25 Jul 2008 14:51]
MySQL Verification Team
when i rerun a test with --slave_net_timeout=2 the i/o thread does fail a few times due to the high load on the box: [Note] Slave I/O thread: connected to master 'root@127.0.0.1:3306',replication started in log 'xp64-bin.000002' at position 6123850 [ERROR] Slave I/O: error reconnecting to master 'root@127.0.0.1:3306' - retry-time: 60 retries: 86400, Error_code: 1159 [Note] Slave: connected to master 'root@127.0.0.1:3306',replication resumed in log 'xp64-bin.000002' at position 6295775 [ERROR] Slave I/O: error reconnecting to master 'root@127.0.0.1:3306' - retry-time: 60 retries: 86400, Error_code: 1159
[25 Jul 2008 19:31]
Omer Barnir
I do believe this issue is related more on the load on the machine then to directly related to the partition tables. It did show in a configuration where the master and slave were running on the same box under load. Increasing the connect values (opposite to what Shane did) decreased the overall number of disconnect/connect and this problem did not show. Also this was not observed in tests run lately. Based on the above setting the bug to 'can't repeat' (at east until I run into it again)