Bug #34624 | Falcon: Slave contains one more record than master after a replication test | ||
---|---|---|---|
Submitted: | 17 Feb 2008 10:23 | Modified: | 15 May 2009 13:39 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0.4 | OS: | Any |
Assigned to: | Kevin Lewis | CPU Architecture: | Any |
Tags: | F_ISOLATION, replication |
[17 Feb 2008 10:23]
Philip Stoev
[17 Feb 2008 12:02]
Philip Stoev
Test case for bug 34624
Attachment: bug34624.zip (application/x-zip-compressed, text), 2.05 KiB.
[17 Feb 2008 12:11]
Philip Stoev
Please run the test case as follows: 1. Place the archive in mysql-test and unzip it. The .txt files must go into mysql-test and the .test files must go into mysql-test/t 2. Start master and slave $ perl mysql-test-run.pl --start-and-exit --skip-ndb rpl_alter The rpl_alter has no relation to this test or this bug, it is only used to force mysql-test-run.pl to start both a master and a slave. 3. Run the test $ perl ./mysql-test-run.pl --stress --stress-init-file=bug34624_init.txt --stress-test-file=bug34624_run.txt --stress-threads=10 --stress-test-duration=1200 --extern --socket=var/tmp/master.sock --user=root 4. Let it run for 5 minutes. Grep the error* files in mysql-test/var/stress for the word "slave" -- it signifies cases where the slave has the wrong number of records. You will also find bug34624_slave.reject files in the subdirectories of mysql-test/var/stress. The files from the test are as follows: bug34624_init.test creates the required tables and the stored procedure used to insert 100 rows into table viewer_tbl2; bug34624_master.test deletes all records from viewer_tbl2 and calls the stored procedure to insert new rows. bug34629_slave.test is called to check whether the number of records in viewer_tbl2 is evenly divisible by 10. If it is not, which means that there is a problem with the slave, this test will fail, which will be reflected in the mysql-test/var/stress/error*.log files.
[18 Feb 2008 10:14]
Susanne Ebrecht
Philip, one short question: RBR or SBR?
[18 Feb 2008 13:20]
Philip Stoev
This test uses mixed replication mode, as specified at the top of bug34624_init.test
[23 Jun 2008 10:18]
Zhenxing He
I ran the test against our bzr tree of mysql-6.0, and I can not reproduce the bug as the report described exactly, but in a slightly different way. The slave got less one row then the master instead of one row more. So I think maybe some fixes of the server have altered the behavior of this bug. After trace and analyse this problem, I found out that this is not a replication bug, it's a falcon storage engine bug. I think there is a lock issue of Falcon, And here is a desciption of the problem, INSERT a row of Falcon engine is handled by the following function: bool Table::insert(Record *record, Record *prior, int recordNumber); When inserting a record, the process will be: 1. sync.lock(Shared); 2. recordBitmap->setSafe(recordNumber); 3. sync.unlock(); 4. sync.lock(Exclusive); 5. records->store(); SELECT with no where clause will read the records with rr_sequential, and it will call the followint function to get each rows in the table: Record* Table::fetchNext(int32 start); The process of this function is: 1. sync.lock(Shared); 2. recordBitmap->nextSet(recordNumber); 3. records->fetch(); 4. if 3 fails, recordBitmap->clear(recordNumber); So it is possible for the following scenario to happen: t1. sync.lock(Shared); t1. recordBitmap->setSafe(recordNumber); t1. sync.unlock(); t2. sync.lock(Shared); t2. recordBitmap->nextSet(recordNumber); t2. records->fetch(); // this will fail t2. recordBitmap->clear(recordNumber) So t2 will think that the record with number recordNumber does not exists, and because the bitmap bit is cleared, all SELECT using rr_sequential will think this record does not exist. This will not affect SELECT that reading records with rr_quick or other methods.
[24 Jun 2008 17:55]
Kevin Lewis
Chris, Once again, a replication bug that happens only in the Falcon related replication code has been assigned to us to isolate. Thanks for doing this.
[31 Mar 2009 3:01]
Kevin Lewis
Philip, It seems very likely that this bug may also be fixed by the CycleManager like the other F_ISOLATION bugs. Can you check?
[31 Mar 2009 15:47]
Kevin Lewis
6.0.11 testing is about 3-4 weeks from now. If it is still a problem, we should fix it with the other isolation bugs for this release.
[31 Mar 2009 16:37]
Philip Stoev
This issue is no longer reproducible with the original test case. Regardless of any transactional fixes, I think that Zhenxing He's comments should still be considered.
[31 Mar 2009 17:14]
Kevin Lewis
This seems to be another bug that is fixed by the recent addition of the CycleManager. See also, Bug#41391, Bug#41478, Bug#41742, Bug#41850, Bug#42459, Bug#41661, Bug#42185, Bug#43146, Bug#43298, Bug#43299. Zhenxing He made an excellent observation back on [23 Jun 2008 12:18] in this bug. The problem he observed was addressed by patch to Bug#41741 on [25 Feb 17:54] http://lists.mysql.com/commits/67593
[15 May 2009 13:39]
MC Brown
A note have been added to the 6.0.11 changelog: The Falcon CycleManager has been updated, which addresses a number of issues when examining records in various transaction states and their visisbility/isolation in relation to other threads.