Bug #33184 | Falcon regression tests fail on Mac OS X | ||
---|---|---|---|
Submitted: | 12 Dec 2007 19:49 | Modified: | 22 Feb 2008 20:53 |
Reporter: | Giuseppe Maxia | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S3 (Non-critical) |
Version: | 6.0.5 | OS: | MacOS (Intel) |
Assigned to: | CPU Architecture: | Any | |
Tags: | F_TEST, falcon, regression, suite |
[12 Dec 2007 19:49]
Giuseppe Maxia
[12 Dec 2007 21:33]
Hakan Küçükyılmaz
Verified failures with latest mysql-6.0 from bk. Change set is 1.2754, 2007-12-11 17:44:21+01:00, df@pippilotta.erinye.com +1 -0. Failing tests are: falcon_bug_22167 falcon_bug_26802 falcon_bug_28046 falcon_bug_28049 falcon_bug_28158 falcon_bug_28165 falcon_bugs I used BUILD/compile-mac for building, which looks like: #! /bin/sh path=`dirname $0` . "$path/SETUP.sh" extra_flags=" $fast_cflags $max_cflags -g" extra_configs=" $max_configs" . "$path/FINISH.sh" # based on BUILD/compile-pentium-max
[12 Dec 2007 22:43]
Hakan Küçükyılmaz
Running the Falcon test suite with latest mysql-6.0-falcon-team tree shows that following tests are failing with BUILD/compile-pentium-max build script: falcon_bug_22167 falcon_bug_26802 falcon_bug_28046 falcon_bug_28049 falcon_bug_28158 falcon_bug_28165 falcon_bugs Most tests are failing when trying to drop a table: DROP TABLE t1' failed: 2013: Lost connection to MySQL server during query
[12 Dec 2007 23:14]
Hakan Küçükyılmaz
Running the Falcon test suite with a debug build shows no tests failing: falcon.rowid_order_falcon [ pass ] 177 ------------------------------------------------------- Stopping All Servers All 145 tests were successful. The servers were restarted 5 times Spent 810.606 of 851 seconds executing testcases
[16 Feb 2008 19:37]
Giuseppe Maxia
Tested again on MySQL 6.0.5 (myql-6.0 tree, ChangeSet@1.2497.7.21) Still failing tests on a server compiled without debugging options: falcon.falcon_bug_22167 falcon.falcon_bug_26802 falcon.falcon_bug_28046 falcon.falcon_bug_28049 falcon.falcon_bug_28158 falcon.falcon_bug_28165 falcon.falcon_bugs
[19 Feb 2008 15:11]
Hakan Küçükyılmaz
Verified with latest mysql-6.0-falcon-team tree on Mac OS X/Intel. I build with BUILD/compile-pentium-max and ran the test suite like ./mysql-test-run.pl --force --suite=falcon Failing tests are: falcon.falcon_bug_22167 falcon.falcon_bug_26802 falcon.falcon_bug_28046 falcon.falcon_bug_28049 falcon.falcon_bug_28158 falcon.falcon_bug_28165 falcon.falcon_bugs I see crashes around DROP TABLE.
[19 Feb 2008 15:14]
Hakan Küçükyılmaz
The assertion we get is: (gdb) f 4 #4 0x002a1df1 in SyncObject::~SyncObject (this=0x70408bc) at SyncObject.cpp:125 125 ASSERT(lockState == 0); (gdb) p lockState $1 = -1341785548 (gdb) bt #0 0x9003d66c in kill () #1 0x9010e8cf in raise () #2 0x002e6eb1 in Error::error (string=0x62b358 "assertion failed at line %d in file %s\n") at Error.cpp:92 #3 0x002e6f38 in Error::assertionFailed (fileName=0x623d64 "SyncObject.cpp", li ne=125) at Error.cpp:76 #4 0x002a1df1 in SyncObject::~SyncObject (this=0x70408bc) at SyncObject.cpp:125 #5 0x002a7021 in Table::~Table (this=0x7040760) at Table.cpp:150 #6 0x002d47c8 in Database::dropTable (this=0x6505648, table=0x7040760, transact ion=0x0) at Database.cpp:1420 #7 0x0035c7ed in Nfs::Statement::executeDDL (this=0x700a740) at Statement.cpp:1 135 #8 0x0035cd68 in Nfs::Statement::execute (this=0x700a740, sqlString=0xb005fc80 "drop table TEST.\"T1\"", isQuery=false) at Statement.cpp:2696 #9 0x0035cdfe in Nfs::Statement::execute (this=0x700a740, sqlString=0xb005fc80 "drop table TEST.\"T1\"") at Statement.cpp:161 #10 0x0029736c in StorageDatabase::deleteTable (this=0x7008108, storageConnectio n=0x703f460, tableShare=0x703ac60) at StorageDatabase.cpp:488 #11 0x0029cf95 in StorageTableShare::deleteTable (this=0x703ac60, storageConnect ion=0x703f460) at StorageTableShare.cpp:138 #12 0x0029bd55 in StorageTable::deleteTable (this=0x6666978) at StorageTable.cpp :81 #13 0x0028b16a in StorageInterface::delete_table (this=0x7871298, tableName=0xb0 060c60 "./test/t1") at ha_falcon.cpp:949 #14 0x001b064d in ha_delete_table (thd=0x781c400, table_type=0x6402380, path=0xb 0060c60 "./test/t1", db=0x7871290 "test", alias=0x7871078 "t1", generate_warning =true) at handler.cc:1484 #15 0x001c32e0 in mysql_rm_table_part2 (thd=0x781c400, tables=0x78710a0, if_exis ts=false, drop_temporary=false, drop_view=false, dont_log_query=false) at sql_ta ble.cc:1662 #16 0x001c36d9 in mysql_rm_table (thd=0x781c400, tables=0x78710a0, if_exists=0 ' \0', drop_temporary=0 '\0') at sql_table.cc:1457 #17 0x000c593f in mysql_execute_command (thd=0x781c400) at sql_parse.cc:2876 #18 0x000cc49a in mysql_parse (thd=0x781c400, inBuf=0x7871010 "DROP TABLE t1", l ength=13, found_semicolon=0xb0061e9c) at sql_parse.cc:5410 #19 0x000ccb20 in dispatch_command (command=COM_QUERY, thd=0x781c400, packet=0x6 1f7001 "DROP TABLE t1", packet_length=13) at sql_parse.cc:921 #20 0x000cde6e in do_command (thd=0x781c400) at sql_parse.cc:697 #21 0x000bc28b in handle_one_connection (arg=0x781c400) at sql_connect.cc:1146 #22 0x90024227 in _pthread_body ()
[20 Feb 2008 18:37]
Kevin Lewis
Bug#33184 occurs regularly on the MAC OS with non-debug code (with the current build). So it is probably timing dependent. During a DropTable, Table::~Table is called with a shared lock still active on Table::syncUpdate. This is the Sync that is sent into Table::checkUniqueIndexes() with the expectation that it would be unlocked before returning. I think this is a risky approach because the syncObject is not released in the same routine that it was locked. I was looking for a pathway that could lead to one of these sync objects being left unlocked (unsuccessfully) when I realized that none of them are ever locked exclusively. What good is a syncObject that is only locked shared? I think we should just delete it. This SyncObject does not do anything now, so if it is deleted, this bug should go away. But it was originally added to ensure that the search for a duplicate value is done atomically. This was needed to avoid the theoretical problem where two transactions insert the same unique index value at the same time. Currently, updates and inserts operate in this order; 1)check for duplicates. 2)insert indexes 3)insert record If two concurrent transactions with the same key value do step 1 before each has reached step 2, then they will not see each other's duplicate key value. The result would be duplicate values even in a unique index. The syncUpdate lock was originally designed to prevent this, by holding the lock until steps 2 and 3 were done. But when Jim implemented it, the exclusive lock caused a major performance degradation. So it was backed up to a shared lock. This may be the source of the duplicate primary keys we have seen periodically. (see Bug#33498). Or there may be some other synchronization mechanism preventing it under most circumstances. Here is a possible solution suggested by Jim from an email on Tue 10/2/2007; >I don't think it is necessary to make record insertion and index update >atomic. I'm also quite sure that making record insertion, index update, >and unique checking atomic is difficult and expensive because any lock >must be released before waiting for a pending transaction, necessitating >a restart of the full check operation. Note that this, in fact, makes >the insert and check operations non-atomic. > >But think about it a little more. If we do the check after the >insertion and index update (either order), it works and it works without >locks. The analysis is simple: Consider two parallel transactions >trying to store duplicate values. Transaction A does its insert and >index update then checks for duplicates. If transaction B hasn't >finished either his insert or index update, transaction A won't see the >update. But for A not to see B, A must have completed its insert and >index update, so A will always be visible to B when B executes it unique >checks. QED. No locks, no atomicity, and it works. I suggest that this bug be fixed by deleting syncUpdate and the possible duplicate unique key issue be handled by another bug. Possibly 33498, even though that one seems hard to reproduce.
[20 Feb 2008 19:03]
Hakan Küçükyılmaz
After removing the SyncObject in Table.cpp around line 3042 I don't see any regression anymore. //Sync sync(&syncUpdate, "Table::update(4)"); if (indexes) { //do // sync.lock(ATOMIC_UPDATE); //while (!checkUniqueIndexes(transaction, record, &sync)); while (!checkUniqueIndexes(transaction, record, NULL)); TEST RESULT TIME (ms) ------------------------------------------------------- falcon.falcon_bug_33184 [ pass ] 84 falcon.falcon_bugs [ pass ] 17448 falcon.falcon_bug_28165 [ pass ] 88 falcon.falcon_bug_28158 [ pass ] 81 falcon.falcon_bug_28049 [ pass ] 512 falcon.falcon_bug_28046 [ pass ] 45 falcon.falcon_bug_26802 [ pass ] 40 falcon.falcon_bug_22167 [ pass ] 40 ------------------------------------------------------- Stopping All Servers All 8 tests were successful. The servers were restarted 1 times Spent 18.338 of 21 seconds executing testcases The whole falcon test suite passes now on Mac OS X/Intel
[22 Feb 2008 20:53]
Hakan Küçükyılmaz
This regression turned out to be compiler bug in the exception handling of Apple's gcc. hakan@au0012:~$ gcc --version i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5367) Copyright (C) 2005 Free Software Foundation, Inc. After installing gcc 4.2.2 from fink, the Falcon test suite runs fine. falcon.falcon_ts [ pass ] 152 falcon.index_merge_falcon [ disabled ] Bug#32696 index_merge_falcon.test fails falcon.ps_8falcon [ disabled ] Bug#32683 Test ps_8falcon fails in 6.0 falcon.read_many_rows_falcon [ disabled ] Bug#23783 2006-10-30 ML Get pushbuild green falcon.rowid_order_falcon [ pass ] 227 falcon.rpl_bug_33688 [ pass ] 218 ------------------------------------------------------- Stopping All Servers All 165 tests were successful. The servers were restarted 10 times Spent 359.264 of 411 seconds executing testcases