Bug #62037 | assert fails in row_search_for_mysql on TRX_ISO_READ_UNCOMMITTED | ||
---|---|---|---|
Submitted: | 30 Jul 2011 14:14 | Modified: | 13 Jul 2012 18:02 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: InnoDB Plugin storage engine | Severity: | S1 (Critical) |
Version: | 5.1.52, 5.1.58, 5.5.14 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | innodb, regression |
[30 Jul 2011 14:14]
Mark Callaghan
[30 Jul 2011 14:58]
Mark Callaghan
There are many asserts for TRX_ISO_READ_UNCOMMITTED that are fairly new including this assert 3351.14.276 marko.makela@oracle.com 20101019 | /* Returning a row to MySQL */ | | if (!row_sel_store_mysql_rec(buf, prebuilt, result_rec, | result_rec != rec, | offsets)) { 3351.14.170 marko.makela@oracle.com 20100630 | /* Only fresh inserts may contain | incomplete externally stored | columns. Pretend that such records do | not exist. Such records may only be | accessed at the READ UNCOMMITTED 3351.14.166 marko.makela@oracle.com 20100629 | isolation level or when rolling back a 3351.14.170 marko.makela@oracle.com 20100630 | recovered transaction. Rollback | happens at a lower level, not here. */ 3351.14.166 marko.makela@oracle.com 20100629 | ut_a(trx->isolation_level | == TRX_ISO_READ_UNCOMMITTED); | goto next_rec; revno: 3351.14.166 committer: Marko Mäkelä <marko.makela@oracle.com> branch nick: 5.1-innodb timestamp: Tue 2010-06-29 15:55:18 +0300 message: Bug#54358: READ UNCOMMITTED access failure of off-page DYNAMIC or COMPRESSED columns When the server crashes after a record stub has been inserted and before all its off-page columns have been written, the record will contain incomplete off-page columns after crash recovery. Such records may only be accessed at the READ UNCOMMITTED isolation level or when rolling back a recovered transaction in recv_recovery_rollback_active(). Skip these records at the READ UNCOMMITTED isolation level. TODO: Add assertions for checking the above assumptions hold when an incomplete BLOB is encountered. btr_rec_copy_externally_stored_field(): Return NULL if the field is incomplete. row_prebuilt_t::templ_contains_blob: Clarify what "BLOB" means in this context. Hint: MySQL BLOBs are not the same as InnoDB BLOBs. row_sel_store_mysql_rec(): Return FALSE if not all columns could be retrieved. Previously this function always returned TRUE. Assert that the record is not delete-marked. row_sel_push_cache_row_for_mysql(): Return FALSE if not all columns could be retrieved. row_search_for_mysql(): Skip records containing incomplete off-page columns. Assert that the transaction isolation level is READ UNCOMMITTED. rb://380 approved by Jimmy Yang
[30 Jul 2011 15:19]
Mark Callaghan
And the root cause for the assert is that this function returns NULL /*******************************************************************//** Copies an externally stored field of a record to mem heap. @return the field copied to heap, or NULL if the field is incomplete */ UNIV_INTERN byte* btr_rec_copy_externally_stored_field( /*=================================*/ const rec_t* rec, /*!< in: record in a clustered index; must be protected by a lock or a page latch */ const ulint* offsets,/*!< in: array returned by rec_get_offsets() */ ulint zip_size,/*!< in: nonzero=compressed BLOB page size, zero for uncompressed BLOBs */ ulint no, /*!< in: field number */ ulint* len, /*!< out: length of the field */ mem_heap_t* heap) /*!< in: mem heap */ { ulint local_len; const byte* data; ut_a(rec_offs_nth_extern(offsets, no)); /* An externally stored field can contain some initial data from the field, and in the last 20 bytes it has the space id, page number, and offset where the rest of the field data is stored, and the data length in addition to the data stored locally. We may need to store some data locally to get the local record length above the 128 byte limit so that field offsets are stored in two bytes, and the extern bit is available in those two bytes. */ data = rec_get_nth_field(rec, offsets, no, &local_len); ut_a(local_len >= BTR_EXTERN_FIELD_REF_SIZE); if (UNIV_UNLIKELY (!memcmp(data + local_len - BTR_EXTERN_FIELD_REF_SIZE, field_ref_zero, BTR_EXTERN_FIELD_REF_SIZE))) { /* The externally stored field was not written yet. This record should only be seen by recv_recovery_rollback_active() or any TRX_ISO_READ_UNCOMMITTED transactions. */ return(NULL); }
[30 Jul 2011 16:19]
Valeriy Kravchuk
It would be still nice to find out exact SQL failing or, at least, information about the table accessed while this assertion happens. I had tried to run concurrent INSERTs, ROLLBACK WORK and SELECT (both all rows and by index on c2) for the table like this: mysql> show create table tinno\G *************************** 1. row *************************** Table: tinno Create Table: CREATE TABLE `tinno` ( `id` int(11) NOT NULL AUTO_INCREMENT, `c2` int(11) DEFAULT NULL, `c3` blob, PRIMARY KEY (`id`), KEY `c2` (`c2`) ) ENGINE=InnoDB AUTO_INCREMENT=145390 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC 1 row in set (0.06 sec) mysql> show table status like 'tinno'\G *************************** 1. row *************************** Name: tinno Engine: InnoDB Version: 10 Row_format: Dynamic Rows: 4098 Avg_row_length: 32883 Data_length: 134758400 Max_data_length: 0 Index_length: 98304 Data_free: 131072000 Auto_increment: 145390 Create_time: 2011-07-30 18:51:24 Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: row_format=DYNAMIC Comment: 1 row in set (0.07 sec) with innodb_file_per_table and Barracuda format set, but no assertions happened yet.
[30 Jul 2011 16:20]
Mark Callaghan
I think mysqldump SELECT statements were repeatedly crashing a server and am trying to reproduce that now
[30 Jul 2011 17:07]
Mark Callaghan
While row_search_for_mysql has good error messages for other errors -- messages that include the table_name, the error message in this case is extremely limited. So this is also a feature request to improve those error messges by including the table/database name or space ID. Otherwise, we must guess at the reproduction case. And please, any time new code adds an assertion to InnoDB, please think about what will be in the mysqld error log when that assertion fails. In many cases the log is insufficient unless more detail is included prior to the assert failure. Assume that the stack dump done by mysqld on asserts and segfaults doesn't always dump the SQL statement.
[31 Jul 2011 9:41]
MySQL Verification Team
On a clean datadir using 5.1.58 + plugin + compact tables + innodb file per table, I got this with 3 concurrent threads doing blob stuff: Version: '5.1.58-community' socket: '' port: 3306 MySQL Community Server (GPL) 110731 11:36:49 InnoDB: Assertion failure in thread 2036 in file .\row\row0sel.c line 4473 InnoDB: Failing assertion: trx->isolation_level == TRX_ISO_READ_UNCOMMITTED InnoDB: We intentionally generate a memory trap. ha_innodb_plugin.dll!row_search_for_mysql()[row0sel.c:4473] ha_innodb_plugin.dll!ha_innodb::index_read()[ha_innodb.cc:5479] ha_innodb_plugin.dll!ha_innodb::index_first()[ha_innodb.cc:5783] mysqld.exe!handler::read_range_first()[handler.cc:4261] mysqld.exe!handler::read_multi_range_first()[handler.cc:4139] mysqld.exe!QUICK_RANGE_SELECT::get_next()[opt_range.cc:8594] mysqld.exe!rr_quick()[records.cc:331] mysqld.exe!join_init_read_record()[sql_select.cc:12189] mysqld.exe!sub_select()[sql_select.cc:11459] mysqld.exe!do_select()[sql_select.cc:11217] mysqld.exe!JOIN::exec()[sql_select.cc:2343] mysqld.exe!mysql_select()[sql_select.cc:2555] mysqld.exe!handle_select()[sql_select.cc:269] mysqld.exe!execute_sqlcom_select()[sql_parse.cc:5171] mysqld.exe!mysql_execute_command()[sql_parse.cc:2307] mysqld.exe!mysql_parse()[sql_parse.cc:6092] mysqld.exe!dispatch_command()[sql_parse.cc:1277] mysqld.exe!do_command()[sql_parse.cc:893] mysqld.exe!handle_one_connection()[sql_connect.cc:1427] mysqld.exe!pthread_start()[my_winthread.c:85] mysqld.exe!_callthreadstart()[thread.c:293] Query (02517400): =select * from `blobtest_1` where `id`<=4495 limit 1 Repeating the query and dumping tables doesn't crash.
[31 Jul 2011 11:49]
MySQL Verification Team
I got this repeatable in seconds by running mysqld like this: ./bin/mysqld_safe --no-defaults --skip-gr --skip-na --ignore-builtin-innodb --plugin-load=ha_innodb_plugin.so --innodb-flush-log-at-trx-commit=0 --core-file then running two copies of the attached blob tester php app concurrently.
[31 Jul 2011 12:46]
Valeriy Kravchuk
Same test case crashed 64-bit release binaries of 5.5.14 on Mac OS X: 110731 15:41:46 [Note] /usr/local/mysql-5.5.14-osx10.5-x86_64/bin/mysqld: ready for connections. Version: '5.5.14' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Server (GPL) 110731 15:42:41 InnoDB: Assertion failure in thread 4492226560 in file row0sel.c line 4471 InnoDB: Failing assertion: trx->isolation_level == TRX_ISO_READ_UNCOMMITTED InnoDB: We intentionally generate a memory trap.
[31 Jul 2011 14:28]
MySQL Verification Team
From my testcase results: 5.1.47: no assertion 5.1.48: no assertion 5.1.49: no assertion 5.1.50: hits assertion 5.1.58: hits assertion 5.5.14: hits assertion 5.5.15: no assertion. not sure why. On a side note, the SQL query was not printed in old versions due to bug #51817 .
[1 Aug 2011 5:53]
MySQL Verification Team
Could this be a duplicate of the following ? BUG 12612184 - RACE CONDITION AFTER BTR_CUR_PESSIMISTIC_UPDATE() http://lists.mysql.com/commits/138935 http://lists.mysql.com/commits/139001
[1 Aug 2011 6:40]
Marko Mäkelä
This looks like a duplicate of Oracle Bug#12612184, which I worked on in June. Since the beginning of InnoDB, there has been a race condition in updates (quoting what I posted in the bug in June): ----start quote---- When a record is updated so that its size grows, non-updated columns can be selected for external (off-page) storage. The bug is that the initially inserted updated record contains an all-zero BLOB pointer to the field that was not updated. Only after the BLOB is allocated, the pointer will be valid. This is the window of opportunity (quoting from row_upd_clust_rec() in MySQL 3.23.53): err = btr_cur_pessimistic_update(BTR_NO_LOCKING_FLAG, btr_cur, &big_rec, node->update, node->cmpl_info, thr, mtr); mtr_commit(mtr); if (err == DB_SUCCESS && big_rec) { mtr_start(mtr); ut_a(btr_pcur_restore_position(BTR_MODIFY_TREE, pcur, mtr)); Between the release of the page latch in mtr_commit(mtr) and the re-latching of the page in btr_pcur_restore_position(), other threads can see the invalid BLOB pointer. I checked that the invalid BLOB pointer would be zeroed in dtuple_convert_big_rec() even in 3.23.53; old versions of InnoDB had the bad habit of not initializing all fields. The problem is amplified by the ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED that were introduced in innodb_file_format=barracuda in InnoDB Plugin, but the bug does exist in all InnoDB versions. ----end quote---- Apparently, the 5.1.58 release does not contain the fix, even though the files in the source archive are dated 2011-07-01. I pushed the fix to the mysql-5.1 tree on 2011-06-16. There is a known bug in my fix: If the server crashes between btr_cur_mtr_commit_and_start() and the final mtr_commit(mtr) after all BLOBs have been written, at crash recovery any information about the non-updated BLOBs (which were moved to off-page storage due to the record size changing in the update) will be permanently lost. I outlined a fix for that in Bug#12612184 on 2011-06-28, and I will get back to it after I have completed more urgent tasks that have been piling up while I was on vacation.
[1 Aug 2011 7:23]
Marko Mäkelä
The recovery bug is being tracked as Oracle Bug #12704861 CORRUPTION AFTER A CRASH DURING BLOB UPDATE.
[1 Aug 2011 15:34]
Mark Callaghan
12704861 describes a race condition between concurrent INSERT (or UPDATE) and SELECT. I can reproduce this with one SELECT statement. Is my table now corrupt with a zero-length blob reference?
[2 Aug 2011 7:02]
Marko Mäkelä
Yes Mark, your table could be corrupted so that a BLOB pointer is full of zero bytes (sort of a null pointer). This could be a transient or a permanent condition. The condition should be transient when another transaction is trying to access an off-page column just after an UPDATE transaction has done the first mtr_commit() and is about to x-latch the clustered index page so that it can write the off-page columns. If you get a crash and crash recovery at an unfortunate moment, you will have permanent corruption. I am not sure if there is an easy way to detect the corruption. The following is from memory and may be wrong. "CHECK TABLE t" would only count the records, not attempt to dereference all off-page columns. The statements "CHECKSUM TABLE t" or "SELECT * FROM t" would read all columns. Alas, the InnoDB BLOB code (or code for retrieving off-page columns a.k.a. externally stored fields) as it was originally written by Heikki Tuuri is lacking some error checks or assertions. It could silently ignore null BLOB pointers, pretending that the column contained just what it was able to retrieve. The BLOB storage should always be an all-or-nothing deal. Either the clustered index leaf page record points to the full BLOB suffix, or it is a null pointer. If you see off-page columns that are 0 or 768 bytes long, alarm bells should ring. Before InnoDB Plugin, InnoDB would store 768 bytes locally in the clustered index page. In the Barracuda format (ROW_FORMAT=DYNAMIC or ROW_FORMAT=COMPRESSED), the clustered index record does not store any prefix.
[2 Aug 2011 15:01]
Mark Callaghan
Can Shane confirm this doesn't reproduce with 5.1.59 or 5.1 trunk?
[2 Aug 2011 16:50]
Valeriy Kravchuk
I will run on QuadXeon under Linux tomorrow, but with recent pull from mysql-5.1 tree in bzr I am not able to get this assertion on Mac after running for more than 1 hour: macbook-pro:5.1 openxs$ tail -f var/macbook-pro.err InnoDB: Setting log file ./ib_logfile1 size to 5 MB InnoDB: Database physically writes the file full: wait... InnoDB: Doublewrite buffer not found: creating new InnoDB: Doublewrite buffer created InnoDB: Creating foreign key constraint system tables InnoDB: Foreign key constraint system tables created 110802 18:41:46 InnoDB Plugin 1.0.17 started; log sequence number 0 110802 18:41:46 [Note] Event Scheduler: Loaded 0 events 110802 18:41:46 [Note] /Users/openxs/dbs/5.1/libexec/mysqld: ready for connections. Version: '5.1.59' socket: '/tmp/mysql.sock' port: 3306 Source distribution ^C macbook-pro:5.1 openxs$ date вт 2 сер 2011 19:49:33 EEST This is the same Mac where I reproduced assertions on both 5.5.14 and 5.1.58 (official 64-bit binaries).
[3 May 2012 6:58]
zhai weixiang
this assertion was reproduced on Percona Server 5.5.18, and ROW_FORMAT=COMPACT. i don't know if this bug affects all types of ROW_FORMAT. below is part of error log 120503 9:20:30 InnoDB: Assertion failure in thread 140210779649792 in file row0sel.c line 4612 InnoDB: Failing assertion: trx->isolation_level == TRX_ISO_READ_UNCOMMITTED InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 120503 9:20:30 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=104857600 read_buffer_size=1048576 max_used_connections=48 max_threads=2100 thread_count=47 connection_count=47 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2488475 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x3f64eec0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f855db6ee78 thread_stack 0x40000 /u01/mysql/bin/mysqld(my_print_stacktrace+0x39)[0x7b1b69] /u01/mysql/bin/mysqld(handle_segfault+0x43c)[0x4fa39c] /lib64/libpthread.so.0[0x36d0a0f520] /lib64/libc.so.6(gsignal+0x35)[0x36d0632a45] /lib64/libc.so.6(abort+0x175)[0x36d0634225] /u01/mysql/bin/mysqld[0x8479c1] /u01/mysql/bin/mysqld[0x81f4fe] /u01/mysql/bin/mysqld[0x5a27d5] /u01/mysql/bin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x79)[0x59a689] /u01/mysql/bin/mysqld[0x59f56d] /u01/mysql/bin/mysqld(_ZN4JOIN4execEv+0xc5e)[0x5b566e] /u01/mysql/bin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x119) [0x5b1249] /u01/mysql/bin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x1c5)[0x5b6e55] /u01/mysql/bin/mysqld[0x574028] /u01/mysql/bin/mysqld(_Z21mysql_execute_commandP3THD+0x1102)[0x577f32] /u01/mysql/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x3a4)[0x57b674] /u01/mysql/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1448)[0x57cac8] /u01/mysql/bin/mysqld(_Z24do_handle_one_connectionP3THD+0x10d)[0x61a70d] /u01/mysql/bin/mysqld(handle_one_connection+0x4c)[0x61a85c] /lib64/libpthread.so.0[0x36d0a077e1] /lib64/libc.so.6(clone+0x6d)[0x36d06e68ed] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x7f8448004bb0): SELECT * FROM `delayed_jobs` ORDER BY `id` LIMIT 0, 1000 Connection ID (thread ID): 28339 Status: NOT_KILLED
[13 Jul 2012 18:02]
Mark Callaghan
Can this get a status update?
[13 Jul 2012 18:26]
Valeriy Kravchuk
This bug was declared a duplicate of internal Bug #13721257, that is fixed in 5.5.22+ (see http://dev.mysql.com/doc/refman/5.5/en/news-5-5-22.html) and 5.1.63+ (Plugin only).
[26 Apr 2013 5:59]
liu hickey
/home/xiyu.lh/work/5.5 $ bzr log | grep 13721257 No 13721257 info in 5.5 launchpad ?
[9 Aug 2013 10:14]
zhai weixiang
it's not easy to find the related patch, because there isn't any comment in the bzr log.... May be this one ? http://bazaar.launchpad.net/~mysql/mysql-server/5.5/revision/3723 http://bazaar.launchpad.net/~mysql/mysql-server/5.5/revision/3730 I don't know if it's safe to patch the diff directly to MySQL5.5.18(we are using now..). and it's impossible for us to upgrade to a new version such as MySQL5.5.22(that means long time tests...)