Bug #54286 | Server crash at lock timeout with MRR | ||
---|---|---|---|
Submitted: | 7 Jun 2010 11:32 | Modified: | 23 Nov 2010 3:30 |
Reporter: | Øystein Grøvlen | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Optimizer | Severity: | S3 (Non-critical) |
Version: | mysql-next-mr-opt-backporting | OS: | Any |
Assigned to: | Olav Sandstå | CPU Architecture: | Any |
Tags: | mrr, optimizer_switch |
[7 Jun 2010 11:32]
Øystein Grøvlen
[7 Jun 2010 14:45]
MySQL Verification Team
Thank you for the bug report. Verified as described: InnoDB: Creating foreign key constraint system tables InnoDB: Foreign key constraint system tables created 100607 17:41:47 InnoDB Plugin 1.0.6 started; log sequence number 0 100607 17:41:47 [Note] Event Scheduler: Loaded 0 events 100607 17:41:47 [Note] /home/miguel/bzr/5.5opt/sql/mysqld: ready for connections. Version: '5.6.99-m4-debug-log' socket: '/home/miguel/bzr/5.5opt/mysql-test/var/tmp/mysqld.1.sock' port: 13000 Source distribution mysqld: protocol.cc:540: void Protocol::end_statement(): Assertion `0' failed. 100607 17:42:38 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=1048576 read_buffer_size=131072 max_used_connections=3 max_threads=151 thread_count=3 connection_count=3 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60702 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0x2e64cf8 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7ffc46b13e08 thread_stack 0x40000 /home/miguel/bzr/5.5opt/sql/mysqld(my_print_stacktrace+0x35)[0xbab68f] /home/miguel/bzr/5.5opt/sql/mysqld(handle_segfault+0x2a7)[0x6c07b2] /lib64/libpthread.so.0[0x35a240f440] /lib64/libc.so.6(gsignal+0x35)[0x35a20329c5] /lib64/libc.so.6(abort+0x175)[0x35a20341a5] /lib64/libc.so.6(__assert_fail+0xf5)[0x35a202b945] /home/miguel/bzr/5.5opt/sql/mysqld(_ZN8Protocol13end_statementEv+0x265)[0x6b27b5] /home/miguel/bzr/5.5opt/sql/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1cb9)[0x6d1430] /home/miguel/bzr/5.5opt/sql/mysqld(_Z10do_commandP3THD+0x27d)[0x6cf571] /home/miguel/bzr/5.5opt/sql/mysqld(_Z24do_handle_one_connectionP3THD+0x14f)[0x6cd69e] /home/miguel/bzr/5.5opt/sql/mysqld(handle_one_connection+0x33)[0x6cd548] /home/miguel/bzr/5.5opt/sql/mysqld(_Z16pfs_spawn_threadPv+0xce)[0xb7e0d9] /lib64/libpthread.so.0[0x35a2407761] /lib64/libc.so.6(clone+0x6d)[0x35a20e150d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x7ffc34004b98 is an invalid pointer thd->thread_id=3 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. Writing a core file - saving '/home/miguel/bzr/5.5opt/mysql-test/var/log/main.bug54286/' to '/home/miguel/bzr/5.5opt/mysql-test/var/log/main.bug54286/' - found 'core.10700' (0/5) Trying 'dbx' to get a backtrace Trying 'gdb' to get a backtrace Core generated by '/home/miguel/bzr/5.5opt/sql/mysqld' Output from gdb follows. The first stack trace is from the failing thread. The following stack traces are from all threads (so the failing one is duplicated). -------------------------- Missing separate debuginfo for Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/9e/ef3c8dabad1ba6746e5fef74878ba17e416e38 [New Thread 10756] [New Thread 10703] [New Thread 10704] [New Thread 10706] [New Thread 10712] [New Thread 10707] [New Thread 10700] [New Thread 10755] [New Thread 10757] [New Thread 10711] [New Thread 10710] [New Thread 10708] [New Thread 10709] [New Thread 10705] Core was generated by `/home/miguel/bzr/5.5opt/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/'. Program terminated with signal 6, Aborted. #0 0x00000035a240c61c in pthread_kill () from /lib64/libpthread.so.0 #0 0x00000035a240c61c in pthread_kill () from /lib64/libpthread.so.0 #1 0x0000000000bab713 in my_write_core (sig=6) at stacktrace.c:326 #2 0x00000000006c09b2 in handle_segfault (sig=6) at mysqld.cc:2786 #3 <signal handler called> #4 0x00000035a20329c5 in raise () from /lib64/libc.so.6 #5 0x00000035a20341a5 in abort () from /lib64/libc.so.6 #6 0x00000035a202b945 in __assert_fail () from /lib64/libc.so.6 #7 0x00000000006b27b5 in Protocol::end_statement (this=0x2e651a8) at protocol.cc:540 #8 0x00000000006d1430 in dispatch_command (command=COM_QUERY, thd=0x2e64cf8, packet=0x2e67a19 "", packet_length=39) at sql_parse.cc:1446 #9 0x00000000006cf571 in do_command (thd=0x2e64cf8) at sql_parse.cc:771 #10 0x00000000006cd69e in do_handle_one_connection (thd_arg=0x2e64cf8) at sql_connect.cc:1188 #11 0x00000000006cd548 in handle_one_connection (arg=0x2e64cf8) at sql_connect.cc:1127 #12 0x0000000000b7e0d9 in pfs_spawn_thread (arg=0x2e42f08) at pfs.cc:1011 #13 0x00000035a2407761 in start_thread () from /lib64/libpthread.so.0 #14 0x00000035a20e150d in clone () from /lib64/libc.so.6 Thread 14 (Thread 10705): #0 0x00000035a240b3bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000bb8ea7 in safe_cond_wait (cond=0x2d46b88, mp=0x2d46b10, file=0xd8d78d "os/os0sync.c", line=399) at thr_mutex.c:240 #2 0x00000000009ad23b in os_event_wait_low (event=0x2d46b10, reset_sig_count=0) at os/os0sync.c:399 #3 0x00000000009abf14 in os_aio_simulated_handle (global_segment=2, message1=0x7ffc4b361e60, message2=0x7ffc4b361e58, type=0x7ffc4b361e50) at os/os0file.c:4197 <cut>
[16 Jun 2010 14:08]
Olav Sandstå
The crash occurs in the assert in the following code in Protocol::end_statement() (in protocol.cc): switch (thd->stmt_da->status()) { case Diagnostics_area::DA_ERROR: ....... case Diagnostics_area::DA_EMPTY: default: DBUG_ASSERT(0); error= send_ok(thd->server_status, 0, 0, 0, NULL); break; When this happens the content of the thd->stmt_da looks like: (dbx) print *thd->stmt_da *thd->stmt_da = { is_sent = false can_overwrite_status = false m_message = "" m_sql_errno = 0 m_sqlstate = "" m_server_status = 0 m_affected_rows = 0 m_last_insert_id = 0 m_statement_warn_count = 0 m_status = DA_EMPTY }
[21 Jun 2010 14:43]
Olav Sandstå
This error happens when we try to read the first record from the database. The following code in sql_select.cc does this: int join_init_read_record(JOIN_TAB *tab) { if (tab->select && tab->select->quick && tab->select->quick->reset()) return 1; init_read_record(&tab->read_record, tab->join->thd, tab->table, tab->select,1,1, FALSE); return (*tab->read_record.read_record)(&tab->read_record); } The first line of the above code will both with and without MRR enabled initialize the reading of data and result in a call to: 1. QUICK_RANGE_SELECT::reset() ha_innobase::multi_range_read_init() DsMrr_impl::dsmrr_init() while the last code line will read the first record and call the following: 2. rr_quick() QUICK_RANGE_SELECT::get_next() ha_innobase::multi_range_read_next() DsMrr_impl::dsmrr__next() handler::multi_range_read_next() The difference between the two scenarios with MRR and without MRR is as follows: Without MRR: ============ 1. Initialization: DsMrr_impl::dsmrr_init() will do a minimal initialization without reading anything from InnoDB. 2. The actual record will be read when executing the last line of join_init_read_record. handler::multi_range_read_next() will return error code 146 (due to the lock time out). This error code is correctly handled in rr_quick() and get propagated back to the client. With MRR: ========= 1. Initialization: DsMrr_imp::dsmrr_init() is doing a lot more during initialization. One of the main steps is to try to fill the row id buffer. This will include accessing the database by calling handler::multi_range_read_next(). Also in this case handler::multi_range_read_next() will return 146 (lock time out) error code. Thus initialization/setup fails. The problem is that this failure is not propagated correctly back to the client and results in the assert. The main difference between these two scenarios is that the lock timeout occurs during initialization/setup when MRR is used and that the lack of correct handling of error codes from the database. At least the following are issues with the existing code: 1. DsMrr_impl::dsmrr_init() does not return the correct error code when the timeout occurs. Instead it just returns 1 to indicate failure/error. 2. In the code path from join_init_read_record() there seems to be no code for handling such errors (opposed to the code handling done in rr_quick() which calls rr_handle_error() to update diagnostic area).
[22 Jun 2010 14:18]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/111784 3209 Olav Sandstaa 2010-06-22 Fix for Bug#54286 Server crash at lock timeout with MRR This error happens when we try to read the first record from the database and a lock timeout occurs. This lock timeout error message does not get correctly propagated. The assert happens when the reply is going to be sent to the client and there is no error code set in the diagnostic area. The reason this happens when MRR is used is that when using MRR more work is done during initialization of MRR. One of the things the MRR code does is to try to fill the row id buffer by reading from the database index. This result in a call to ha_innobase::multi_range_read() which returns the lock timeout error code (error 146). This error code is not propagated up to the server code. When MRR is not in use there will be no calls to the database during initialization. The lock timeout error will only occur when the first call to read a record is done. With this code path the error code is correctly propagated to the server code and written to the diagnostic area before the reply message get sent to the client. This patch fixes this problem by the following two changes: 1. Fix DsMrr_impl::dsmrr_init() so that it returns the same error code as reported by the storage engine instead of just returning one when an error occurs. 2. Extend join_init_read_record() (in sql_select.cc) so that it handles errors that occurs during initialization (in this case from QUICK_RANGE_SELECT::reset()) and ensures the error code gets written to the diagnostic area. @ mysql-test/include/mrr_innodb_tests.inc Added test case for Bug#54286 "Server crash at lock timeout with MRR". @ mysql-test/r/innodb_mrr.result Test case for Bug#54286 "Server crash at lock timeout with MRR". @ mysql-test/r/innodb_mrr_all.result Test case for Bug#54286 "Server crash at lock timeout with MRR". @ mysql-test/r/innodb_mrr_icp.result Test case for Bug#54286 "Server crash at lock timeout with MRR". @ mysql-test/r/innodb_mrr_none.result Test case for Bug#54286 "Server crash at lock timeout with MRR". @ sql/handler.cc Extend DsMrr_impl::dsmrr_init() so that in the case where the storage engine returns an error code this error code is returned to the caller instead of just returning 1. @ sql/sql_select.cc Extend join_init_read_record() to handle the error code that is returned from tab->select->quick->reset() by ensuring this get written to the diagnostic area by calling handler::print_error(). This fix will ensure that the error code get correctly returned to the client.
[24 Jun 2010 9:23]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/112038 3198 Olav Sandstaa 2010-06-24 Fix for Bug#54286 Server crash at lock timeout with MRR This error happens when we try to read the first record from the database and a lock timeout occurs. This lock timeout error message does not get correctly propagated. The assert happens when the reply is going to be sent to the client and there is no error code set in the diagnostic area. The reason this happens when MRR is used is that when using MRR more work is done during initialization of MRR. One of the things the MRR code does is to try to fill the row id buffer by reading from the database index. This result in a call to ha_innobase::multi_range_read() which returns the lock timeout error code (error 146). This error code is not propagated up to the server code. When MRR is not in use there will be no calls to the database during initialization. The lock timeout error will only occur when the first call to read a record is done. With this code path the error code is correctly propagated to the server code and written to the diagnostic area before the reply message get sent to the client. This patch fixes this problem by the following two changes: 1. Fix DsMrr_impl::dsmrr_init() so that it returns the same error code as reported by the storage engine instead of just returning one when an error occurs. 2. Extend join_init_read_record() (in sql_select.cc) so that it handles errors that occurs during initialization (in this case from QUICK_RANGE_SELECT::reset()) and ensures the error code gets written to the diagnostic area. @ mysql-test/include/mrr_innodb_tests.inc Added test case for Bug#54286 "Server crash at lock timeout with MRR". @ mysql-test/r/innodb_mrr.result Test case for Bug#54286 "Server crash at lock timeout with MRR". @ mysql-test/r/innodb_mrr_all.result Test case for Bug#54286 "Server crash at lock timeout with MRR". @ mysql-test/r/innodb_mrr_icp.result Test case for Bug#54286 "Server crash at lock timeout with MRR". @ mysql-test/r/innodb_mrr_none.result Test case for Bug#54286 "Server crash at lock timeout with MRR". @ sql/handler.cc Extend DsMrr_impl::dsmrr_init() so that in the case where the storage engine returns an error code this error code is returned to the caller instead of just returning 1. @ sql/sql_select.cc Extend join_init_read_record() to handle the error code that is returned from tab->select->quick->reset() by ensuring this get written to the diagnostic area by calling report_error(). This fix will ensure that the error code get correctly returned to the client.
[24 Jun 2010 11:39]
Olav Sandstå
Patch pushed to mysql-next-mr-opt-backporting with revision id: olav@sun.com-20100624092332-amenrmodq0ypprr6 .
[16 Aug 2010 6:32]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100816062819-bluwgdq8q4xysmlg) (version source revid:alik@sun.com-20100816062612-enatdwnv809iw3s9) (pib:20)
[13 Nov 2010 16:04]
Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (merge vers: 5.6.99-m4) (pib:21)
[23 Nov 2010 3:30]
Paul DuBois
Bug does not appear in any released 5.6.x version. No 5.6.1 changelog entry needed.