Bug #24884 | mysqld got signal 11 on a slave host | ||
---|---|---|---|
Submitted: | 7 Dec 2006 13:24 | Modified: | 17 Nov 2008 21:46 |
Reporter: | Cyril SCETBON | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S1 (Critical) |
Version: | 4.1.18 | OS: | Linux (Debian GNU/Linux 3.1) |
Assigned to: | Heikki Tuuri | CPU Architecture: | Any |
[7 Dec 2006 13:24]
Cyril SCETBON
[7 Dec 2006 13:24]
Cyril SCETBON
The error file
Attachment: magic-sql2.err.log (application/octet-stream, text), 41.78 KiB.
[7 Dec 2006 13:25]
Cyril SCETBON
1st stack
Attachment: mysql_06_12_06_12_15.trace (application/octet-stream, text), 1.31 KiB.
[7 Dec 2006 13:25]
Cyril SCETBON
2nd stack
Attachment: mysql_07_12_06_05_40.trace (application/octet-stream, text), 1.36 KiB.
[7 Dec 2006 13:26]
Cyril SCETBON
3rd stack
Attachment: mysql_07_12_06_05_45.trace (application/octet-stream, text), 1.36 KiB.
[7 Dec 2006 13:26]
Cyril SCETBON
4th stack
Attachment: mysql_07_12_06_05_50.trace (application/octet-stream, text), 1.36 KiB.
[7 Dec 2006 13:27]
Valeriy Kravchuk
Thank you for a problem report. Please, try to repeat with a newr version, 4.1.22, on botyh master and slaves. In case of similar problem, please, send the appropriate part of error log and your my.cnf file content from failing slave.
[7 Dec 2006 13:28]
Cyril SCETBON
All the stacks are the same : 0x81214d8 handle_segfault + 356 0xb7f38420 _end + -1348008912 0xa (?) 0x821ac04 row_search_for_mysql + 6792 0x81a1bf4 _ZN11ha_innobase13general_fetchEPcjj + 64 0x8199ced _ZN7handler15read_range_nextEv + 37 0x81904e0 _ZN12QUICK_SELECT8get_nextEv + 308 0x81957da _Z8rr_quickP14st_read_record + 26 0x8166b0b _Z21join_init_read_recordP13st_join_table + 71 0x8165ca5 _Z10sub_selectP4JOINP13st_join_tableb + 97 0x815faa9 _Z9do_selectP4JOINP4ListI4ItemEP8st_tableP9Procedure + 265 0x81578d1 _ZN4JOIN4execEv + 3553 0x8103409 _ZN30subselect_single_select_engine4execEv + 101 0x80fff84 _ZN14Item_subselect4execEv + 32 0x8100ba7 _ZN21Item_exists_subselect3valEv + 19 0x80e276d _ZN13Item_func_not7val_intEv + 21 0x80e7855 _ZN13Item_cond_and7val_intEv + 65 0x8165d41 _Z10sub_selectP4JOINP13st_join_tableb + 253 0x815faa9 _Z9do_selectP4JOINP4ListI4ItemEP8st_tableP9Procedure + 265 0x81578d1 _ZN4JOIN4execEv + 3553 0x8158448 _Z12mysql_selectP3THDPPP4ItemP13st_table_listjR4ListIS1_ES2_jP8st_orderSB_S2_SB_mP13select_resultP18st_select_lex_unitP13st_sel + 380 0x8154cc1 _Z13handle_selectP3THDP6st_lexP13select_result + 173 0x8133a45 _Z21mysql_execute_commandP3THD + 849 0x8138c9e _Z11mysql_parseP3THDPcj + 166 0x81325e9 _Z16dispatch_command19enum_server_commandP3THDPcj + 893 0x813222e _Z10do_commandP3THD + 122 0x8131a25 handle_one_connection + 829 0xb7f21b63 _end + -1348101261 0xb7e5a18a _end + -1348918886 except the lines containing the _end symbol
[7 Dec 2006 13:30]
Cyril SCETBON
I can't change the version at the moment cause it is in production. I'll change it only if I'm only sure it's the only way to solve the issue.
[7 Dec 2006 13:41]
Valeriy Kravchuk
Sorry, I missed your files uploaded when answering... Anyway, please, send my.cnf content from both slaves, and describe their hardware (how much RAM do they have) and software (uname -a results, glibc versions).
[7 Dec 2006 13:48]
Cyril SCETBON
The my.cnf of the slave concerned by the issue
Attachment: my.cnf (application/octet-stream, text), 2.51 KiB.
[7 Dec 2006 14:21]
Cyril SCETBON
uname -a configuration of the slave concerned Linux magic-sql2.echo-net.net 2.6.18.1 #1 SMP Thu Nov 2 09:15:36 CET 2006 i686 GNU/ cat /proc/cpuinfo : processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.00GHz stepping : 1 cpu MHz : 3000.356 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pni monitor ds_cpl cid cx16 xtpr bogomips : 6003.63 processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.00GHz stepping : 1 cpu MHz : 3000.356 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pni monitor ds_cpl cid cx16 xtpr bogomips : 6000.52 processor : 2 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.00GHz stepping : 1 cpu MHz : 3000.356 cache size : 1024 KB physical id : 3 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pni monitor ds_cpl cid cx16 xtpr bogomips : 6000.46 processor : 3 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.00GHz stepping : 1 cpu MHz : 3000.356 cache size : 1024 KB physical id : 3 siblings : 2 core id : 0 cpu cores : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pni monitor ds_cpl cid cx16 xtpr bogomips : 6000.45 cat /proc/meminfo : MemTotal: 4086360 kB MemFree: 1096580 kB Buffers: 198104 kB Cached: 2552752 kB SwapCached: 0 kB Active: 2139048 kB Inactive: 697912 kB HighTotal: 3211208 kB HighFree: 855180 kB LowTotal: 875152 kB LowFree: 241400 kB SwapTotal: 526312 kB SwapFree: 526308 kB Dirty: 524 kB Writeback: 0 kB AnonPages: 85688 kB Mapped: 10124 kB Slab: 138796 kB PageTables: 1224 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 2569492 kB Committed_AS: 239828 kB VmallocTotal: 116728 kB VmallocUsed: 3312 kB VmallocChunk: 113404 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB
[7 Dec 2006 15:33]
Heikki Tuuri
Hi! It may be that the table is corrupt. What does CHECK TABLE ... say? Have you upgraded to 4.1.xx recently? Regards, Heikki
[7 Dec 2006 15:52]
Cyril SCETBON
We didn't upgrade the database for a year. I checked all the tables referenced with a "select ... from XX" in the error log and none is corrupted.
[7 Dec 2006 15:52]
Cyril SCETBON
We didn't upgrade the database for a year. I checked all the tables referenced with a "select ... from XX" in the error log and none is corrupted.
[7 Dec 2006 15:55]
Heikki Tuuri
But secondary indexes may be corrupt. You can check also them with CHECK TABLE ... . Warning: for a big table (bigger than the buffer pool), CHECK TABLE will run very long, hours or days.
[7 Dec 2006 16:30]
Cyril SCETBON
mysql> check table APPUSER extended; +----------------+-------+----------+----------+ | Table | Op | Msg_type | Msg_text | +----------------+-------+----------+----------+ | magic4.APPUSER | check | status | OK | +----------------+-------+----------+----------+ 1 row in set (0.01 sec) mysql> check table USER_NOTIFICATION extended; +--------------------------+-------+----------+----------+ | Table | Op | Msg_type | Msg_text | +--------------------------+-------+----------+----------+ | magic4.USER_NOTIFICATION | check | status | OK | +--------------------------+-------+----------+----------+ 1 row in set (0.46 sec) mysql> check table NOTIFICATION extended; +---------------------+-------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------------------+-------+----------+----------+ | magic4.NOTIFICATION | check | status | OK | +---------------------+-------+----------+----------+ 1 row in set (0.64 sec)
[8 Dec 2006 8:52]
Cyril SCETBON
Another suggestion ?
[8 Dec 2006 13:34]
Cyril SCETBON
This problem is causing shutdown of our production slave. Did you find anything else ?
[8 Dec 2006 13:43]
Heikki Tuuri
Hi! Can you repeat this on 4.1.22? Please give the SELECT query that is crashing, and print EXPLAIN SELECT ... for it. Regards, Heikki
[8 Dec 2006 14:05]
Cyril SCETBON
If I was able I would do it. But I don't know which one is causing the segmentation fault. You've got all the error files I found.
[8 Dec 2006 15:09]
Valeriy Kravchuk
Please, check with 4.1.22, as Heikki suggested. One of the crashing queries is: select ID_USER, LOGIN, OTFA_HASH, US_CREATED_BY from APPUSER where OTFA_HASH is not null and US_STATUS <> 'K' and not exists (select UN.ID_USER from USER_NOTIFICATION UN, NOTIFICATION N where UN.ID_USER = APPUSER.ID_USER and UN.ID_NOTIFICATION = N.ID_NOTIFICATION and N.ID_MAIL in (3, 5)) Can you send EXPLAIN results for it, for example?
[8 Dec 2006 15:18]
Cyril SCETBON
mysql> explain select ID_USER, LOGIN, OTFA_HASH, US_CREATED_BY from APPUSER where OTFA_HASH is not null and US_STATUS <> 'K' and not exists (select UN.ID_USER from USER_NOTIFICATION UN, NOTIFICATION N where UN.ID_USER = APPUSER.ID_USER and UN.ID_NOTIFICATION = N.ID_NOTIFICATION and N.ID_MAIL in (3, 5))\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: APPUSER type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 253 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: N type: range possible_keys: PRIMARY,IK1_NOTIFICATION key: IK1_NOTIFICATION key_len: 2 ref: NULL rows: 91 Extra: Using where; Using index *************************** 3. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: UN type: ref possible_keys: PRIMARY,IK1_USER_NOTIFICATION,IK2_USER_NOTIFICATION key: IK1_USER_NOTIFICATION key_len: 4 ref: magic4.N.ID_NOTIFICATION rows: 1 Extra: Using where; Using index
[8 Dec 2006 15:33]
MySQL Verification Team
the stack trace reminds me of that READ-COMMITTED bug, but that was >= 5.0. Can it be related to bug #19834 ?
[8 Dec 2006 15:36]
Cyril SCETBON
As you saw we got this in my.cnf : transaction-isolation = READ-COMMITTED
[8 Dec 2006 15:48]
Heikki Tuuri
I think the READ COMMITTED bug was in 5.1. It was about releasing a row lock. You are not using innodb_locks_unsafe_for_binlog?
[8 Dec 2006 15:59]
Cyril SCETBON
No it's OFF. You don't thing that we have to stop using READ COMMITTED transaction isolation level ?
[11 Dec 2006 10:58]
Cyril SCETBON
Hi, You didn't answer to my last question. What's your opinion ? Anything else ?
[12 Dec 2006 14:52]
Heikki Tuuri
http://bugs.mysql.com/bug.php?id=24996 may be a duplicate of this.
[12 Dec 2006 15:33]
Heikki Tuuri
The seg fault may happen in the function below. row0sel.c in 4.1.18 /****************************************************************** Stores the row id to the prebuilt struct. */ static void row_sel_store_row_id_to_prebuilt( /*=============================*/ row_prebuilt_t* prebuilt, /* in: prebuilt */ rec_t* index_rec, /* in: record */ dict_index_t* index) /* in: index of the record */ { byte* data; ulint len; data = rec_get_nth_field(index_rec, dict_index_get_sys_col_pos(index, DATA_ROW_ID), &len); if (len != DATA_ROW_ID_LEN) { fprintf(stderr, "InnoDB: Error: Row id field is wrong length %lu in ", (ulong) len); dict_index_name_print(stderr, prebuilt->trx, index); fprintf(stderr, "\n" "InnoDB: Field number %lu, record:\n", (ulong) dict_index_get_sys_col_pos(index, DATA_ROW_ID))\ ; rec_print(stderr, index_rec); putc('\n', stderr); ut_error; } ut_memcpy(prebuilt->row_id, data, len); } 0x08245770 <row_search_for_mysql+6640>: call 0x82134e0 <dict_index_name_print> 0x08245775 <row_search_for_mysql+6645>: mov 0xfffffb6c(%ebp),%edi 0x0824577b <row_search_for_mysql+6651>: mov 0x18(%edi),%edx 0x0824577e <row_search_for_mysql+6654>: mov 0x30(%edx),%ecx 0x08245781 <row_search_for_mysql+6657>: mov 0x34(%edx),%edx 0x08245784 <row_search_for_mysql+6660>: sub $0x4,%ecx 0x08245787 <row_search_for_mysql+6663>: lea (%ecx,%ecx,4),%eax 0x0824578a <row_search_for_mysql+6666>: testb $0x1,0xc(%edi) 0x0824578e <row_search_for_mysql+6670>: lea (%ecx,%eax,2),%eax 0x08245791 <row_search_for_mysql+6673>: lea (%edx,%eax,4),%eax 0x08245794 <row_search_for_mysql+6676>: je 0x8245880 <row_search_for_mysql+6 912> 0x0824579a <row_search_for_mysql+6682>: mov 0x8(%eax),%eax 0x0824579d <row_search_for_mysql+6685>: mov %eax,0x8(%esp) 0x082457a1 <row_search_for_mysql+6689>: mov 0x844674c,%edx 0x082457a7 <row_search_for_mysql+6695>: mov $0x83ef500,%eax 0x082457ac <row_search_for_mysql+6700>: mov %eax,0x4(%esp) 0x082457b0 <row_search_for_mysql+6704>: mov %edx,(%esp) 0x082457b3 <row_search_for_mysql+6707>: call 0x80c2a38 <__cxa_pure_virtual+240 > 0x082457b8 <row_search_for_mysql+6712>: mov %esi,0x4(%esp) 0x082457bc <row_search_for_mysql+6716>: mov 0x844674c,%ecx 0x082457c2 <row_search_for_mysql+6722>: mov %ecx,(%esp) 0x082457c5 <row_search_for_mysql+6725>: call 0x8309290 <rec_print> 0x082457ca <row_search_for_mysql+6730>: movl $0xa,(%esp) 0x082457d1 <row_search_for_mysql+6737>: mov 0x844674c,%esi 0x082457d7 <row_search_for_mysql+6743>: mov %esi,0x4(%esp) 0x082457db <row_search_for_mysql+6747>: call 0x80c32c8 <exit+48> 0x082457e0 <row_search_for_mysql+6752>: mov 0x844674c,%edi 0x082457e6 <row_search_for_mysql+6758>: mov %edi,(%esp) 0x082457e9 <row_search_for_mysql+6761>: call 0x831c950 <ut_print_timestamp> 0x082457ee <row_search_for_mysql+6766>: call 0x831e680 <os_thread_get_curr_id> 0x082457f3 <row_search_for_mysql+6771>: mov %eax,(%esp) 0x082457f6 <row_search_for_mysql+6774>: call 0x831e670 <os_thread_pf> 0x082457fb <row_search_for_mysql+6779>: mov %eax,0x8(%esp) 0x082457ff <row_search_for_mysql+6783>: mov 0x84238dc,%esi 0x08245805 <row_search_for_mysql+6789>: mov $0x868,%edx 0x0824580a <row_search_for_mysql+6794>: mov %edx,0x10(%esp) 0x0824580e <row_search_for_mysql+6798>: mov 0x844674c,%edi 0x08245814 <row_search_for_mysql+6804>: mov $0x83ef01d,%ecx 0x08245819 <row_search_for_mysql+6809>: mov %ecx,0xc(%esp) 0x0824581d <row_search_for_mysql+6813>: mov %esi,0x4(%esp) 0x08245821 <row_search_for_mysql+6817>: mov %edi,(%esp)
[12 Dec 2006 15:46]
Cyril SCETBON
Any way to fix it ?
[12 Dec 2006 16:43]
Heikki Tuuri
Can you compile a debug build of mysqld, run it inside gdb and repeat the crash with the query below? select ID_USER, LOGIN, OTFA_HASH, US_CREATED_BY from APPUSER where OTFA_HASH is not null and US_STATUS <> 'K' and not exists (select UN.ID_USER from USER_NOTIFICATION UN, NOTIFICATION N where UN.ID_USER = APPUSER.ID_USER and UN.ID_NOTIFICATION = N.ID_NOTIFICATION and N.ID_MAIL in (3, 5)) When mysqld crashes in gdb, do (gdb) bt full so that we see where it crashes. Regards, Heikki
[12 Dec 2006 16:54]
Cyril SCETBON
Sorry, but I can do it. It often happens, but we don't know how to reproduce this BUG. Do you still suspect the option transaction-isolation ?
[12 Dec 2006 17:00]
Heikki Tuuri
You could test if the bug disappears if you change the isolation level. But you CAN catch this bug inside gdb. Just start inside gdb the debug build of mysqld compiled from source. It works like a normal mysqld until it gets a seg fault. Then do that: (gdb) bt full
[12 Dec 2006 17:06]
Cyril SCETBON
I'll try it. For the debug build version of mysql I don't think that the managers will accept it :-( But maybe ..
[19 Dec 2006 11:00]
Cyril SCETBON
The BUG seems to be still present when not using READ COMMITTED transaction isolation level :-( However I've noticed that Justin Albstmeijer (http://bugs.mysql.com/bug.php?id=24996) do not hit the BUG anymore (I suppose). I've read all the changes between release 4.1.18 and 4.1.22 but I don't see anything related to my BUG. Did I miss something ?
[21 Dec 2006 10:47]
Cyril SCETBON
Anything ?
[21 Dec 2006 14:14]
Heikki Tuuri
Hi! I am not aware of any change between 4.1.18 and 4.1.22 that would fix this bug. It is pure luck if some change has masked or fixes this bug. But please test with 4.1.22. Regards, Heikki
[4 Jan 2007 16:22]
Cyril SCETBON
One of the slaves has been upgraded to 4.1.22 and we are waiting for news about it.
[5 Feb 2007 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[17 Nov 2008 21:46]
Cyril SCETBON
No more feedback