Bug #15268 | Crash on unknown condition | ||
---|---|---|---|
Submitted: | 27 Nov 2005 12:54 | Modified: | 21 Dec 2005 8:42 |
Reporter: | Thomas Auge | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 5.017-BK, 5.0.x | OS: | Linux (Linux) |
Assigned to: | Evgeny Potemkin | CPU Architecture: | Any |
[27 Nov 2005 12:54]
Thomas Auge
[27 Nov 2005 13:38]
Valeriy Kravchuk
Thank you for a bug report. I have a question to you, as your data came from 4.x.x. Are you absolutely sure that you imported the data from dumps on the 5.0.x servers that give you that crashes? And not simply copied them? In any case, please, send the resolved stack traces for both crashes. Any ideas on what particular tables and SQL statements give you that problems are also welcomed. How often these crashes happen? Can you start your server with general query log to catch all queries, or it is a highly loaded production environment?
[27 Nov 2005 14:09]
Thomas Auge
I usually just copy the tables. Since I figured that might be a reason for the crash, I dumped the DB, dropped it, and inserted the dump again. The server even crashed while the dump was loading. So I don't think it's related to the upgrades. I never dumped the system tables - those were copied and updated with the corresponding scripts. I ran myisamcheck on them, just to be sure, but no problems surfaced. At an average load of 100 queries per second, the server crashes about once an hour on average on a very random pattern. Would a complete query log really be any help with that amount of queries? It'd be kinda ... huge. I could run it a server which will become the new production environment in a few days, so it could handle it. There is enough activity from background tasks to cause the crashes there, too. The crashes occur on many different queries. One sticks out though: SELECT COUNT(*) FROM ships WHERE design = '15706' AND owner = '2540' AND position = 'colony' AND fleet = -1 AND x = '4162' Funny enough, it's not a variant of that query, which is called with all kinds of parameters, but exactly THAT query several times. The ships table looks like this: Field Type Null Default ID bigint(20) No owner int(10) No 0 design bigint(20) No 0 fleet int(11) No -1 speed int(10) No 0 armor int(10) No 0 range double(12,2) No 0.00 fightercap smallint(5) No 0 cargocap int(10) No 0 colonistscap mediumint(8) No 0 colonists mediumint(8) No 0 position enum('local', 'global', 'world', 'colony', 'ship') No colony x int(11) No -1 y int(11) No -1 z int(11) No -1 system mediumint(9) No -1 structure double No 0 Keyname Type Cardinality Field PRIMARY PRIMARY 4468940 ID fleet INDEX 14700 fleet position x design INDEX 18856 design fleet speed INDEX 16863 fleet speed position INDEX 8594 position x owner INDEX 17663 owner position x design owner_2 INDEX 11917 owner fleet Type Usage Data 336,043 KB Index 438,774 KB Overhead 77 Bytes Effective 774,817 KB Total 774,817 KB Statements Value Format fixed Rows 4,468,940 Row length ø 77 Row size ø 178 Bytes Next Autoindex 13,911,069 Stacktrace from a 5.0.11beta crash: ---- 0x809b352 handle_segfault + 430 0x82cece8 pthread_sighandler + 184 0x8156d20 get_free_block__11Query_cacheUlcUl + 60 0x8156c85 allocate_block__11Query_cacheUlcUlc + 81 0x8156662 allocate_data_chain__11Query_cachePP17Query_cache_blockUlP17Query_cache_blockc + 178 0x815651a write_result_data__11Query_cachePP17Query_cache_blockUlPcP17Query_cache_blockQ217Query_cache_block10block_type + 46 0x8156406 append_result_data__11Query_cachePP17Query_cache_blockUlPcP17Query_cache_block + 82 0x8154c47 query_cache_insert__FP6st_netPCcUl + 99 0x8094ee2 net_real_write + 62 0x8094b3b net_flush + 35 0x8095e6a send_eof__FP3THD + 54 0x8092521 send_eof__11select_send + 89 0x80e7848 do_select__FP4JOINPt4List1Z4ItemP8st_tableP9Procedure + 596 0x80d9461 exec__4JOIN + 4805 0x80d9fa5 mysql_select__FP3THDPPP4ItemP13st_table_listUiRt4List1Z4ItemP4ItemUiP8st_orderT7T5T7UlP13select_resultP18st_select_lex_unitP13s + 797 0x80d6531 handle_select__FP3THDP6st_lexP13select_resultUl + 193 0x80ac249 mysql_execute_command__FP3THD + 1429 0x80b261c mysql_parse__FP3THDPcUi + 248 0x80aa9c1 dispatch_command__F19enum_server_commandP3THDPcUi + 1617 0x80aa363 do_command__FP3THD + 203 0x80a9914 handle_one_connection + 748 0x82cc49c pthread_start_thread + 220 0x82f5eaa thread_start + 4 ---- Stacktrace from a 5.0.16 crash: ----- 0x8150650 handle_segfault + 356 0x85a7c8 (?) 0x81db87d _ZN17TRP_ROR_INTERSECT10make_quickEP13st_qsel_parambP11st_mem_root + 205 0x81db87d _ZN17TRP_ROR_INTERSECT10make_quickEP13st_qsel_parambP11st_mem_root + 205 0x81db2f8 _ZN10SQL_SELECT17test_quick_selectEP3THD6BitmapILj64EEyyb + 1748 0x81b3dec _Z12mysql_updateP3THDP13st_table_listR4ListI4ItemES6_PS4_jP8st_ordery15enum_duplicatesb + 1156 0x8164dcf _Z21mysql_execute_commandP3THD + 3763 0x816ae7a _Z11mysql_parseP3THDPcj + 306 0x8162aa2 _Z16dispatch_command19enum_server_commandP3THDPcj + 1178 0x81625cd _Z10do_commandP3THD + 129 0x8161aad handle_one_connection + 569 0x854341 (?) 0x7ac6fe (?) -----
[27 Nov 2005 14:56]
Valeriy Kravchuk
Thank you for the additional information. I am not sure that general query log is needed right now. Let me analyze what we do know already. Can you try to switch off the query cache temporary? Please, try, and inform how it influence the crashes. You my.cnf content may be of some use too. I had to ask about it earlier.
[27 Nov 2005 15:13]
Thomas Auge
my.cnf: ----------- [client] port = 3306 socket = /tmp/mysql.sock [mysqld] port = 3306 socket = /tmp/mysql.sock skip-locking key_buffer = 512M max_allowed_packet = 1M table_cache = 1024 sort_buffer_size = 2M read_buffer_size = 512k read_rnd_buffer_size = 1M myisam_sort_buffer_size = 128M max_heap_table_size = 128M tmp_table_size = 16M delayed_queue_size = 50000 delayed_insert_limit = 500 thread_cache = 64 query_cache_type = 1 query_cache_size = 64M thread_concurrency = 4 max_connections = 512 log-slow-queries long-query-time = 1 log-long-format #skip-networking myisam-recover = force [mysqldump] quick max_allowed_packet = 16M [mysql] no-auto-rehash [isamchk] key_buffer = 128M sort_buffer_size = 128M read_buffer = 2M write_buffer = 2M [myisamchk] key_buffer = 128M sort_buffer_size = 128M read_buffer = 2M write_buffer = 2M [mysqlhotcopy] interactive-timeout ---------------------- This is the configuration for a dual opteron with 2GB of RAM. I know the max connections are a bit out of whack. That's a left over from fighting random connection failures on 4.1, which I never figured out - the app now expects it. Max connections in practice are below 100. Average connections are around 30. I will turn off the query cache on this box. I will have to fix a few tables before I can start up the app again, but should have results in a few hours.
[27 Nov 2005 15:25]
Thomas Auge
Well, that crash came quick: ------------- 0x80c9fe2 handle_segfault + 570 0x8242b60 __pthread_sighandler + 144 0x80da001 _Z9end_transP3THD25enum_mysql_completiontype + 565 0x8176fff _Z21ha_enable_transactionP3THDb + 39 0x80bb332 _ZN3THD16init_for_queriesEv + 18 0x80e9ccf handle_one_connection + 1295 0x823d499 pthread_start_thread + 201 0x827198a __clone + 106 ------------ 051127 10:16:35 mysqld started 051127 10:16:35 [Note] /usr/local/mysql/libexec/mysqld: ready for connections. Version: '5.0.16-log' socket: '/tmp/mysql.sock' port: 3306 Source distribution mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=536870912 read_buffer_size=520192 max_used_connections=5 max_connections=512 threads_connected=6 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 1832956 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd=0x8539af8 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0xbedff4f8, backtrace may not be correct. Stack range sanity check OK, backtrace follows: 0x80c9fe2 0x8242b60 0x80da001 0x8176fff 0x80bb332 0x80e9ccf 0x823d499 0x827198a New value of fp=(nil) failed sanity check, terminating stack trace! Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved stack trace is much more helpful in diagnosing the problem, so please do resolve it Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at (nil) is invalid pointer thd->thread_id=29 The manual page at http://www.mysql.com/doc/en/Crashing.html contains information that should help you find out what is causing the crash. Number of processes running now: 0 ----- This installation is a custom compile: configured by ./configure, generated by GNU Autoconf 2.59, with options \"'--prefix=/usr/local/mysql' '--with-mysqld-ldflags=-all-static' '--exec-prefix=/usr/local/mysql' '--without-innodb' '--with-client-ldflags=-all-static' '--disable-shared' '--without-debug' '--enable-assembler' CFLAGS='-O3 -DDBUG_OFF ' CXXFLAGS='-O3 -DDBUG_OFF -fno-implicit-templates -fno-exceptions -fno-rtti'\"
[28 Nov 2005 17:49]
Valeriy Kravchuk
Are you sure the last stack trace rtesolved properly? Look: 0x80c9fe2 handle_segfault + 570 0x8242b60 __pthread_sighandler + 144 0x80da001 _Z9end_transP3THD25enum_mysql_completiontype + 565 0x8176fff _Z21ha_enable_transactionP3THDb + 39 0x80bb332 _ZN3THD16init_for_queriesEv + 18 0x80e9ccf handle_one_connection + 1295 0x823d499 pthread_start_thread + 201 0x827198a __clone + 106 It is about enable_transaction, but you stated that you compiled --without-innodb... So, what transactions are used? Please, check. Previous crashes had something to do with query cache, really...
[28 Nov 2005 18:41]
Thomas Auge
Let's say I am 99% sure. I compiled and installed the server. Then recompiled, because I had a typo in one of the configuration options. I am rather sure I did another "make install" then. Since the binary was already "strip"ped I did another make install to get a binary with symbols, which *should* have installed the same binary - or at least a binary with the same addresses. I'll get you another set of test data, just to make sure. I do not use any inno DB tables though. So if this transaction call can only happen if inno db is actually in use, then there is either something very wrong or the stacktrace is screwy.
[28 Nov 2005 18:47]
Thomas Auge
There has been another crash overnight. I created a new symbols file from the currently running mysqld, resolved the stack and it is EXACTLY the same as the last one. The server does not mention innoDB on startup, as it usually does, so I am pretty sure innoDB is not compiled in. Do you want me to reproduce the crash with one of the official binaries. Just FYI: mysql was compiled with gcc version 3.4.4 20050721 (Red Hat 3.4.4-2)
[28 Nov 2005 18:48]
Thomas Auge
If it's any help I can also give you access to the server. Just to make sure I am not doing anything stupid. :-)
[29 Nov 2005 15:27]
Valeriy Kravchuk
Thank you for the additional information. Yes, it will be good to try to repeat the same crash with MySQL official binaries suitable for your Linux. Please, do it and inform about the results, while I am thinking about your stack trace.
[29 Nov 2005 22:16]
Thomas Auge
This is the output from the latest crash. (5.0.16 dynamic glibc 2.3) 7c8 (nil) 0x81a107f _Z22get_quick_record_countP3THDP10SQL_SELECTP8st_tablePK6BitmapILj64EEy + 59 0x8198546 _Z20make_join_statisticsP4JOINP13st_table_listP4ItemP16st_dynamic_array + 2550 0x818f75a _ZN4JOIN8optimizeEv + 750 0x8192946 _Z12mysql_selectP3THDPPP4ItemP13st_table_listjR4ListIS1_ES2_jP8st_orderSB_S2_SB_mP13select_resultP18st_select_lex_unitP13st_sel + 130 0x818ebd2 _Z13handle_selectP3THDP6st_lexP13select_resultm + 234 0x8164175 _Z21mysql_execute_commandP3THD + 601 0x816ae7a _Z11mysql_parseP3THDPcj + 306 0x8162aa2 _Z16dispatch_command19enum_server_commandP3THDPcj + 1178 0x81625cd _Z10do_commandP3THD + 129 0x8161aad handle_one_connection + 569 0x2ba341 (?) 0x38e6fe (?) thd->query at 0x8ae7f48 = SELECT COUNT(*) FROM ships WHERE design = '90' AND owner = '247' AND position = 'colony' AND fleet = -1 AND x = '16' thd->thread_id=133 This is looking pretty random ...
[29 Nov 2005 23:01]
Nathan Tanner
Try this simple test case: create table testtable1(TestField1 int not null, TestField2 char, TestField3 char not null default 'n'); insert into testtable1 values (1, null, 'y'); create table testtable2(TestField4 char not null); insert into testtable2 values ('b'); select testtable1.* from testtable1 left join testtable2 on testtable1.testfield2=testtable2.testfield4 where testfield3='y'; On the last query you will see: ERROR 2013 (HY000): Lost connection to MySQL server during query And the error log will have similar data to what has already been provided. It appears it is when you LEFT JOIN on a column that has a NULL value, but I am not sure of the extent of the bug (may require other conditions).
[29 Nov 2005 23:05]
Nathan Tanner
Further to my comment above, you will notice if you remove the WHERE clause, like so: mysql> select testtable1.* from testtable1 left join testtable2 on testtable1.testfield2=testtable2.testfield4; +------------+------------+------------+ | TestField1 | TestField2 | TestField3 | +------------+------------+------------+ | 1 | NULL | y | +------------+------------+------------+ 1 row in set (0.00 sec) There is no crash.
[30 Nov 2005 8:34]
Valeriy Kravchuk
Thank you for a test case. Yes, it is a definite crash. Verified just as described in you last comment, on 5.0.17-BK (ChangeSet@1.2013.1.1, 2005-11-26 20:23:20+01:00) on Fedora Core 1: [openxs@Fedora 5.0]$ bin/mysql -uroot test Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 to server version: 5.0.17 Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql> create table testtable1(TestField1 int not null, TestField2 char, TestField3 -> char not null default 'n'); Query OK, 0 rows affected (0,01 sec) mysql> insert into testtable1 values (1, null, 'y'); Query OK, 1 row affected (0,01 sec) mysql> create table testtable2(TestField4 char not null); Query OK, 0 rows affected (0,00 sec) mysql> insert into testtable2 values ('b'); Query OK, 1 row affected (0,00 sec) mysql> select testtable1.* from testtable1 left join testtable2 on -> testtable1.testfield2=testtable2.testfield4 where testfield3='y'; ERROR 2013 (HY000): Lost connection to MySQL server during query mysql> Number of processes running now: 0 051130 11:13:44 mysqld restarted mysql> select testtable1.* from testtable1 left join testtable2 on testtable1.te stfield2=testtable2.testfield4; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 1 Current database: test +------------+------------+------------+ | TestField1 | TestField2 | TestField3 | +------------+------------+------------+ | 1 | NULL | y | +------------+------------+------------+ 1 row in set (0,37 sec) In the error log I've got: mysql> exit Bye [openxs@Fedora 5.0]$ tail -50 var/Fedora.err key_buffer_size=8388600 read_buffer_size=131072 max_used_connections=1 max_connections=100 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd=0x9556938 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... Cannot determine thread, fp=0xb8d8a4cc, backtrace may not be correct. Stack range sanity check OK, backtrace follows: 0x814e979 0x64ef18 0x2 0x811064d 0x819ff9c 0x819665d 0x8192544 0x81953ad 0x8191ab2 0x8161b03 0x816942a 0x8160656 0x8160211 0x815f6ae 0x64879c 0x49527a New value of fp=(nil) failed sanity check, terminating stack trace! Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved stack trace is much more helpful in diagnosing the problem, so please do resolve it Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x9583f88 = select testtable1.* from testtable1 left join testtable2 on testtable1.testfield2=testtable2.testfield4 where testfield3='y' thd->thread_id=1 The manual page at http://www.mysql.com/doc/en/Crashing.html contains information that should help you find out what is causing the crash. Number of processes running now: 0 051130 11:13:44 mysqld restarted 051130 11:13:44 InnoDB: Started; log sequence number 0 48993045 051130 11:13:44 [Note] /home/openxs/dbs/5.0/libexec/mysqld: ready for connections. Version: '5.0.17' socket: '/tmp/mysql.sock' port: 3306 Source distribution Stack trace is resolved as follows: [openxs@Fedora 5.0]$ bin/resolve_stack_dump -s /tmp/mysqld.sym -n myslq.stack 0x814e979 handle_segfault + 561 0x64ef18 (?) 0x2 (?) 0x811064d _ZN10Item_equal7val_intEv + 149 0x819ff9c _Z21join_read_const_tableP13st_join_tableP11st_position + 240 0x819665d _Z20make_join_statisticsP4JOINP13st_table_listP4ItemP16st_dynamic_arra y + 4277 0x8192544 _ZN4JOIN8optimizeEv + 684 0x81953ad _Z12mysql_selectP3THDPPP4ItemP13st_table_listjR4ListIS1_ES2_jP8st_orde rSB_S2_SB_mP13select_resultP18st_select_lex_unitP13st_sel + 113 0x8191ab2 _Z13handle_selectP3THDP6st_lexP13select_resultm + 230 0x8161b03 _Z21mysql_execute_commandP3THD + 551 0x816942a _Z11mysql_parseP3THDPcj + 290 0x8160656 _Z16dispatch_command19enum_server_commandP3THDPcj + 1030 0x8160211 _Z10do_commandP3THD + 125 0x815f6ae handle_one_connection + 462 0x64879c (?) 0x49527a (?)
[8 Dec 2005 18:59]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/33179
[9 Dec 2005 16:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/33191
[9 Dec 2005 20:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/33196
[14 Dec 2005 21:36]
Evgeny Potemkin
cmp_item_sort_string::cmp() wasn't checking values_res variable for null. Later called function was dereferenced it and crashed server. Fixed in 5.0.18, 1.1960.85.1
[14 Dec 2005 21:51]
Evgeny Potemkin
Fixed in 5.1.4
[21 Dec 2005 8:42]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html Additional info: Documented in 5.0.18 and 5.1.4 changelogs. Closed.