Bug #7186 | Server crashes in query_cache_abort() | ||
---|---|---|---|
Submitted: | 10 Dec 2004 22:45 | Modified: | 5 Apr 2005 3:00 |
Reporter: | Marko Mäkelä | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 5.0.3-bk | OS: | Linux (GNU/Linux) |
Assigned to: | Oleksandr Byelkin | CPU Architecture: | Any |
[10 Dec 2004 22:45]
Marko Mäkelä
[10 Dec 2004 23:08]
Marko Mäkelä
Actually, it is not necessary to have any client attempting to connect to mysqld when the server is restarted. A crash recovery of "ibtest7a" is enough to trigger the crash. It does not matter if the table used by "ibtest7a" ("alex1") is in the old or new InnoDB format.
[10 Dec 2004 23:51]
Marko Mäkelä
Here's a more descriptive stack trace, after recompiling mysqld.cc and sql_cache.cc with -g and without -O3 (CXXFLAGS was not overridden to include -g in my ./configure line): #0 0x08248928 in Query_cache_block::headers_len (this=0x37) at sql_cache.cc:407 #1 0x08248919 in Query_cache_block::data (this=0x37) at sql_cache.cc:413 #2 0x08248999 in Query_cache_block::query (this=0x37) at sql_cache.cc:422 #3 0x082447c1 in query_cache_abort (net=0x89b1ecc) at sql_cache.cc:641 #4 0x0816dbfb in my_message_sql (error=1146, str=0xbfffdf20 "Table 'mysql.time_zone_leap_second' doesn't exist", MyFlags=0) at mysqld.cc:2205 #5 0x08405fca in my_error (nr=1146, MyFlags=55) at my_error.c:69 #6 0x081a8837 in openfrm () at sql_cache.cc:384 #7 0x081a5e42 in open_unireg_entry () at sql_cache.cc:384 #8 0x081a7228 in open_table () at sql_cache.cc:384 #9 0x081a79c5 in open_tables () at sql_cache.cc:384 #10 0x08266cd3 in my_tz_init () #11 0x0816ed8d in main (argc=1, argv=0xbffffc14) at mysqld.cc:3043 The 0x37 pointer comes from net->query_cache_query in query_cache_abort(). Here's a complete printout of *net in that stack frame: (gdb) p *net $2 = {vio = 0x0, buff = 0x61f11e07 <Address 0x61f11e07 out of bounds>, buff_end = 0x32383661 <Address 0x32383661 out of bounds>, write_pos = 0x3636 <Address 0x3636 out of bounds>, read_pos = 0x0, fd = 0, max_packet = 1425670144, max_packet_size = 495381, pkt_nr = 520568840, compress_pkt_nr = 912351503, write_timeout = 909521464, read_timeout = 0, retry_count = 0, fcntl = 16777216, compress = 2 '\002', remain_in_buf = 1208484871, length = 1630347015, buf_length = 842544737, where_b = 13878, return_status = 0x0, reading_or_writing = 0 '\0', save_char = 0 '\0', no_send_ok = 0 '\0', no_send_eof = 0 '\0', last_error = "\000\001\023¤\025\217\a\000\bP\a Ðaa68266", '\0' <repeats 11 times>, "\001#\216\024\016\006\000\bX\a\037haA6827\200\000\0223\bøX\230\000\000\000\000ÅV\025\017\a\000\b`\a\037\206aa68281\200\000\0223\bøX\230\000\000\000\000Û\000\025\217\a\000\bh\a\037¤aa68296", '\0' <repeats 12 times>, "à\205\025\217\a\000\bp\a\000taa68296", '\0' <repeats 12 times>, "ðR\025\217\a\000\bx\a!*aa67967", '\0' <repeats 11 times>, "\001E5\025\217\a\000\b\200\a!Haa68038", '\0' <repeats 11 times>..., sqlstate = "\030\025\217\a\000\b", last_errno = 1633784354, error = 54 '6', query_cache_query = 0x37 <Address 0x37 out of bounds>, report_error = 1 '\001', return_errno = 0 '\0'} On a subsequent run, upon entering query_cache_abort(), net->query_cache_query == 0x36. Here's the complete printout: (gdb) p *net $3 = {vio = 0x0, buff = 0xf21e0738 <Address 0xf21e0738 out of bounds>, buff_end = 0x38326161 <Address 0x38326161 out of bounds>, write_pos = 0x313634 <Address 0x313634 out of bounds>, read_pos = 0x0, fd = 0, max_packet = 3892314112, max_packet_size = 126817609, pkt_nr = 121636864, compress_pkt_nr = 1633751071, write_timeout = 909391922, read_timeout = 49, retry_count = 0, fcntl = 0, compress = 0 '\0', remain_in_buf = 134481807, length = 773785416, buf_length = 942825825, where_b = 3225140, return_status = 0x0, reading_or_writing = 0 '\0', save_char = 0 '\0', no_send_ok = 0 '\0', no_send_eof = 0 '\0', last_error = "\000\000\000ÿ¹\025\217\a\000\bP\a\037Laa28461", '\0' <repeats 11 times>, "\001\a\211\025\217\a\000\bX\a!Þaa28461", '\0' <repeats 11 times>, "\001\016å\024\016\006\000\b`\a\037\207aA2847\200\000\0223\bøX\230\000\000\000\000½\220\025\017\a\000\bh\a\037¥aa28476\200\000\0223\bøX\230\000\000\000\000Ö^\024\016\006\000\bp\a\000taA2848\200\000\0223\bøX\230\000\000\000\000½\221\025\217\a\000\bx\a\"°aa28334", '\0' <repeats 11 times>, "\001\0230\025\217\a\000\b\200\a\"Îaa27781", '\0' <repeats 11 times>..., sqlstate = "þ\025\217\a\004\b", last_errno = 1633780259, error = 50 '2', query_cache_query = 0x36 <Address 0x36 out of bounds>, report_error = 1 '\001', return_errno = 0 '\0'} On this occasion, thd == 0x89b17f0 in my_message_sql().
[11 Dec 2004 0:04]
Heikki Tuuri
Marko, I think in the past days there have been also other bug reports involving timezones. I was not able to repeat this crash in hundin with ./BUILD/compile-pentium-debug My /mysql system tables are from an about 3 months old 'make test' run. Maybe the crash depends on the system tables. Regards, Heikki
[11 Dec 2004 0:18]
Heikki Tuuri
Hi! I am assigning this nug report to Sanja. Maybe he can cast light on why my_message_sql() calls query_cache_abort(). Maybe a query result should not be stored in the query cache if there is an error in a query execution? But in this case this is a general error that is printed to the .err log, and there is really no query involved, I presume. Regards, Heikki mysqld.cc: /* thd->lex->current_select == 0 if lex structure is not inited (not query command (COM_QUERY)) */ if (thd->lex->current_select && thd->lex->current_select->no_error && !thd->is_fatal_error) { DBUG_PRINT("error", ("Error converted to warning: current_select: no_erro\ r %d fatal_error: %d", (thd->lex->current_select ? thd->lex->current_select->no_error : 0), (int) thd->is_fatal_error)); push_warning(thd, MYSQL_ERROR::WARN_LEVEL_ERROR, error, str); } else { NET *net= &thd->net; net->report_error= 1; #ifndef EMBEDDED_LIBRARY /* TODO query cache in embedded library*/ query_cache_abort(net); #endif if (!net->last_error[0]) // Return only first message { strmake(net->last_error, str, sizeof(net->last_error)-1); net->last_errno= error ? error : ER_UNKNOWN_ERROR; } } } if (!thd || MyFlags & ME_NOREFRESH) sql_print_error("%s: %s",my_progname,str); /* purecov: inspected */ DBUG_RETURN(0); }
[1 Feb 2005 13:55]
Marko Mäkelä
Any news regarding this? My temporary fix is to remove the frm_error() call from openfrm() in table.cc. (I could perhaps also update the mysql.* system tables, but I want to use the same set of system tables when testing 4.0, 4.1 and 5.0.)
[3 Feb 2005 19:11]
Oleksandr Byelkin
I'll say something as soon as I fetch this bug for processing, I can't say something aster fast look on it.
[7 Feb 2005 20:19]
MySQL Verification Team
Marko, Where can I get "ibtest7a" for testing?
[9 Feb 2005 11:26]
Marko Mäkelä
Victoria, This bug can manifest itself also without any SQL being executed. It was just a coincidence that it first occurred after InnoDB crash recovery. To reproduce the bug, start MySQL 5.0.3-bk on system tables created with 4.0 or 4.1. Some 5.0-specific tables will be missing then. As you can see from the stack trace, mysqld is trying to complain "Table 'mysql.time_zone_leap_second' doesn't exist" when it crashes. I don't know why mysqld is not always crashing at start-up when the new system tables are missing, but I'm fully sure that all recent start-up crashes of 5.0.3-bk I have seen have been caused by this bug. Perhaps it is accessing uninitialized memory, or maybe it fails to protect a critical section with a mutex.
[22 Mar 2005 14:25]
MySQL Verification Team
Marko, I wasn't able to repeat the crash either with 4.0 system tables or 4.1 system tables. Just I got the messages for missed tables. Could you please try again with latest BK source. Thanks
[22 Mar 2005 14:53]
Marko Mäkelä
Miguel, it doesn't crash every time. There have been days with no crash, but it did crash again this week, with fresh BK sources. As a workaround, I disabled the frm_error() call in openfrm(). I'm not sure, but it could be that mysqld has to be compiled with the certain options, or the data segments have to be placed in a certain way in memory in order for the crash to be possible. If mysqld crashes on startup due to this bug, the crash seems to be repeatable.
[5 Apr 2005 3:00]
MySQL Verification Team
Marko, Sorry I tried again and I wasn't able to repeat the behavior reported. Could you please test again on your side.