Bug #42784 | archive tables cause 100% cpu usage and hang in show table status | ||
---|---|---|---|
Submitted: | 12 Feb 2009 10:05 | Modified: | 6 Dec 2011 0:51 |
Reporter: | Shane Bester (Platinum Quality Contributor) | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Archive storage engine | Severity: | S3 (Non-critical) |
Version: | 5.0.88-debug, 5.1.31-debug, 5.1.42-debug | OS: | Any (Windows, Mac OS X) |
Assigned to: | Sergey Vojtovich | CPU Architecture: | Any |
[12 Feb 2009 10:05]
Shane Bester
[12 Feb 2009 10:10]
MySQL Verification Team
the bug is here: pthread_mutex_lock(&my_file_info[Filedes].mutex); Filedes is -1 so we have pthread function waiting (or crashing?) due to invalid parameter...
[17 Sep 2009 19:42]
Sveta Smirnova
Thank you for the report. I tried with generic test, but could not repeat lock, although CPU usage was about 90% (not 100% anyway). Please try to generate test case or at least provide more information about statements. RQG test case which I used: $cat bug42784.zz $tables = { rows => [10000], engines => ['Archive'], }; $fields = { types => ['int', 'varchar'], indexes => [undef], null => ['not null'], sign => ['unsigned'], }; $data = { numbers => ['digit'], strings => ['english'], }; $cat bug42784.yy query: select | hselect | flush | status ; select: SELECT * from _table where _field=digit; hselect: SELECT HIGH_PRIORITY COUNT(*) FROM _table; flush: FLUSH TABLES ; status: SHOW TABLE STATUS ;
[18 Sep 2009 5:36]
MySQL Verification Team
sveta, add some inserts to the testcase. btw, in 5.1.38 archive tables are totally broken anyway, so you should be getting crashes/hangs quickly
[1 Nov 2009 10:52]
MySQL Verification Team
the testcase in bug #47012 (bug47012.c) can repeat this hangup too.
[14 Nov 2009 16:47]
Valeriy Kravchuk
Verified just as described with recent 5.1.42-debug on Mac OS X.
[14 Nov 2009 16:51]
Valeriy Kravchuk
5.0.88 is also affected. I see the following in the error log: 091114 18:48:48 [Note] /Users/openxs/dbs/5.0/libexec/mysqld: ready for connections. Version: '5.0.88-debug' socket: '/tmp/mysql.sock' port: 3306 Source distribution mysqld(11040,0xb0441000) malloc: *** mmap(size=2779099136) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug 091114 18:48:49 [ERROR] /Users/openxs/dbs/5.0/libexec/mysqld: Out of memory at line 51, 'sql_string.cc' 091114 18:48:49 [ERROR] /Users/openxs/dbs/5.0/libexec/mysqld: needed 2779096488 byte (2713962k), memory in use: 9172385 bytes (8958k) mysqld(11040,0xb04d7000) malloc: *** mmap(size=3876610048) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug 091114 18:48:49 [ERROR] /Users/openxs/dbs/5.0/libexec/mysqld: Out of memory at line 51, 'sql_string.cc' 091114 18:48:49 [ERROR] /Users/openxs/dbs/5.0/libexec/mysqld: needed 3876608928 byte (3785751k), memory in use: 2074027253 bytes (2025418k) mysqld(11040,0xb04d7000) malloc: *** mmap(size=2064842752) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug 091114 18:48:49 [ERROR] /Users/openxs/dbs/5.0/libexec/mysqld: Out of memory at line 51, 'sql_string.cc' 091114 18:48:49 [ERROR] /Users/openxs/dbs/5.0/libexec/mysqld: needed 2064839056 byte (2016445k), memory in use: 2074256967 bytes (2025642k) Warning: Found locks from different threads in write: after waking up waiters Warning: Found locks from different threads in write: after waking up waiters ...
[14 Nov 2009 16:56]
Valeriy Kravchuk
Non-debug binaries just show 100+% CPU usage.
[9 Dec 2009 11:13]
Satya B
I was able to reproduce the Out of memory error and also other crashes when running the testcase bug47102.c but I couldn't get the stack trace in the description section. Can you please tell me how to get the stack trace(in description)? mysqld-debug.exe!my_pread(int Filedes=-1 Line 60 mysqld-debug.exe!azflush Line 627 mysqld-debug.exe!ha_archive::info(unsigned int flag=84) Line 1487 mysqld-debug.exe!get_schema_tables_record mysqld-debug.exe!get_all_tables mysqld-debug.exe!get_schema_tables_result mysqld-debug.exe!JOIN::exec mysqld-debug.exe!mysql_select mysqld-debug.exe!handle_select mysqld-debug.exe!execute_sqlcom_select mysqld-debug.exe!mysql_execute_command mysqld-debug.exe!mysql_parse mysqld-debug.exe!dispatch_command mysqld-debug.exe!do_command mysqld-debug.exe!handle_one_connection mysqld-debug.exe!pthread_start mysqld-debug.exe!_callthreadstart mysqld-debug.exe!_threadstart
[9 Dec 2009 16:01]
MySQL Verification Team
Satya, just opening a archive table in 5.1.41 is causing corruptions of memory. what happens after that (assuming crash doesn't happen) is more or less random. I'll upload you a single .sql file. Use a debug build on windows, or valgrind on linux. Step 1 to fixing archive tables will be to get this stack trace, and fix it: mysqld-debug.exe!_free_dbg_nolock()[dbgheap.c:1367] mysqld-debug.exe!_free_dbg()[dbgheap.c:1220] mysqld-debug.exe!free()[dbgheap.c:1179] mysqld-debug.exe!my_no_flags_free()[my_malloc.c:60] mysqld-debug.exe!ha_archive::free_share()[ha_archive.cc:413] mysqld-debug.exe!ha_archive::close()[ha_archive.cc:561] mysqld-debug.exe!closefrm()[table.cc:1971] mysqld-debug.exe!intern_close_table()[sql_base.cc:783] mysqld-debug.exe!free_cache_entry()[sql_base.cc:805] mysqld-debug.exe!my_hash_delete()[hash.c:547] mysqld-debug.exe!close_open_tables()[sql_base.cc:1204] mysqld-debug.exe!close_thread_tables()[sql_base.cc:1353] mysqld-debug.exe!close_tables_for_reopen()[sql_base.cc:5456] mysqld-debug.exe!get_all_tables()[sql_show.cc:3490] mysqld-debug.exe!get_schema_tables_result()[sql_show.cc:6135] mysqld-debug.exe!JOIN::exec()[sql_select.cc:1766] mysqld-debug.exe!mysql_select()[sql_select.cc:2437] mysqld-debug.exe!handle_select()[sql_select.cc:269] mysqld-debug.exe!execute_sqlcom_select()[sql_parse.cc:5051] mysqld-debug.exe!mysql_execute_command()[sql_parse.cc:2246] mysqld-debug.exe!mysql_parse()[sql_parse.cc:5974] mysqld-debug.exe!dispatch_command()[sql_parse.cc:1233] mysqld-debug.exe!do_command()[sql_parse.cc:872] mysqld-debug.exe!handle_one_connection()[sql_connect.cc:1127] mysqld-debug.exe!pthread_start()[my_winthread.c:85] mysqld-debug.exe!_callthreadstart()[thread.c:295] mysqld-debug.exe!_threadstart()[thread.c:277] kernel32.dll!BaseThreadStart() ariables. invalid and cause the dump to abort... 000444B1E8=select * from information_schema.tables
[9 Dec 2009 16:07]
MySQL Verification Team
load tables into db. run "select * from information_schema.tables" on a debug build. observe memory corruptions
Attachment: tables.zip (application/zip, text), 102.67 KiB.
[9 Dec 2009 16:10]
MySQL Verification Team
unzip the tables.sql and import using mysql -uroot --force test <tables.sql some tables will error out. this is okay. i think there's bug when many tables need to be opened and closed. so you could try a tiny table_open_cache, or hundreds of archive tables. valgrind will detect system calls such as pread getting -1 fd for example.
[10 Dec 2009 11:15]
Satya B
Shane, I have tried your query(after loading tables using tables.sql) in following combinations on ubuntu 9.04. 5.0-bugteam, 5.1-bugteam, 5.1 (main) ( valgrind and non valgrind builds) and I tried to run the server with valgrind and without valgrind and I couldn't see any errors or crash under valgrind I don't have windows setup ready and couldn't try on it :( I am using these scripts ./BUILD/compile-pentium-valgrind-max-no-ndb and ./BUILD/compile-pentium-debug-max-no-ndb. BTW, I have set table_open_cache to 4 in all scenarios. Can you please try on any linux machine and let me know the mysql branch used to reproduce?
[10 Dec 2009 15:00]
MySQL Verification Team
more problems/details about original bug47012.c testcase.
Attachment: bug47012_5.1.43-debug_multiple_problems.txt (text/plain), 11.56 KiB.
[11 Dec 2009 9:53]
Satya B
Shane, I was able to reproduce the bug finally :). Thanks for all the help. I was not running the bug47102 testcase when running the "show table status" query. So when I ran bug47102 testcase along with the 'show table status' query I was able to reproduce. Problems found. 1. crashes at ha_archive::check , ha_archive::real_write_row, ha_archive::optimize 2. Some of the tables are corrupted after starting the server from crash. 3. Incorrect key file for table 'table0003'; try to repair it) (HY000) (same 2 ?) 4. Out of memory errors. 5. this bug, fd being negative (Assertion `fd >= 0' failed.)
[11 Dec 2009 9:54]
Satya B
s/bug47102/bug47012/ in the above comment
[4 Jan 2010 5:17]
Satya B
The problem seems to be in ha_archive::optimize() where the locks are not taken when we re-open the file. The draft patch below solves the fd issue but the deadlock problem still remains. The locking is not yet proper in the below draft patch. For 5.1 ======= === modified file 'storage/archive/ha_archive.cc' --- storage/archive/ha_archive.cc 2009-11-11 08:03:29 +0000 +++ storage/archive/ha_archive.cc 2010-01-04 04:57:47 +0000 @@ -1293,6 +1293,7 @@ init_archive_reader(); + pthread_mutex_lock(&share->mutex); // now we close both our writer and our reader for the rename if (share->archive_write_open) { @@ -1305,7 +1306,10 @@ MY_REPLACE_EXT | MY_UNPACK_FILENAME); if (!(azopen(&writer, writer_filename, O_CREAT|O_RDWR|O_BINARY))) + { + pthread_mutex_unlock(&share->mutex); DBUG_RETURN(HA_ERR_CRASHED_ON_USAGE); + } /* An extended rebuild is a lot more effort. We open up each row and re-record it. @@ -1377,11 +1381,13 @@ rc = my_rename(writer_filename,share->data_file_name,MYF(0)); + pthread_mutex_unlock(&share->mutex); DBUG_RETURN(rc); error: DBUG_PRINT("ha_archive", ("Failed to recover, error was %d", rc)); azclose(&writer); + pthread_mutex_unlock(&share->mutex); DBUG_RETURN(rc); }
[5 Jan 2010 8:56]
Satya B
Note:The above suggested patch might be a risky one. Looking for reviewers for this bug.
[21 Sep 2011 12:17]
MySQL Verification Team
here is a stress test. still causes valgrind errors on trunk and 2GB mallocs on 5.5.17
Attachment: bug47012.c (text/plain), 146.64 KiB.
[6 Dec 2011 0:51]
Paul DuBois
Noted in 5.1.61, 5.5.20, 5.6.4 changelogs. Concurrent access to ARCHIVE tables could cause corruption.