Bug #66363 | Large transaction writing rbr with sync_binlog hangs entire server on fsync | ||
---|---|---|---|
Submitted: | 13 Aug 2012 21:33 | Modified: | 13 Mar 2013 22:27 |
Reporter: | Trey Raymond | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.1.58, 5.1.67 | OS: | Linux (rhel 6.1) |
Assigned to: | CPU Architecture: | Any |
[13 Aug 2012 21:33]
Trey Raymond
[16 Aug 2012 19:12]
Trey Raymond
From further testing...it doesn't seem to be dependent on sync_binlog, the issue happens even when it's set to 0. I ran a bunch of tests insert...select'ing 40mil rows from one table into another identical table. Every time, the server behaved fine during the insert, then it finished inserting and briefly went into 'freeing items' state, and then the host became unresponsive. IO use shot up (expected, with a large flush) and a single core was pretty heavily into iowait during the hang, but that's no reason that the whole system should become unresponsive. It must be acquiring some sort of global mutex.
[16 Aug 2012 19:31]
MySQL Verification Team
I guess global mutex is LOCK_log while copying from tmpdir->binlog.
[13 Mar 2013 22:27]
Sveta Smirnova
Thank you for the report. Verified as described. Delay in my case. $ cat result.log while true; do mysql -uroot -S /tmp/mysql_ssmirnova5.sock -e "select now()" >> bug66363_1.log; sleep 1; done 2131 now() 2132 2013-03-13 19:10:54 2133 now() 2134 2013-03-13 19:12:24 9958 2013-03-13 19:45:47 9959 now() 9960 2013-03-13 19:45:47 9961 now() 9962 2013-03-13 19:45:57 9963 now() 9964 2013-03-13 19:45:57 9965 now() while true; do mysql -uroot -S /tmp/mysql_ssmirnova5.sock -e "select now()" test >> bug66363_2.log; sleep 1; done 3872 2013-03-13 19:45:47 3873 now() 3874 2013-03-13 19:45:57 3875 now() To repeat: 1. Start server with options: ./libexec/mysqld --defaults-file=support-files/my-small.cnf --basedir=. --datadir=./data --innodb-file-per-table --log-bin --binlog-format=row --socket=/tmp/mysql_ssmirnova5.sock --port=33051 --log-error --innodb_buffer_pool_size=8G & 2. Use table from bug #43968: CREATE TABLE `test_log` ( `SEQ` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `TEST_ID` int(11) NOT NULL, `TEST_NAME` varchar(20) COLLATE utf8_bin NOT NULL, PRIMARY KEY (`SEQ`,`TEST_ID`), KEY `IX_TEST_ID` (`TEST_ID`) ) ENGINE=InnoDB AUTO_INCREMENT=32171868 DEFAULT CHARSET=utf8 COLLATE=utf8_bin /*!50100 PARTITION BY HASH (TEST_ID) PARTITIONS 50 */; Run test case from bug #43968 until table is approximately 4.5G 3. Run in parallel session two jobs which SELECT NOW() and log results: while true; do mysql -uroot -S /tmp/mysql_ssmirnova5.sock -e "select now()" >> bug66363_1.log; sleep 1; done & while true; do mysql -uroot -S /tmp/mysql_ssmirnova5.sock -e "select now()" test >> bug66363_2.log; sleep 1; done 4. Run huge DELETE: mysql> begin; Query OK, 0 rows affected (0.00 sec) mysql> delete from test_log; commit; Query OK, 32171867 rows affected (32 min 20.19 sec) mysql> commit; Query OK, 0 rows affected (16.78 sec) 5. Stop jobs which log SELECT NOW() output into files. 6. Create LEX file: $ cat verify.l %option noyywrap warn %x IDATE IHOUR IMINUTE %{ int oldtime = 0; int time = 0; %} %% "2013-03-13 " { BEGIN IDATE; } <IDATE>[0-9][0-9] { time += atoi(yytext)*3600; } <IDATE>: { BEGIN IHOUR; } <IHOUR>[0-9][0-9] { time += atoi(yytext)*60; } <IHOUR>: { BEGIN IMINUTE; } <IMINUTE>[0-9][0-9] { time += atoi(yytext); if ( 0 != oldtime && 1 < (time - oldtime)) { fprintf(yyout, "Difference found: difference: - %8d sec, time - %8d, oldtime - %8d\n", time - oldtime, time, oldtime); } oldtime = time; time = 0; BEGIN INITIAL; } "now()" { } \n {} . {} %% main(int argc, char **argv) { yylex(); printf("Parsing finished\n"); } 7. Compile it: $ flex verify.l $ gcc lex.yy.c -o verify 8. Run it on log files: $ ./verify <bug66363_1.log Difference found: difference: - 2 sec, time - 68045, oldtime - 68043 ... Difference found: difference: - 90 sec, time - 69144, oldtime - 69054 ... Difference found: difference: - 2 sec, time - 71145, oldtime - 71143 Difference found: difference: - 10 sec, time - 71157, oldtime - 71147 Difference found: difference: - 2 sec, time - 71245, oldtime - 71243 ... Parsing finished $ ./verify <bug66363_2.log Difference found: difference: - 2 sec, time - 69203, oldtime - 69201 ... Difference found: difference: - 10 sec, time - 71157, oldtime - 71147 ... Difference found: difference: - 2 sec, time - 71403, oldtime - 71401 Parsing finished 9. Observe the issue.
[13 Mar 2013 22:31]
Sveta Smirnova
Huger test crashed debug server for me: mysqld: sql_class.cc:503: void Diagnostics_area::set_ok_status(THD*, ha_rows, ulonglong, const char*): Assertion `! is_set()' failed. 22:52:44 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16384 read_buffer_size=262144 max_used_connections=4 max_threads=151 thread_count=2 connection_count=2 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 49930 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x2aacd00008e8 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 45bb5108 thread_stack 0x20000 ./libexec/mysqld(my_print_stacktrace+0x35)[0xb35717] ./libexec/mysqld(handle_fatal_signal+0x3b8)[0x813c1c] /lib64/libpthread.so.0[0x3e2f40e7c0] /lib64/libc.so.6(gsignal+0x35)[0x3e2ec30265] /lib64/libc.so.6(abort+0x110)[0x3e2ec31d10] /lib64/libc.so.6(__assert_fail+0xf6)[0x3e2ec296e6] ./libexec/mysqld(_ZN16Diagnostics_area13set_ok_statusEP3THDyyPKc+0x45)[0x67e07d] ./libexec/mysqld(_Z5my_okP3THDyyPKc+0x42)[0x5a77e8] ./libexec/mysqld(_Z21mysql_execute_commandP3THD+0x67d9)[0x6b52f0] ./libexec/mysqld(_Z11mysql_parseP3THDPcjPPKc+0x2ef)[0x6ba94b] ./libexec/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xe45)[0x6acc1d] ./libexec/mysqld(_Z10do_commandP3THD+0x27e)[0x6abae1] ./libexec/mysqld(handle_one_connection+0x14c)[0x6a9c27] /lib64/libpthread.so.0[0x3e2f4064a7] /lib64/libc.so.6(clone+0x6d)[0x3e2ecd3c2d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (1de8f1c8): commit Connection ID (thread ID): 74668 Status: NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. For bigger test I created a table like: CREATE TABLE `t0` ( `f1` blob ) ENGINE=InnoDB DEFAULT CHARSET=latin1 Then filled it, so size of *ibd file is about 450MB Then I created 9 copies of this table: CREATE TABLE t1 LIKE t0; INSERT INTO t1 SELECT * FROM t0;, then run following transaction: $cat bug66363.sql begin; delete from t1; delete from t2; delete from t3; delete from t4; delete from t5; delete from t6; delete from t7; delete from t8; delete from t9; commit; Server crashed when commit was executed.