Bug #59351 rqg_info_schema failure in PB2 actual test duration (603 seconds) is more than 2
Submitted: 7 Jan 2011 14:02 Modified: 9 Jan 2015 10:23
Reporter: SaiKumar V Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Tests Severity:S3 (Non-critical)
Version:mysql-trunk OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: pushbuild, rqg_pb2, test failure

[7 Jan 2011 14:02] SaiKumar V
Description:
rqg_info_schema test is failing in Pushbuild trunk. Please see http://pb2.norway.sun.com/web.py?template=show_pushes&nocache=yes&branch=mysql-trunk

# 2011-01-06T14:45:45 Actual test duration (603 seconds) is more than 2 times the desired duration (300 seconds) # 2011-01-06T14:45:46 Killing child process with pid 13439... # 2011-01-06T14:45:46 Killing child process with pid 13443... # 2011-01-06T14:45:46 Kill GenTest::ErrorFilter(13427) # 2011-01-06T14:45:46 Server deadlock reported, initiating analysis... # 2011-01-06T14:45:46 Killing mysqld with pid 13397 with SIGHUP in order to force debug output. # 2011-01-06T14:45:48 Killing mysqld with pid 13397 with SIGSEGV in order to capture core. # 2011-01-06T14:46:10 The last 100 lines from /export/home/pb2/test/sb_1-2744546-1294320921.49/mysql-5.6.2-m5-linux-i686-test/vardirs/data//../mysql.err :
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 337977 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
user                                 1       0     1  Not in use
mysql          user                                 1       0     1  Not in use
mysql          user                                 1       0     1  Not in use
mysql          help_topic                           1       0     1  Not in use
mysql          help_topic                           1       0     1  Not in use
mysql          func                                 1       0     1  Not in use
mysql          func                                 1       0     1  Not in use
performance_sc rwlock_instances                     1       0     1  Not in use
test           w                                    1       0     1  Not in use
mysql          time_zone_transition                 1       0     1  Not in use
mysql          time_zone_transition                 1       0     1  Not in use
mysql          time_zone_transition                 1       0     1  Not in use
test           E                                    1       0     1  Not in use
test           E                                    1       0     1  Not in use
test           E                                    1       0     1  Not in use
mysql          slave_relay_log_info                 1       0     1  Not in use
mysql          slave_relay_log_info                 1       0     1  Not in use
test           AA                                   1       0     1  Not in use
test           AA                                   1       0     1  Not in use
mysql          time_zone_transition_type            1       0     1  Not in use
mysql          time_zone_transition_type            1       0     1  Not in use
performance_sc performance_timers                   1       0     1  Not in use
mysql          time_zone                            1       0     1  Not in use
mysql          time_zone                            1       0     1  Not in use
mysql          time_zone                            1       0     1  Not in use
test           BB                                   1       0     1  Not in use
test           BB                                   1       0     1  Not in use
performance_sc setup_instruments                    1       0     1  Not in use
performance_sc file_summary_by_instance             1       0     1  Not in use
mysql          slave_master_info                    1       0     1  Not in use
mysql          slave_master_info                    1       0     1  Not in use

--

Using host libthread_db library "/lib/libthread_db.so.1".
Core was generated by `/export/home/pb2/test/sb_1-2744546-1294320921.49/mysql-5.6.2-m5-linux-i686-test'.
Program terminated with signal 11, Segmentation fault.
#0  0x00b64402 in __kernel_vsyscall ()
#0  0x00b64402 in __kernel_vsyscall ()
#1  0x0089f067 in pthread_kill () from /lib/libpthread.so.0
#2  0x085181fe in my_write_core (sig=11)
    at /export/home/pb2/build/sb_0-2744438-1294315679.8/mysql-5.6.2-m5/mysys/stacktrace.c:423
#3  0x08166335 in handle_segfault (sig=11)
    at /export/home/pb2/build/sb_0-2744438-1294315679.8/mysql-5.6.2-m5/sql/mysqld.cc:2511
#4  <signal handler called>
#5  0x00b64402 in __kernel_vsyscall ()
#6  0x007e8033 in poll () from /lib/libc.so.6
#7  0x08166997 in handle_connections_sockets ()
    at /export/home/pb2/build/sb_0-2744438-1294315679.8/mysql-5.6.2-m5/sql/mysqld.cc:5346
#8  0x0816b54d in mysqld_main (argc=25, argv=0x971d4c0)
    at /export/home/pb2/build/sb_0-2744438-1294315679.8/mysql-5.6.2-m5/sql/mysqld.cc:4816
#9  0x0815e742 in main (argc=)
    at /export/home/pb2/build/sb_0-2744438-1294315679.8/mysql-5.6.2-m5/sql/main.cc:24
#5  0x00b64402 in __kernel_vsyscall ()
 in /export/home/pb2/build/sb_0-2744438-1294315679.8/mysql-5.6.2-m5/sql/main.cc

The reason being its timing out (i.e test duration exceeds 2 times set duration) for some reason, resulting in RQG declaring a deadlock situation. 

Analysis by JohnEmb below,

According to the PB2 history, it looks like it started failing like this in daily-trunk-bugfixing on December 17.
On December 16 it failed with a different symptom (crash).
Prior to December 16 it did not fail regularly, but it has been quite unstable with various crashes before that.

In mysql-trunk, it started occurring after the merge by alik at 2010-12-22 22:31:37. This was a large merge from mysql-trunk-bugfixing.

How to repeat:
mysqld  Ver 5.6.2-m5-debug

Branch and build, or otherwise obtain the code from mysql-trunk. Referred to
below as environment variable $CODE.

Branch or otherwise obtain the Random Query Generator:
bzr branch lp:randgen

Run the test:
cd randgen

runall-new.pl \ 
--mysqld=--loose-skip-safemalloc \ 
--grammar=conf/runtime/information_schema.yy \ 
--threads=3 \ 
--duration=300 \ 
--mysqld=--log-output=file \ 
--reporters=Deadlock,ErrorLog,Backtrace \ 
--basedir=$CODE
--vardir=$PWD/vardirs \ 
--mysqld=--loose-table-lock-wait-timeout=1 \ 
--mysqld=--loose-lock-wait-timeout=1 \ 
--mysqld=--loose-innodb-lock-wait-timeout=1 \ 
--queries=100000 \ 
--testname=rqg_info_schema
[10 Jan 2011 10:30] Jon Olav Hauglid
I've looked at the stack traces and it looks like the test failure is due to
an information schema query taking too long to execute and not due to locking
issues.

According to http://forge.mysql.com/wiki/RandomQueryGeneratorReporters
RQG, can report deadlock if:

"The test itself took more than twice the desired running time, as specified via the --duration command-line option. This protection is in place to detect situations where a single query has hanged and prevents the test from completing. A false alarm may be triggered if the grammar generates queries that take much longer to execute as compared to the --duration desired."

And that's what I think happened here.

It looks like the long-running query is related to
INFORMATION_SCHEMA.`INNODB_BUFFER_PAGE`. It could be that the test started
failing due to changes in the usage of this table, either InnoDB or I_S related.

A possible workaround is to increase --duration or reduce --queries for the
test.

Reassigning this bug to Saikumar for further investigation (as discussed on IRC).
[3 Feb 2011 12:12] Sandeep Doddaballapur
An update ,

   The test has issues with spefic queries related to information schema.
   The duration issue was resolved by increasing the duration time to 3600 . At 3600 we dont see any error of kind actual test duration exceeding the required duration .
    Next issue which popped up was the "failed: 126 Incorrect key file for table" error for the same information schema query . I digged through the problem and it more looks like a space issue in TMPDIR . I changed the TMPDIR to point at disk location with ample amount of space but still the query is causing deadlock.

    The culprit query is :
mysql> SELECT * FROM (INFORMATION_SCHEMA . `SESSION_VARIABLES` AS table1 INNER JOIN INFORMATION_SCHEMA . `INNODB_BUFFER_PAGE` AS table2 ON ( table2 . `FREE_PAGE_CLOCK` = table1 . `VARIABLE_VALUE` ) ) WHERE NOT ( ( table1 . `VARIABLE_VALUE` <> table1 . `VARIABLE_NAME` AND table2 . `POOL_ID` > table2 . `TABLE_NAME`) AND table2 . `COMPRESSED_SIZE` = 2)   ORDER BY table1 . `VARIABLE_VALUE`;
104755 rows in set, 65535 warnings (1 min 58.40 sec) 

Note : resulting rowset and time is when it is executed manually.

Following is the crash thread in the test :
==========================================
Thread 2 (Thread 30732):
#0  0x000000382860e438 in pread64 () from /lib64/libpthread.so.0
#1  0x000000000090b2a1 in my_pread (Filedes=145, Buffer=0x2aaabc0c2210 "1\376\vSYNC_BINLOGET_SYSTEMRSISTENTD_INSTANCES_SIZE_SIZE", Count=10120, offset=11043925640, MyFlags=4)
    at /users/sandeepd/work/mysql-trunk/mysys/my_pread.c:75
#2  0x0000000000b15c10 in inline_mysql_file_pread (src_file=0xd5dfc0 "/users/sandeepd/work/mysql-trunk/storage/myisam/mi_dynrec.c", src_line=201, file=145, 
    buffer=0x2aaabc0c2210 "1\376\vSYNC_BINLOGET_SYSTEMRSISTENTD_INSTANCES_SIZE_SIZE", count=10120, offset=11043925640, flags=4)
    at /users/sandeepd/work/mysql-trunk/include/mysql/psi/mysql_file.h:1167
#3  0x0000000000b15cfc in mi_nommap_pread (info=0x2aaabc003bc0, Buffer=0x2aaabc0c2210 "1\376\vSYNC_BINLOGET_SYSTEMRSISTENTD_INSTANCES_SIZE_SIZE", Count=10120, 
    offset=11043925640, MyFlags=4) at /users/sandeepd/work/mysql-trunk/storage/myisam/mi_dynrec.c:201
#4  0x0000000000b39a57 in _mi_read_static_record (info=0x2aaabc003bc0, pos=11043925640, record=0x2aaabc0c2210 "1\376\vSYNC_BINLOGET_SYSTEMRSISTENTD_INSTANCES_SIZE_SIZE")
    at /users/sandeepd/work/mysql-trunk/storage/myisam/mi_statrec.c:178
#5  0x0000000000b39e5e in _mi_read_rnd_static_record (info=0x2aaabc003bc0, buf=0x2aaabc0c2210 "1\376\vSYNC_BINLOGET_SYSTEMRSISTENTD_INSTANCES_SIZE_SIZE", filepos=11043925640, 
    skip_deleted_blocks=0 '\000') at /users/sandeepd/work/mysql-trunk/storage/myisam/mi_statrec.c:263
#6  0x0000000000b33244 in mi_rrnd (info=0x2aaabc003bc0, buf=0x2aaabc0c2210 "1\376\vSYNC_BINLOGET_SYSTEMRSISTENTD_INSTANCES_SIZE_SIZE", filepos=11043925640)
    at /users/sandeepd/work/mysql-trunk/storage/myisam/mi_rrnd.c:58
#7  0x0000000000af6d3c in ha_myisam::rnd_pos (this=0x894d1f0, buf=0x2aaabc0c2210 "1\376\vSYNC_BINLOGET_SYSTEMRSISTENTD_INSTANCES_SIZE_SIZE", pos=0x8a18500 "")
    at /users/sandeepd/work/mysql-trunk/storage/myisam/ha_myisam.cc:1720
#8  0x0000000000745e63 in handler::ha_rnd_pos (this=0x894d1f0, buf=0x2aaabc0c2210 "1\376\vSYNC_BINLOGET_SYSTEMRSISTENTD_INSTANCES_SIZE_SIZE", pos=0x8a18500 "")
    at /users/sandeepd/work/mysql-trunk/sql/handler.cc:2237
#9  0x000000000082904c in rr_from_cache (info=0x89f85f0) at /users/sandeepd/work/mysql-trunk/sql/records.cc:665
#10 0x000000000060e5b9 in sub_select (join=0x89fc9c0, join_tab=0x89f8568, end_of_records=false) at /users/sandeepd/work/mysql-trunk/sql/sql_select.cc:17464
#11 0x000000000061c5e2 in do_select (join=0x89fc9c0, fields=0x89fcca8, table=0x0, procedure=0x0) at /users/sandeepd/work/mysql-trunk/sql/sql_select.cc:16997
#12 0x0000000000639b8f in JOIN::exec (this=0x89fc9c0) at /users/sandeepd/work/mysql-trunk/sql/sql_select.cc:3361
#13 0x0000000000633cb5 in mysql_select (thd=0x8889930, rref_pointer_array=0x888be70, tables=0x889f878, wild_num=1, fields=..., conds=0x8933588, og_num=1, order=0x8933790, 
    group=0x0, having=0x0, proc_param=0x0, select_options=2684619520, result=0x89fc9a0, unit=0x888b650, select_lex=0x888bc88)
    at /users/sandeepd/work/mysql-trunk/sql/sql_select.cc:3584
#14 0x0000000000639f51 in handle_select (thd=0x8889930, lex=0x888b5a0, result=0x89fc9a0, setup_tables_done_option=0) at /users/sandeepd/work/mysql-trunk/sql/sql_select.cc:323
#15 0x00000000005da8b6 in execute_sqlcom_select (thd=0x8889930, all_tables=0x889f878) at /users/sandeepd/work/mysql-trunk/sql/sql_parse.cc:4513
#16 0x00000000005db81f in mysql_execute_command (thd=0x8889930) at /users/sandeepd/work/mysql-trunk/sql/sql_parse.cc:2096
#17 0x00000000005e276f in mysql_parse (thd=0x8889930, 
    rawbuf=0x889ed00 "SELECT * FROM (INFORMATION_SCHEMA . `SESSION_VARIABLES` AS table1 INNER JOIN INFORMATION_SCHEMA . `INNODB_BUFFER_PAGE` AS table2 ON ( table2 . `FREE_PAGE_CLOCK` = table1 . `VARIABLE_VALUE` ) ) WHERE NOT ( ( table1 . `VARIABLE_VALUE` <> table1 . `VARIABLE_NAME` AND table2 . `POOL_ID` > table2 . `TABLE_NAME`) AND table2 . `COMPRESSED_SIZE` = 2)   ORDER BY table1 . `VARIABLE_VALUE`", length=381, parser_state=0x4a76a870) at /users/sandeepd/work/mysql-trunk/sql/sql_parse.cc:5550
#18 0x00000000005e3db8 in dispatch_command (command=COM_QUERY, thd=0x8889930, 
    packet=0x88a6ee1 "RGE_PAGE_SIZE\001\060\001\060\004\065\062\071\066\001\060\001\060\aUNKNOWN\001\060\001\060\002NO\001\060\001\060\001\060\373\373\001\060\001\060\001\060\bNOT_USED\aIO_NONE\002NO\001\060K", packet_length=382) at /users/sandeepd/work/mysql-trunk/sql/sql_parse.cc:1078
#19 0x00000000005e5301 in do_command (thd=0x8889930) at /users/sandeepd/work/mysql-trunk/sql/sql_parse.cc:815
#20 0x00000000006b631e in do_handle_one_connection (thd_arg=0x8889930) at /users/sandeepd/work/mysql-trunk/sql/sql_connect.cc:748
#21 0x00000000006b6417 in handle_one_connection (arg=0x8889930) at /users/sandeepd/work/mysql-trunk/sql/sql_connect.cc:684
#22 0x000000382860673d in start_thread () from /lib64/libpthread.so.0
#23 0x0000003827ad3d1d in clone () from /lib64/libc.so.6
==========================================

Still digging through the issue.
[7 Feb 2011 10:14] Sandeep Doddaballapur
The test faces issues with tmpdir space if the duration is increased to 3600

Tried running the test with tmpdir at a location with over 44G space but the query went out of space while executing. 

Following is the command line 
========================
setsid perl runall-new.pl --mtr-build-thread=950 --mysqld=--loose-skip-safemalloc --mysqld=--tmpdir=/data0/sd242371/tmpfil --grammar=conf/runtime/information_schema.yy --threads=3 --duration=3600 --mysqld=--log-output=file --reporters=Deadlock,ErrorLog,Backtrace --basedir=/data0/sd242371/work/mysql-trunk/ --mysqld=--loose-table-lock-wait-timeout=1 --mysqld=--loose-lock-wait-timeout=1 --mysqld=--loose-innodb-lock-wait-timeout=1 --queries=100000 --testname=rqg_info_schema > & runall_infoschema_run10.out &
========================

Following is the df -h output on the machine during the test run 
Before test run
=============================
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              65G   18G   44G  29% /
/dev/sda1              99M   24M   70M  26% /boot
tmpfs                 1.9G     0  1.9G   0% /dev/shm
=============================

During test run
=============================
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3              65G   61G     0 100% /
/dev/sda1              99M   24M   70M  26% /boot
tmpfs                 1.9G     0  1.9G   0% /dev/shm
=============================