Bug #87129 Unstable test main.basedir
Submitted: 20 Jul 2017 2:59 Modified: 22 Sep 2017 3:44
Reporter: Laurynas Biveinis (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:Tests: Server Severity:S3 (Non-critical)
Version:8.0.2, 8.0.3 OS:Any
Assigned to: CPU Architecture:Any
Tags: mtr

[20 Jul 2017 2:59] Laurynas Biveinis
Description:
Test main.basedir regularly fails in one of two ways.

How to repeat:
./mtr basedir
[20 Jul 2017 3:00] Laurynas Biveinis
1):

main.basedir                             [ fail ]
        Test ended at 2017-07-20 05:51:12

CURRENT_TEST: main.basedir
system failed on 'mysqld --no-defaults --explicit_defaults_for_timestamp --socket=/Users/laurynas/percona/obj-mysql-8.0.2-pp-asan-debug/mysql-test/var/tmp/mysqld.1.sock --port=13000 --log-syslog=0 --skip-ssl --secure-file-priv="" --datadir=/Users/laurynas/percona/obj-mysql-8.0.2-pp-asan-debug/mysql-test/var/tmp/basedir_test --log-error=/Users/laurynas/percona/obj-mysql-8.0.2-pp-asan-debug/mysql-test/var/tmp/server.log --init-file=/Users/laurynas/percona/obj-mysql-8.0.2-pp-asan-debug/mysql-test/var/tmp/shutdown.sql':  at /Users/laurynas/percona/obj-mysql-8.0.2-pp-asan-debug/mysql-test/var/tmpThjrgW line 11.
mysqltest: At line 85: command "perl" failed with error 2. my_errno=175

Then, var/tmp/server.log contains:

2017-07-20T02:51:09.927451Z 0 [Note] InnoDB: Completed initialization of buffer pool
2017-07-20T02:51:09.947042Z 0 [ERROR] [FATAL] InnoDB: Table flags are 0 in the data dictionary but the flags in file ./ibdata1 are 0x4800!
2017-07-20 05:51:09 0x7fffd0c053c0  InnoDB: Assertion failure in thread 140736695653312 in file ut0ut.cc line 916
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
02:51:09 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0
max_threads=151
thread_count=0
connection_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68215 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
0   mysqld                              0x000000010d3f5a1a my_print_stacktrace + 58
1   mysqld                              0x000000010d377c17 handle_fatal_signal + 700
2   libsystem_platform.dylib            0x00007fffc7efab3a _sigtramp + 26
3   ???                                 0x00007fff52df1a20 0x0 + 140734583740960
4   libsystem_c.dylib                   0x00007fffc7d7f420 abort + 129
5   mysqld                              0x000000010d5cfe45 _Z23ut_dbg_assertion_failedPKcS0_m + 161
6   mysqld                              0x000000010d5d2848 _ZN2ib5fatalD2Ev + 82
7   mysqld                              0x000000010d5d287b _ZN2ib5fatalD1Ev + 9
8   mysqld                              0x000000010d4864fc _ZL18fil_node_open_fileP10fil_node_t + 2437
9   mysqld                              0x000000010d48f5b8 _ZL23fil_node_prepare_for_ioP10fil_node_tP12fil_system_tP11fil_space_t + 191
10  mysqld                              0x000000010d48fc73 _Z6fil_ioRK9IORequestbRK9page_id_tRK11page_size_tmmPvS8_ + 816
11  mysqld                              0x000000010d459544 _ZL17buf_read_page_lowP7dberr_tbmmRK9page_id_tRK11page_size_tb + 384
12  mysqld                              0x000000010d4596a6 _Z13buf_read_pageRK9page_id_tRK11page_size_t + 59
13  mysqld                              0x000000010d445e3b _Z16buf_page_get_genRK9page_id_tRK11page_size_tmP11buf_block_tmPKcmP5mtr_tb + 1885
14  mysqld                              0x000000010d5c10bd _Z31trx_rseg_get_n_undo_tablespacesPm + 251
15  mysqld                              0x000000010d5a5879 _Z34innobase_start_or_create_for_mysqlv + 6734
16  mysqld                              0x000000010d4ddcf1 _ZL13innobase_initPv + 3560
17  mysqld                              0x000000010ce6d1de _Z24ha_initialize_handlertonP13st_plugin_int + 79
18  mysqld                              0x000000010d2cbfc1 _ZL17plugin_initializeP13st_plugin_int + 79
19  mysqld                              0x000000010d2caffd _Z11plugin_initPiPPci + 1556
20  mysqld                              0x000000010d36f82c _Z11mysqld_mainiPPc + 2902
21  libdyld.dylib                       0x00007fffc7ceb235 start + 1
22  ???                                 0x000000000000000b 0x0 + 11
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
[20 Jul 2017 3:03] Laurynas Biveinis
2):

main.basedir                             [ fail ]
        Test ended at 2017-07-20 05:54:58

MTR's internal check of the test case 'main.basedir' failed.
This means that the test case does not preserve the state that existed
before the test case was executed.  Most likely the test case did not
do a proper clean-up. It could also be caused by the previous test run
by this thread, if the server wasn't restarted.
This is the diff of the states of the servers before and after the
test case was executed:
mysqltest: Results saved in '/home/laurynas/obj-8.0-asan-debug-openssl/mysql-test/var/tmp/check-mysqld_1.result'.
mysqltest: Logging to '/home/laurynas/obj-8.0-asan-debug-openssl/mysql-test/var/tmp/check-mysqld_1.log'.
mysqltest: Connecting to server localhost:13000 (socket /home/laurynas/obj-8.0-asan-debug-openssl/mysql-test/var/tmp/mysqld.1.sock) as 'root', connection 'default', attempt 0 ...
mysqltest: Connect attempt 0/500 failed: 2002: Can't connect to local MySQL server through socket '/home/laurynas/obj-8.0-asan-debug-openssl/mysql-test/var/tmp/mysqld.1.sock' (2)
...
mysqltest: Could not open connection 'default' after 500 attempts: 2002 Can't connect to local MySQL server through socket '/home/laurynas/obj-8.0-asan-debug-openssl/mysql-test/var/tmp/mysqld.1.sock' (2)
not ok
safe_process[2282]: Child process: 2283, exit: 1

Leaving the temp files around does not seem to indicate the cause of this issue, except that the regular server error log ends at server shutdown and does not contain the final server startup sequence for the testcase.
[20 Jul 2017 7:39] MySQL Verification Team
Hello Laurynas,

Thank you for the report.
I could not repeat the issue at my end, anything else required(cmake, os
etc details) to trigger the issue? Please let me know.

--
[umshastr@hod03]/export/umesh/server/binaries/Trunk/mysql-8.0.2-dmr-linux-glibc2.12-x86_64/mysql-test: ./mtr --debug basedir
Logging: ./mtr  --debug basedir
MySQL Version 8.0.2
Too long tmpdir path '/export/umesh/server/binaries/Trunk/mysql-8.0.2-dmr-linux-glibc2.12-x86_64/mysql-test/var/tmp'  creating a shorter one...
 - using tmpdir: '/tmp/4eypWjhVw8'

Checking supported features...
 - SSL connections supported
 - binaries are debug compiled
Collecting tests...
Checking leftover processes...
Removing old var directory...
Creating var directory '/export/umesh/server/binaries/Trunk/mysql-8.0.2-dmr-linux-glibc2.12-x86_64/mysql-test/var'...
Installing system database...
Using parallel: 1

==============================================================================

TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
main.basedir                             [ pass ]  28254
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 28.254 of 72 seconds executing testcases

Completed: All 1 tests were successful.

Thanks,
Umesh
[21 Jul 2017 2:58] Laurynas Biveinis
I am seeing it quite often on a variety of platforms / build configurations (various versions of Ubuntu, Debian, Mac, debug builds w/ and w/o ASan), but no fully deterministic procedure. A side load and high MTR --repeat seems to help.
[2 Aug 2017 10:53] MySQL Verification Team
I couldn't repeat with most recent source server with debug/release binaries on Centos 7 and MAC Sierra 10.12.6:

MySQL Version 8.0.3
Checking supported features...
 - SSL connections supported
Collecting tests...
Removing old var directory...
Creating var directory '/home/miguel/build/2017AUG01R/mysql-trunk/mysql-test/var'...
Installing system database...
Using parallel: 1

==============================================================================

TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 13000..13009
main.basedir                             [ pass ]  23526
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 23.526 of 34 seconds executing testcases

Completed: All 1 tests were successful.
[3 Aug 2017 9:46] MySQL Verification Team
Finally observed with the cmake provided in earlier bug.

Thanks,
Umesh
[3 Aug 2017 9:46] MySQL Verification Team
test results

Attachment: 87129.rtf (text/rtf), 92.42 KiB.

[3 Aug 2017 9:50] MySQL Verification Team
test results

Attachment: 87129.ol7.results (application/octet-stream, text), 100.15 KiB.

[22 Sep 2017 3:44] Laurynas Biveinis
Happens regularly with 8.0.3 too.
[11 Dec 2017 9:18] Ashish Padiyar
Unable to reproduce the failure symptoms even after running the test on binary build with cmake options provided by Umesh. Also tried with large --repeat and parallel values.