Bug #20326 test federated crashes mysqld with 5.1.12-beta
Submitted: 7 Jun 2006 22:20 Modified: 18 Jul 2006 22:05
Reporter: Marc Alff Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.1.12-beta OS:GNU/Linux AMD64
Assigned to: Bugs System CPU Architecture:Any

[7 Jun 2006 22:20] Marc Alff
Description:
The test "federated" kills mysqld with 5.1.12-beta

How to repeat:
Built the code from bk://mysql.bkbits.net/mysql-5.1-new on 2006-06-07,
make test : fails with the test named "federated"

mysqldev@weblab ~/mysql-5.1/sql $ ./mysqld --version
./mysqld  Ver 5.1.12-beta-debug for unknown-linux-gnu on x86_64 (Source distribution)

mysqldev@weblab ~/mysql-5.1/mysql-test $ ./mysql-test-run.pl federated -ddd :

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

federated                      [ fail ]

Errors are (from /home/mysqldev/mysql-5.1/mysql-test/var/log/mysqltest-time) :
mysqltest: At line 471: query 'SELECT * FROM federated.t1 WHERE (id = 5 AND col2 = 'Two two')
OR (col2 = 444 AND col3 = 4444444)' failed: 2013: Lost connection to MySQL server during query
(the last lines may be the most important ones)
Result from queries before failure can be found in r/federated.log

The call stack is :

Program received signal SIGSEGV, Segmentation fault.
cli_fetch_lengths (to=0x1c72160, column=0x8f8f8f8f8f8f8f8f, field_count=2408550335) at client.c:1136
Current language:  auto; currently c
(gdb) where
#0  cli_fetch_lengths (to=0x1c72160, column=0x8f8f8f8f8f8f8f8f, field_count=2408550335) at client.c:1136
#1  0x0000000000750c3e in mysql_fetch_lengths (res=0x1c720d8) at client.c:2808
#2  0x00000000006d698d in ha_federated::convert_row_to_internal_format (this=0x1c62c68, record=0x1c63068 "\005", row=0x8f8f8f8f8f8f8f8f) at ha_federated.cc:773
#3  0x00000000006d9a15 in ha_federated::rnd_next (this=0x1c62c68, buf=0x1c63068 "\005") at ha_federated.cc:2327
#4  0x00000000006d9b47 in ha_federated::rnd_pos (this=0x1c62c68, buf=0x1c63068 "\005", pos=0x1c616d8 "8MÈ\001") at ha_federated.cc:2374
#5  0x00000000006ae56c in rr_from_pointers (info=0x1c62a78) at records.cc:443
#6  0x00000000006a5862 in QUICK_INDEX_MERGE_SELECT::get_next (this=0x1c629d0) at opt_range.cc:7444
#7  0x00000000006ae34a in rr_quick (info=0x1c6dd98) at records.cc:301
#8  0x000000000064e427 in join_init_read_record (tab=0x1c6dd20) at sql_select.cc:10390
#9  0x000000000064d203 in sub_select (join=0x1c6c078, join_tab=0x1c6dd20, end_of_records=191) at sql_select.cc:9756
#10 0x000000000064cda3 in do_select (join=0x1c6c078, fields=0x1c287f8, table=0x0, procedure=0x0) at sql_select.cc:9520
#11 0x000000000063e0ad in JOIN::exec (this=0x1c6c078) at sql_select.cc:1760
#12 0x000000000063e677 in mysql_select (thd=0x1c28238, rref_pointer_array=0x1c28930, tables=0x1c45830, wild_num=1, fields=@0x1c287f8, conds=0x1c46928, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147764736, result=0x1c46a80, unit=0x1c28340, select_lex=0x1c286f8) at sql_select.cc:1922
#13 0x000000000063a391 in handle_select (thd=0x1c28238, lex=0x1c282a0, result=0x1c46a80, setup_tables_done_option=0) at sql_select.cc:238
#14 0x00000000005fce14 in mysql_execute_command (thd=0x1c28238) at sql_parse.cc:2546
#15 0x000000000060441a in mysql_parse (thd=0x1c28238, inBuf=0x1c45638 "SELECT * FROM federated.t1 WHERE (id = 5 AND col2 = 'Two two')\nOR (col2 = 444 AND col3 = 4444444)", length=29524640) at sql_parse.cc:5939
#16 0x00000000005fb1c7 in dispatch_command (command=COM_QUERY, thd=0x1c28238, packet=0x1c45699 "", packet_length=98) at sql_parse.cc:1764
#17 0x00000000005fa9a8 in do_command (thd=0x1c28238) at sql_parse.cc:1550
#18 0x00000000005f9c0e in handle_one_connection (arg=0x1c72160) at sql_parse.cc:1190
#19 0x00002aaaaade114a in start_thread () from /lib/tls/libpthread.so.0
#20 0x00002aaaab34b943 in clone () from /lib/tls/libc.so.6
#21 0x0000000000000000 in ?? ()

See the full DDD session as attachment

Suggested fix:
None available.
[7 Jun 2006 22:22] Marc Alff
DDD session

Attachment: BUG-MySQL-20326 (application/octet-stream, text), 11.32 KiB.

[8 Jun 2006 12:23] Valeriy Kravchuk
Test passed OK with older 5.1-BK build (ChangeSet@1.2179.1.1, 2006-06-05 15:07:27-07:00).
[9 Jun 2006 7:54] Marc Alff
Tried to narrow down the issue and found the following :

5.1.11-beta + BUILD/compile-amd64-max :
./mysql-test-run.pl --force : all the tests pass

5.1.11-beta + BUILD/compile-amd64-debug-max :
./mysql-test-run.pl --force : test case federated fails, all the other tests pass

5.1.12-beta (BK) + BUILD/compile-amd64-max :
./mysql-test-run.pl --force :
- test federated pass
- tests system_mysql_db_fix, view fails
- other tests pass

5.1.12-beta (BK) + BUILD/compile-amd64-debug-max :
./mysql-test-run.pl --force :
- test case federated fails
- tests system_mysql_db_fix, view fails
- other tests pass

Somehow, compiling with debug causes the test case federated to fail,
both with 5.1.11-beta and 5.1.12-beta
[30 Jun 2006 13:09] Marc Alff
Running the test case 'federated' under valgrind shows that free'd memory is used, so this is a confirmed bug in the code.

See the attached master.err file, look for "Invalid read" errors.

Reproduced with 5.1-BK as of 2006-06-28
[30 Jun 2006 13:10] Marc Alff
valgrind error log

Attachment: master.err (application/octet-stream, text), 129.46 KiB.

[30 Jun 2006 21:23] Antony Curtis
I am getting the same crash in this test using current 5.1 bk tree
This is on Gentoo Linux amd64
[1 Jul 2006 10:55] Valeriy Kravchuk
Still can't repeat with latest 5.1.12-BK (ChangeSet@1.2228.1.1, 2006-07-01 01:37:20+04:00) built --with-debug=full (test passes). 

But this may be also bug repeatable on gentoo/AMD64 only. So, with valgrind error log uploaded, I think, we have enough information for developers to take a look at this report.
[2 Jul 2006 19:01] Patrick Galbraith
Looking into this issue. I also need to find a gentoo box in our build farm.
[18 Jul 2006 18:41] Patrick Galbraith
I cannot reproduce this. I will try with valgrind. See too if you can reproduce this.

patg@govinda:~/mysql-build/mysql-5.1-engines/mysql-test> uname -a
Linux govinda 2.6.13-15.8-default #1 Tue Feb 7 11:07:24 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux
patg@govinda:~/mysql-build/mysql-5.1-engines/mysql-test> ../sql/mysqld --version
../sql/mysqld  Ver 5.1.12-beta-debug for suse-linux on x86_64 (Source distribution)

patg@govinda:~/mysql-build/mysql-5.1-engines/mysql-test> ./mysql-test-run.pl federated -ddd
Logging: ./mysql-test-run.pl federated -ddd
Using "../libtool" when running valgrind or debugger
Using ndbcluster if necessary, mysqld supports it
Setting mysqld to support SSL connections
Using MTR_BUILD_THREAD      = 0
Using MASTER_MYPORT         = 9306
Using MASTER_MYPORT1        = 9307
Using SLAVE_MYPORT          = 9308
Using SLAVE_MYPORT1         = 9309
Using SLAVE_MYPORT2         = 9310
Using NDBCLUSTER_PORT       = 9310
Using NDBCLUSTER_PORT_SLAVE = 9311
Using IM_PORT               = 9312
Using IM_MYSQLD1_PORT       = 9313
Using IM_MYSQLD2_PORT       = 9314
Killing Possible Leftover Processes
Removing Stale Files
Installing Master Database
Installing Master Database
Installing Slave Database
Installing Slave Database
Installing Slave Database
=======================================================
Finding  Tests in the 'main' suite
Starting Tests in the 'main' suite

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

federated                      [ pass ]          44563
-------------------------------------------------------
All 1 tests were successful.
[18 Jul 2006 22:05] Marc Alff
Reproduced both the crash and the valgrind error,
with bk://mysql.bkbits.net/mysql-5.1 as of 2006-07-17.

uname -a :
Linux weblab 2.6.15-gentoo-r7 #10 PREEMPT Wed Mar 29 15:48:01 MST 2006 x86_64 AMD Athlon(tm) 64 Processor 3200+ GNU/Linux

./mysqld --help :
./mysqld  Ver 5.1.12-beta-debug for unknown-linux-gnu on x86_64 (Source distribution)

I disagree with changing the bug status to "can't repeat" based on the "pass" result alone.

Since the issue reported by valgrind shows that invalid memory is used,
the test execution can "PASS" or crash in a very unreproductible way,
depending on debug/release builds, use of ddd, platform, etc
(or anything that modifies the malloc/free patterns, really).

In my opinion, a test that shows valgrind errors should be considered failed,
regardless of the "PASS"/"FAIL" report given, and in this case the bug *is* reproduced.

Now, if the federated test does pass *and* is valgrind clean,
the bug report can be flaged as c'ant repeat (but it's not the case).

By the same token, if a code change is needed to fix valgrind errors
(assuming they can be reproduced), a "PASS" result under a clean run is
sufficient verification of the fix, even if the crash could not be observed
due to difficulties reprocucing it (understandable due to the nature of the bug).

If needed, I can also test code changes with my machine, to test fixes under the environment that cause the initial crash, for additional verification.
(see email in private attachments)