Bug #9755 MySQL server dumps core on SMP box with dbd-deadlock test
Submitted: 8 Apr 2005 13:01 Modified: 27 Jul 2005 15:03
Reporter: Sivakumar K Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.0.24 OS:Linux (RHEL 3)
Assigned to: Matthew Lord CPU Architecture:Any

[8 Apr 2005 13:01] Sivakumar K
Description:
MySQL server dumps a core when the bdb-deadlock test case is run. This happends for versions 4.0.23a and 4.0.24 versions of mysql running on Redhat Enterprise Linux 3 running on a Hyper Threaded Server. 
cat /proc/version gives : Linux version 2.4.21-15.ELsmp (bhcompile@bugs.build.redhat.com) (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-34)) #1 SMP Thu Apr 22 00:18:24 EDT 2004

the back trace of the core from gdb is:
#0  0xb75b9e8e in pthread_kill () from /lib/tls/libpthread.so.0
#1  0x081962af in write_core ()
#2  0x081021c3 in handle_segfault ()
#3  <signal handler called>
#4  0x081ed8ac in __dd_abort ()
#5  0x081ecb1a in lock_detect ()
#6  0x081eaa3c in __lock_get_internal ()
#7  0x081ea0e2 in lock_get ()
#8  0x081d9b27 in __db_lget ()
#9  0x081f78bf in __bam_c_first ()
#10 0x081f6642 in __bam_c_get ()
#11 0x081d4787 in __db_c_get ()
#12 0x08168711 in ha_berkeley::rnd_next ()
#13 0x0815e13a in rr_sequential ()
#14 0x08136443 in join_init_read_record ()
#15 0x0813593b in sub_select ()
#16 0x08130c1d in do_select ()
#17 0x0812b173 in mysql_select ()
#18 0x08129cd1 in handle_select ()
#19 0x0810f6c8 in mysql_execute_command ()
#20 0x08113bfd in mysql_parse ()
#21 0x0810e707 in dispatch_command ()
#22 0x0810e288 in do_command ()
#23 0x0810dbdd in handle_one_connection ()
#24 0xb75b6dec in start_thread () from /lib/tls/libpthread.so.0
#25 0xb7426e8a in clone () from /lib/tls/libc.so.6

How to repeat:
after un-taring the source of mysql, run the following commands:

./configure --prefix=/root/siva/MySQL --with-extra-charsets=complex --enable-thread-safe-client --enable-local-infile --enable-assembler --disable-shared --with-client-ldflags=-all-static --with-berkeley-db --with-mysqld-user=mysql --with-mysqld-ldflags=-all-static

make

cd mysql-test

./mysql-test-run --local bdb-deadlock

Running the last command (./mysql-test-run ...) 8-10 times results in a test failure and core dump of the server 2-3 times. (there is a 20 - 25 % failure).
[8 Apr 2005 13:05] Sivakumar K
gdb back trace, when mysql is built with --with-debug. 
gives the file names and line numbers where the core occurs.
#0  0x0831aef1 in kill ()
#1  0x083018a5 in pthread_kill ()
#2  0x0813275b in write_core (sig=11) at stacktrace.c:220
#3  0x0808c7a4 in handle_segfault (sig=11) at mysqld.cc:1814
#4  0x08303ce0 in __pthread_sighandler ()
#5  <signal handler called>
#6  0x0818c6ad in __dd_abort (dbenv=0x84a95e8, info=0x8b38478) at ../../bdb/lock/lock_deadlock.c:572
#7  0x0818b903 in lock_detect (dbenv=0x84a95e8, flags=0, atype=1, abortp=0xbe3fed58) at ../../bdb/lock/lock_deadlock.c:208
#8  0x08189742 in __lock_get_internal (lt=0x84b5bb8, locker=2147483664, flags=0, obj=0x8b37458, lock_mode=DB_LOCK_READ,
    lock=0x8b36e7c) at ../../bdb/lock/lock.c:586
#9  0x08188de8 in lock_get (dbenv=0x84a95e8, locker=2147483664, flags=0, obj=0x8b37458, lock_mode=DB_LOCK_READ,
    lock=0x8b36e7c) at ../../bdb/lock/lock.c:366
#10 0x081781cf in __db_lget (dbc=0x8b37408, flags=0, pgno=3, mode=DB_LOCK_READ, lkflags=0, lockp=0x8b36e7c)
    at ../../bdb/db/db_meta.c:301
#11 0x081980b4 in __bam_c_first (dbc=0x8b37408) at ../../bdb/btree/bt_cursor.c:1457
#12 0x08196e37 in __bam_c_get (dbc=0x8b37408, key=0x8b36a74, data=0xbe3fefb4, flags=19, pgnop=0xbe3fef50)
    at ../../bdb/btree/bt_cursor.c:882
#13 0x08172afe in __db_c_get (dbc_arg=0x8b36d88, key=0x8b36a74, data=0xbe3fefb4, flags=19) at ../../bdb/db/db_cam.c:620
#14 0x080fc555 in ha_berkeley::rnd_next (this=0x8b369d0, buf=0x8b36b00 "&#65533;") at ha_berkeley.cc:1576
#15 0x080efda1 in rr_sequential (info=0x8b204ec) at records.cc:181
#16 0x080c27e8 in join_init_read_record (tab=0x8b204c8) at sql_select.cc:5213
#17 0x080c1b9e in sub_select (join=0xbe3ff154, join_tab=0x8b204c8, end_of_records=144) at sql_select.cc:4785
#18 0x080c1964 in do_select (join=0xbe3ff154, fields=0x8b204c8, table=0x0, procedure=0x0) at sql_select.cc:4696
#19 0x080b98b8 in mysql_select (thd=0x8af83c0, tables=0x0, fields=@0x8af856c, conds=0x8b20278, order=0x0, group=0x0,
    having=0x0, proc_param=0x0, select_options=18387968, result=0x8b202e8) at sql_select.cc:1036
#20 0x080b7369 in handle_select (thd=0x8af83c0, lex=0x0, result=0x8b202e8) at sql_select.cc:183
#21 0x0809c0ce in mysql_execute_command () at sql_parse.cc:2077
#22 0x0809e3b9 in mysql_parse (thd=0x8af83c0, inBuf=0x8af84f0 "\001", length=29) at sql_parse.cc:3055
#23 0x08099c2b in dispatch_command (command=COM_QUERY, thd=0x8af83c0, packet=0x8b180e1 "", packet_length=29)
    at sql_parse.cc:1089
#24 0x08099686 in do_command (thd=0x8af83c0) at sql_parse.cc:959
#25 0x08098da2 in handle_one_connection (arg=0x0) at sql_parse.cc:743
#26 0x082ff615 in pthread_start_thread ()
#27 0x0833cffa in clone ()
[8 Apr 2005 15:11] Sivakumar K
SUGGESTED FIX:

file: bdb/lock-deadlock.c
The cause of this issue is lockp is NULL. hence lockp->obj causes a SIGSEGV.
Can the below fix (come out if lockp is NULL)  be used for solving this issue ?

the diff -Nru output for the file is :

--- lock_deadlock.c.old 2005-04-08 20:36:51.000000000 +0530
+++ lock_deadlock.c     2005-04-08 20:37:32.000000000 +0530
@@ -563,6 +563,9 @@
                        __lock_freelocker(lt, region, lockerp, ndx);
                        goto out;
                }
+                else
+                    goto out;
+
        } else if (R_OFFSET(&lt->reginfo, lockp) != info->last_lock ||
            lockp->status != DB_LSTAT_WAITING) {
                ret = DB_ALREADY_ABORTED;
[12 Apr 2005 14:05] Sivakumar K
I made the changes and the bdb-deadlock test is now passing on a SMP machine without crashing the mysql server.
[7 May 2005 8:13] Jim Winstead
We upgraded the bundled bdb from 4.0 to 4.1, and this bug does not appear to exist in that version.
[28 Jun 2005 19:47] MySQL Verification Team
I was unable to reproduce the behavior reported on Slackware 10.1
Linux version 2.4.29 (root@midas) (gcc version 3.3.4) #6 Thu Jan 20 16:30:37 PST 2005

With the current source.
[27 Jul 2005 15:03] Matthew Lord
I could not repeat the failure on RHAS 3 using our official 4.0.25-max binaries (mysql-max-4.0.25-pc-linux-gnu-i686).

I took these steps:
cd mysql-test
../bin/mysqld --no-defaults --skip-grant-tables     --basedir=.. --datadir=mysql-test/var/master-data --skip-innodb  --skip-warnings &
./mysql-test-run t/bdb-deadlock.test (REPEAT)

I got no failures in about 20 attempts.