Bug #14928 Random server crashes on query that has DISTINCT & IN syntax
Submitted: 14 Nov 2005 19:57 Modified: 16 Nov 2005 17:02
Reporter: Nathan Goff Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S1 (Critical)
Version:5.0.15 OS:Linux (Red Hat EWS 3u4)
Assigned to: Assigned Account CPU Architecture:Any

[14 Nov 2005 19:57] Nathan Goff
Description:
We are experiencing random server crashes on a particular type of query that includes DISTINCT and IN syntax.  We are at a loss for what is wrong.  We just upgraded to 5.0.15 from 4.1.11.  We did not see this problem in 4.1.11 and we have 8 database servers that are exhibiting the same type of random crash.

The offending query: 

SELECT DISTINCT UnitName FROM Build WHERE UnitName LIKE 'Lib%'  AND BuildId IN (SELECT DISTINCT BuildID FROM Sod) ORDER BY UnitName LIMIT 0,10

Stack trace:

0x814e7f8 handle_segfault + 356
0xb75abd28 _end + -1358466344
(nil)
0x82a67e9 btr_search_build_page_hash_index + 509
0x82a4d98 btr_search_info_update_slow + 264
0x829a1a4 btr_cur_search_to_nth_level + 2068
0x827f8db row_sel_try_search_shortcut_for_mysql + 63
0x827c36d row_search_for_mysql + 7581
0x81f302b _ZN11ha_innobase10index_readEPcPKcj16ha_rkey_function + 303
0x812bbd0 _ZN31subselect_uniquesubquery_engine4execEv + 136
0x8128127 _ZN14Item_subselect4execEv + 19
0x812905b _ZN17Item_in_subselect8val_boolEv + 19
0x80f0307 _ZN4Item15val_bool_resultEv + 15
0x8108c83 _ZN17Item_in_optimizer7val_intEv + 71
0x80e303d _ZN4Item8val_boolEv + 41
0x810d80d _ZN13Item_cond_and7val_intEv + 65
0x8198f59 _Z20evaluate_join_recordP4JOINP13st_join_tableiPc + 173
0x8192fa2 _Z10sub_selectP4JOINP13st_join_tableb + 174
0x8198ca9 _Z9do_selectP4JOINP4ListI4ItemEP8st_tableP9Procedure + 281
0x818f8b2 _ZN4JOIN4execEv + 4018
0x81903cd _Z12mysql_selectP3THDPPP4ItemP13st_table_listjR4ListIS1_ES2_jP8st_orderSB_S2_SB_mP13select_resultP18st_select_lex_unitP13st_sel + 305
0x818c6ba _Z13handle_selectP3THDP6st_lexP13select_resultm + 234
0x81621ad _Z21mysql_execute_commandP3THD + 669
0x8168d82 _Z11mysql_parseP3THDPcj + 306
0x8160aa6 _Z16dispatch_command19enum_server_commandP3THDPcj + 1178
0x81605d1 _Z10do_commandP3THD + 129
0x815fab1 handle_one_connection + 569
0xb75a5dec _end + -1358490724
0xb74dfe8a _end + -1359301574

Pertinent log file info:

mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=268435456
read_buffer_size=1044480
max_used_connections=3
max_connections=100
threads_connected=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 466543 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0xa55d3a40
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x60cd3c, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x814e7f8
0x436e58
(nil)
0x82a67e9
0x82a4d98
0x829a1a4
0x827f8db
0x827c36d
0x81f302b
0x812bbd0
0x8128127
0x812905b
0x80f0307
0x8108c83
0x80e303d
0x810d80d
0x8198f59
0x8192fa2
0x8198ca9
0x818f8b2
0x81903cd
0x818c6ba
0x81621ad
0x8168d82
0x8160aa6
0x81605d1
0x815fab1
0x430dec
0xc7aa2a
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do 
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0xabcd5f8 = SELECT DISTINCT UnitName FROM Build WHERE UnitName LIKE 'Lib%'  AND BuildId IN (SELECT DISTINCT BuildID FROM Sod) ORDER BY UnitName LIMIT 0,10
thd->thread_id=1058
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.

Number of processes running now: 0
051114 13:25:52  mysqld restarted

How to repeat:
It is difficult to repeat and appears to be random.  The same query runs fine most of the time, but crashes the server other times.  This happens on multiple servers at completely disconnected physical locations, cities, states and countries.  We are pretty confident it isn't a hardware issue given the distribution of the problem.
[15 Nov 2005 9:49] Valeriy Kravchuk
Thank you for a problem report. Would you, please, send the SHOW CREATE TABLE results for the tables used in the problematic query (Build and Sod). How did you perfromed the upgrade (descibe all steps, please)?

Your my.cnf file content and hardware description may be of some use too.
[15 Nov 2005 16:42] Heikki Tuuri
Hi!

The stack trace has similarity to the ones at:

http://bugs.mysql.com/bug.php?id=14747

Also in that bug it was a 'complex' SQL query that was causing the crash.

But in the stack traces there are crashes in three different places. That suggests a memory corruption bug.

Can you run a debug version of mysqld inside gdb, so that we would get a detailed stack trace?

Build mysql like this:

CFLAGS="-O3 -g" CXXFLAGS="-O3 -g" ./configure

make

Regards,

Heikki
[15 Nov 2005 17:26] Heikki Tuuri
In 5.0.15, btr_search_build_page_hash_index+510 seems to be about the place marked with #### below:

        }

        /* Check that the values for hash index build are sensible */

        if (n_fields + n_bytes == 0) {

                return;
        }

####  if (dict_index_get_n_unique_in_tree(index) < n_fields
                      || (dict_index_get_n_unique_in_tree(index) == n_fields
                          && n_bytes > 0)) {
                return;
        }
[15 Nov 2005 17:29] Heikki Tuuri
Note that also in the other bug report DISTINCT was involved. In theory, this might be massive memory corruption that happens to crash mysqld inside the btr0sea.c routines.

But we should get a gdb trace to be sure where it exactly crashes.
[15 Nov 2005 23:06] Nathan Goff
To get a gdb trace would we just run the pre-compiled debug version of 5.0.15?

I will work on getting the other requested info.
[15 Nov 2005 23:31] Heikki Tuuri
Nathan,

I guess that a pre-compiled mysqld debug binary can be used in gdb.

But what I usually do, I build mysqld from the source like this:

CFLAGS="-O3 -g" CXXFLAGS="-O3 -g" ./configure

make

The mysqld binary is built to directory /sql in the source tree.

Then you have the flexibility to add patches to the source if we come up with something.

If you are able to crash it under gdb, please do:

(gdb) bt full

(gdb) info threads

and if some of the threads are in an 'interesting' function, do:

(gdb) thread <number of the thread>

(gdb) bt full

etc.

Post the WHOLE output here, including all mysqld output.

Regards,

Heikki
[15 Nov 2005 23:47] Nathan Goff
SHOW CREATE TABLE Sod;

| Sod   |CREATE TABLE `Sod` (
  `BuildId` int(11) NOT NULL default '0',
  `PlatformId` varchar(32) NOT NULL default '',
  `MessageId` int(11) NOT NULL default '0',
  `FromSite` varchar(32) NOT NULL default '',
  `UserId` varchar(32) NOT NULL default '',
  `PublishFromLocation` varchar(255) NOT NULL default '',
  `Comments` longtext NOT NULL,
  `PublishDate` int(11) NOT NULL default '0',
  `AutoChangeLog` set('y','n') NOT NULL default 'n',
  `IgnoreSodFiles` tinyint(4) NOT NULL default '0',
  `ArchiveSite` varchar(32) NOT NULL default '',
  PRIMARY KEY  (`BuildId`),
  KEY `PlatformId` (`PlatformId`),
  KEY `FromSite` (`FromSite`),
  KEY `UserId` (`UserId`),
  KEY `MessageId` (`MessageId`),
  CONSTRAINT `Sod_ibfk_4` FOREIGN KEY (`BuildId`) REFERENCES `Build` (`BuildId`),
  CONSTRAINT `Sod_ibfk_5` FOREIGN KEY (`PlatformId`) REFERENCES `Platform` (`PlatformId`),
  CONSTRAINT `Sod_ibfk_6` FOREIGN KEY (`FromSite`) REFERENCES `Site` (`SiteName`),
  CONSTRAINT `Sod_ibfk_7` FOREIGN KEY (`UserId`) REFERENCES `DomainUser` (`UserId`),
  CONSTRAINT `Sod_ibfk_8` FOREIGN KEY (`MessageId`) REFERENCES `Message` (`MessageId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |

SHOW CREATE TABLE Build;

| Build |CREATE TABLE `Build` (
  `BuildId` int(11) NOT NULL auto_increment,
  `UnitName` varchar(32) NOT NULL default '',
  `UserId` varchar(32) NOT NULL default '',
  `CompileTypeName` varchar(32) NOT NULL default '',
  `BuildStart` datetime default NULL,
  `BuildEnd` datetime default NULL,
  `BuildComments` text,
  `ConfigSpec` longtext,
  `SrcViewType` set('static','dynamic') default 'static',
  `ReleaseName` varchar(32) default NULL,
  `ProcessId` int(11) default NULL,
  `BuildLog` longtext,
  `SodPublish` set('y','n') default 'n',
  `BuildStored` longtext,
  `WinUpdateFrom` varchar(128) NOT NULL default '',
  `WinUpdateTo` varchar(128) NOT NULL default '',
  `UnixUpdateFrom` varchar(128) NOT NULL default '',
  `UnixUpdateTo` varchar(128) NOT NULL default '',
  `IsUpdate` set('y','n') NOT NULL default 'n',
  `RestartQueue` set('y','n') NOT NULL default 'n',
  `NotifyUsers` set('y','n') NOT NULL default 'y',
  `CreateInstall` set('y','n') NOT NULL default 'n',
  `SiteName` varchar(32) NOT NULL default '',
  `EmailReportType` set('small','medium','large','xlarge') default NULL,
  `MainwinVer` varchar(64) NOT NULL default '',
  `ReleasedBuild` tinyint(4) NOT NULL default '0',
  PRIMARY KEY  (`BuildId`),
  KEY `CompileTypeName_index` (`CompileTypeName`),
  KEY `UserId_index` (`UserId`),
  KEY `UnitName_index` (`UnitName`),
  KEY `BuildId_UnitName` (`BuildId`,`UnitName`),
  KEY `BuildId` (`BuildId`,`UserId`),
  CONSTRAINT `0_973` FOREIGN KEY (`CompileTypeName`) REFERENCES `CompileType` (`CompileTypeName`),
  CONSTRAINT `0_974` FOREIGN KEY (`UnitName`) REFERENCES `Unit` (`UnitName`),
  CONSTRAINT `0_975` FOREIGN KEY (`UserId`) REFERENCES `DomainUser` (`UserId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
[15 Nov 2005 23:53] Nathan Goff
my.cnf

# Example MySQL config file for large systems.
#
# This is for a large system with memory = 512M where the system runs mainly
# MySQL.
#
# You can copy this file to
# /etc/my.cnf to set global options,
# mysql-data-dir/my.cnf to set server-specific options (in this
# installation this directory is /usr/local/mysql/data) or
# ~/.my.cnf to set user-specific options.
#
# In this file, you can use all long options that a program supports.
# If you want to know which options a program supports, run the program
# with the "--help" option.

# The following options will be passed to all MySQL clients
[client]
#password	= your_password
port		= 3306
socket		= /tmp/mysql.sock

# Here follows entries for some specific programs

# The MySQL server
[mysqld]
user = gbsadmin 
datadir=/gbs/mysql_data
lower_case_table_names = 0
replicate-do-db = gbs
port		= 3306
socket		= /tmp/mysql.sock
skip-locking
key_buffer = 256M
max_allowed_packet = 32M
table_cache = 256
sort_buffer_size = 1M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache = 8
query_cache_size= 16M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 8

# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the "enable-named-pipe" option) will render mysqld useless!
# 
#skip-networking

# Replication Master Server (default)
# binary logging is required for replication
log-bin

# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id = 4

# Replication Slave (comment out master section to use this)
#
# To configure this host as a replication slave, you can choose between
# two methods :
#
# 1) Use the CHANGE MASTER TO command (fully described in our manual) -
#    the syntax is:
#
#    CHANGE MASTER TO MASTER_HOST=<host>, MASTER_PORT=<port>,
#    MASTER_USER=<user>, MASTER_PASSWORD=<password> ;
#
#    where you replace <host>, <user>, <password> by quoted strings and
#    <port> by the master's port number (3306 by default).
#
#    Example:
#
#    CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
#    MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# OR
#
# 2) Set the variables below. However, in case you choose this method, then
#    start replication for the first time (even unsuccessfully, for example
#    if you mistyped the password in master-password and the slave fails to
#    connect), the slave will create a master.info file, and any later
#    change in this file to the variables' values below will be ignored and
#    overridden by the content of the master.info file, unless you shutdown
#    the slave server, delete master.info and restart the slaver server.
#    For that reason, you may want to leave the lines below untouched
#    (commented) and instead use CHANGE MASTER TO (see above)
#
# required unique id between 2 and 2^32 - 1
# (and different from the master)
# defaults to 2 if master-host is set
# but will not function as a slave if omitted
#server-id       = 2
#
# The replication master for this slave - required
#master-host     =   <hostname>
#
# The username the slave will use for authentication when connecting
# to the master - required
#master-user     =   <username>
#
# The password the slave will authenticate with when connecting to
# the master - required
#master-password =   <password>
#
# The port the master is listening on.
# optional - defaults to 3306
#master-port     =  <port>
#
# binary logging - not required for slaves, but recommended
#log-bin

# Point the following paths to different dedicated disks
#tmpdir		= /tmp/		
#log-update 	= /path-to-dedicated-directory/hostname

# Uncomment the following if you are using BDB tables
#bdb_cache_size = 64M
#bdb_max_lock = 100000

# Uncomment the following if you are using InnoDB tables
innodb_data_home_dir=/gbs/mysql_data
innodb_data_file_path=ibdata1:2000M;ibdata2:10M:autoextend
innodb_log_group_home_dir=/gbs/mysql_data
innodb_log_arch_dir=/gbs/mysql_data
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
#innodb_buffer_pool_size = 256M
#innodb_additional_mem_pool_size = 20M
# Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size=250M
innodb_log_files_in_group=2
#innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit = 1
#innodb_lock_wait_timeout = 50

[mysqldump]
quick
max_allowed_packet = 32M

[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates

[isamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M

[myisamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout
[16 Nov 2005 0:00] Nathan Goff
Hardware:

Dell PowerEdge 750
3.4 Ghz, 1MB Cache, 800Mhz FSB
2GB DDR, 400Mhz
160GB SATA, 7.2K RPM
[16 Nov 2005 0:06] Nathan Goff
Upgrade process:

We were running 4.1.11.  Our datadir has always been outside the mysql tree.  Here are the steps we took at all 8 of our production sites that run MySQL and replicate from a master database at one site.

1.  Copy mysql-standard-5.0.15-linux-i686-glibc23.tar.gz to /usr/local
2.  Shutdown current 4.1 server: service mysql stop
3.  Unzip/untar 5.0 and create mysql link to new version in /usr/local
4.  Update my.cnf from running as root to running as gbsadmin
5.  chown/chgrp /usr/local/mysql and datadir location to gbsadmin/sddbuild instead of root
6.  Start 5.0 server: service mysql start
7.  Run mysql_fix_privilege_tables
8.  Restart server: service mysql restart
9.  Reset replication to new 5.0 master
[16 Nov 2005 0:34] Nathan Goff
Good news!  I don't think this is random.  I am now able to reproduce it, but not with a known set of queries.  It is reproducable with a series of clicks on our PHP gui.  I haven't figured out the exact order that causes it, but I can click away and it will crash it.  We use XML HTTP requests to dynamically populate dropdowns, so there are rapid queries going to the database.  That is where we are seeing the crash.

I need some help running mysql in gdb.  I can get mysql compiled debug no problem, but how exactly do I bring it up in gbd.  We have used MySQL for years and a lot of other open source software, but we have never had to debug something like this.  It normally just works :)  Any help you can provide on detailed command line commands to run to get mysqld running in gdb would be very helpful.
[16 Nov 2005 0:35] Nathan Goff
Also...I did read the manual on gdb and mysqld, but it isn't very clear and has a lot of options.  I am not sure what you want from it, so I don't know which way to run it.
[16 Nov 2005 1:08] Nathan Goff
Disregard my previous messages about gdb.  I figured it out.  The help page wasn't as unclear as I thought once I understood gbd a little.  I got it to crash in gdb and ran all the things you wanted me to run.  I have no idea what you mean by "interesting" threads, but I have the debugger open still so if you see anything you want more info on send me the gdb commands and I can run them.

I have attached a file with the gdb output in it.
[16 Nov 2005 1:09] Nathan Goff
crashed mysqld in gdb, here is the output

Attachment: gdb_crash (application/octet-stream, text), 16.21 KiB.

[16 Nov 2005 7:42] Heikki Tuuri
Nathan,

thank you. We are now studying the gdb trace.

Regards,

Heikki
[16 Nov 2005 8:18] Heikki Tuuri
Nathan,

if gdb gives the crash on the right line, then 'index' probably is a garbage pointer, but not NULL.

(Note that in the gdb trace some variables and function parameters are shown wrong because the compiler has optimized the code.)

btr_search_drop_page_hash_index():

"
        if (block->index == NULL) {

                mem_analyze_corruption((byte*)block);

                ut_a(block->index != NULL);
        }

        while (!page_rec_is_supremum(rec)) {
                /* FIXME: in a mixed tree, not all records may have enough
                ordering fields: */
                offsets = rec_get_offsets(rec, block->index,
                                offsets, n_fields + (n_bytes > 0), &heap);
"

rec_get_offsets():
"
        ulint           line)   /* in: line number where called */
{
        ulint   n;
        ulint   size;

        ut_ad(rec);
        ut_ad(index);
        ut_ad(heap);

        if (UNIV_LIKELY(index->table->comp)) {  #### crashes on this line
"

Please do:

(gdb) print index

(gdb) print *index

(gdb) print index->table

(gdb) print *index->table

(gdb) frame 1

(gdb) print *block

Since the crash is so easy to repeat, I believe we will know in 3 days what the bug is. The above looks like either block or index is corrupt. That suggests memory corruption caused by a MySQL or InnoDB bug.

Regards,

Heikki
[16 Nov 2005 11:34] Heikki Tuuri
Vadim's latest research shows that:

http://bugs.mysql.com/bug.php?id=14747

is almost certainly the same bug as this. block->index suddenly becomes NULL in the middle of dropping a page hash index and causes a crash.

Regards,

Heikki
[16 Nov 2005 11:46] Heikki Tuuri
Nathan,

this could be the other interesting thread:

14 Thread -1608717392 (LWP 29689)  0x08301edd in rec_copy_prefix_to_buf (rec=0x8a27c48 "", index=0x8a27c48, 
    n_fields=1783, buf=0x0, buf_size=0xa01cd7f0) at rem0rec.c:1129

The bug looks like some other thread is suddenly causing block->index to become a NULL pointer. The other thread doing active work is the suspect. It is interesting to know what the other thread is doing.

Regards,

Heikki
[16 Nov 2005 14:22] Heikki Tuuri
Nathan,

this is the same bug as:
http://bugs.mysql.com/bug.php?id=14747

A patch may be available in 30 minutes in that other bug report.

Thank you for helping us to find this elusive bug.

--Heikki
[16 Nov 2005 17:02] Nathan Goff
No problem...thanks for helping out!  

Do you still want me to run those other gdb commands?
[16 Nov 2005 17:28] Marko Mäkelä
Nathan,
thanks, no need to run any more gdb commands. We're pretty sure that the bug you found is a duplicate of Bug #14747, which was just fixed. In case the patch attached to Bug #14747 does not work for you, please reopen this bug.