Bug #25460 High concurrency MyISAM access causes severe mysqld crash.
Submitted: 8 Jan 2007 3:10 Modified: 15 Mar 2007 18:08
Reporter: Jason Williams Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: General Severity:S1 (Critical)
Version:5.0.27,5.1.11,5.1.14,5.1.15 OS:Solaris (Solaris 10)
Assigned to: Ingo Strüwing CPU Architecture:Any
Tags: bfsm_2007_02_01, bfsm_2007_02_15, regression

[8 Jan 2007 3:10] Jason Williams
Description:
Inexplicably the mysqld daemon will terminate with a signal 11 (or a signal 6 if using --with-debug) and restart. After the restart, the high volume MyISAM tables will be marked crashed and have to be repaired. No error message is logged with the crash normally. Moving the tables to InnoDB rectifies the problem. Unfortunately, some tables must remain MyISAM.

Recompiled 5.1.15 with --with-debug and received the following more detailed error message:

Assertion failed: fixed == 1, file item.h, line 1601
mysqld got signal 6;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=262144000
read_buffer_size=2093056
max_used_connections=105
max_connections=350
threads_connected=103
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 1329787 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

How to repeat:
Create a MyISAM table with multiple programs writing to it...particularly high UPDATE & INSERT traffic. The exact time/parameters required to cause the crash are uncertain as the time required is rarely the same interval. Sometimes 2 weeks sometimes 3 hours. Once it starts happening it begins happening more and more frequently.

Suggested fix:
Moving to InnoDB tables seems to resolve the issue.
[8 Jan 2007 7:10] Jason Williams
At the time of crash memory utilization is quite high with only 500MB free of 8GB. Could moving to InnoDB correct this issue because InnoDB pre-allocates all of its buffers on start up?
[8 Jan 2007 8:07] MySQL Verification Team
please upload output of SHOW CREATE TABLE <table> and complete error log (with resolved stack trace if you have it)
[8 Jan 2007 17:00] Jason Williams
Will upload trace (and core if possible) today. We had compiled with omit-framepointer, am recompiling and going to try and repro the crash through memory pressure. The SHOW CREATE for the crashed tables are below:

mysql> SHOW CREATE TABLE dspam_token_data\G;
*************************** 1. row ***************************
       Table: dspam_token_data
Create Table: CREATE TABLE `dspam_token_data` (
  `uid` bigint(20) unsigned NOT NULL,
  `token` bigint(20) unsigned NOT NULL,
  `spam_hits` int(11) NOT NULL,
  `innocent_hits` int(11) NOT NULL,
  `last_hit` date NOT NULL,
  UNIQUE KEY `id_token_data_01` (`uid`,`token`),
  KEY `spam_hits` (`spam_hits`),
  KEY `innocent_hits` (`innocent_hits`),
  KEY `last_hit` (`last_hit`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PACK_KEYS=1
1 row in set (0.00 sec)

ERROR:
No query specified

mysql> SHOW CREATE TABLE dspam_virtual_uids\G;
*************************** 1. row ***************************
       Table: dspam_virtual_uids
Create Table: CREATE TABLE `dspam_virtual_uids` (
  `uid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `username` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`uid`),
  UNIQUE KEY `id_virtual_uids_01` (`username`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

ERROR:
No query specified

mysql> SHOW CREATE TABLE dspam_stats\G;
*************************** 1. row ***************************
       Table: dspam_stats
Create Table: CREATE TABLE `dspam_stats` (
  `uid` bigint(20) unsigned NOT NULL DEFAULT '0',
  `spam_learned` int(11) NOT NULL,
  `innocent_learned` int(11) NOT NULL,
  `spam_misclassified` int(11) NOT NULL,
  `innocent_misclassified` int(11) NOT NULL,
  `spam_corpusfed` int(11) NOT NULL,
  `innocent_corpusfed` int(11) NOT NULL,
  `spam_classified` int(11) NOT NULL,
  `innocent_classified` int(11) NOT NULL,
  PRIMARY KEY (`uid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

ERROR:
No query specified

mysql> SHOW CREATE TABLE dspam_signature_data\G;
*************************** 1. row ***************************
       Table: dspam_signature_data
Create Table: CREATE TABLE `dspam_signature_data` (
  `uid` bigint(20) unsigned NOT NULL,
  `signature` char(32) NOT NULL,
  `data` blob NOT NULL,
  `length` smallint(6) NOT NULL,
  `created_on` date NOT NULL,
  UNIQUE KEY `id_signature_data_01` (`uid`,`signature`),
  KEY `id_signature_data_02` (`created_on`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 MAX_ROWS=2500000 AVG_ROW_LENGTH=8096
1 row in set (0.00 sec)

ERROR:
No query specified
[8 Jan 2007 17:06] Jason Williams
This is the error log of the primary Solaris 10 SPARC (T2000) server exhibiting the error. There's multi. instances in the file.

Attachment: iss.node1-sqlcluster1.err (application/octet-stream, text), 24.80 KiB.

[8 Jan 2007 17:06] Jason Williams
Error log of Solaris 10 X64 server we moved the DB to, and had the same error.

Attachment: iss.err (application/octet-stream, text), 12.88 KiB.

[16 Jan 2007 10:07] Sveta Smirnova
Thank you for the report.

Is it possible in your situation to create core file? (--core-file option of mysqld, http://dev.mysql.com/doc/refman/5.0/en/server-options.html)
[16 Jan 2007 12:43] Jason Williams
Have not been able to replicate crash since. However, since moving to InnoDB tables on the crashed tables, we have gotten the MySQL server frozen up twice. InnoDB did not report any deadlocks, however all 103 connected threads were stuck in UPDATE or UPDATING state. Only way to release the situation was to "kill -9" the MySQL server. This was 5.1.14-beta on Solaris 10 on AMD64.

Last night moved tables back to MyISAM. One table now reports:

070116  3:15:33 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:15:33 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:15:43 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:15:43 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:15:51 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:15:51 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:15:57 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:15:57 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:58:29 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:58:29 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:58:29 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:58:29 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:58:29 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:58:29 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:58:29 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:58:29 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:58:29 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  3:58:29 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  5:40:36 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'
070116  5:40:49 [ERROR] Got error 127 when reading table './dspam/dspam_signature_data'

This started 1 hour after moving the table to MyISAM. SHOW CREATE TABLE below. Any help is greatly appreciated. Would like to correct this before MySQL crashes.

dspam_signature_data | CREATE TABLE `dspam_signature_data` (
  `uid` bigint(20) unsigned NOT NULL,
  `signature` char(32) NOT NULL,
  `data` blob NOT NULL,
  `length` smallint(6) NOT NULL,
  `created_on` date NOT NULL,
  UNIQUE KEY `id_signature_data_01` (`uid`,`signature`),
  KEY `id_signature_data_02` (`created_on`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 MAX_ROWS=2500000 AVG_ROW_LENGTH=8096
[16 Jan 2007 12:58] Jason Williams
Renamed dspam_signature_data to dspam_signature_data_bad and ran myisamchk on it. Following came back:

-bash-3.00# /opt/mysql-5.1.14-beta/bin/myisamchk -m -T /node1_iss_mysql/dspam/dspam_signature_data_bad.MYI
Checking MyISAM file: /node1_iss_mysql/dspam/dspam_signature_data_bad.MYI
Data records:  177388   Deleted blocks:       0
- check file-size
- check record delete-chain
- check key delete-chain
- check index reference
- check data record references index: 1
- check data record references index: 2
- check record links
-bash-3.00# /opt/mysql-5.1.14-beta/bin/myisamchk -e -T /node1_iss_mysql/dspam/dspam_signature_data_bad.MYI
Checking MyISAM file: /node1_iss_mysql/dspam/dspam_signature_data_bad.MYI
Data records:  177388   Deleted blocks:       0
- check file-size
- check record delete-chain
- check key delete-chain
- check index reference
- check data record references index: 1
- check data record references index: 2
- check records and index references
-bash-3.00# /opt/mysql-5.1.14-beta/bin/myisamchk -e -f  /node1_iss_mysql/dspam/dspam_signature_data_bad.MYI
Checking MyISAM file: /node1_iss_mysql/dspam/dspam_signature_data_bad.MYI
Data records:  177388   Deleted blocks:       0
- check file-size
- check record delete-chain
- check key delete-chain
- check index reference
- check data record references index: 1
- check data record references index: 2
- check records and index references
[16 Jan 2007 13:07] Jason Williams
Trying to convert the binlog from the error 127 timeframe generates the following while creating the .sql file:

bash-3.00# /opt/mysql-5.1.14-beta/bin/mysqlbinlog /node1_iss_mysql_logs/mysql-bin.000016 > /2R0116_0339-1258_mysql-bin-000016.mysql
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}S6aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}T6aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}U6aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}V6aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}W6aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}X6aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}Y6aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}Z6aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}06aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}16aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}26aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}36aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}46aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}56aWDq' (Errcode: 2)
/opt/mysql-5.1.14-beta/bin/mysqlbinlog: Can't create/write to file 'ÿÿÿÿÿÿÿÿ`ï/á9
ÿÿÿÿÿÿÿÿ`}66aWDq' (Errcode: 2)

<snip> 

Errors continue for another few 100 lines.
[16 Jan 2007 13:20] Jason Williams
There are no reads against this server for the dspam_signature_data table, only writes. Cannot find an insert/update/delete query against dspam_signature_table at the same timestamp as a 127 error in the binary log. Uploading raw binary log due to mysqlbinlog problem.
[16 Jan 2007 13:37] Jason Williams
Is there a way to turn on core-files and log options without restarting mysqld?

Also, are there any caveats/things to watch out for with core-files on Solaris?
[16 Jan 2007 16:31] Sveta Smirnova
>Is there a way to turn on core-files and log options without restarting mysqld?
No, you must restart the server.

>Also, are there any caveats/things to watch out for with core-files on Solaris?
Yes, on Solaris, you do not get a core file if you are also using the --user option. It is why I asked you about possibility in your situation to get core file
[17 Jan 2007 2:34] Jason Williams
Hi Sveta,

Thank you for following up. It appears that the mysql.server script starts the process with --user=mysql. Is the problem the --user flag, or the fact that it sets the user to mysql? For example, would setting it to root allow cores to be created?

Also, more 127 errors on the new table (dspam_signature_data) that was created. About 7 of them. Rotated the new table out and created another new one in its place. How can we track down what's causing this? Thank you in advance.
[17 Jan 2007 3:28] Jason Williams
Hi Sveta,

The MySQL instance fully crashed about 1 hour after my last comment. Crashed all the MyISAM tables. I've renamed them and created new ones in their place...populated the new tables with INSERT INTO x SELECT * FROM x_bad;

Unfortunately, I hadn't restarted with core-file yet. I have now. The process command line is below. I su'd to the mysql user and then ran the command line since mysqld would not let me run it as root.  It looks like its running without --user:

mysql    17286  0.1  1.21849220195804 pts/7    S 03:23:54  0:01 /opt/mysql-5.1.14-beta/bin/mysqld --basedir=/opt/mysql-5.1.14-beta --datadir=/node1_iss_mysql --pid-file=/node1_iss_mysql/iss.pid --log-error=/node1_iss_mysql/iss.err --socket=/tmp/mysql.sock --port=3306 --open-files-limit=8192
[17 Jan 2007 3:44] Jason Williams
Sveta,

Will the core file be located in /var/tmp?
[17 Jan 2007 18:42] Sveta Smirnova
>For example, would setting it to root allow cores to be created?

yes

>Will the core file be located in /var/tmp?

Location of core files varies from OS.

On Solaris you can find its location with help of the coreadm comand:

    $ coreadm | grep global
    global core file pattern: /var/cores/core.%f.%p
    global core file content: default
    global core dumps: enabled
    global setid core dumps: disabled
    global core dump logging: disabled
[18 Jan 2007 17:16] Dmitry Kochin
We experience similiar bug. May be, the same one.
We have attached the core dump (It is saved to /var/core on Solaris 10).
http://haddan.ru/download/core/core_db2.haddan.2007-01-13.rar

And we have 4 more dumps if you need.
[18 Jan 2007 17:19] Dmitry Kochin
I have forgotten to reference our submitted bug.
Here it is: http://bugs.mysql.com/bug.php?id=25294
[18 Jan 2007 18:18] Jason Williams
Hi Dmitry,

That sounds very similar to our problem. With the exception that in the first few crashes for you MySQL actually tried to backtrace. This is very frustrating. We've duplicated this problem on multiple Solaris 10 platforms (S10 Update 3 [AMD64 & SPARC], as well as Solaris Express Community Release Build 54 [AMD64]). I think we will finally get MySQL 5.1.14 to core dump properly, just waiting for it to do so. It has stayed up since we turned core-dump on and used coreadm to put the corefile  in /tmp/ (verified with gcore). 

Have you had any luck with an analysis of the coredump by MySQL?
[18 Jan 2007 21:30] Jason Williams
Hi Sveta,

We have not crashed yet, but 127s are starting to happen again. I expect a crash shortly. I have been going through the general query log and have found an interesting pattern. At the same time stamp as each of the 127s against dspam_signature_data, we have a pattern of 4 rapid connect/disconnects from a client followed  by a read query on the dspam_signature_data table. I have included 3 instances below from the general query log. In each instance I have included the entry right before and after the rapid connect/disconnect sequence.

"2007-01-18 19:13:50","dspam[dspam] @  [10.1.58.42]","1765","5819","Query","update dspam_stats set spam_learned = spam_learned - 0, innocent_learned = innocent_learned - 0, spam_misclassified = spam_misclassified - 0, innocent_misclassified = innocent_misclassified - 0, spam_corpusfed = spam_corpusfed - 0, innocent_corpusfed = innocent_corpusfed - 0, spam_classified = spam_classified - 0, innocent_classified = innocent_classified + 1 where uid = 182"
"2007-01-18 19:13:52","dspam[dspam] @  [10.1.58.43]","9219","5819","Connect","dspam@10.1.58.43 on dspam"
"2007-01-18 19:13:52","dspam[dspam] @  [10.1.58.43]","9219","5819","Quit",""
"2007-01-18 19:13:52","dspam[dspam] @  [10.1.58.43]","9220","5819","Connect","dspam@10.1.58.43 on dspam"
"2007-01-18 19:13:52","dspam[dspam] @  [10.1.58.43]","9220","5819","Quit",""
"2007-01-18 19:13:52","dspam[dspam] @  [10.1.58.43]","9221","5819","Connect","dspam@10.1.58.43 on dspam"
"2007-01-18 19:13:52","dspam[dspam] @  [10.1.58.43]","9221","5819","Quit",""
"2007-01-18 19:13:52","dspam[dspam] @  [10.1.58.43]","9222","5819","Connect","dspam@10.1.58.43 on dspam"
"2007-01-18 19:13:52","dspam[dspam] @  [10.1.58.43]","9222","5819","Query","select data, length from dspam_signature_data where uid = 390 and signature = \"45afbe74203618814418829\""

"2007-01-18 19:34:53","dspam[dspam] @  [10.1.58.43]","8776","5819","Query","update dspam_stats set spam_learned = spam_learned - 0, innocent_learned = innocent_learned - 0, spam_misclassified = spam_misclassified - 0, innocent_misclassified = innocent_misclassified - 0, spam_corpusfed = spam_corpusfed - 0, innocent_corpusfed = innocent_corpusfed - 0, spam_classified = spam_classified - 0, innocent_classified = innocent_classified + 1 where uid = 2464"
"2007-01-18 19:34:56","dspam[dspam] @  [10.1.58.42]","9251","5819","Connect","dspam@10.1.58.42 on dspam"
"2007-01-18 19:34:56","dspam[dspam] @  [10.1.58.42]","9251","5819","Quit",""
"2007-01-18 19:34:56","dspam[dspam] @  [10.1.58.42]","9252","5819","Connect","dspam@10.1.58.42 on dspam"
"2007-01-18 19:34:56","dspam[dspam] @  [10.1.58.42]","9252","5819","Quit",""
"2007-01-18 19:34:56","dspam[dspam] @  [10.1.58.42]","9253","5819","Connect","dspam@10.1.58.42 on dspam"
"2007-01-18 19:34:56","dspam[dspam] @  [10.1.58.42]","9253","5819","Quit",""
"2007-01-18 19:34:56","dspam[dspam] @  [10.1.58.42]","9254","5819","Connect","dspam@10.1.58.42 on dspam"
"2007-01-18 19:34:56","dspam[dspam] @  [10.1.58.42]","9254","5819","Query","select data, length from dspam_signature_data where uid = 109 and signature = \"45afc0cb114471702517514\""

"2007-01-18 19:55:41","dspam[dspam] @  [10.1.58.42]","1778","5819","Query","update dspam_stats set spam_learned = spam_learned - 0, innocent_learned = innocent_learned - 0, spam_misclassified = spam_misclassified - 0, innocent_misclassified = innocent_misclassified - 0, spam_corpusfed = spam_corpusfed - 0, innocent_corpusfed = innocent_corpusfed - 0, spam_classified = spam_classified - 0, innocent_classified = innocent_classified + 1 where uid = 1373"
"2007-01-18 19:55:42","dspam[dspam] @  [10.1.58.42]","9358","5819","Connect","dspam@10.1.58.42 on dspam"
"2007-01-18 19:55:42","dspam[dspam] @  [10.1.58.42]","9358","5819","Quit",""
"2007-01-18 19:55:42","dspam[dspam] @  [10.1.58.42]","9359","5819","Connect","dspam@10.1.58.42 on dspam"
"2007-01-18 19:55:42","dspam[dspam] @  [10.1.58.42]","9359","5819","Quit",""
"2007-01-18 19:55:42","dspam[dspam] @  [10.1.58.42]","9360","5819","Connect","dspam@10.1.58.42 on dspam"
"2007-01-18 19:55:42","dspam[dspam] @  [10.1.58.42]","9360","5819","Quit",""
"2007-01-18 19:55:42","dspam[dspam] @  [10.1.58.42]","9361","5819","Connect","dspam@10.1.58.42 on dspam"
"2007-01-18 19:55:42","dspam[dspam] @  [10.1.58.42]","9361","5819","Query","select data, length from dspam_signature_data where uid = 672 and signature = \"45afb4e0114478195311136\""
[18 Jan 2007 22:22] MySQL Verification Team
Jason, what is your thread_cache variable set to?  Does setting it to a very low number make error 127 harder to reproduce?
[18 Jan 2007 22:26] Jason Williams
Hi Shane,

The thread_cache is set to 16 right now. There are two MySQL instances on the box (2x Dual-Core Opteron 2.6GHz) with the same thread_cache setting. I don't know if setting the thread_cache lower will reduce the 127s. I'm hesitant to restart the instance right now as I'm hoping for a crash and core dump soon. Once it crashes I'll lower that and see if it helps. What is a good rule of thumb in your opinion?
[19 Jan 2007 1:36] Jason Williams
Sveta,

The MySQL instance in question finally crashed and cored. I have uploaded a BZIP'd TAR containing the core and the running mysqld binary to the MySQL FTP server. Its name is: 2R01181807_crash-mysqld+corefile.tar.bz2 

Also, I have uploaded a BZIP'd copy of the general log from the time of the crash to the MySQL FTP server. Its name is: 2R01181807_crash_generallog.CSV.bz2

I will upload the error log here presently.
[19 Jan 2007 1:41] Jason Williams
Error log for MySQL instance. Taken right after crash.

Attachment: 2R01181807_crash_iss.err (application/octet-stream, text), 864 bytes.

[19 Jan 2007 1:46] Jason Williams
Backtrace output from GDB against core file and crashed mysqld binary.

Attachment: 2R01181807_crash_gdb-backtrace.txt (text/plain), 10.51 KiB.

[19 Jan 2007 14:23] Jason Williams
Hi Sveta,

With all of my comments, I wanted to make sure you saw the one about the core file upload to the MySQL FTP server?
[19 Jan 2007 20:42] Sveta Smirnova
Thank you for the files, Jason.

I received all.
[23 Jan 2007 9:31] MySQL Verification Team
Hi Jason, please set myisam_use_mmap=off in my.cnf if it's currently on, and check if crash/corruptions occur then (in 5.1.x)
[23 Jan 2007 12:11] Mikhail Petrov
Hi, Jason!
Sorry, I have no idea how to connect with you except this bugzilla.
I've read about your InnoDB freezing. Please, if you'll observe such situation again, send me output of 10-15 consecutive executions of 'show engine innodb status' and, if you can, `vmstat 1` to  madmike at madmike dot ru.
Very thanks.
[23 Jan 2007 18:51] Jason Williams
Hi Shane,

I checked our my.cnf and it does indeed have myisam_use_mmap=1. I set that to myisam_use_mmap=0 and restarted the instance. I was reading that the use_mmap feature utilizes memcpy to cache MyISAM data...and correlated that with the backtrace where our crash was on a memcpy call. Is the function where that memcpy crash occurred a part of the use_mmap code?

Also, reducing the thread_cache to 8 from 16 eliminated the 127s for awhile, but they were back today. It was the longest period without 127s. Could there be an interaction between the thread caching and the myisam_use_mmap code?
[23 Jan 2007 18:53] Jason Williams
Hi Mikhail,

Thank you for the offer! I will do that! If we can get MyISAM to work stably again for these tables that would be my preference as the performance and slave lag is much better. Its interesting that the nature of our concurrency is such that InnoDB gets very behind on these tables, whereas full table locks seem to do a nice job of queuing the queries.
[23 Jan 2007 21:22] MySQL Verification Team
Jason, please use very low thread_cache, or disable it AND disable mmap.  Indeed, mmap should probably not be used on 32bit systems with large tables, due to the low maximum limit that can be mapped (2GB?). Not sure about 64-bit limits.
[23 Jan 2007 23:37] Jason Williams
Hi Shane,

use_mmap is disabled, and we're running with the reduced thread_cache (8) we implemented last week at your suggestion. I'm keeping an eye on the logs. Going through the backtrace it looks indeed like the traceback was in the my_mmap_pread function, as opposed to the my_nommap_pread function. I'm really hopeful this fixes it!  What's the general consensus on the mmap feature for MyISAM?
[23 Jan 2007 23:38] Jason Williams
Shane,

BTW, this is a 4-core (2x Dual-Core 2.6GHz) Opteron box running Solaris Express Build 54, so it is a 64-bit environment.
[25 Jan 2007 18:21] Sveta Smirnova
Backtrace produced by dbx:

  [1] 0xfec68567(0xd8, 0xb, 0xfecb3600), at 0xfec68567 
=>[2] write_core(sig = 0), line 245 in "stacktrace.c"
  [3] handle_segfault(sig = 0), line 2162 in "mysqld.cc"
  [4] 0xfec6791f(0xb, 0x0, 0xfdc60768, 0x81a2d8a, 0x5f7f19b0, 0xfd88a400), at 0xfec6791f 
  [5] libc_init(0xb, 0x0, 0xfdc60768, 0x65b194c, 0x5f8fff88, 0xfdc6078c), at 0xfec5cf9b 
  [6] libc_init(0xb, 0x0, 0xfdc60768, 0xf, 0x0, 0x807007), at 0xfec5d142 
  [7] 0xffffffff(0x5f8fff88, 0x5f7f19b0, 0x26c8, 0x65b194c, 0x0, 0x4, 0x10, 0x0), at 0xffffffff 
  [8] _mi_read_dynamic_record(info = 0x5f8fff88, filepos = 0, buf = 0x6040cc30 "\x85^B"), line 1417 in "mi_dynrec.c"
  [9] mi_rkey(info = (nil), buf = 0x6040cc30 "\x85^B", inx = 0, key = 0x5f996078 "\x85^B", key_len = 0, search_flag = HA_READ_KEY_EXACT), line 151 in "mi_rkey.c"
  [10] ha_myisam::index_read_idx(this = (nil), buf = 0x6040cc30 "\x85^B", index = 0, key = 0x5f996078 "\x85^B", key_len = 104U, find_flag = HA_READ_KEY_EXACT), line 1223 in "ha_myisam.cc"
  [11] join_read_const(tab = (nil)), line 10913 in "sql_select.cc"
  [12] join_read_const_table(tab = 0x5fabab18, pos = (nil)), line 10815 in "sql_select.cc"
  [13] make_join_statistics(join = 0x5fab9728, tables = (nil), conds = 0x5fab9608, keyuse_array = 0x5faba828), line 2421 in "sql_select.cc"
  [14] JOIN::optimize(this = 0x5fab9728), line 759 in "sql_select.cc"
  [15] mysql_select(thd = 0x5f971ed0, rref_pointer_array = 0x5f9722dc, tables = 0x5fab90a0, wild_num = 0, fields = CLASS, conds = 0x5fab9608, og_num = 0, order = (nil), group = (nil), having = (nil), proc_param = (nil), select_options = 2147764736ULL, result = 0x5fab9718, unit = 0x5f971f78, select_lex = (nil)), line 2049 in "sql_select.cc"
  [16] handle_select(thd = 0x5f971ed0, lex = (nil), result = 0x5fab9718, setup_tables_done_option = 0), line 245 in "sql_select.cc"
  [17] execute_sqlcom_select(thd = 0x5f971ed0, all_tables = 0x5fab90a0), line 5285 in "sql_parse.cc"
  [18] mysql_execute_command(thd = 0x5f971ed0), line 2702 in "sql_parse.cc"
  [19] mysql_parse(thd = (nil), inBuf = (nil), length = 0), line 6103 in "sql_parse.cc"
  [20] dispatch_command(command = COM_SLEEP, thd = 0x5f971ed0, packet = 0x5fa55fb9 "select data, length from dspam_signature_data where uid = 645 and signature = "45addddb114474244920162"", packet_length = 104U), line 1862 in "sql_parse.cc"
  [21] do_command(thd = (nil)), line 1631 in "sql_parse.cc"
  [22] handle_one_connection(arg = 0xfdc61fcc), line 1237 in "sql_parse.cc"
  [23] 0xfec675e4(0xfd88a400, 0x0, 0x0, ), at 0xfec675e4 
  [24] 0xfec67840(), at 0xfec67840
[25 Jan 2007 19:29] Sveta Smirnova
Thank you for the report and all information.

Verified as described.
[25 Jan 2007 19:34] Sergei Golubchik
Looks like _mi_read_dynamic_record() - at least - dereferences info->s->file_read without a mutex, even though it can be concurrently set by another thread. This could be a reason for a crash, though unlikely. Anyway, this should be fixed.
[25 Jan 2007 20:43] Jason Williams
Hi Sveta & Sergei,

I'm very happy y'all were able to verify the behavior! I'm anxious to see what the actual issue is.
[30 Jan 2007 22:26] Jason Williams
Have now been running 7 days without any 127s or crashes. Appears to be conclusively linked to enabling myisam_mmap.
[8 Feb 2007 20:25] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19584

ChangeSet@1.2423, 2007-02-08 21:24:12+01:00, istruewing@synthia.local +5 -0
  Bug#25460 - High concurrency MyISAM access causes severe mysqld crash.
  
  Under high load it was possible that memory mapping was started on a table
  while other threads were working with the table.
  
  I fixed the start of memory mapping so that it is done at the first table
  open or when the requesting thread is using the table exclusively only.
[13 Feb 2007 15:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/19767

ChangeSet@1.2423, 2007-02-13 16:33:32+01:00, istruewing@chilla.local +5 -0
  Bug#25460 - High concurrency MyISAM access causes severe mysqld crash.
  
  Under high load it was possible that memory mapping was started on a table
  while other threads were working with the table.
  
  I fixed the start of memory mapping so that it is done at the first table
  open or when the requesting thread is using the table exclusively only.
[14 Feb 2007 11:08] Ingo Strüwing
Queued to 5.1-engines. Note that this fixes one of the mentioned problems only. Work for repeating more problems is in progress.
[20 Feb 2007 18:05] Ingo Strüwing
Pushed to 5.1.16.
Back "in progress" because there are more problems suspected.
[22 Feb 2007 8:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/20343

ChangeSet@1.2432, 2007-02-22 09:12:07+01:00, istruewing@chilla.local +1 -0
  Bug#25460 - High concurrency MyISAM access causes severe mysqld crash.
  Decreased code duplication by calling memory mapping
  function through mi_extra().
[26 Feb 2007 16:49] Ingo Strüwing
I made three attempts now with long-running, high-concurrency tests on multiple machines. I was not able to repeat any problem with the tests I have available. Hence I don't know what to fix. Please reopen if you have more ideas how to repeat the problem.

Regards
Ingo
[13 Mar 2007 16:42] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/21819

ChangeSet@1.2439, 2007-03-13 16:43:45+01:00, istruewing@chilla.local +1 -0
  Bug#25460 - High concurrency MyISAM access causes severe mysqld crash.
  
  The previous two patches for this bug worked together so that
  no permanent table was memory mapped. The first patch tried to
  avoid mapping while a table is in use. It allowed mapping only
  if there was exactly one lock on the table, assuming that the
  calling thread owned it. During mi_open(), a different call to
  memory mapping was coded, which did not have this limitation.
  
  The second patch tried to remove the code duplication and just
  called mi_extra() from mi_open() an thus inherited the limitation.
  But on open, a thread does not have a lock on the table...
  
  A possible solution would be to check for zero or one lock.
  But since I learned that it is safe to memory map a file while
  normal file I/O is done on it, I removed the restriction altogether
  and allow to memory map while a table is in use.
  
  No test case. I do not see a chance to verify with the test suite
  which kind of I/O is used on a table.
[13 Mar 2007 16:44] Ingo Strüwing
Bug database automatic failed once more again. Sigh.

  http://lists.mysql.com/commits/21819

ChangeSet@1.2439, 2007-03-13 16:43:45+01:00, istruewing@chilla.local +1 -0
  Bug#25460 - High concurrency MyISAM access causes severe mysqld crash.
  
  The previous two patches for this bug worked together so that
  no permanent table was memory mapped. The first patch tried to
  avoid mapping while a table is in use. It allowed mapping only
  if there was exactly one lock on the table, assuming that the
  calling thread owned it. During mi_open(), a different call to
  memory mapping was coded, which did not have this limitation.
  
  The second patch tried to remove the code duplication and just
  called mi_extra() from mi_open() an thus inherited the limitation.
  But on open, a thread does not have a lock on the table...
  
  A possible solution would be to check for zero or one lock.
  But since I learned that it is safe to memory map a file while
  normal file I/O is done on it, I removed the restriction altogether
  and allow to memory map while a table is in use.
  
  No test case. I do not see a chance to verify with the test suite
  which kind of I/O is used on a table.
[13 Mar 2007 16:49] Michael Widenius
ok to push
[15 Mar 2007 8:01] Ingo Strüwing
Queued to 5.1-engines. Nothing to document. No behavioral change. Just internal optimization. Improved crash resistence is just theoretical and not expected to show up in any real application.
[15 Mar 2007 18:06] Ingo Strüwing
Pushed to 5.1.17.

Note that this does not fix any of the users problems. I was not able to repeat any. The patch fixes only a theoretic flaw. It is not expected to show any difference in real applications.

Hence I do not close the bug, but set it to "Can't repeat".
[15 Mar 2007 18:08] Jason Williams
Hi Ingo,

Thank you very much for getting this in place. We'll work to get 5.1.17 or newer installed within the next 30 days and see if the problem recurs. Again thank you.