Bug #14730 mysqld terminates on signal 11
Submitted: 8 Nov 2005 4:04 Modified: 14 Dec 2005 7:03
Reporter: Rob Dey Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:4.1.15 OS:Linux (RedHat Linux AS 4.0)
Assigned to: CPU Architecture:Any

[8 Nov 2005 4:04] Rob Dey
Description:
We are running a high transaction rate system which inserts a high number of rows in a number of tables. We also have a transaction that deletes the inserted rows periodically. We beleive that mysql is terminating whilst running the delete queries.
I would expect that there would be about 15000 eligible rows in each table for deletion when the delete statements are issued.

The system runs ok for up to about 45minutes before mysql terminates.

We are running on a machine with 3115132kb of memory.

The delete queries being used are as follows:

DELETE pi, mt FROM JBPM_PROCESSINSTANCE_SIMPLE pi, JBPM_TOKEN_SIMPLE mt WHERE pi.endAt IS NOT NULL AND mt.processInstanceId = pi.id

DELETE nc FROM NOTIFICATION_CASCADE nc WHERE nc.FINISH_AT IS NOT NULL

DELETE ev FROM EVENT ev WHERE NOT EXISTS (SELECT 1 FROM NOTIFICATION_CASCADE nc WHERE ev.ID=nc.EVENT_ID)

The following is the contents of the my.cnf file that it being used:

[mysqld]
skip-external-locking
#log
log-queries-not-using-indexes
log-slow-queries
skip-name-resolve
max_connections = 2000
max_allowed_packet = 262144
table_cache = 512
key_buffer_size = 256M
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
thread_cache = 24
thread_concurrency = 8
query_cache_size = 64M
query_cache_type = 1
query_cache_wlock_invalidate = 0

This is the contents of the .err log file in the mysql directory showing that the server stopped and restarted:

mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=268435456
read_buffer_size=2093056
max_used_connections=384
max_connections=2000
threads_connected=380
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 57520 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd=0x8b6c370
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0x5c94cc, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x812659c
0xb7d7c8
(nil)
0x813ad0e
0x8177580
0x817792f
0x813dd19
0x813ee63
0x813f4e4
0xb77341
0xa09fee
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x9ec303b0  is invalid pointer
thd->thread_id=27
The manual page at http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.

Number of processes running now: 0
051108 16:27:37  mysqld restarted
051108 16:27:39  InnoDB: Started; log sequence number 12 2444472294
051108 16:27:39 [Warning] 'user' entry 'root@localhost.localdomain' ignored in --skip-name-resolve mode.
051108 16:27:39 [Warning] 'user' entry '(null)@localhost.localdomain' ignored in --skip-name-resolve mode.
/usr/sbin/mysqld: ready for connections.
Version: '4.1.15-standard-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Edition - Standard (GPL)

How to repeat:
We werwe running Grinder processes against our system putting through 250 transactions/second over 4 other machines all linked to one database server via the connector/j driver mysql-connector-java-3.1.10.
I don't know quite how you would reproduce it.
[8 Nov 2005 12:39] Valeriy Kravchuk
Thank you for a problem report. Please, send the resolved stack trace (see http://dev.mysql.com/doc/refman/5.0/en/using-stack-trace.html on how to get it). The definitions of the tables used in your DELETE's will be useful too. How many rows are there in these tables?

By the way, why do you give so few memory to MySQL server? Having 3G RAM...
[8 Nov 2005 20:12] Rob Dey
We were running with more memory allocated but were having the same problem so we tuned the memory down to see if it would help. Here is the my.cnf before we reduced the memory allocations:

[mysqld]
skip-external-locking
#log
log-queries-not-using-indexes
log-slow-queries
skip-name-resolve
max_connections = 2000
max_allowed_packet = 1M
table_cache = 512
key_buffer_size = 512M
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
thread_cache = 24
thread_concurrency = 8
query_cache_size = 128M
query_cache_type = 1
query_cache_wlock_invalidate = 0

Here is the resolved stack trace data:

0x812659c handle_segfault + 610
0xb7d7c8 (?)
(nil)
0x813ad0e _Z21mysql_execute_commandP3THD + 9380
0x8177580 _Z12execute_stmtP3THDP18Prepared_statementP6Stringb + 346
0x817792f _Z18mysql_stmt_executeP3THDPcj + 165
0x813dd19 _Z16dispatch_command19enum_server_commandP3THDPcj + 711
0x813ee63 _Z10do_commandP3THD + 127
0x813f4e4 handle_one_connection + 1432
0xb77341 (?)
0xa09fee (?)

Here are the table definitions for the tables involved:

mysql> describe JBPM_PROCESSINSTANCE_SIMPLE;
+---------+------------+------+-----+---------+-------+
| Field   | Type       | Null | Key | Default | Extra |
+---------+------------+------+-----+---------+-------+
| id      | bigint(20) |      | PRI | 0       |       |
| startAt | datetime   | YES  |     | NULL    |       |
| endAt   | datetime   | YES  | MUL | NULL    |       |
| state   | blob       | YES  |     | NULL    |       |
+---------+------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

mysql> describe JBPM_TOKEN_SIMPLE;
+-------------------+--------------+------+-----+---------+-------+
| Field             | Type         | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| id                | bigint(20)   |      | PRI | 0       |       |
| processInstanceId | bigint(20)   | YES  | MUL | NULL    |       |
| actorId           | varchar(255) | YES  | MUL | NULL    |       |
+-------------------+--------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> describe NOTIFICATION_CASCADE;
+-----------------------+--------------+------+-----+---------------------+-------+
| Field                 | Type         | Null | Key | Default             | Extra |
+-----------------------+--------------+------+-----+---------------------+-------+
| ID                    | varchar(36)  |      | PRI |                     |       |
| PROCESS_ID            | varchar(36)  | YES  |     | NULL                |       |
| EVENT_ID              | varchar(36)  |      | MUL |                     |       |
| EVENT_TYPE            | varchar(50)  |      | MUL |                     |       |
| SUBSCRIBER_ID         | varchar(255) |      | MUL |                     |       |
| EVENT_SUBSCRIPTION_ID | varchar(36)  |      |     |                     |       |
| LOCALE                | varchar(32)  |      |     |                     |       |
| TEMPLATE              | varchar(255) |      |     |                     |       |
| START_AT              | datetime     |      |     | 0000-00-00 00:00:00 |       |
| FINISH_AT             | datetime     | YES  | MUL | NULL                |       |
+-----------------------+--------------+------+-----+---------------------+-------+

mysql> describe EVENT;
+-------------+-------------+------+-----+---------------------+-------+
| Field       | Type        | Null | Key | Default             | Extra |
+-------------+-------------+------+-----+---------------------+-------+
| ID          | varchar(36) |      | PRI |                     |       |
| EVENT_TYPE  | varchar(50) |      |     |                     |       |
| RECEIVED_AT | datetime    |      |     | 0000-00-00 00:00:00 |       |
| EXPIRES_AT  | datetime    | YES  |     | NULL                |       |
| DELIVER_AT  | datetime    | YES  |     | NULL                |       |
| ATTRIBUTES  | blob        | YES  |     | NULL                |       |
+-------------+-------------+------+-----+---------------------+-------+
6 rows in set (0.00 sec)

The indexes on these tables are create with:

CREATE INDEX IDX_DEFINITION_NAME ON JBPM_DEFINITION_SIMPLE(name)
CREATE INDEX IDX_PROCESSINSTANCE_ENDAT ON JBPM_PROCESSINSTANCE_SIMPLE(endAt)
CREATE INDEX IDX_TOKEN_PROCESSINSTANCEID ON JBPM_TOKEN_SIMPLE(processInstanceId)
CREATE INDEX IDX_TOKEN_ACTORID ON JBPM_TOKEN_SIMPLE(actorId)
CREATE INDEX IDX_Notification_Event_ID ON NOTIFICATION_CASCADE (EVENT_ID)
CREATE INDEX IDX_Notification_Finish_At ON NOTIFICATION_CASCADE (FINISH_AT)
CREATE INDEX IDX_Event_Type ON NOTIFICATION_CASCADE (EVENT_TYPE)
CREATE INDEX IDX_Subscriber_ID ON NOTIFICATION_CASCADE (SUBSCRIBER_ID)
[8 Nov 2005 20:56] Heikki Tuuri
Rob,

please post SHOW CREATE TABLE for the tables involved.

Regards,

Heikki
Oracle/Innobase
[8 Nov 2005 21:28] Rob Dey
| JBPM_PROCESSINSTANCE_SIMPLE | CREATE TABLE `JBPM_PROCESSINSTANCE_SIMPLE` (
  `id` bigint(20) NOT NULL,
  `startAt` datetime default NULL,
  `endAt` datetime default NULL,
  `state` blob,
  PRIMARY KEY  (`id`),
  KEY `IDX_PROCESSINSTANCE_ENDAT` (`endAt`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

| JBPM_TOKEN_SIMPLE | CREATE TABLE `JBPM_TOKEN_SIMPLE` (
  `id` bigint(20) NOT NULL,
  `processInstanceId` bigint(20) default NULL,
  `actorId` varchar(255) default NULL,
  PRIMARY KEY  (`id`),
  KEY `IDX_TOKEN_PROCESSINSTANCEID` (`processInstanceId`),
  KEY `IDX_TOKEN_ACTORID` (`actorId`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

| EVENT | CREATE TABLE `EVENT` (
  `ID` varchar(36) NOT NULL,
  `EVENT_TYPE` varchar(50) NOT NULL,
  `RECEIVED_AT` datetime NOT NULL,
  `EXPIRES_AT` datetime default NULL,
  `DELIVER_AT` datetime default NULL,
  `ATTRIBUTES` blob,
  PRIMARY KEY  (`ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

| NOTIFICATION_CASCADE | CREATE TABLE `NOTIFICATION_CASCADE` (
  `ID` varchar(36) NOT NULL,
  `PROCESS_ID` varchar(36) default NULL,
  `EVENT_ID` varchar(36) NOT NULL,
  `EVENT_TYPE` varchar(50) NOT NULL,
  `SUBSCRIBER_ID` varchar(255) NOT NULL,
  `EVENT_SUBSCRIPTION_ID` varchar(36) NOT NULL,
  `LOCALE` varchar(32) NOT NULL,
  `TEMPLATE` varchar(255) NOT NULL,
  `START_AT` datetime NOT NULL,
  `FINISH_AT` datetime default NULL,
  PRIMARY KEY  (`ID`),
  KEY `IDX_Notification_Event_ID` (`EVENT_ID`),
  KEY `IDX_Notification_Finish_At` (`FINISH_AT`),
  KEY `IDX_Event_Type` (`EVENT_TYPE`),
  KEY `IDX_Subscriber_ID` (`SUBSCRIBER_ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
[9 Nov 2005 12:13] Valeriy Kravchuk
Thank you for the additional information. The only part missed is the estimated number of rows in each of the tables. We need these to try to create a test case.

As all the tables are MyISAM, please, inform about the maintenance actions you performed with them CHECK, (REPAIR, OPTIMIZE etc.), if any. Have you tried to dump and restore data on the other machine?
[9 Nov 2005 19:31] Rob Dey
I would expect each table based on the runs we were doing to have about 15000 rows inserted and updated each minute.  The deletion queries are scheduled to run every minute or so. The processes would run for about half an hour before mysql would terminate and restart.

We did not perform any maintenance actions on these tables before this problem occurred.

We could try this on another database server if you think it would identify anything different.
[10 Nov 2005 7:43] Valeriy Kravchuk
Please, send the results of

CHECK TABLE tbl_name EXTENDED

for the tables involved.

Describe your hardware also (how many CPUs, first of all).
[10 Nov 2005 21:11] Rob Dey
We have dropped and re-created the table since raising the problem.  I can run the check once I have the tables back again if you like.

Here is the machine info that our database is running on

Intel Xeon CPU 3.6 gHz $ processors.
Physical memory 3Gb

Linus version 2.6.9-11 Red Hat 3.4.3-22
[11 Nov 2005 8:02] Valeriy Kravchuk
If you recreated the tables already, then, please, send the check results I asked about if you get this same crash next time. I believe you can increase your configuration parameters' values to a more reasonable ones for your 3G of RAM.

I'll inform you as soon as I find some possible reason for this kind of crashes on your platform.
[13 Nov 2005 19:27] Rob Dey
I will make sure that I do the CHECK if we ever encounter this again.  We have converted to a NDB Cluster now to see what performance we get from that.  I will post aagain if we encounter this error again.

Thanks.
[14 Nov 2005 7:03] Valeriy Kravchuk
OK. I'll change status to need feedback then. Inform about this problem here if you get new crash.
[15 Dec 2005 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".