| Bug #21059 | Server crashes on join query with large dataset with NDB tables | ||
|---|---|---|---|
| Submitted: | 14 Jul 2006 13:10 | Modified: | 17 Aug 2006 8:51 |
| Reporter: | Giuseppe Maxia | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S1 (Critical) |
| Version: | 5.1.11, 5.0.19 | OS: | Linux (Linux) |
| Assigned to: | Martin Skold | CPU Architecture: | Any |
| Tags: | cluster | ||
[26 Jul 2006 8:30]
Hartmut Holzgraefe
The query did not crash for me but after running it on the full table without the WHERE clause limitation mysqld had grown to a virtual size of 8GB, so consuming almost all available physical memory on my 64bit test system: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12501 hholzgra 16 0 8084m 7.5g 6292 S 0.0 98.9 8:32.82 mysqld So i'm pretty sure you experienced a crash caused by an out-of-memory situation.
[26 Jul 2006 8:46]
Giuseppe Maxia
As I said in my original report, the query without the WHERE clause is "safe" (for the cluster), while the query without the WHERE clause is not. Anyway, the problem could be an out-of-memory case, because my server has only 4 GB of RAM. The system should not crash, though. The same query in InnoDB or MyISAM works just fine, even with the full database, not only the reduced one, and much faster. Giuseppe
[26 Jul 2006 9:44]
Hartmut Holzgraefe
Even in its simplest form
select player_name
from players
inner join results
using (player_id)
this query leaks memory at about the same speed.
When converting the tables from NDB to MyISAM all is well.
EXPLAIN for NDB tables:
explain select player_name from players inner join results on players.player_id = results.player_id \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: results
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 3400000
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: players
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.results.player_id
rows: 1
Extra:
2 rows in set (0.01 sec)
EXPLAIN for MyISAM tables:
explain select player_name from players inner join results on players.player_id = results.player_id \G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: results
type: index
possible_keys: PRIMARY
key: PRIMARY
key_len: 8
ref: NULL
rows: 3400000
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: players
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.results.player_id
rows: 1
Extra:
2 rows in set (0.00 sec)
[26 Jul 2006 11:13]
Hartmut Holzgraefe
raised to P1 (Critical) due to crashing behavior and probably customers being affected
[7 Aug 2006 11:51]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/10113 ChangeSet@1.2498, 2006-08-07 13:51:20+02:00, mskold@mysql.com +3 -0 Fix for bug #21059 Server crashes on join query with large dataset with NDB tables: Releasing operation for each intermediate batch, before next call to trans->execute(NoCommit);
[7 Aug 2006 12:49]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/10118 ChangeSet@1.2200, 2006-08-07 14:48:54+02:00, mskold@mysql.com +2 -0 Fix for bug #21059 Server crashes on join query with large dataset with NDB tables: Releasing operation for each intermediate batch, before next call to trans->execute(NoCommit);
[7 Aug 2006 13:58]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/10119 ChangeSet@1.2201, 2006-08-07 15:58:43+02:00, mskold@mysql.com +1 -0 Fix for bug #21059 Server crashes on join query with large dataset with NDB tables: missing friend declaration in change set
[9 Aug 2006 12:33]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/10198 ChangeSet@1.2203, 2006-08-09 14:32:56+02:00, mskold@mysql.com +2 -0 Fix for bug #21059 Server crashes on join query with large dataset with NDB tables: added more tests
[14 Aug 2006 7:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/10339 ChangeSet@1.2204, 2006-08-14 09:42:51+02:00, mskold@mysql.com +2 -0 Fix for bug #21059 Server crashes on join query with large dataset with NDB tables: Post review fix, not releasing operation records if BLOB operations
[15 Aug 2006 7:38]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/10424 ChangeSet@1.2537, 2006-08-15 09:38:21+02:00, mskold@mysql.com +5 -0 ndb_lock.test, ndb_lock.result: bug #18184 SELECT ... FOR UPDATE does not work..: New test case ha_ndbcluster.h, ha_ndbcluster.cc, NdbConnection.hpp: Fix for bug #21059 Server crashes on join query with large dataset with NDB tables: Releasing operation for each intermediate batch, before next call to trans->execute(NoCommit);
[15 Aug 2006 11:12]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/10455 ChangeSet@1.2537, 2006-08-15 13:12:27+02:00, mskold@mysql.com +5 -0 ndb_lock.test, ndb_lock.result: bug #18184 SELECT ... FOR UPDATE does not work..: New test case ha_ndbcluster.h, ha_ndbcluster.cc, NdbConnection.hpp: Fix for bug #21059 Server crashes on join query with large dataset with NDB tables: Releasing operation for each intermediate batch, before next call to trans->execute(NoCommit);
[15 Aug 2006 12:31]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/10468 ChangeSet@1.2254, 2006-08-15 14:31:21+02:00, mskold@mysql.com +2 -0 Fix for bug #21059 Server crashes on join query with large dataset with NDB tables: do not release operation records for on-going read_multi_range
[16 Aug 2006 7:39]
Martin Skold
Pushed to 4.1.22, 5.0.25, 5.1.12
[17 Aug 2006 8:51]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.
If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at
http://dev.mysql.com/doc/en/installing-source.html
Documented bugfix in 4.1.22/5.0.25/5.1.12 changelogs.

Description: In a NDB cluster, issuing a query involving two large tables and aggregate functions, the mysqld server crashes. The tables are the following: CREATE TABLE `players` ( `player_id` int(10) unsigned NOT NULL AUTO_INCREMENT, `player_name` varchar(40) NOT NULL DEFAULT '', `TS` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`player_id`), KEY `player_name` (`player_name`) ) ENGINE=ndbcluster DEFAULT CHARSET=latin1 CREATE TABLE `results` ( `player_id` int(10) unsigned NOT NULL DEFAULT '0', `game_id` int(10) unsigned NOT NULL DEFAULT '0', `color` enum('w','b') NOT NULL DEFAULT 'w', `result` decimal(3,1) NOT NULL DEFAULT '0.5', `elo` int(11) NOT NULL DEFAULT '0', `TS` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`player_id`,`game_id`), KEY `game_id` (`game_id`), KEY `result` (`result`), KEY `color` (`color`), KEY `elo` (`elo`) ) ENGINE=ndbcluster DEFAULT CHARSET=latin1 show table status like "players"\G *************************** 1. row *************************** Name: players Engine: NDBCLUSTER Version: 10 Row_format: Dynamic Rows: 147664 Avg_row_length: 16 Data_length: 4849664 Max_data_length: 0 Index_length: 0 Data_free: 0 Auto_increment: 147666 Create_time: NULL Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: number_of_replicas: 2 show table status like "results"\G *************************** 1. row *************************** Name: results Engine: NDBCLUSTER Version: 10 Row_format: Fixed Rows: 3400000 Avg_row_length: 40 Data_length: 137494528 Max_data_length: 0 Index_length: 0 Data_free: 0 Auto_increment: NULL Create_time: NULL Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: number_of_replicas: 2 The query that causes the crash is this: select player_name, count(*) as how_many from players inner join results using (player_id) group by player_name having how_many > 50; As you see, table "results" has 3,400,000 rows. I could not reproduce this bug with a lower number of records. The stack trace resolved from the error log is: 0x81d02a8 handle_segfault + 356 0xd2e888 (?) 0x8311a09 hp_rec_key_cmp + 261 0x8419c12 _ZN3Ndb12getOperationEv + 38 0x842183f _ZN14NdbTransaction15getNdbOperationEPK12NdbTableImplP12NdbOperation + 71 0x8421902 _ZN14NdbTransaction15getNdbOperationEPKN13NdbDictionary5TableE + 38 0x8297f09 _ZN13ha_ndbcluster7pk_readEPKcjPcj + 93 0x829a249 _ZN13ha_ndbcluster23read_range_first_to_bufEPK12st_key_rangeS2_bbPc + 365 0x8299f93 _ZN13ha_ndbcluster10index_readEPcPKcj16ha_rkey_function + 75 0x822c076 _Z13join_read_keyP13st_join_table + 242 0x821e20b _Z10sub_selectP4JOINP13st_join_tableb + 259 0x8224425 _Z20evaluate_join_recordP4JOINP13st_join_tableiPc + 329 0x821e1ba _Z10sub_selectP4JOINP13st_join_tableb + 178 0x82240d8 _Z9do_selectP4JOINP4ListI4ItemEP8st_tableP9Procedure + 268 0x821962d _ZN4JOIN4execEv + 1349 0x821ace1 _Z12mysql_selectP3THDPPP4ItemP13st_table_listjR4ListIS1_ES2_jP8st_orderSB_S2_SB_mP13select_resultP18st_select_lex_unitP13st_sel + 305 0x8216c13 _Z13handle_selectP3THDP6st_lexP13select_resultm + 267 0x81e8443 _Z21mysql_execute_commandP3THD + 899 0x81ef817 _Z11mysql_parseP3THDPcj + 359 0x81e6ccf _Z16dispatch_command19enum_server_commandP3THDPcj + 1007 0x81e68a5 _Z10do_commandP3THD + 129 0x81e5d9d handle_one_connection + 621 0xd28371 (?) 0xbb29be (?) How to repeat: Download the data I used for this test (URL in the private section). Use a 2-node cluster with the following parameters: [NDBD DEFAULT] NoOfReplicas=2 DataMemory=1000M IndexMemory=1024M MaxNoOfUniqueHashIndexes=100 MaxNoOfConcurrentOperations=250000 Create the offending tables, load the data, and issue the above mentioned query. Notice that if you add a WHERE clause limiting the rows to 3,000,000, the bug does not strike. select player_name, count(*) as how_many from players inner join results using (player_id) WHERE game_id < 1500000 # there are two rows for each game_id group by player_name having how_many > 50; (it takes about 4 minutes to run)