Bug #9282 | API Node Crashes/Reloads on 'DELETE FROM' | ||
---|---|---|---|
Submitted: | 18 Mar 2005 16:15 | Modified: | 13 Oct 2005 11:04 |
Reporter: | David Seltzer | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | 4.1.10 | OS: | Linux (Redhat Enterprise Linux 3.2) |
Assigned to: | Jonas Oreland | CPU Architecture: | Any |
[18 Mar 2005 16:15]
David Seltzer
[21 Mar 2005 10:57]
Johan Andersson
Did the mysqld print a stacktrace? If so, please put the entire stack trace here.
[21 Mar 2005 14:29]
David Seltzer
This version doesnt seem to support the --debug flag for mysqld. The configure options are: ./configure '--prefix=/usr/local/mysql' '--localstatedir=/usr/local/mysql/data' '--libexecdir=/usr/local/mysql/bin' '--with-extra-charsets\ =complex' '--enable-thread-safe-client' '--enable-local-infile' '--with-ndbcluster' '--with-innodb' 'CC=gcc' 'CFLAGS=-march=nocona -O3' 'CXXFLAGS=-march=noco\ na -O3' 'CXX=gcc' The log output on the API node: ------------------------------------------------------------------------------- mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8388600 read_buffer_size=131072 max_used_connections=5 max_connections=100 threads_connected=5 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Number of processes running now: 0 050317 14:18:07 mysqld restarted -------------------------------------------------------------------------------
[22 Mar 2005 1:47]
frank homann
i can confirm this bug with mysql 4.1.10a on a 3 nodes cluster configuration on linux 2.6.10 with nptl... the query responsable for the crash: DELETE FROM max_adviews WHERE t_stamp>=20050321160000 AND t_stamp<20050321170000; on a 587708 rows table: mysql> SHOW COLUMNS FROM max_adviews; +----------+--------------+------+-----+-------------------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+--------------+------+-----+-------------------+-------+ | userid | varchar(32) | | MUL | | | | bannerid | mediumint(9) | | MUL | 0 | | | zoneid | mediumint(9) | | | 0 | | | t_stamp | timestamp | YES | MUL | CURRENT_TIMESTAMP | | | host | varchar(255) | | | | | | source | varchar(50) | | | | | | country | char(2) | | | | | +----------+--------------+------+-----+-------------------+-------+ the backtrace: Cannot determine thread, fp=0xb2f34c1c, backtrace may not be correct. Stack range sanity check OK, backtrace follows: 0x80c7387 0x8331508 0x14 0x826c871 0x8274e3a 0x8273c04 0x826ced8 0x826ce57 0x8287feb 0x832efd2 0x8365d7a New value of fp=(nil) failed sanity check, terminating stack trace! i've also tried to resolve the backtrace, but i get the following output: $ nm -n /usr/sbin/mysqld > mysqld.sym nm: /usr/sbin/mysqld: no symbols
[12 Apr 2005 13:54]
Jonas Oreland
Hi, I tried with a 2-node cluster and 4.1.11. And was unable to get this error. But there is to my knowledge no major difference between 4.1.11 and 4.1.10 regarding this I managed to get missed heartbeat once, but then the transaction was correctly aborted (we're working on improving heartbeat handling). /Jonas
[21 Apr 2005 15:13]
Jonas Oreland
I could not reproduce it simply...but I know johan can..but he's currently in polen
[26 Apr 2005 17:45]
frank homann
hi, i can give you a shell account on our test servers and tell you how to reproduce it... please contact me either with icq (96372521) or jabber (skunk@jabber.linux.it)
[26 Apr 2005 18:53]
frank homann
btw, running against your precompiled mysql-max binary i get the following backtrace: 0x8136a43 0xffffe420 (nil) 0x83db3d4 0x83e057e 0x83dfa76 0x83db936 0x83db8ae 0x83d22d6 0xb7fb8add 0xb7eea58a resolving the stack dump against the symbols file obtained from your mysql-debug binary: # ./resolve_stack_dump -s mysqld.sym -n mysqld.stack 0x8136a43 get_opt_type__14field_longlongP6StringUl + 259 0xffffe420 _end + -140810812 (nil) 0x83db3d4 field_separator + 55508 0x83e057e field_separator + 76414 0x83dfa76 field_separator + 73590 0x83db936 field_separator + 56886 0x83db8ae field_separator + 56750 0x83d22d6 field_separator + 18390 0xb7fb8add _end + -1349055359 0xb7eea58a _end + -1349900498 hope this helps a little more resolving this very nasty bug...
[23 May 2005 8:13]
frank homann
this bug shows up when deleting many rows (ex. 30k+) on a large table (ex. 100k+ rows) at once. as workaround just delete fewer records (ex. using the "limit 20000" statement) and repeat the query until affected_rows<limit_value. it's probably necessary to insert a few seconds delay between deletions in order to avoid this bug...
[16 Sep 2005 13:44]
Fabio Coatti
I can confirm this behaviour also for version 4.1.15-nightly-20050826, compiled for Opteron. (I'm testing this version because of bug #12118) I get no trace or whatever in ndbd logs, in mysqld.log i have only this: mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=16777216 read_buffer_size=131072 max_used_connections=3 max_connections=100 threads_connected=1 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 233983 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Number of processes running now: 0 The system and configuration is the same as bugs #11776 and #12118 (two dual opterons with 8 Gb each, with gentoo linux) The table has 734847 rows. When delete from table is issued, I get this: mysql> delete from table; ERROR 2013 (HY000): Lost connection to MySQL server during query On mgm node, I get this: 2005-09-16 15:39:27 [MgmSrvr] ALERT -- Node 3: Node 6 Disconnected 2005-09-16 15:39:27 [MgmSrvr] INFO -- Node 3: Communication to Node 6 closed 2005-09-16 15:39:27 [MgmSrvr] ALERT -- Node 2: Node 6 Disconnected 2005-09-16 15:39:27 [MgmSrvr] INFO -- Node 2: Communication to Node 6 closed 2005-09-16 15:39:27 [MgmSrvr] INFO -- Mgmt server state: nodeid 6 freed, m_reserved_nodes 000000000000002e. 2005-09-16 15:39:27 [MgmSrvr] INFO -- Mgmt server state: nodeid 4 reserved for ip X.Y.Z.Q, m_reserved_nodes 000000000000003e. 2005-09-16 15:39:27 [MgmSrvr] INFO -- Node 3: Node 4 Connected 2005-09-16 15:39:27 [MgmSrvr] INFO -- Node 3: Node 4: API version 4.1.15 2005-09-16 15:39:27 [MgmSrvr] INFO -- Node 2: Node 4 Connected 2005-09-16 15:39:27 [MgmSrvr] INFO -- Node 2: Node 4: API version 4.1.15 (2,3 are ndb nodes, >= 4 API nodes) If more informations or test are needed, just let me know.
[16 Sep 2005 14:38]
Fabio Coatti
More on this; I've tried to find the number of rows that I can delete with one command; I've seen that with limit 100000 the rows are deleted, but after this I'm unable to re-issue the same command, see: (note, no on else was using the database) mysql> delete from table limit 200000; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 1 Current database: test_d2c ERROR 2013 (HY000): Lost connection to MySQL server during query mysql> select count(*) from table; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 1 Current database: test_d2c +----------+ | count(*) | +----------+ | 734848 | +----------+ 1 row in set (1.02 sec) mysql> delete from table limit 100000; Query OK, 100000 rows affected (2 min 5.01 sec) mysql> select count(*) from table; +----------+ | count(*) | +----------+ | 634848 | +----------+ 1 row in set (0.01 sec) mysql> delete from table limit 200000; ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction mysql> delete from table limit 100000; ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
[19 Sep 2005 10:05]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/internals/30038
[19 Sep 2005 10:58]
Jonas Oreland
Pushed into 4.1.15
[28 Sep 2005 7:28]
Jonas Oreland
And 5.0.14
[5 Oct 2005 16:02]
Jonas Oreland
is more info needed?
[6 Oct 2005 6:33]
Jonas Oreland
When performing large delete, (typically > 100k rows) there _could_ occur a deference of an invalid pointer causing mysqld to crash. The fix introduces lock of a mutex just before changing the mentioned pointer. The mutex is only used when expanding objidmap in ndbapi
[13 Oct 2005 11:04]
Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release. If necessary, you can access the source repository and build the latest available version, including the bugfix, yourself. More information about accessing the source trees is available at http://www.mysql.com/doc/en/Installing_source_tree.html Additional info: Documented bugfix in 4.1.15 and 5.0.14 changelogs. Closed.
[3 Jan 2010 13:41]
manish kumar
HI, When i delete some thing from table.Table getting crash. Manish Kumar