Bug #16572 | Server hangs randomly upon multiple SELECT NOW() | ||
---|---|---|---|
Submitted: | 17 Jan 2006 17:34 | Modified: | 8 Mar 2006 8:29 |
Reporter: | Björn Wiberg | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server | Severity: | S3 (Non-critical) |
Version: | 5.0.18 | OS: | IBM AIX 5.2.0.0 ML5 |
Assigned to: | CPU Architecture: | Any |
[17 Jan 2006 17:34]
Björn Wiberg
[18 Jan 2006 9:35]
Valeriy Kravchuk
Thank you for a problem report. Please, send your my.cnf file content. How many concurrent SELECT NOW() statements are usually needed for server to hang? Is it responding to commands in existing sessions during this "hang"?
[18 Jan 2006 9:51]
Björn Wiberg
Hi Valeriy! Configuration when the error last occurred (yesterday): ---8<--- # MySQL server options [mysqld] character-set-server=latin1 collation-server=latin1_swedish_ci ft_min_word_len=3 innodb_file_per_table interactive_timeout=300 log=/usr/local/mysql/logs/spinus.log log-error=/usr/local/mysql/logs/spinus.err max_connect_errors=999999999 max_user_connections=20 myisam-recover=BACKUP,FORCE skip-innodb sql-mode="NO_ENGINE_SUBSTITUTION" wait_timeout=300 # MySQL client options [mysql] pager=/usr/bin/less -S prompt=\\u@\\h\\_\\d>\\_ # myisamchk options [myisamchk] ft_min_word_len=3 --->8--- Current configuration (added default timezone + downloaded and unpacked timezone tables as AIX doesn't come with /usr/share/zoneinfo or similar): ---8<--- # MySQL server options [mysqld] character-set-server=latin1 collation-server=latin1_swedish_ci default-time-zone="Europe/Stockholm" ft_min_word_len=3 innodb_file_per_table interactive_timeout=300 log=/usr/local/mysql/logs/spinus.log log-error=/usr/local/mysql/logs/spinus.err max_connect_errors=999999999 max_user_connections=20 myisam-recover=BACKUP,FORCE skip-innodb sql-mode="NO_ENGINE_SUBSTITUTION" wait_timeout=300 # MySQL client options [mysql] pager=/usr/bin/less -S prompt=\\u@\\h\\_\\d>\\_ # myisamchk options [myisamchk] ft_min_word_len=3 --->8--- The number of SELECT NOW() required varies, as seen in the examples (13 or 27 such queries at the end). The queries need not be concurrent, however -- in our case, one user, connected but mostly idling, issuing those queries hung the server. It is strange, however, that the server didn't close the connection between the SELECT NOW() queries -- despite our timeout values of 300 seconds -- as some of the queries were more than 300 seconds apart (see the earlier logs). I managed to hang the server, too, after issuing the query containing TO_DAYS() followed by just a couple of SELECT NOW()s. When MySQL hangs due to this, netstat shows that it is still listening for connections, however, when you try to connect to it, nothing happens (no connection gets established). Also, no communication through the MySQL socket is possible, one gets no connection and no response. MySQL does not care about SIGTERM, only SIGKILL stops the server... Perhaps the timezone changes will eliminate the hangs -- I guess only time can tell. I will try to see if I can reproduce the error again by issuing the last queries again. Best regards, Björn
[18 Jan 2006 10:09]
Björn Wiberg
Added the BokLantagare table as an attachment, which the TO_DAYS() query is run against, in case this helps. I suspect that the TO_DAYS() calculations may affect the subsequent hangs when performing SELECT NOW()s. Best regards, Björn
[20 Jan 2006 20:46]
Björn Wiberg
Hi! The MySQL server hung again today, about one and a half hour after me issuing the previously mentioned SELECT with TO_DAYS() followed by SELECT NOW() several times. This time, the end of the query log looks like this: 59157 Connect n08web@spinus.its.UU.SE on 59157 Init DB smalanddb 59157 Query SELECT datum FROM smaland_events WHERE datum LIKE '2006-01-%' 59157 Query SELECT datum, slutdatum, dag FROM smaland_events se, smaland_events_upprepa seu WHERE se.i d=seu.event_id AND se.upprepa=1 AND seu.slutdatum > '2006-01-00' 59157 Quit 060120 18:33:01 59158 Connect geronauw@localhost on 59158 Init DB ittelefonidb 59158 Query SELECT status FROM inter_status WHERE enhet='51' 59158 Init DB ittelefonidb 59158 Query SELECT id,namn FROM inter_opus WHERE id='51' 59158 Init DB ittelefonidb 59158 Query SELECT p.id AS xxx,p.ansvarig,p.rubrik,p.updated,DATE_FORMAT(p.modified,'%Y-%m') AS ymod,C 59158 Init DB ittelefonidb 59158 Query SELECT status FROM inter_status WHERE enhet='51' 59158 Init DB ittelefonidb 59158 Query SELECT id,namn FROM inter_opus WHERE id='51' 59158 Init DB ittelefonidb 59158 Query SELECT p.id AS xxx,p.ansvarig,p.rubrik,p.updated,DATE_FORMAT(p.modified,'%Y-%m') AS ymod,C OUNT(*) AS antal FROM inter_proj AS p LEFT JOIN inter_coop AS c ON p.id=c.projid WHERE p.instkod='51' AND p.status!='r' GROUP BY p.id ORDER BY p.ansvarig,p.rubrik 59158 Init DB ittelefonidb 59158 Query SELECT id FROM inter_aktivitet WHERE user='hidden.hidden@pcr.uu.se' AND enhet='51' 59158 Query UPDATE inter_aktivitet SET hits=hits+1 WHERE user='hidden.hidden@pcr.uu.se' AND enhet='51' 59158 Quit 060120 18:33:04 59159 Connect geronauw@localhost on 59159 Init DB ittelefonidb 59159 Query SELECT status FROM inter_status WHERE enhet='51' 59159 Init DB ittelefonidb 59159 Query SELECT id,namn FROM inter_opus WHERE id='51' 59159 Init DB ittelefonidb 59159 Query SELECT y.name AS region,COUNT(*) AS antal FROM inter_proj AS p LEFT JOIN inter_coop AS c ON c.projid=p.id LEFT JOIN inter_opus AS d ON p.instkod=d.id LEFT JOIN iso3166 AS i ON c.landskod=i.id /* Borde infogas i någon annan tabell */ LEFT JOIN un_regions AS u ON u.id=i.iso LEFT JOIN un_regions AS x ON u.parent=x.id LEFT JOIN un_regions AS y ON x.parent=y.id WHERE p.status='p' AND land IS NOT NULL AND p.instkod='51' GROUP BY y.name ORDER BY antal DESC 59159 Init DB ittelefonidb 59159 Query SELECT p.id,c.land,c.stad,c.univ,p.namn,DATE_FORMAT(p.modified,'%Y-%m-%d') AS modifierad FROM inter_coop AS c LEFT JOIN inter_proj AS p ON c.projid=p.id WHERE p.instkod='51' AND p.status!='r' ORDER BY c.projid 59159 Quit 060120 18:33:09 59161 Connect mikaoste@spinus.its.UU.SE on 59161 Init DB teknatdb 59161 Query SELECT * FROM aktuellt where typ='internt' ORDER BY id DESC After this, the server stopped logging and I had to kill -9 it. Notice the recent call to DATE_FORMAT(). It seems strange to me that one of the date functions always seem to be there... Also notice the strange ordering of the queries in the log file. The same lock waiting message was shown by 'truss' this time. So, I guess the timezone tables (and setting the timezone correctly, instead of the default of SYSTEM) did not help. Best regards, Björn
[21 Jan 2006 19:07]
Björn Wiberg
Hi! Another hang today, similar to the others, after calling a date function: ---8<--- AND ((publish_up >= '2006-01-27 00:00:00' AND publish_up <= '2006-01-27 23:59:59') OR (publish_down >= '2006-01-27 00:00:00' AND publish_down <= '2006-01-27 23:59:59') OR (publish_up <= '2006-01-27 00:00:00' AND publish_down >= '2006-01-27 23:59:59')) AND state='1' ORDER BY publish_up ASC 18793 Query SELECT mos_events.* FROM mos_events, mos_categories as b WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0 AND ((publish_up >= '2006-01-28 00:00:00' AND publish_up <= '2006-01-28 23:59:59') OR (publish_down >= '2006-01-28 00:00:00' AND publish_down <= '2006-01-28 23:59:59') OR (publish_up <= '2006-01-28 00:00:00' AND publish_down >= '2006-01-28 23:59:59')) AND state='1' ORDER BY publish_up ASC 18793 Query SELECT mos_events.* FROM mos_events, mos_categories as b WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0 AND ((publish_up >= '2006-01-29 00:00:00' AND publish_up <= '2006-01-29 23:59:59') OR (publish_down >= '2006-01-29 00:00:00' AND publish_down <= '2006-01-29 23:59:59') OR (publish_up <= '2006-01-29 00:00:00' AND publish_down >= '2006-01-29 23:59:59')) AND state='1' ORDER BY publish_up ASC 18793 Query SELECT mos_events.* FROM mos_events, mos_categories as b WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0 AND ((publish_up >= '2006-01-30 00:00:00' AND publish_up <= '2006-01-30 23:59:59') OR (publish_down >= '2006-01-30 00:00:00' AND publish_down <= '2006-01-30 23:59:59') OR (publish_up <= '2006-01-30 00:00:00' AND publish_down >= '2006-01-30 23:59:59')) AND state='1' ORDER BY publish_up ASC 18793 Query SELECT mos_events.* FROM mos_events, mos_categories as b WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0 AND ((publish_up >= '2006-01-31 00:00:00' AND publish_up <= '2006-01-31 23:59:59') OR (publish_down >= '2006-01-31 00:00:00' AND publish_down <= '2006-01-31 23:59:59') OR (publish_up <= '2006-01-31 00:00:00' AND publish_down >= '2006-01-31 23:59:59')) AND state='1' ORDER BY publish_up ASC 18793 Query SELECT m.* FROM mos_menu AS m WHERE menutype='othermenu' AND published='1' ORDER BY parent,ordering 18793 Query SELECT mos_events.* FROM mos_events, mos_categories as b WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0 AND (mos_events.state='1' AND mos_events.checked_out='0') AND ((publish_up <= '2006-01-21 00:00:00%' AND publish_down >= '2006-01-21 00:00:00%') OR (publish_up <= '2006-01-21 23:59:59%' AND publish_down >= '2006-01-21 23:59:59%') OR (publish_up <= '2006-01-21 23:59:59%' AND publish_up >= '2006-01-21 00:00:00%') OR (publish_down <= '2006-01-21 23:59:59%' AND publish_down >= '2006-01-21 00:00:00%')) ORDER BY publish_up ASC 18793 Query SELECT mos_events.* FROM mos_events, mos_categories as b WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0 AND (mos_events.state='1' AND mos_events.checked_out='0') AND ((publish_up <= '2006-01-21 00:00:00%' AND publish_down >= '2006-01-21 00:00:00%') OR (publish_up <= '2006-01-20 23:59:59%' AND publish_down >= '2006-01-20 23:59:59%') OR (publish_up <= '2006-01-20 23:59:59%' AND publish_up >= '2006-01-21 00:00:00%') OR (publish_down <= '2006-01-20 23:59:59%' AND publish_down >= '2006-01-21 00:00:00%')) ORDER BY publish_up DESC 18794 Connect mikaoste@spinus.its.UU.SE on 18794 Init DB teknatdb 18794 Query SELECT * FROM aktuellt where typ='internt' ORDER BY id DESC 18794 Quit 18793 Quit 060121 16:12:11 18796 Connect mikaoste@spinus.its.UU.SE on 18796 Init DB teknatdb 18796 Query SELECT * FROM aktuellt where typ='utbildning' ORDER BY id DESC 18796 Query SELECT * FROM exjobb where aktiv=1 and utbprogram like '%%' ORDER BY id DESC 18796 Quit 18797 Connect geronauw@localhost on 18797 Init DB ittelefonidb 18797 Query SELECT status FROM inter_status WHERE enhet='51' 18797 Init DB ittelefonidb 18797 Query SELECT id,namn FROM inter_opus WHERE id='51' 18797 Init DB ittelefonidb 18797 Query SELECT p.id AS xxx,p.ansvarig,p.rubrik,p.updated,DATE_FORMAT(p.modified,'%Y-%m') AS ymod,COUNT(*) AS antal FROM inter_pro j AS p LEFT JOIN inter_coop AS c ON p.id=c.projid WHERE p.instkod='51' AND p.status!='r' GROUP BY p.id ORDER BY p.ansvarig,p.rubrik 18797 Init DB ittelefonidb 18797 Query SELECT id FROM inter_aktivitet WHERE user='eriknore' AND enhet='51' 18797 Query INSERT INTO inter_aktivitet (user,enhet,hits) VALUES ('eriknore','51',1) 18797 Quit 060121 16:12:13 18798 Connect sympoll@localhost on 18798 Init DB sympolldb 18798 Query SELECT * FROM sympoll_list,sympoll_data WHERE (sympoll_list.pid='1' AND sympoll_data.pid='1') ORDER BY cid 18798 Quit 060121 16:12:15 18799 Connect geronauw@localhost on 18799 Init DB ittelefonidb 18799 Query SELECT status FROM inter_status WHERE enhet='51' 18799 Init DB ittelefonidb 18799 Query SELECT id,namn FROM inter_opus WHERE id='51' 18799 Init DB ittelefonidb 18799 Query SELECT y.name AS region,COUNT(*) AS antal FROM inter_proj AS p LEFT JOIN inter_coop AS c ON c.projid=p.id LEFT JOIN inter_opus AS d ON p.instkod=d.id LEFT JOIN iso3166 AS i ON c.landskod=i.id /* Borde infogas i någon annan tabell */ LEFT JOIN un_regions AS u ON u.id=i.iso LEFT JOIN un_regions AS x ON u.parent=x.id LEFT JOIN un_regions AS y ON x.parent=y.id WHERE p.status='p' AND land IS NOT NULL AND p.instkod='51' GROUP BY y.name ORDER BY antal DESC 18799 Init DB ittelefonidb 18799 Query SELECT p.id,c.land,c.stad,c.univ,p.namn,DATE_FORMAT(p.modified,'%Y-%m-%d') AS modifierad FROM inter_coop AS c LEFT JOIN inter_proj AS p ON c.projid=p.id WHERE p.instkod='51' AND p.status!='r' ORDER BY c.projid 18799 Quit 060121 16:13:19 18291 Query SELECT true --->8--- Notice the DATE_FORMAT() call at the end. Also notice the "SELECT true", which was the replacement for the earlier "SELECT NOW()" that one of our clients issued to keep a connection (which would otherwise timeout) from timing out. Attaching the complete listings for the two sessions; 18291 and 18799. Sorry for the long listings, but maybe this helps spotting a pattern of some kind related to the date functions. Best regards, Björn
[21 Jan 2006 19:21]
Björn Wiberg
Another notice just to inform that AIX *does* come with timezone information in /usr/share/lib/zoneinfo/, however, this does not include timezone information specific for Sweden (i.e., "Europe/Stockholm"), so we're still using the timezone tables available for download from the MySQL site. Best regards, Björn
[21 Jan 2006 19:39]
Björn Wiberg
Uploaded a corrected version of the general query log excerpts (now with 18291 and 18799, not 18291 and 18797). Best regards, Björn
[23 Jan 2006 13:04]
Björn Wiberg
Hi again! I'm running a periodic mysqladmin command (with "version status processlist extended-status debug") every 10 seconds now to get more debug logs. The database user has adjusted his web application so that it will reconnect if thrown out. It does not perform no-op queries (SELECT true or SELECT NOW()) any longer. Right now we're using 300 seconds timeouts, so we'll see if the absence of no-op queries helps or not. The next step (if it still hangs) will probably be to lower our timeouts below 200 seconds so that the client will be thrown out, to see if "fresh" sessions will cure things. Best regards, Björn
[25 Jan 2006 8:13]
Björn Wiberg
The hang occurred again, despite our server seemingly kicking out the user after 5 minutes of inactivity. The truss/gdb output shows the same as before: ---8<--- root@spinus:/mysql/data# cat spinus.pid 966658 root@spinus:/mysql/data# truss -p 966658 thread_waitlock(0x00000000) (sleeping...) ^CPstatus: process is not stopped root@spinus:/mysql/data# gdb /usr/local/mysql/bin/mysqld 966658 GNU gdb 5.0-aix51-020209 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "powerpc-ibm-aix5.2.0.0"...(no debugging symbols found)... /mysql/data/966658: No such file or directory. Attaching to program: /usr/local/mysql/bin/mysqld, process 966658 (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)...(no debugging symbols found)...0xd339c5d8 in _global_lock_common () (gdb) bt #0 0xd339c5d8 in _global_lock_common () #1 0xffffffff in ?? () from (unknown load module) (gdb) bt full #0 0xd339c5d8 in _global_lock_common () No symbol table info available. #1 0xffffffff in ?? () from (unknown load module) No symbol table info available. (gdb) quit A debugging session is active. Do you still want to close the debugger?(y or n) y Detaching from program: /usr/local/mysql/bin/mysqld, process 966658 --->8--- I'm attaching a file containing the MySQL error log, excerpts from the general query log, and mysqladmin output just before the crash. Notice that the last query is one of the two queries that I attached logs of earlier on (ittelefonidb), which uses date functions. I have now installed the -debug build instead, I'll get back to you if the problem appears again with that one; I see it can give more information on the use of current locks etc. (plus perhaps some useful symbol information in gdb). Do you have any recommendations regarding tracing or anything else that I should try? Thanks in advance! Best regards, Björn
[25 Jan 2006 12:23]
Björn Wiberg
Hi again! Oops... happened again, but I accidentally lost the gdb output. :-/ Truss output was identical; gdb output indicated that it was waiting for a table lock (I do not remember the exact wording). It seems that this happens when there are 64 open tables (see also the last hang) and a lock is attempted for a 65th table! See attached log excerpts. The most interesting part is perhaps this, from the MySQL error log: ---8<--- safe_mutex: Trying to lock unitialized mutex at sql_test.cc, line 377 /../ mysqld got signal 6; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8388600 read_buffer_size=131072 max_used_connections=5 max_connections=100 threads_connected=2 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. --->8--- I'll try raising the memory limit for MySQL (currently it may hit the 128 MB limit because of both /etc/security/limits and the fact that maxDATA of the mysqld executable is set to 0x00000000 and may need to be set to 0x80000000), although the memory statistics above do not indicate any excessive memory usage anywhere near that limit (the mysqladmin samples were taken 10 seconds apart). I'll get back to you if we encounter more hangs -- the "magic" value of 64 for the number of open tables seems too magic to be random. :-) Best regards, Björn
[25 Jan 2006 12:36]
Valeriy Kravchuk
Thank you for a lot of additional information. This 64 open tables magic is really strange. So, please, send the ulimit -a results from the user you run mysql server as. SHOW STATUS results may be useful also.
[25 Jan 2006 12:40]
Björn Wiberg
...although that assertion(?) error was actually after the hang. But may indicate the source of the problems (not possible to use more than 64 locks?). Best regards, Björn
[25 Jan 2006 13:00]
Björn Wiberg
Hi again! root@spinus:/# su - mysql $ ulimit -a time(seconds) unlimited file(blocks) unlimited data(kbytes) 524288 stack(kbytes) 131072 memory(kbytes) 524288 coredump(blocks) 2097151 nofiles(descriptors) 2000 $ Excerpt from /etc/security/limits (512 byte block size, limits measured in blocks): * * For MySQL: Allow writing of large files, raise memory limits * * 512 MB data segment size * 512 MB real memory usage * 128 MB stack segment size * mysql: fsize = -1 data = 1048576 rss = 1048576 stack = 262144 MySQL seems to play nice with the memory, consumes between 10-40 MB on average: mysql 589836 10,2 1,0 19168 35644 pts/3 A 13.52.09 1:28 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --user=mysql --pid-file=/usr/local/mysql/data/spinus.pid root 942166 0,0 0,0 704 712 pts/3 A 13.52.09 0:00 /bin/sh ./bin/mysqld_safe --datadir=/usr/local/mysql/data --pid-file=/usr/local/mysql/data/spinus.pid Best regards, Björn
[25 Jan 2006 13:02]
Björn Wiberg
The SHOW STATUS information is in 060125_1.txt as part of the mysqladmin output. A separate run of SHOW STATUS on a newly started server shows: root@localhost (none)> SHOW STATUS; +-----------------------------------+----------+ | Variable_name | Value | +-----------------------------------+----------+ | Aborted_clients | 0 | | Aborted_connects | 17 | | Binlog_cache_disk_use | 0 | | Binlog_cache_use | 0 | | Bytes_received | 109 | | Bytes_sent | 156 | | Com_admin_commands | 0 | | Com_alter_db | 0 | | Com_alter_table | 0 | | Com_analyze | 0 | | Com_backup_table | 0 | | Com_begin | 0 | | Com_change_db | 0 | | Com_change_master | 0 | | Com_check | 0 | | Com_checksum | 0 | | Com_commit | 0 | | Com_create_db | 0 | | Com_create_function | 0 | | Com_create_index | 0 | | Com_create_table | 0 | | Com_dealloc_sql | 0 | | Com_delete | 0 | | Com_delete_multi | 0 | | Com_do | 0 | | Com_drop_db | 0 | | Com_drop_function | 0 | | Com_drop_index | 0 | | Com_drop_table | 0 | | Com_drop_user | 0 | | Com_execute_sql | 0 | | Com_flush | 0 | | Com_grant | 0 | | Com_ha_close | 0 | | Com_ha_open | 0 | | Com_ha_read | 0 | | Com_help | 0 | | Com_insert | 0 | | Com_insert_select | 0 | | Com_kill | 0 | | Com_load | 0 | | Com_load_master_data | 0 | | Com_load_master_table | 0 | | Com_lock_tables | 0 | | Com_optimize | 0 | | Com_preload_keys | 0 | | Com_prepare_sql | 0 | | Com_purge | 0 | | Com_purge_before_date | 0 | | Com_rename_table | 0 | | Com_repair | 0 | | Com_replace | 0 | | Com_replace_select | 0 | | Com_reset | 0 | | Com_restore_table | 0 | | Com_revoke | 0 | | Com_revoke_all | 0 | | Com_rollback | 0 | | Com_savepoint | 0 | | Com_select | 1 | | Com_set_option | 0 | | Com_show_binlog_events | 0 | | Com_show_binlogs | 0 | | Com_show_charsets | 0 | | Com_show_collations | 0 | | Com_show_column_types | 0 | | Innodb_log_waits | 0 | | Innodb_log_write_requests | 0 | | Innodb_log_writes | 0 | | Innodb_os_log_fsyncs | 0 | | Innodb_os_log_pending_fsyncs | 0 | | Innodb_os_log_pending_writes | 0 | | Innodb_os_log_written | 0 | | Innodb_page_size | 0 | | Innodb_pages_created | 0 | | Innodb_pages_read | 0 | | Innodb_pages_written | 0 | | Innodb_row_lock_current_waits | 0 | | Innodb_row_lock_time | 0 | | Innodb_row_lock_time_avg | 0 | | Innodb_row_lock_time_max | 0 | | Innodb_row_lock_waits | 0 | | Innodb_rows_deleted | 0 | | Innodb_rows_inserted | 0 | | Innodb_rows_read | 0 | | Innodb_rows_updated | 0 | | Key_blocks_not_flushed | 0 | | Key_blocks_unused | 7185 | | Key_blocks_used | 7274 | | Key_read_requests | 10318059 | | Key_reads | 42306 | | Key_write_requests | 3925471 | | Key_writes | 40053 | | Last_query_cost | 0.000000 | | Max_used_connections | 5 | | Not_flushed_delayed_rows | 0 | | Open_files | 112 | | Open_streams | 0 | | Open_tables | 56 | | Opened_tables | 0 | | Qcache_free_blocks | 0 | | Qcache_free_memory | 0 | | Qcache_hits | 0 | | Qcache_inserts | 0 | | Qcache_lowmem_prunes | 0 | | Qcache_not_cached | 0 | | Qcache_queries_in_cache | 0 | | Qcache_total_blocks | 0 | | Questions | 8247 | | Rpl_status | NULL | | Select_full_join | 0 | | Select_full_range_join | 0 | | Select_range | 0 | | Select_range_check | 0 | | Select_scan | 1 | | Slave_open_temp_tables | 0 | | Slave_retried_transactions | 0 | | Slave_running | OFF | | Slow_launch_threads | 0 | | Slow_queries | 0 | | Sort_merge_passes | 0 | | Sort_range | 0 | | Sort_rows | 0 | | Sort_scan | 0 | | Table_locks_immediate | 8469 | | Table_locks_waited | 0 | | Tc_log_max_pages_used | 0 | | Tc_log_page_size | 0 | | Tc_log_page_waits | 0 | | Threads_cached | 0 | | Threads_connected | 2 | | Threads_created | 257 | | Threads_running | 1 | | Uptime | 505 | +-----------------------------------+----------+ 222 rows in set (0.00 sec) Best regards, Björn
[25 Jan 2006 14:17]
Björn Wiberg
Hi again! The default value for the table cache is 64 tables, which explains that number. However, the error I encountered (at the end of 060125_1.txt) where an unitialized mutex lock attempt was made looks strange. Shouldn't mysqld purge parts of the table cache, release associated locks, and use the freed resources for acquiring the new lock, given that it has reached the table cache limit (64)? (I'll send more debug info + gdb info whenever I encounter another of those hangs.) Best regards, Björn
[26 Jan 2006 9:24]
Björn Wiberg
Another note to say that I actually did find occurrences of 65 open tables in the MySQL error log, so it is indeed possible to open more than 64 tables. Although it failed once... We'll see if the raised resource limits affect this. Best regards, Björn
[26 Jan 2006 10:12]
Björn Wiberg
Shouldn't sessions that time out be logged in the MySQL general query log as "Quit", like normal sessions where the user explicitly asks to terminate the connection? Wondering if that may have something to do with it (some cleaning/purging not being performed)... No hangs yet, I'll get back to you as soon as I have more information. Best regards, Björn
[26 Jan 2006 14:39]
Valeriy Kravchuk
I found the following interesting clame among plenty of information you had provided already: "I managed to hang the server, too, after issuing the query containing TO_DAYS() followed by just a couple of SELECT NOW()s." So, is it possible to hang the server with several simple queries from one session only? If yes, please, provide this ideal test case!
[26 Jan 2006 14:47]
Björn Wiberg
Hi Valeriy! Unfortunately, I'm not alone on the server so it may have been other concurrent queries causing the hang, rather than only my queries causing the hang. The query was: SELECT TO_DAYS(now()) - TO_DAYS(Slutdatum) FROM BokLantagare WHERE fLantagare = 0 AND Slutdatum < now() AND aktiv AND fBokLantagareStatus = 1 ...followed by several SELECT NOW(). All this on the BokLantagare table (BokLantagare.sql attached earlier). I have also tried deliberately to hang the server using this query on other occasions, but it didn't hang until hours later that time. So I'm afraid the results may vary. You may wish to try the BokLantagare.sql dump and issuing the select to_days()... query followed by repeated select now()s for a long time to see if this may happen on other platforms as well. Preferrably so that no wait or interactive timeout kicks you out (if set); like in our case where the timeout was set to 5 minutes and the client performed the select now() queries every 2-3 minutes to avoid getting kicked out. I do feel that long-winded sessions increase the chance of achieving the hang... Hoping that the next gdb session (the next hang) will give more input; it did specify a function name, but I do not recall the exact name (and my gdb output was lost during the terminal session last time). Best regards, Björn
[30 Jan 2006 12:48]
Björn Wiberg
No hang so far after raising the memory limits... I'll update this bug report if anything occurs. Best regards, Björn
[31 Jan 2006 7:23]
Valeriy Kravchuk
Please, send the current my.cnf settings and describe your hardware.
[1 Feb 2006 9:21]
Björn Wiberg
No hangs yet, but I noticed that mysqld had been restarted by mysqld_safe this Monday afternoon (MySQL is restarted weekly, at midnight between Sunday and Monday, so I noticed that the uptime was incorrect when using phpMyAdmin). Attaching the output from the logs. Best regards, Björn
[2 Feb 2006 8:15]
Björn Wiberg
Doesn't the thread_waitlock(NULL) in the truss ouput look strange? Hoping for some suggestions on how to debug this issue further... Many thanks in advance! Best regards, Björn
[3 Feb 2006 13:23]
Björn Wiberg
Just came across a segfault crash on another machine, running IBM AIX 5.3.0.0 ML2: ---8<--- 060130 0:00:08 [Note] /usr/local/mysql/bin/mysqld: ready for connections. Version: '5.0.18-standard-log' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Edition - Standard (GPL) mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8388600 read_buffer_size=131072 max_used_connections=11 max_connections=100 threads_connected=3 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. 060202 15:50:32 mysqld restarted 060202 15:50:34 [Note] /usr/local/mysql/bin/mysqld: ready for connections. Version: '5.0.18-standard-log' socket: '/tmp/mysql.sock' port: 3306 MySQL Community Edition - Standard (GPL) 060202 15:51:05 [ERROR] /usr/local/mysql/bin/mysqld: Table './ortnamndb/sth1' is marked as crashed and should be repaired 060202 15:51:05 [Warning] Checking table: './ortnamndb/sth1' 060202 15:51:06 [ERROR] /usr/local/mysql/bin/mysqld: Table './ortnamndb/socken1' is marked as crashed and should be repaired 060202 15:51:06 [Warning] Checking table: './ortnamndb/socken1' 060202 15:54:30 [ERROR] /usr/local/mysql/bin/mysqld: Table './ortnamndb/sdm1' is marked as crashed and should be repaired 060202 15:54:30 [Warning] Checking table: './ortnamndb/sdm1' 060202 16:19:15 [ERROR] /usr/local/mysql/bin/mysqld: Table './mastersdb/students' is marked as crashed and should be repaired 060202 16:19:15 [Warning] Checking table: './mastersdb/students' 060202 16:32:06 [ERROR] /usr/local/mysql/bin/mysqld: Table './bncData/settings' is marked as crashed and should be repaired 060202 16:32:06 [Warning] Checking table: './bncData/settings' --->8--- ---8<--- root@aramus:/# su - mysql $ ulimit -a time(seconds) unlimited file(blocks) unlimited data(kbytes) 524288 stack(kbytes) 131072 memory(kbytes) 524288 coredump(blocks) 2097151 nofiles(descriptors) 2000 --->8--- /etc/security/limits: ---8<--- * * For MySQL: Allow writing of large files, raise memory limits * * 512 MB data segment size * 512 MB real memory usage * 128 MB stack segment size * mysql: fsize = -1 data = 1048576 rss = 1048576 stack = 262144 --->8--- Don't know if this is related, but it sure seems like we're having problems with 5.0.18 on AIX, both 5.2 and 5.3... Best regards, Björn
[3 Feb 2006 13:26]
Björn Wiberg
Forgot to give an excerpt of the MySQL general query log prior to the crash on the 5.3 machine: ---8<--- 060202 15:50:31 17247 Connect pma@spinus.its.UU.SE on 17247 Query SELECT VERSION() 17247 Query SET NAMES utf8 17247 Query SET collation_connection = 'utf8_unicode_ci' 17248 Connect soresigv@spinus.its.UU.SE on 17248 Query SET NAMES utf8 17248 Query SET collation_connection = 'utf8_unicode_ci' 17248 Query SHOW SESSION VARIABLES LIKE 'collation_connection' 17248 Query SHOW SESSION VARIABLES LIKE 'character_set_connection' 17248 Query SHOW CHARACTER SET 17248 Query SHOW COLLATION 17248 Query SELECT COUNT(*) FROM mysql.user 17248 Query SHOW GRANTS 17248 Query USE `sofidb` 17248 Query SHOW VARIABLES LIKE 'language' 17248 Query USE `ortnamndb` 17248 Init DB ortnamndb 17248 Query REPLACE DELAYED INTO socken1 VALUES ("a16201b","Lillhärdal","Svegs och Hede tg","JTL") /usr/local/mysql/bin/mysqld, Version: 5.0.18-standard-log. started with: Tcp port: 0 Unix socket: (null) Time Id Command Argument 060202 15:51:05 2 Connect soresigv@localhost on --->8--- Best regards, Björn
[3 Feb 2006 13:54]
Valeriy Kravchuk
I see this in your query log: "REPLACE DELAYED INTO socken1 ..." and a crash after that. That is why I have to ask you a question: have you created that socken1 table in 5.0.x or it was created in pre-5.0.x MySQL server and just used since than? Very familiar query, that will always fail after that upgrade...
[3 Feb 2006 14:25]
Björn Wiberg
We're using our 4.1 tables (which were 4.0 tables...) with 5.0. I've dumped and re-read some of the databases or tables that have been involved in crashes / have been marked as corrupted, but not done so for all database and/or tables. One interesting thing to note is that the corrupt tables, reported in the MySQL 5 on AIX 5.3 error log, were not found as corrupt by our daily check script, which uses the following for all MyISAM tables: Another interesting thing to notice is that I've encountered, a while back, that a (4.1/4.0) table, which was marked as crashed and repaired with myisamchk of 5.0, was later found corrupt again (just a server start/stop and a check again) by myisamchk. Several runs of 5.0's myisamchk didn't manage to repair the table, despite all -r arguments and such. A repair using 4.1 myisamchk, switching back to 4.1, running 4.1 myisamchk checks on it, indicated that all was well. Switched to 5.0 server again, ran myisamcheck checks on it, all OK, then started 5.0 server for a little while, stopped it, checked the table again, and it was found corrupt. Would you advise me to dump and re-read all databases to eliminate this possiblity? (I know this is recommended in the docs.) That's about the only thing I haven't tried, but would probably need to do so for all databases, not just a couple of them... As for the query that always fails -- by fail, do you mean crashes the server, or only that the query fails? Best regards, Björn
[3 Feb 2006 14:42]
Valeriy Kravchuk
Yes, I recommend to dump all the tables in 4.1 and restore them in 5.0.18. Look at bug #17028 and search our bugs database for INSERT or REPLACE DELAYED in 5.0.x... You'll find many interesting things. Anyway, please, if you can, try to recreate all your databases|tables in 5.0.18 from dump, try the same tests as before and inform about the results. There are many different bugs are mentioned in this report already, and I want to figure out is there something new and unknown in it, or just the results of upgrade...
[3 Feb 2006 14:57]
Björn Wiberg
Hi Valeriy! Thanks for your reply! Would it be OK to dump the databases one by one using 5.0.18 mysqldump, and reload them with 5.0.18 mysql client to the 5.0.18 mysqld server? Best regards, Björn
[3 Feb 2006 15:01]
Valeriy Kravchuk
I think, you'd better use mysqldump from 4.1.x connecting to 4.1.x. Yes, you can dump them one by one, as soon as there are no changes and inter-database relations (you need consisten data, aren't you?). You may start with that database from the latest crash.
[3 Feb 2006 15:08]
Björn Wiberg
I guess the inconsistency problem will be present even if we would use 4.1; we need to prevent access from the network to stop users from updating the tables when performing the dump. I'll perform the dumps early in the morning when usage is as low as possible to minimize the problems. I'll do it with 5.0.18, though. Theoretically, if I get a dump from the 5.0.18 server using the 5.0.18 mysqldump, assume that that one is correct, and recreate the database using that dump, there won't be anything left of the old files, right? And the new tables should be "correct" from 5.0.18:s point of view, right? (Just wanted to check, so that I'm not doing the dumps to no avail :-)). Best regards, Björn
[4 Feb 2006 10:37]
Björn Wiberg
Dumped all databases, one by one, using 5.0.18 mysqldump: mysqldump -u root --password=SECRET --allow-keywords --add-drop-database --create-options --databases --max-allowed-packet=256M --set-variable=max_allowed_packet=256M $1 > a.txt ...and restored them, one by one, using 5.0.18 mysql client: mysql -u root --password=SECRET --max-allowed-packet=256M $1 < ./a.txt Of course, set max_allowed_packet_size=256M in /etc/my.cnf. Just so that no table inserts would be too big durig restore. (Notice the --drop-database argument to mysqldump, meaning that the database directories also got re-created, and contains db.opt files.) Stopped server, ran a check/repair: myisamchk --force --update-state -O key_buffer=64M -O sort_buffer=64M -O read_buffer=1M -O write_buffer=1M $filename ...on each table. Then checked (just to be sure) with: /myisamchk --silent "$filename" ...on each table, all OK, no errors reported. Then restarted server, and ran: mysqlcheck --all-databases --auto-repair --extended --force --repair -u root -p ...just to be sure. Later that night, about an hour later, I got 4 crashes in close succession. See attached file (060204.txt). It seems that it disliked the phpMyAdmin question of listing all databases, when working against the information_schema virtual database: ---8<--- SELECT `information_schema`.`SCHEMATA`.*, COUNT(`information_schema`.`TABLES`.`TABLE_SCHEMA`) AS `SCHEMA_TABLES`, SUM(`information_schema`.`TABLES`.`TABLE_ROWS`) AS `SCHEMA_TABLE_ROWS`, SUM(`information_schema`.`TABLES`.`DATA_LENGTH`) AS `SCHEMA_DATA_LENGTH`, SUM(`information_schema`.`TABLES`.`MAX_DATA_LENGTH`) AS `SCHEMA_MAX_DATA_LENGTH`, SUM(`information_schema`.`TABLES`.`INDEX_LENGTH`) AS `SCHEMA_INDEX_LENGTH`, SUM(`information_schema`.`TABLES`.`DATA_LENGTH` + `information_schema`.`TABLES`.`INDEX_LENGTH`) AS `SCHEMA_LENGTH`, SUM(`information_schema`.`TABLES`.`DATA_FREE`) AS `SCHEMA_DATA_FREE` FROM `information_schema`.`SCHEMATA` LEFT JOIN `information_schema`.`TABLES` ON `information_schema`.`TABLES`.`TABLE_SCHEMA` = `information_schema`.`SCHEMATA`.`SCHEMA_NAME` GROUP BY `information_schema`.`SCHEMATA`.`SCHEMA_NAME` --->8--- No tables has been reported as "crashed and should be repaired" in the MySQL error log, which I am glad for, but the number of crashes close to each other in time suggest that something isn't quite right with the information_schema querying... Best regards, Björn
[4 Feb 2006 10:47]
Björn Wiberg
Also, worth noticing, our current /etc/my.cnf reads: ---8<--- # MySQL server options [mysqld] character-set-server=latin1 collation-server=latin1_swedish_ci default-time-zone="Europe/Stockholm" ft_min_word_len=3 innodb_file_per_table interactive_timeout=60 log=/usr/local/mysql/logs/spinus.log log-error=/usr/local/mysql/logs/spinus.err max_allowed_packet=16M max_connect_errors=999999999 max_user_connections=20 myisam-recover=BACKUP,FORCE skip-innodb sql-mode="NO_ENGINE_SUBSTITUTION" wait_timeout=60 # MySQL client options [mysql] pager=/usr/bin/less -S prompt=\\u@\\h\\_\\d>\\_ # myisamchk options [myisamchk] ft_min_word_len=3 --->8--- Changes from the previous version is the addition of max_allowed_packet to allow 16 MB packets instead of default of 1 MB, and lowered wait and inactive timeouts from 300 to 60 seconds. Best regards, Björn
[4 Feb 2006 11:09]
Björn Wiberg
Another note to inform you that we previously ran: cd /mysql/usr/local/mysql-debug-5.0.18-aix5.2-powerpc/bin /opt/freeware/bin/cp -d -p -f mysqld mysqld.ORIGINAL ldedit -b maxdata:0x80000000 ./mysqld ...to allow mysqld to use more than 128 MB of memory under AIX (old maxDATA value = 0x00000000). Didn't make any difference with the previous hangs, I think (mysqld seems to consume very little memory), but still worth noticing. Best regards, Björn
[5 Feb 2006 8:19]
Björn Wiberg
Hi again! OK, got a couple of crashes (again the "Trying to lock unitialized mutex at sql_test.cc, line 377" message and signal 6), followed by some table being detected as "crashed and should be repaired") yesterday evening: ---8<--- 060204 19:08:39 [ERROR] /usr/local/mysql/bin/mysqld: Table './genpatmambodb/mos_session' is marked as crashed and should be repaired 060204 19:08:39 [Warning] Checking table: './genpatmambodb/mos_session' 060204 19:08:39 [ERROR] /usr/local/mysql/bin/mysqld: Table './genpatmambodb/mos_stats_agents' is marked as crashed and should be repaired 060204 19:08:39 [Warning] Checking table: './genpatmambodb/mos_stats_agents' 060204 19:08:39 [ERROR] /usr/local/mysql/bin/mysqld: Table './genpatmambodb/mos_events' is marked as crashed and should be repaired 060204 19:08:39 [Warning] Checking table: './genpatmambodb/mos_events' 060204 19:08:39 [ERROR] /usr/local/mysql/bin/mysqld: Table './genpatmambodb/mos_content' is marked as crashed and should be repaired 060204 19:08:39 [Warning] Checking table: './genpatmambodb/mos_content' --->8--- ---8<--- 060204 19:08:45 [ERROR] /usr/local/mysql/bin/mysqld: Table './smalanddb/smaland_raknare' is marked as crashed and should be repaired 060204 19:08:45 [Warning] Checking table: './smalanddb/smaland_raknare' 060204 19:08:46 [ERROR] /usr/local/mysql/bin/mysqld: Table './smalanddb/smaland_events' is marked as crashed and should be repaired 060204 19:08:46 [Warning] Checking table: './smalanddb/smaland_events' --->8--- ---8<--- 060204 19:09:49 [ERROR] /usr/local/mysql/bin/mysqld: Table './engelskaparkendb/menylankar' is marked as crashed and should be repaired 060204 19:09:49 [Warning] Checking table: './engelskaparkendb/menylankar' --->8--- ---8<--- 060204 19:20:41 [ERROR] /usr/local/mysql/bin/mysqld: Table './ibkdb/Visitors' is marked as crashed and should be repaired 060204 19:20:41 [Warning] Checking table: './ibkdb/Visitors' --->8--- (I guess the tables were successfully repaired as we have that option in our my.cnf, and our nightly myisamchk --silent didn't find anything wrong.) So it seems that the assert error on uninitialized mutexes did not depend on the version of the MyISAM tables. Checking the general query log, the queries this time were not the information_schema ones that I reported about yesterday morning. See attached file (060204_2.txt). No crashes on the 5.3 machine, but that one has much less traffic. I'm adding the following to my.cnf on both servers now to see if it helps: delay-key-write=OFF flush flush_time=10 safe-mode skip-concurrent-insert skip-thread-priority thread_cache_size=0 Best regards, Björn
[6 Feb 2006 9:25]
Björn Wiberg
Hello Valeriy! Another crash this morning (mysql_060206.txt); please see the attached file. Lots of date functions involved. Note that we are using thread_cache_size=0 so reused threads should not be the cause. Also, flush_time=10 hopefully helps to limit the number of tables involved in the crashes. Does flush_time close open tables that are currently in use by some thread? Or does it close only open but unused tables? What would flush_time=1 or even flush_time=0 mean? Had to turn the "flush" directive off, though, as MySQL started to consume an awful lot of CPU (99-100%), don't know how it handles wait during flushes... Any suggestions as to how to proceed? I'm thinking of running MySQL outside the chroot, or to compile from source to see if that helps (like we used to do before, in the 4.0.x days), but perhaps you have some other things in mind that I should try first. Thanks in advance! Best regards, Björn
[6 Feb 2006 10:01]
Valeriy Kravchuk
Sorry, but this bug report now contains information about several different bugs and crashes, and tend to become a kind of free-form discussion on bugs, workarounds and general MySQL usage practicies. While it is appropriate for issues created by MySQL customers, or for discussions on our forums, our bugs database is not a proper place to do it. So, please, clarify the status of your initial report, about multiple SELECT NOW(), and try to create a small and repeatable test case for it and other crashes you already mentioned. Then we'll have to open a separate bug report for each particular crash case. I can summarize the information you already sent and propose synopsis for each of that potential bug reports, if you want. This report becomes simply hard to understand and maintain now.
[6 Feb 2006 14:37]
Björn Wiberg
Hello Valeriy! OK, I will submit a new bug report regarding the phpMyAdmin causing mysqld to crash; that one seems rather repeatable on our system, and I have also managed to get a coredump from it. I will see if I encounter any more hangs related to SELECT NOW(), but I doubt it -- the signs are probably to vague to go on. Thank you for your help, and see you in another bug report! :-) Best regards, Björn
[8 Feb 2006 8:29]
Valeriy Kravchuk
Reopen this report if you'll get that hang with SELECT NOW() (or SELECT something as a way to keep alive) again. I saw your new report, so let's continue to discuss the other possible bugs there.
[9 Mar 2006 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".