Bug #16572 Server hangs randomly upon multiple SELECT NOW()
Submitted: 17 Jan 2006 17:34 Modified: 8 Mar 2006 8:29
Reporter: Björn Wiberg Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.0.18 OS:IBM AIX 5.2.0.0 ML5
Assigned to: CPU Architecture:Any

[17 Jan 2006 17:34] Björn Wiberg
Description:
A database user utilizes our database server through MySQL Connector/J.

The application idles but repeatedly executes SELECT NOW() to avoid being thrown out by our interactive and non-interactive timeout of 300 seconds.

However, sometimes the server freezes (listens on port 3306 but one does not get connected when trying, socket file present but no response when using the MySQL client utility to connect through the socket), and a 'kill -9' on the mysqld process is necessary in order to force it down.

How to repeat:
As this happens sometimes, but not everytime, several factors probably play a role. Therefore, an exact guide on how to reproduce this is not available right now. However, my findings so far are available below.

The general query log shows:

                 155247 Query       SELECT TO_DAYS(now()) - TO_DAYS(Slutdatum) FROM BokLantagare WHERE fLantagare = 2433 AND Slutdatum < now() AND aktiv AND fBokLantagareStatus = 1
                 155247 Query       SELECT NOW()
                 155247 Query       SELECT NOW()
                 155247 Query       SELECT NOW()
060113 10:25:02  155247 Query       SELECT NOW()
                 155247 Query       SELECT NOW()
060113 10:31:43  155247 Query       SELECT NOW()
060113 10:35:02  155247 Query       SELECT NOW()
                 155247 Query       SELECT NOW()
                 155247 Query       SELECT NOW()
060113 10:45:02  155247 Query       SELECT NOW()
060113 10:48:22  155247 Query       SELECT NOW()
                 155247 Query       SELECT NOW()
060113 10:55:02  155247 Query       SELECT NOW()

On another occasion, the query log showed:

                  42276 Query       SELECT TO_DAYS(now()) - TO_DAYS(Slutdatum) FROM BokLantagare WHERE fLantagare = 0 AND Slutdatum < now() AND aktiv AND fBokLantagareStatus = 1
060117 12:58:18   42276 Query       SELECT NOW()
060117 13:01:38   42276 Query       SELECT NOW()
060117 13:04:58   42276 Query       SELECT NOW()
060117 13:08:18   42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
060117 13:21:38   42276 Query       SELECT NOW()
060117 13:24:58   42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
060117 13:41:38   42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
060117 13:48:18   42276 Query       SELECT NOW()
060117 13:51:38   42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
060117 13:58:18   42276 Query       SELECT NOW()
060117 14:01:38   42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
060117 14:11:38   42276 Query       SELECT NOW()
                  42276 Query       SELECT NOW()
060117 14:18:18   42276 Query       SELECT NOW()
060117 14:21:38   42276 Query       SELECT NOW()
060117 14:24:58   42276 Query       SELECT NOW()

On a third occasion, we captured some process information using truss and other utilities:

spinus$ lsof -i | grep -i mysql
lsof: WARNING: compiled for AIX version 5.1.0.0; this is 5.2.0.0.
mysqld    6737926   mysql    4u  IPv4 0xf10000f005273358       0t0    TCP *:3306 (LISTEN)
mysqld    6737926   mysql    6u  IPv4 0xf10000f0050dc358    0t1099    TCP loopback:3306->loopback:63800 (ESTABLISHED)
mysqld    6737926   mysql    9u  IPv4 0xf10000f005594b58  0t196814    TCP www-hotel.uu.se:3306->nl105-252-42.student.uu.se:2062 (CLOSE_WAIT)
mysqld    6737926   mysql   10u  IPv4 0xf10000f0032ea358       0t0    TCP www-hotel.uu.se:3306->spinus.its.uu.se:63801 (CLOSE_WAIT)

spinus$ truss -p 6737926
thread_waitlock(0x00000000)     (sleeping...)

spinus$ gdb /usr/local/mysql/bin/mysqld 6737926
GNU gdb 5.0-aix51-020209
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc-ibm-aix5.2.0.0"...(no debugging symbols found)...
/home/pont/6737926: No such file or directory.
Attaching to program: /usr/local/mysql/bin/mysqld, process 6737926
thread_waitlock(0x00000000)                     = 1
truss: 0915-015 Cannot create subject process.
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...0xd01445d8 in _global_lock_common ()
(gdb) bt
#0  0xd01445d8 in _global_lock_common ()
#1  0xffffffff in ?? () from (unknown load module)
(gdb) quit
A debugging session is active.
Do you still want to close the debugger?(y or n) y
Detaching from program: /usr/local/mysql/bin/mysqld, process 6737926

System load was not very high when this happened. It has happened both during I/O-intensive periods and non-I/O-intensive periods.

A ps listing shows, for example:

mysql    1265814  3,7  0,0 24332  492      - A      jan 16 174:42 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --user=mysql --pid-file=/usr/local/mysql/data/spinus.pid 
root     3911720  0,0  0,0  704   24      - A      jan 16  0:00 /bin/sh ./bin/mysqld_safe --datadir=/usr/local/mysql/data --pid-file=/usr/local/mysql/data/spinus.pid 

Any ideas what could be wrong?

For now, we have advised the user to use something else than SELECT NOW(), in case this is what causes the hangs.

A "single" (or even multiple) SELECT NOW() queries seem to work fine, so perhaps this is something that accumulates over time?

Also, I'm a bit curious as to why the interactive and non-interactive timeouts haven't thrown the database user out, as it is set to 300 seconds but the general query log indicates that on several occasions, the amount of time between the SELECT NOW() calls is clearly longer than 300 seconds?

Furthermore, it seems that when this happens, chances are that log segments indicating the actions are lost. For instance, I executed the SELECT TO_DAYS(now())... query followed by a couple of SELECT NOW() queries, and the server hanged. No traces of this in the logfile whatsoever...

I just noticed that our AIX system doesn't have zoneinfo file, and the time zone tables are empty. Perhaps this affects the TO_DAYS() calls. I'll try installing the time zone tables from the ZIP file to see if this changes things.

Best regards,
Björn
[18 Jan 2006 9:35] Valeriy Kravchuk
Thank you for a problem report. Please, send your my.cnf file content. How many concurrent SELECT NOW() statements are usually needed for server to hang? Is it responding to commands in existing sessions during this "hang"?
[18 Jan 2006 9:51] Björn Wiberg
Hi Valeriy!

Configuration when the error last occurred (yesterday):

---8<---
# MySQL server options
[mysqld]
character-set-server=latin1
collation-server=latin1_swedish_ci
ft_min_word_len=3
innodb_file_per_table
interactive_timeout=300
log=/usr/local/mysql/logs/spinus.log
log-error=/usr/local/mysql/logs/spinus.err
max_connect_errors=999999999
max_user_connections=20
myisam-recover=BACKUP,FORCE
skip-innodb
sql-mode="NO_ENGINE_SUBSTITUTION"
wait_timeout=300

# MySQL client options
[mysql]
pager=/usr/bin/less -S
prompt=\\u@\\h\\_\\d>\\_

# myisamchk options
[myisamchk]
ft_min_word_len=3
--->8---

Current configuration (added default timezone + downloaded and unpacked timezone tables as AIX doesn't come with /usr/share/zoneinfo or similar):

---8<---
# MySQL server options
[mysqld]
character-set-server=latin1
collation-server=latin1_swedish_ci
default-time-zone="Europe/Stockholm"
ft_min_word_len=3
innodb_file_per_table
interactive_timeout=300
log=/usr/local/mysql/logs/spinus.log
log-error=/usr/local/mysql/logs/spinus.err
max_connect_errors=999999999
max_user_connections=20
myisam-recover=BACKUP,FORCE
skip-innodb
sql-mode="NO_ENGINE_SUBSTITUTION"
wait_timeout=300

# MySQL client options
[mysql]
pager=/usr/bin/less -S
prompt=\\u@\\h\\_\\d>\\_

# myisamchk options
[myisamchk]
ft_min_word_len=3
--->8---

The number of SELECT NOW() required varies, as seen in the examples (13 or 27 such queries at the end). The queries need not be concurrent, however -- in our case, one user, connected but mostly idling, issuing those queries hung the server.

It is strange, however, that the server didn't close the connection between the SELECT NOW() queries -- despite our timeout values of 300 seconds -- as some of the queries were more than 300 seconds apart (see the earlier logs).

I managed to hang the server, too, after issuing the query containing TO_DAYS() followed by just a couple of SELECT NOW()s.

When MySQL hangs due to this, netstat shows that it is still listening for connections, however, when you try to connect to it, nothing happens (no connection gets established). Also, no communication through the MySQL socket is possible, one gets no connection and no response. MySQL does not care about SIGTERM, only SIGKILL stops the server...

Perhaps the timezone changes will eliminate the hangs -- I guess only time can tell. I will try to see if I can reproduce the error again by issuing the last queries again.

Best regards,
Björn
[18 Jan 2006 10:09] Björn Wiberg
Added the BokLantagare table as an attachment, which the TO_DAYS() query is run against, in case this helps.

I suspect that the TO_DAYS() calculations may affect the subsequent hangs when performing SELECT NOW()s.

Best regards,
Björn
[20 Jan 2006 20:46] Björn Wiberg
Hi!

The MySQL server hung again today, about one and a half hour after me issuing the previously mentioned SELECT with TO_DAYS() followed by SELECT NOW() several times.

This time, the end of the query log looks like this:

                  59157 Connect     n08web@spinus.its.UU.SE on
                  59157 Init DB     smalanddb
                  59157 Query       SELECT datum FROM smaland_events WHERE
                  datum LIKE '2006-01-%'
                  59157 Query       SELECT datum, slutdatum, dag FROM
                  smaland_events se, smaland_events_upprepa seu WHERE se.i
d=seu.event_id AND se.upprepa=1 AND seu.slutdatum > '2006-01-00'
                  59157 Quit
060120 18:33:01   59158 Connect     geronauw@localhost on
                  59158 Init DB     ittelefonidb
                  59158 Query       SELECT status FROM inter_status WHERE
                  enhet='51'
                  59158 Init DB     ittelefonidb
                  59158 Query       SELECT id,namn FROM inter_opus WHERE
                  id='51'
                  59158 Init DB     ittelefonidb
                  59158 Query       SELECT p.id AS
                  xxx,p.ansvarig,p.rubrik,p.updated,DATE_FORMAT(p.modified,'%Y-%m')
                  AS ymod,C
                  59158 Init DB     ittelefonidb
                  59158 Query       SELECT status FROM inter_status WHERE
                  enhet='51'
                  59158 Init DB     ittelefonidb
                  59158 Query       SELECT id,namn FROM inter_opus WHERE
                  id='51'
                  59158 Init DB     ittelefonidb
                  59158 Query       SELECT p.id AS
                  xxx,p.ansvarig,p.rubrik,p.updated,DATE_FORMAT(p.modified,'%Y-%m')
                  AS ymod,C
OUNT(*) AS antal FROM inter_proj AS p
LEFT JOIN inter_coop AS c ON p.id=c.projid
WHERE p.instkod='51' AND p.status!='r'
GROUP BY p.id
ORDER BY p.ansvarig,p.rubrik
                  59158 Init DB     ittelefonidb
                  59158 Query       SELECT id FROM inter_aktivitet WHERE
                  user='hidden.hidden@pcr.uu.se' AND enhet='51'
                  59158 Query       UPDATE inter_aktivitet SET hits=hits+1
                  WHERE user='hidden.hidden@pcr.uu.se' AND enhet='51'
                  59158 Quit
060120 18:33:04   59159 Connect     geronauw@localhost on
                  59159 Init DB     ittelefonidb
                  59159 Query       SELECT status FROM inter_status WHERE
                  enhet='51'
                  59159 Init DB     ittelefonidb
                  59159 Query       SELECT id,namn FROM inter_opus WHERE
                  id='51'
                  59159 Init DB     ittelefonidb
                  59159 Query       SELECT y.name AS region,COUNT(*) AS
                  antal
FROM inter_proj AS p
LEFT JOIN inter_coop AS c ON  c.projid=p.id
LEFT JOIN inter_opus AS d ON p.instkod=d.id
LEFT JOIN iso3166 AS i ON c.landskod=i.id /* Borde infogas i någon annan
                  tabell */
LEFT JOIN un_regions AS u ON u.id=i.iso
LEFT JOIN un_regions AS x ON u.parent=x.id
LEFT JOIN un_regions AS y ON x.parent=y.id
WHERE p.status='p' AND land IS NOT NULL AND p.instkod='51'
GROUP BY y.name
ORDER BY antal DESC
                  59159 Init DB     ittelefonidb
                  59159 Query       SELECT
                  p.id,c.land,c.stad,c.univ,p.namn,DATE_FORMAT(p.modified,'%Y-%m-%d')
                  AS modifierad
FROM inter_coop AS c
LEFT JOIN inter_proj AS p ON c.projid=p.id
WHERE p.instkod='51' AND p.status!='r'
ORDER BY c.projid
                  59159 Quit
060120 18:33:09   59161 Connect     mikaoste@spinus.its.UU.SE on
                  59161 Init DB     teknatdb
                  59161 Query       SELECT * FROM aktuellt where
                  typ='internt' ORDER BY id DESC

After this, the server stopped logging and I had to kill -9 it. Notice the recent call to DATE_FORMAT(). It seems strange to me that one of the date functions always seem to be there...

Also notice the strange ordering of the queries in the log file.

The same lock waiting message was shown by 'truss' this time.

So, I guess the timezone tables (and setting the timezone correctly, instead of the default of SYSTEM) did not help.

Best regards,
Björn
[21 Jan 2006 19:07] Björn Wiberg
Hi!

Another hang today, similar to the others, after calling a date function:

---8<---
AND ((publish_up >= '2006-01-27 00:00:00' AND publish_up <= '2006-01-27 23:59:59')
        OR (publish_down >= '2006-01-27 00:00:00' AND publish_down <= '2006-01-27 23:59:59')
        OR (publish_up <= '2006-01-27 00:00:00' AND publish_down >= '2006-01-27 23:59:59')) AND state='1'
ORDER BY publish_up ASC
                  18793 Query       SELECT mos_events.* FROM mos_events, mos_categories as b
WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0
AND ((publish_up >= '2006-01-28 00:00:00' AND publish_up <= '2006-01-28 23:59:59')
        OR (publish_down >= '2006-01-28 00:00:00' AND publish_down <= '2006-01-28 23:59:59')
        OR (publish_up <= '2006-01-28 00:00:00' AND publish_down >= '2006-01-28 23:59:59')) AND state='1'
ORDER BY publish_up ASC
                  18793 Query       SELECT mos_events.* FROM mos_events, mos_categories as b
WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0
AND ((publish_up >= '2006-01-29 00:00:00' AND publish_up <= '2006-01-29 23:59:59')
        OR (publish_down >= '2006-01-29 00:00:00' AND publish_down <= '2006-01-29 23:59:59')
        OR (publish_up <= '2006-01-29 00:00:00' AND publish_down >= '2006-01-29 23:59:59')) AND state='1'
ORDER BY publish_up ASC
                  18793 Query       SELECT mos_events.* FROM mos_events, mos_categories as b
WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0
AND ((publish_up >= '2006-01-30 00:00:00' AND publish_up <= '2006-01-30 23:59:59')
        OR (publish_down >= '2006-01-30 00:00:00' AND publish_down <= '2006-01-30 23:59:59')
        OR (publish_up <= '2006-01-30 00:00:00' AND publish_down >= '2006-01-30 23:59:59')) AND state='1'
ORDER BY publish_up ASC
                  18793 Query       SELECT mos_events.* FROM mos_events, mos_categories as b
WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0
AND ((publish_up >= '2006-01-31 00:00:00' AND publish_up <= '2006-01-31 23:59:59')
        OR (publish_down >= '2006-01-31 00:00:00' AND publish_down <= '2006-01-31 23:59:59')
        OR (publish_up <= '2006-01-31 00:00:00' AND publish_down >= '2006-01-31 23:59:59')) AND state='1'
ORDER BY publish_up ASC
                  18793 Query       SELECT m.* FROM mos_menu AS m
WHERE menutype='othermenu' AND published='1'
ORDER BY parent,ordering
                  18793 Query       SELECT mos_events.* FROM mos_events, mos_categories as b
WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0 AND (mos_events.state='1' AND mos_events.checked_out='0')
        AND ((publish_up <= '2006-01-21 00:00:00%' AND publish_down >= '2006-01-21 00:00:00%')
        OR (publish_up <= '2006-01-21 23:59:59%' AND publish_down >= '2006-01-21 23:59:59%')
   OR (publish_up <= '2006-01-21 23:59:59%' AND publish_up >= '2006-01-21 00:00:00%')
   OR (publish_down <= '2006-01-21 23:59:59%' AND publish_down >= '2006-01-21 00:00:00%'))
ORDER BY publish_up ASC
                  18793 Query       SELECT mos_events.* FROM mos_events, mos_categories as b
WHERE mos_events.catid = b.id AND b.access <= 0 AND mos_events.access <= 0 AND (mos_events.state='1' AND mos_events.checked_out='0')
        AND ((publish_up <= '2006-01-21 00:00:00%' AND publish_down >= '2006-01-21 00:00:00%')
        OR (publish_up <= '2006-01-20 23:59:59%' AND publish_down >= '2006-01-20 23:59:59%')
   OR (publish_up <= '2006-01-20 23:59:59%' AND publish_up >= '2006-01-21 00:00:00%')
   OR (publish_down <= '2006-01-20 23:59:59%' AND publish_down >= '2006-01-21 00:00:00%'))
  ORDER BY publish_up DESC
                  18794 Connect     mikaoste@spinus.its.UU.SE on
                  18794 Init DB     teknatdb
                  18794 Query       SELECT * FROM aktuellt where typ='internt' ORDER BY id DESC
                  18794 Quit
                  18793 Quit
060121 16:12:11   18796 Connect     mikaoste@spinus.its.UU.SE on
                  18796 Init DB     teknatdb
                  18796 Query       SELECT * FROM aktuellt where typ='utbildning' ORDER BY id DESC
                  18796 Query       SELECT * FROM exjobb where aktiv=1 and utbprogram like '%%' ORDER BY id DESC
                  18796 Quit
                  18797 Connect     geronauw@localhost on
                  18797 Init DB     ittelefonidb
                  18797 Query       SELECT status FROM inter_status WHERE enhet='51'
                  18797 Init DB     ittelefonidb
                  18797 Query       SELECT id,namn FROM inter_opus WHERE id='51'
                  18797 Init DB     ittelefonidb
                  18797 Query       SELECT p.id AS xxx,p.ansvarig,p.rubrik,p.updated,DATE_FORMAT(p.modified,'%Y-%m') AS ymod,COUNT(*) AS antal FROM inter_pro
j AS p
LEFT JOIN inter_coop AS c ON p.id=c.projid
WHERE p.instkod='51' AND p.status!='r'
GROUP BY p.id
ORDER BY p.ansvarig,p.rubrik
                  18797 Init DB     ittelefonidb
                  18797 Query       SELECT id FROM inter_aktivitet WHERE user='eriknore' AND enhet='51'
                  18797 Query       INSERT INTO inter_aktivitet (user,enhet,hits) VALUES ('eriknore','51',1)
                  18797 Quit
060121 16:12:13   18798 Connect     sympoll@localhost on
                  18798 Init DB     sympolldb
                  18798 Query       SELECT * FROM sympoll_list,sympoll_data WHERE (sympoll_list.pid='1' AND sympoll_data.pid='1') ORDER BY cid
                  18798 Quit
060121 16:12:15   18799 Connect     geronauw@localhost on
                  18799 Init DB     ittelefonidb
                  18799 Query       SELECT status FROM inter_status WHERE enhet='51'
                  18799 Init DB     ittelefonidb
                  18799 Query       SELECT id,namn FROM inter_opus WHERE id='51'
                  18799 Init DB     ittelefonidb
                  18799 Query       SELECT y.name AS region,COUNT(*) AS antal
FROM inter_proj AS p
LEFT JOIN inter_coop AS c ON  c.projid=p.id
LEFT JOIN inter_opus AS d ON p.instkod=d.id
LEFT JOIN iso3166 AS i ON c.landskod=i.id /* Borde infogas i någon annan tabell */
LEFT JOIN un_regions AS u ON u.id=i.iso
LEFT JOIN un_regions AS x ON u.parent=x.id
LEFT JOIN un_regions AS y ON x.parent=y.id
WHERE p.status='p' AND land IS NOT NULL AND p.instkod='51'
GROUP BY y.name
ORDER BY antal DESC
                  18799 Init DB     ittelefonidb
                  18799 Query       SELECT p.id,c.land,c.stad,c.univ,p.namn,DATE_FORMAT(p.modified,'%Y-%m-%d') AS modifierad
FROM inter_coop AS c
LEFT JOIN inter_proj AS p ON c.projid=p.id
WHERE p.instkod='51' AND p.status!='r'
ORDER BY c.projid
                  18799 Quit
060121 16:13:19   18291 Query       SELECT true
--->8---

Notice the DATE_FORMAT() call at the end.

Also notice the "SELECT true", which was the replacement for the earlier "SELECT NOW()" that one of our clients issued to keep a connection (which would otherwise timeout) from timing out.

Attaching the complete listings for the two sessions; 18291 and 18799.

Sorry for the long listings, but maybe this helps spotting a pattern of some kind related to the date functions.

Best regards,
Björn
[21 Jan 2006 19:21] Björn Wiberg
Another notice just to inform that AIX *does* come with timezone information in /usr/share/lib/zoneinfo/, however, this does not include timezone information specific for Sweden (i.e., "Europe/Stockholm"), so we're still using the timezone tables available for download from the MySQL site.

Best regards,
Björn
[21 Jan 2006 19:39] Björn Wiberg
Uploaded a corrected version of the general query log excerpts (now with 18291 and 18799, not 18291 and 18797).

Best regards,
Björn
[23 Jan 2006 13:04] Björn Wiberg
Hi again!
I'm running a periodic mysqladmin command (with "version status processlist extended-status debug") every 10 seconds now to get more debug logs.

The database user has adjusted his web application so that it will reconnect if thrown out. It does not perform no-op queries (SELECT true or SELECT NOW()) any longer.

Right now we're using 300 seconds timeouts, so we'll see if the absence of no-op queries helps or not.

The next step (if it still hangs) will probably be to lower our timeouts below 200 seconds so that the client will be thrown out, to see if "fresh" sessions will cure things.

Best regards,
Björn
[25 Jan 2006 8:13] Björn Wiberg
The hang occurred again, despite our server seemingly kicking out the user after 5 minutes of inactivity.

The truss/gdb output shows the same as before:

---8<---
root@spinus:/mysql/data# cat spinus.pid
966658

root@spinus:/mysql/data# truss -p 966658
thread_waitlock(0x00000000)     (sleeping...)
^CPstatus: process is not stopped

root@spinus:/mysql/data# gdb /usr/local/mysql/bin/mysqld 966658
GNU gdb 5.0-aix51-020209
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "powerpc-ibm-aix5.2.0.0"...(no debugging symbols found)...
/mysql/data/966658: No such file or directory.
Attaching to program: /usr/local/mysql/bin/mysqld, process 966658
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...(no debugging symbols found)...0xd339c5d8 in _global_lock_common ()
(gdb) bt
#0  0xd339c5d8 in _global_lock_common ()
#1  0xffffffff in ?? () from (unknown load module)
(gdb) bt full
#0  0xd339c5d8 in _global_lock_common ()
No symbol table info available.
#1  0xffffffff in ?? () from (unknown load module)
No symbol table info available.
(gdb) quit
A debugging session is active.
Do you still want to close the debugger?(y or n) y
Detaching from program: /usr/local/mysql/bin/mysqld, process 966658
--->8---

I'm attaching a file containing the MySQL error log, excerpts from the general query log, and mysqladmin output just before the crash.

Notice that the last query is one of the two queries that I attached logs of earlier on (ittelefonidb), which uses date functions.

I have now installed the -debug build instead, I'll get back to you if the problem appears again with that one; I see it can give more information on the use of current locks etc. (plus perhaps some useful symbol information in gdb).

Do you have any recommendations regarding tracing or anything else that I should try?

Thanks in advance!

Best regards,
Björn
[25 Jan 2006 12:23] Björn Wiberg
Hi again!

Oops... happened again, but I accidentally lost the gdb output. :-/

Truss output was identical; gdb output indicated that it was waiting for a table lock (I do not remember the exact wording).

It seems that this happens when there are 64 open tables (see also the last hang) and a lock is attempted for a 65th table!

See attached log excerpts. The most interesting part is perhaps this, from the MySQL error log:

---8<---
safe_mutex: Trying to lock unitialized mutex at sql_test.cc, line 377
/../

mysqld got signal 6;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=5
max_connections=100
threads_connected=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
--->8---

I'll try raising the memory limit for MySQL (currently it may hit the 128 MB limit because of both /etc/security/limits and the fact that maxDATA of the mysqld executable is set to 
 0x00000000 and may need to be set to 0x80000000), although the memory statistics above do not indicate any excessive memory usage anywhere near that limit (the mysqladmin samples were taken 10 seconds apart).

I'll get back to you if we encounter more hangs -- the "magic" value of 64 for the number of open tables seems too magic to be random. :-)

Best regards,
Björn
[25 Jan 2006 12:36] Valeriy Kravchuk
Thank you for a lot of additional information. This 64 open tables magic is really strange. So, please, send the ulimit -a results from the user you run mysql server as. SHOW STATUS results may be useful also.
[25 Jan 2006 12:40] Björn Wiberg
...although that assertion(?) error was actually after the hang. But may indicate the source of the problems (not possible to use more than 64 locks?).

Best regards,
Björn
[25 Jan 2006 13:00] Björn Wiberg
Hi again!

root@spinus:/# su - mysql
$ ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         524288
stack(kbytes)        131072
memory(kbytes)       524288
coredump(blocks)     2097151
nofiles(descriptors) 2000
$ 

Excerpt from /etc/security/limits (512 byte block size, limits measured in blocks):

*
* For MySQL: Allow writing of large files, raise memory limits
*
* 512 MB data segment size
* 512 MB real memory usage
* 128 MB stack segment size
*
mysql:
        fsize = -1
        data = 1048576
        rss = 1048576
        stack = 262144

MySQL seems to play nice with the memory, consumes between 10-40 MB on average:

mysql     589836 10,2  1,0 19168 35644  pts/3 A    13.52.09  1:28 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --user=mysql --pid-file=/usr/local/mysql/data/spinus.pid 
root      942166  0,0  0,0  704  712  pts/3 A    13.52.09  0:00 /bin/sh ./bin/mysqld_safe --datadir=/usr/local/mysql/data --pid-file=/usr/local/mysql/data/spinus.pid 

Best regards,
Björn
[25 Jan 2006 13:02] Björn Wiberg
The SHOW STATUS information is in 060125_1.txt as part of the mysqladmin output.

A separate run of SHOW STATUS on a newly started server shows:

root@localhost (none)> SHOW STATUS;
+-----------------------------------+----------+
| Variable_name                     | Value    |
+-----------------------------------+----------+
| Aborted_clients                   | 0        |
| Aborted_connects                  | 17       |
| Binlog_cache_disk_use             | 0        |
| Binlog_cache_use                  | 0        |
| Bytes_received                    | 109      |
| Bytes_sent                        | 156      |
| Com_admin_commands                | 0        |
| Com_alter_db                      | 0        |
| Com_alter_table                   | 0        |
| Com_analyze                       | 0        |
| Com_backup_table                  | 0        |
| Com_begin                         | 0        |
| Com_change_db                     | 0        |
| Com_change_master                 | 0        |
| Com_check                         | 0        |
| Com_checksum                      | 0        |
| Com_commit                        | 0        |
| Com_create_db                     | 0        |
| Com_create_function               | 0        |
| Com_create_index                  | 0        |
| Com_create_table                  | 0        |
| Com_dealloc_sql                   | 0        |
| Com_delete                        | 0        |
| Com_delete_multi                  | 0        |
| Com_do                            | 0        |
| Com_drop_db                       | 0        |
| Com_drop_function                 | 0        |
| Com_drop_index                    | 0        |
| Com_drop_table                    | 0        |
| Com_drop_user                     | 0        |
| Com_execute_sql                   | 0        |
| Com_flush                         | 0        |
| Com_grant                         | 0        |
| Com_ha_close                      | 0        |
| Com_ha_open                       | 0        |
| Com_ha_read                       | 0        |
| Com_help                          | 0        |
| Com_insert                        | 0        |
| Com_insert_select                 | 0        |
| Com_kill                          | 0        |
| Com_load                          | 0        |
| Com_load_master_data              | 0        |
| Com_load_master_table             | 0        |
| Com_lock_tables                   | 0        |
| Com_optimize                      | 0        |
| Com_preload_keys                  | 0        |
| Com_prepare_sql                   | 0        |
| Com_purge                         | 0        |
| Com_purge_before_date             | 0        |
| Com_rename_table                  | 0        |
| Com_repair                        | 0        |
| Com_replace                       | 0        |
| Com_replace_select                | 0        |
| Com_reset                         | 0        |
| Com_restore_table                 | 0        |
| Com_revoke                        | 0        |
| Com_revoke_all                    | 0        |
| Com_rollback                      | 0        |
| Com_savepoint                     | 0        |
| Com_select                        | 1        |
| Com_set_option                    | 0        |
| Com_show_binlog_events            | 0        |
| Com_show_binlogs                  | 0        |
| Com_show_charsets                 | 0        |
| Com_show_collations               | 0        |
| Com_show_column_types             | 0        |
| Innodb_log_waits                  | 0        |
| Innodb_log_write_requests         | 0        |
| Innodb_log_writes                 | 0        |
| Innodb_os_log_fsyncs              | 0        |
| Innodb_os_log_pending_fsyncs      | 0        |
| Innodb_os_log_pending_writes      | 0        |
| Innodb_os_log_written             | 0        |
| Innodb_page_size                  | 0        |
| Innodb_pages_created              | 0        |
| Innodb_pages_read                 | 0        |
| Innodb_pages_written              | 0        |
| Innodb_row_lock_current_waits     | 0        |
| Innodb_row_lock_time              | 0        |
| Innodb_row_lock_time_avg          | 0        |
| Innodb_row_lock_time_max          | 0        |
| Innodb_row_lock_waits             | 0        |
| Innodb_rows_deleted               | 0        |
| Innodb_rows_inserted              | 0        |
| Innodb_rows_read                  | 0        |
| Innodb_rows_updated               | 0        |
| Key_blocks_not_flushed            | 0        |
| Key_blocks_unused                 | 7185     |
| Key_blocks_used                   | 7274     |
| Key_read_requests                 | 10318059 |
| Key_reads                         | 42306    |
| Key_write_requests                | 3925471  |
| Key_writes                        | 40053    |
| Last_query_cost                   | 0.000000 |
| Max_used_connections              | 5        |
| Not_flushed_delayed_rows          | 0        |
| Open_files                        | 112      |
| Open_streams                      | 0        |
| Open_tables                       | 56       |
| Opened_tables                     | 0        |
| Qcache_free_blocks                | 0        |
| Qcache_free_memory                | 0        |
| Qcache_hits                       | 0        |
| Qcache_inserts                    | 0        |
| Qcache_lowmem_prunes              | 0        |
| Qcache_not_cached                 | 0        |
| Qcache_queries_in_cache           | 0        |
| Qcache_total_blocks               | 0        |
| Questions                         | 8247     |
| Rpl_status                        | NULL     |
| Select_full_join                  | 0        |
| Select_full_range_join            | 0        |
| Select_range                      | 0        |
| Select_range_check                | 0        |
| Select_scan                       | 1        |
| Slave_open_temp_tables            | 0        |
| Slave_retried_transactions        | 0        |
| Slave_running                     | OFF      |
| Slow_launch_threads               | 0        |
| Slow_queries                      | 0        |
| Sort_merge_passes                 | 0        |
| Sort_range                        | 0        |
| Sort_rows                         | 0        |
| Sort_scan                         | 0        |
| Table_locks_immediate             | 8469     |
| Table_locks_waited                | 0        |
| Tc_log_max_pages_used             | 0        |
| Tc_log_page_size                  | 0        |
| Tc_log_page_waits                 | 0        |
| Threads_cached                    | 0        |
| Threads_connected                 | 2        |
| Threads_created                   | 257      |
| Threads_running                   | 1        |
| Uptime                            | 505      |
+-----------------------------------+----------+
222 rows in set (0.00 sec)

Best regards,
Björn
[25 Jan 2006 14:17] Björn Wiberg
Hi again!

The default value for the table cache is 64 tables, which explains that number.

However, the error I encountered (at the end of 060125_1.txt) where an unitialized mutex lock attempt was made looks strange. Shouldn't mysqld purge parts of the table cache, release associated locks, and use the freed resources for acquiring the new lock, given that it has reached the table cache limit (64)?

(I'll send more debug info + gdb info whenever I encounter another of those hangs.)

Best regards,
Björn
[26 Jan 2006 9:24] Björn Wiberg
Another note to say that I actually did find occurrences of 65 open tables in the MySQL error log, so it is indeed possible to open more than 64 tables. Although it failed once...

We'll see if the raised resource limits affect this.

Best regards,
Björn
[26 Jan 2006 10:12] Björn Wiberg
Shouldn't sessions that time out be logged in the MySQL general query log as "Quit", like normal sessions where the user explicitly asks to terminate the connection?

Wondering if that may have something to do with it (some cleaning/purging not being performed)...

No hangs yet, I'll get back to you as soon as I have more information.

Best regards,
Björn
[26 Jan 2006 14:39] Valeriy Kravchuk
I found the following interesting clame among plenty of information you had provided already:

"I managed to hang the server, too, after issuing the query containing TO_DAYS() followed by just a couple of SELECT NOW()s."

So, is it possible to hang the server with several simple queries from one session only? If yes, please, provide this ideal test case!
[26 Jan 2006 14:47] Björn Wiberg
Hi Valeriy!

Unfortunately, I'm not alone on the server so it may have been other concurrent queries causing the hang, rather than only my queries causing the hang.

The query was:

SELECT TO_DAYS(now()) - TO_DAYS(Slutdatum)
FROM BokLantagare WHERE fLantagare = 0 AND Slutdatum < now() AND aktiv AND
fBokLantagareStatus = 1

...followed by several SELECT NOW(). All this on the BokLantagare table (BokLantagare.sql attached earlier).

I have also tried deliberately to hang the server using this query on other occasions, but it didn't hang until hours later that time. So I'm afraid the results may vary.

You may wish to try the BokLantagare.sql dump and issuing the select to_days()... query followed by repeated select now()s for a long time to see if this may happen on other platforms as well. Preferrably so that no wait or interactive timeout kicks you out (if set); like in our case where the timeout was set to 5 minutes and the client performed the select now() queries every 2-3 minutes to avoid getting kicked out.

I do feel that long-winded sessions increase the chance of achieving the hang...

Hoping that the next gdb session (the next hang) will give more input; it did specify a function name, but I do not recall the exact name (and my gdb output was lost during the terminal session last time).

Best regards,
Björn
[30 Jan 2006 12:48] Björn Wiberg
No hang so far after raising the memory limits...

I'll update this bug report if anything occurs.

Best regards,
Björn
[31 Jan 2006 7:23] Valeriy Kravchuk
Please, send the current my.cnf settings and describe your hardware.
[1 Feb 2006 9:21] Björn Wiberg
No hangs yet, but I noticed that mysqld had been restarted by mysqld_safe this Monday afternoon (MySQL is restarted weekly, at midnight between Sunday and Monday, so I noticed that the uptime was incorrect when using phpMyAdmin).

Attaching the output from the logs.

Best regards,
Björn
[2 Feb 2006 8:15] Björn Wiberg
Doesn't the thread_waitlock(NULL) in the truss ouput look strange?

Hoping for some suggestions on how to debug this issue further...

Many thanks in advance!

Best regards,
Björn
[3 Feb 2006 13:23] Björn Wiberg
Just came across a segfault crash on another machine, running IBM AIX 5.3.0.0 ML2:

---8<---
060130  0:00:08 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.0.18-standard-log'  socket: '/tmp/mysql.sock'  port: 3306  MySQL Community Edition - Standard (GPL)
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=11
max_connections=100
threads_connected=3
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

060202 15:50:32  mysqld restarted
060202 15:50:34 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.0.18-standard-log'  socket: '/tmp/mysql.sock'  port: 3306  MySQL Community Edition - Standard (GPL)
060202 15:51:05 [ERROR] /usr/local/mysql/bin/mysqld: Table './ortnamndb/sth1' is marked as crashed and should be repaired
060202 15:51:05 [Warning] Checking table:   './ortnamndb/sth1'
060202 15:51:06 [ERROR] /usr/local/mysql/bin/mysqld: Table './ortnamndb/socken1' is marked as crashed and should be repaired
060202 15:51:06 [Warning] Checking table:   './ortnamndb/socken1'
060202 15:54:30 [ERROR] /usr/local/mysql/bin/mysqld: Table './ortnamndb/sdm1' is marked as crashed and should be repaired
060202 15:54:30 [Warning] Checking table:   './ortnamndb/sdm1'
060202 16:19:15 [ERROR] /usr/local/mysql/bin/mysqld: Table './mastersdb/students' is marked as crashed and should be repaired
060202 16:19:15 [Warning] Checking table:   './mastersdb/students'
060202 16:32:06 [ERROR] /usr/local/mysql/bin/mysqld: Table './bncData/settings' is marked as crashed and should be repaired
060202 16:32:06 [Warning] Checking table:   './bncData/settings'
--->8---

---8<---
root@aramus:/# su - mysql
$ ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         524288
stack(kbytes)        131072
memory(kbytes)       524288
coredump(blocks)     2097151
nofiles(descriptors) 2000
--->8---

/etc/security/limits:

---8<---
*
* For MySQL: Allow writing of large files, raise memory limits
*
* 512 MB data segment size
* 512 MB real memory usage
* 128 MB stack segment size
*
mysql:
        fsize = -1
        data = 1048576
        rss = 1048576
        stack = 262144
--->8---

Don't know if this is related, but it sure seems like we're having problems with 5.0.18 on AIX, both 5.2 and 5.3...

Best regards,
Björn
[3 Feb 2006 13:26] Björn Wiberg
Forgot to give an excerpt of the MySQL general query log prior to the crash on the 5.3 machine:

---8<---
060202 15:50:31   17247 Connect     pma@spinus.its.UU.SE on 
                  17247 Query       SELECT VERSION()
                  17247 Query       SET NAMES utf8
                  17247 Query       SET collation_connection = 'utf8_unicode_ci'
                  17248 Connect     soresigv@spinus.its.UU.SE on 
                  17248 Query       SET NAMES utf8
                  17248 Query       SET collation_connection = 'utf8_unicode_ci'
                  17248 Query       SHOW SESSION VARIABLES LIKE 'collation_connection'
                  17248 Query       SHOW SESSION VARIABLES LIKE 'character_set_connection'
                  17248 Query       SHOW CHARACTER SET
                  17248 Query       SHOW COLLATION
                  17248 Query       SELECT COUNT(*) FROM mysql.user
                  17248 Query       SHOW GRANTS
                  17248 Query       USE `sofidb`
                  17248 Query       SHOW VARIABLES LIKE 'language'
                  17248 Query       USE `ortnamndb`
                  17248 Init DB     ortnamndb
                  17248 Query       REPLACE DELAYED INTO socken1 VALUES ("a16201b","Lillhärdal","Svegs och Hede tg","JTL")
/usr/local/mysql/bin/mysqld, Version: 5.0.18-standard-log. started with:
Tcp port: 0  Unix socket: (null)
Time                 Id Command    Argument
060202 15:51:05       2 Connect     soresigv@localhost on 
--->8---

Best regards,
Björn
[3 Feb 2006 13:54] Valeriy Kravchuk
I see this in your query log:

"REPLACE DELAYED INTO socken1 ..."

and a crash after that. That is why I have to ask you a question: have you created that socken1 table in 5.0.x or it was created in pre-5.0.x MySQL server and just used since than? Very familiar query, that will always fail after that upgrade...
[3 Feb 2006 14:25] Björn Wiberg
We're using our 4.1 tables (which were 4.0 tables...) with 5.0.

I've dumped and re-read some of the databases or tables that have been involved in crashes / have been marked as corrupted, but not done so for all database and/or tables.

One interesting thing to note is that the corrupt tables, reported in the MySQL 5 on AIX 5.3 error log, were not found as corrupt by our daily check script, which uses the following for all MyISAM tables:

Another interesting thing to notice is that I've encountered, a while back, that a (4.1/4.0) table, which was marked as crashed and repaired with myisamchk of 5.0, was later found corrupt again (just a server start/stop and a check again) by myisamchk. Several runs of 5.0's myisamchk didn't manage to repair the table, despite all -r arguments and such. A repair using 4.1 myisamchk, switching back to 4.1, running 4.1 myisamchk checks on it, indicated that all was well. Switched to 5.0 server again, ran myisamcheck checks on it, all OK, then started 5.0 server for a little while, stopped it, checked the table again, and it was found corrupt.

Would you advise me to dump and re-read all databases to eliminate this possiblity? (I know this is recommended in the docs.) That's about the only thing I haven't tried, but would probably need to do so for all databases, not just a couple of them...

As for the query that always fails -- by fail, do you mean crashes the server, or only that the query fails?

Best regards,
Björn
[3 Feb 2006 14:42] Valeriy Kravchuk
Yes, I recommend to dump all the tables in 4.1 and restore them in 5.0.18. Look at bug #17028 and search our bugs database for INSERT or REPLACE DELAYED in 5.0.x... You'll find many interesting things. 

Anyway, please, if you can, try to recreate all your databases|tables in 5.0.18 from dump, try the same tests as before and inform about the results. There are many different bugs are mentioned in this report already, and I want to figure out is there something new and unknown in it, or just the results of upgrade...
[3 Feb 2006 14:57] Björn Wiberg
Hi Valeriy!

Thanks for your reply!

Would it be OK to dump the databases one by one using 5.0.18 mysqldump, and reload them with 5.0.18 mysql client to the 5.0.18 mysqld server?

Best regards,
Björn
[3 Feb 2006 15:01] Valeriy Kravchuk
I think, you'd better use mysqldump from 4.1.x connecting to 4.1.x. Yes, you can dump them one by one, as soon as there are no changes and inter-database relations (you need consisten data, aren't you?). You may start with that database from the latest crash.
[3 Feb 2006 15:08] Björn Wiberg
I guess the inconsistency problem will be present even if we would use 4.1; we need to prevent access from the network to stop users from updating the tables when performing the dump. I'll perform the dumps early in the morning when usage is as low as possible to minimize the problems.

I'll do it with 5.0.18, though. Theoretically, if I get a dump from the 5.0.18 server using the 5.0.18 mysqldump, assume that that one is correct, and recreate the database using that dump, there won't be anything left of the old files, right? And the new tables should be "correct" from 5.0.18:s point of view, right? (Just wanted to check, so that I'm not doing the dumps to no avail :-)).

Best regards,
Björn
[4 Feb 2006 10:37] Björn Wiberg
Dumped all databases, one by one, using 5.0.18 mysqldump:

mysqldump -u root --password=SECRET --allow-keywords --add-drop-database --create-options --databases --max-allowed-packet=256M --set-variable=max_allowed_packet=256M $1 > a.txt

...and restored them, one by one, using 5.0.18 mysql client:

mysql -u root --password=SECRET --max-allowed-packet=256M $1 < ./a.txt

Of course, set max_allowed_packet_size=256M in /etc/my.cnf. Just so that no table inserts would be too big durig restore.

(Notice the --drop-database argument to mysqldump, meaning that the database directories also got re-created, and contains db.opt files.)

Stopped server, ran a check/repair:

myisamchk --force --update-state -O key_buffer=64M -O sort_buffer=64M -O read_buffer=1M -O write_buffer=1M $filename

...on each table.

Then checked (just to be sure) with:

/myisamchk --silent "$filename"

...on each table, all OK, no errors reported.

Then restarted server, and ran:

mysqlcheck --all-databases --auto-repair --extended --force --repair -u root -p

...just to be sure.

Later that night, about an hour later, I got 4 crashes in close succession. See attached file (060204.txt). It seems that it disliked the phpMyAdmin question of listing all databases, when working against the information_schema virtual database:

---8<---
SELECT `information_schema`.`SCHEMATA`.*,
                    COUNT(`information_schema`.`TABLES`.`TABLE_SCHEMA`)
                        AS `SCHEMA_TABLES`,
                    SUM(`information_schema`.`TABLES`.`TABLE_ROWS`)
                        AS `SCHEMA_TABLE_ROWS`,
                    SUM(`information_schema`.`TABLES`.`DATA_LENGTH`)
                        AS `SCHEMA_DATA_LENGTH`,
                    SUM(`information_schema`.`TABLES`.`MAX_DATA_LENGTH`)
                        AS `SCHEMA_MAX_DATA_LENGTH`,
                    SUM(`information_schema`.`TABLES`.`INDEX_LENGTH`)
                        AS `SCHEMA_INDEX_LENGTH`,
                    SUM(`information_schema`.`TABLES`.`DATA_LENGTH`
                      + `information_schema`.`TABLES`.`INDEX_LENGTH`)
                        AS `SCHEMA_LENGTH`,
                    SUM(`information_schema`.`TABLES`.`DATA_FREE`)
                        AS `SCHEMA_DATA_FREE`
               FROM `information_schema`.`SCHEMATA`
          LEFT JOIN `information_schema`.`TABLES`
                 ON `information_schema`.`TABLES`.`TABLE_SCHEMA`
                  = `information_schema`.`SCHEMATA`.`SCHEMA_NAME`

           GROUP BY `information_schema`.`SCHEMATA`.`SCHEMA_NAME`
--->8---

No tables has been reported as "crashed and should be repaired" in the MySQL error log, which I am glad for, but the number of crashes close to each other in time suggest that something isn't quite right with the information_schema querying...

Best regards,
Björn
[4 Feb 2006 10:47] Björn Wiberg
Also, worth noticing, our current /etc/my.cnf reads:

---8<---
# MySQL server options
[mysqld]
character-set-server=latin1
collation-server=latin1_swedish_ci
default-time-zone="Europe/Stockholm"
ft_min_word_len=3
innodb_file_per_table
interactive_timeout=60
log=/usr/local/mysql/logs/spinus.log
log-error=/usr/local/mysql/logs/spinus.err
max_allowed_packet=16M
max_connect_errors=999999999
max_user_connections=20
myisam-recover=BACKUP,FORCE
skip-innodb
sql-mode="NO_ENGINE_SUBSTITUTION"
wait_timeout=60

# MySQL client options
[mysql]
pager=/usr/bin/less -S
prompt=\\u@\\h\\_\\d>\\_

# myisamchk options
[myisamchk]
ft_min_word_len=3
--->8---

Changes from the previous version is the addition of max_allowed_packet to allow 16 MB packets instead of default of 1 MB, and lowered wait and inactive timeouts from 300 to 60 seconds.

Best regards,
Björn
[4 Feb 2006 11:09] Björn Wiberg
Another note to inform you that we previously ran:

  cd /mysql/usr/local/mysql-debug-5.0.18-aix5.2-powerpc/bin
  /opt/freeware/bin/cp -d -p -f mysqld mysqld.ORIGINAL
  ldedit -b maxdata:0x80000000 ./mysqld

...to allow mysqld to use more than 128 MB of memory under AIX (old maxDATA value = 0x00000000).

Didn't make any difference with the previous hangs, I think (mysqld seems to consume very little memory), but still worth noticing.

Best regards,
Björn
[5 Feb 2006 8:19] Björn Wiberg
Hi again!

OK, got a couple of crashes (again the "Trying to lock unitialized mutex at sql_test.cc, line 377" message and signal 6), followed by some table being detected as "crashed and should be repaired") yesterday evening:

---8<---
060204 19:08:39 [ERROR] /usr/local/mysql/bin/mysqld: Table './genpatmambodb/mos_session' is marked as crashed and should be repaired
060204 19:08:39 [Warning] Checking table:   './genpatmambodb/mos_session'
060204 19:08:39 [ERROR] /usr/local/mysql/bin/mysqld: Table './genpatmambodb/mos_stats_agents' is marked as crashed and should be repaired
060204 19:08:39 [Warning] Checking table:   './genpatmambodb/mos_stats_agents'
060204 19:08:39 [ERROR] /usr/local/mysql/bin/mysqld: Table './genpatmambodb/mos_events' is marked as crashed and should be repaired
060204 19:08:39 [Warning] Checking table:   './genpatmambodb/mos_events'
060204 19:08:39 [ERROR] /usr/local/mysql/bin/mysqld: Table './genpatmambodb/mos_content' is marked as crashed and should be repaired
060204 19:08:39 [Warning] Checking table:   './genpatmambodb/mos_content'
--->8---
---8<---
060204 19:08:45 [ERROR] /usr/local/mysql/bin/mysqld: Table './smalanddb/smaland_raknare' is marked as crashed and should be repaired
060204 19:08:45 [Warning] Checking table:   './smalanddb/smaland_raknare'
060204 19:08:46 [ERROR] /usr/local/mysql/bin/mysqld: Table './smalanddb/smaland_events' is marked as crashed and should be repaired
060204 19:08:46 [Warning] Checking table:   './smalanddb/smaland_events'
--->8---
---8<---
060204 19:09:49 [ERROR] /usr/local/mysql/bin/mysqld: Table './engelskaparkendb/menylankar' is marked as crashed and should be repaired
060204 19:09:49 [Warning] Checking table:   './engelskaparkendb/menylankar'
--->8---
---8<---
060204 19:20:41 [ERROR] /usr/local/mysql/bin/mysqld: Table './ibkdb/Visitors' is marked as crashed and should be repaired
060204 19:20:41 [Warning] Checking table:   './ibkdb/Visitors'
--->8---

(I guess the tables were successfully repaired as we have that option in our my.cnf, and our nightly myisamchk --silent didn't find anything wrong.)

So it seems that the assert error on uninitialized mutexes did not depend on the version of the MyISAM tables.

Checking the general query log, the queries this time were not the information_schema ones that I reported about yesterday morning. See attached file (060204_2.txt).

No crashes on the 5.3 machine, but that one has much less traffic.

I'm adding the following to my.cnf on both servers now to see if it helps:

delay-key-write=OFF
flush
flush_time=10
safe-mode
skip-concurrent-insert
skip-thread-priority
thread_cache_size=0

Best regards,
Björn
[6 Feb 2006 9:25] Björn Wiberg
Hello Valeriy!

Another crash this morning (mysql_060206.txt); please see the attached file.

Lots of date functions involved. Note that we are using thread_cache_size=0 so reused threads should not be the cause.

Also, flush_time=10 hopefully helps to limit the number of tables involved in the crashes.

Does flush_time close open tables that are currently in use by some thread? Or does it close only open but unused tables? What would flush_time=1 or even flush_time=0 mean?

Had to turn the "flush" directive off, though, as MySQL started to consume an awful lot of CPU (99-100%), don't know how it handles wait during flushes...

Any suggestions as to how to proceed? I'm thinking of running MySQL outside the chroot, or to compile from source to see if that helps (like we used to do before, in the 4.0.x days), but perhaps you have some other things in mind that I should try first.

Thanks in advance!

Best regards,
Björn
[6 Feb 2006 10:01] Valeriy Kravchuk
Sorry, but this bug report now contains information about several different bugs and crashes, and tend to become a kind of free-form discussion on bugs, workarounds and general MySQL usage practicies. While it is appropriate for issues created by MySQL customers, or for discussions on our forums, our bugs database is not a proper place to do it. 

So, please, clarify the status of your initial report, about multiple SELECT NOW(), and try to create a small and repeatable test case for it and other crashes you already mentioned. Then we'll have to open a separate bug report for each particular crash case.

I can summarize the information you already sent and propose synopsis for each of that potential bug reports, if you want. This report becomes simply hard to understand and maintain now.
[6 Feb 2006 14:37] Björn Wiberg
Hello Valeriy!

OK, I will submit a new bug report regarding the phpMyAdmin causing mysqld to crash; that one seems rather repeatable on our system, and I have also managed to get a coredump from it.

I will see if I encounter any more hangs related to SELECT NOW(), but I doubt it -- the signs are probably to vague to go on.

Thank you for your help, and see you in another bug report! :-)

Best regards,
Björn
[8 Feb 2006 8:29] Valeriy Kravchuk
Reopen this report if you'll get that hang with SELECT NOW() (or SELECT something as a way to keep alive) again. I saw your new report, so let's continue to discuss the other possible bugs there.
[9 Mar 2006 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".