Bug #12231 | chronic 'error 12's after long uptime (memory leak) | ||
---|---|---|---|
Submitted: | 27 Jul 2005 21:50 | Modified: | 7 Feb 2008 12:55 |
Reporter: | Nathan Cheng | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 4.1.9,4.1.13 | OS: | Linux (RedHat release 3) |
Assigned to: | CPU Architecture: | Any |
[27 Jul 2005 21:50]
Nathan Cheng
[27 Jul 2005 21:57]
Nathan Cheng
I flushed tables and threads, and that seems to have bought me some time. The SZ being reported by ps for mysqld is now 604571. Before flushing it was 639780, and yesterday before the errors started, it was 636074. So it would follow that the error threshold is somewhere between 636074 and 639780. Are there any STATUS or VARIABLES entries that would be helpful for you to know? I have been pasting them into a log file periodically.
[28 Jul 2005 12:34]
Matthew Lord
Hi, Operating System Error 12 is Cannot allocate memory. You need to adjust your configuration settings as I'm guessing you are using 32 bit hardware and you are hitting the addressable limit. If you are not a support customer you can get help from these community avenues: mysql.com/IRC forums.mysql.com lists.mysql.com Best Regards
[28 Jul 2005 12:38]
Matthew Lord
Hi, I apologize, I missed the memory leak portion of your report. This is an important subject but we do need a repeatable test case to proceed using this avenue. Have you been able to narrow it down to a set of steps that you can see leaking X bytes? Best Regards
[28 Jul 2005 16:50]
Nathan Cheng
I will see if I can do a massive repetition of the nested queries in our staging environment and recreate the problem there. The difficulty with this is that I have only seen the error begin after about 300 million db questions, so it may take a while. I have thought that perhaps I just have bad memory settings, but that explanation doesn't make sense for the following reason: let's say the set of all queries run by my application is {q}; a "bug free" db system must then behave in the following way: as long as the memory settings are such that all queries in {q} can run once successfully without any kind of memory error, then all queries in {q} can run an infinite number of times successfully without any kind of memory error. Let's call this "the inferior memory management axiom of a bug-free db system". Now allow me to formulate "the superior memory management axiom of a bug-free db system": if {q}_S is the set of all queries that can run successfully at least once with memory settings S, and {p}_S is the set of all queries that cannot run successfully even once with memory settings S, then a bug-free db system must be able to run successfully any query from {q}_S an infinite number of times, no matter how many and at what times a query from {p}_S is attempted. I cannot rule out the possibility that what we are dealing with is a breach of the "superior" axiom, rather than the "inferior" axiom. In other words, it is a possibility that my application attempts a query that is not possible given my memory settings, and in so doing, something breaks and causes, for example, that one thread to not be able to do anything any more. Here is an interesting fact that _may_ support that hypothesis: After restarting the db a little after midnight 7/22, at noon 7/26 the errors started. They continued to worsen throughout the day, and tapered off a little during the night (expected, since people use the application mainly during business hours). The last error occurred at 7:06 AM 7/27, because right at about 7:07 I ran the following two commands: mysqladmin flush-tables mysqladmin flush-threads And here's the interesting thing: it is now 9:44 AM 7/28, and the error has not yet re-occurred. In other words, flushing tables and threads seems to have cleared up whatever was causing the error. It will be interesting to see if the errors start up again in a few days, which would mean flushing tables and threads accomplishes the same thing as restarting the db. It will be baffling if the error never occurs again as long as we do not restart the db. That would be absolutely flabergasting, and defy all logic.
[28 Jul 2005 18:39]
Jorge del Conde
Hi! Errno. 12 means "Not enough space". Can you confirm or not if you have enough space in your server ? I have been running 4.1.X for extended amount of time, and I have never received those kinds of errors. THanks!
[28 Jul 2005 18:52]
Nathan Cheng
If "space" means disk space, then yes, I definitely have enough disk space: [root@localhost root]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 69408144 25362948 40519332 39% / /dev/sda1 101089 14941 80929 16% /boot /dev/sdb1 141106132 94874920 39063416 71% /mnt/drive2 none 1934772 0 1934772 0% /dev/shm [root@localhost root]# (The data directory resides under /mnt/drive2.) If "space" means memory, here is the present state of my memory: 11:49:27 up 260 days, 8:55, 2 users, load average: 2.17, 1.98, 1.84 111 processes: 110 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 51.0% 0.0% 3.2% 0.2% 2.2% 0.5% 42.5% cpu00 37.8% 0.0% 4.6% 0.0% 1.6% 0.4% 55.6% cpu01 53.2% 0.0% 2.6% 1.2% 4.4% 0.6% 38.0% cpu02 38.3% 0.0% 3.5% 0.0% 2.9% 0.7% 54.2% cpu03 74.8% 0.0% 2.3% 0.0% 0.1% 0.3% 22.1% Mem: 3869544k av, 3844552k used, 24992k free, 0k shrd, 196264k buff 2955108k actv, 735780k in_d, 18908k in_c Swap: 1052248k av, 770384k used, 281864k free 1725612k cached Of course, right now I am not getting the errors. However, I think this particular line of inquiry is beside the point, because what happens is that if I do not restart mysqld, eventually ALL queries fail--even simple ones that would require only a few bytes of memory to run. Historical discussion of this problem can be found at: http://forums.mysql.com/read.php?21,33887,33887#msg-33887
[31 Jul 2005 18:56]
Jorge del Conde
Hi! I made a mistake in my previous comment. I stated error 12 was 'not enough disk' and that was wrong. Under Linux, error 12 means not enough memory: jorge-/home/jorge> perror 12 OS error code 12: Cannot allocate memory jorge-/home/jorge>
[31 Jul 2005 22:45]
Nathan Cheng
I don't agree that this is not a bug. If my memory configuration is such that a query can run once, it should be able to run that same query an infinite number of times without running out of memory. Otherwise, you should display the following disclaimer in a prominent place in the documentation: MYSQLD IN A PRODUCTION ENVIRONMENT MUST BE REBOOTED EVERY NIGHT TO AVOID THE THREAT OF RUNNING OUT OF MEMORY AND BEING UNABLE TO RESPOND TO ANY QUERY
[3 Aug 2005 15:41]
Puneesh Kumar
I am glad I am not the only one getting the "Got error 12 when reading table" ERROR. Nathan,I agree with you - this is a highly critical BUG as its affecting our production system adversely. We are running the std-4.1.7 version on HPUX11.0 The first few months of testing on DEV were fine. We never encounterd an issue with "Got error 12 when reading table" One month into production and we started noticing the ERROR 12. It started will less frequency and now has started occussing every 3-4 days. The only way I could get rid of it was a stop/ start of MySqld I was using the my-medium.cnf as my configuration file. I thought that some internal variables might be running out of memory and increased the parameter values using my-huge.cnf. With those settings, I start getting mysql_query() failed / mysql_store_result() failed "Got Error 12 from Storage Engine" How can I check how much memory is being used by Mysql ? The ps -le | grep mysqld -- the SZ is very low -- its hovering around 8000 as of now. Any advice as to how to anticipate the memory running out OR a command to check how much memory is getting used up by Mysql will be highly appreciated. The flush-tables and flush-threads did not help me with the my-huge.cnf scenario. Thanks Puneesh
[4 Aug 2005 16:57]
Nathan Cheng
Ok, here is one of the queries that periodically generates the "error 12" message. It usually only happens after calling this line several tens of thousands of times, and even then it is only sporadic: SELECT a.queue, a.destination, b.binary_data, a.tries, a.send_after, a.status, a.created, a.created_by, a.modified, a.modified_by, a.sequence_id FROM io_export_message AS a LEFT JOIN io_export_message_blob AS b USING( export_message_id ) WHERE a.export_message_id = ? Here are the two tables involved in this query: CREATE TABLE `io_export_message` ( `export_message_id` int(10) unsigned NOT NULL auto_increment, `queue` varchar(128) NOT NULL default '', `destination` varchar(128) NOT NULL default '', `tries` int(10) unsigned NOT NULL default '0', `send_after` int(10) unsigned NOT NULL default '0', `status` enum('creating','queued','sending','succeeded','failed','deleted') NOT NULL default 'creating', `created` int(10) unsigned NOT NULL default '0', `created_by` varchar(128) NOT NULL default '', `modified` int(10) unsigned NOT NULL default '0', `modified_by` varchar(128) NOT NULL default '', `sequence_id` int(11) NOT NULL default '1', PRIMARY KEY (`export_message_id`), KEY `idx_1` (`queue`), KEY `idx_2` (`created`), KEY `idx_3` (`status`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 CREATE TABLE `io_export_message_blob` ( `export_message_id` int(10) unsigned NOT NULL default '0', `binary_data` longblob, UNIQUE KEY `idx_1` (`export_message_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=100000000 AVG_ROW_LENGTH=13000 Note that a.export_message_id is the PRIMARY KEY. I do not believe that anyone out there could possibly claim with a straight face that this query is legitimately taking up too much memory. THIS A BUG!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[5 Aug 2005 8:51]
Ingo Strüwing
OK. This is a first step towards a repeatable test case. But still we need some data to fill into the tables before running the SELECT command, right? Other questions: - Did you run this yourself on a test machine and got the error? - Is it sufficient to run it in a single session or is it required that multiple sessions run it in parrallel? - Did you ever try to run it on a "debug" server, which might note not freed memory in the error log? - Can you please wait a minute between "mysqladmin flush-tables" and "mysqladmin flush-threads" and save the SZ value from "ps -le" right before both commands and a miniute after the secons one?
[7 Aug 2005 4:31]
Nathan Cheng
--------------------- - Did you run this yourself on a test machine and got the error? --------------------- No, this has never happened on a test machine--dev, qa, staging, or otherwise-- and our staging environment is the *exact* same configuration (hardware, software, network, and otherwise) as our production environment. The only difference between production and staging is traffic and diversity of usage: production has several orders of magnitude more of both. --------------------- - Is it sufficient to run it in a single session or is it required that multiple sessions run it in parrallel? --------------------- It seems that eventually the error occurs no matter what you do--whether single or multiple sessions, whether all at once or one at a time, whether complex query or trivial query. --------------------- - Did you ever try to run it on a "debug" server, which might note not freed memory in the error log? --------------------- I lied in the answer to the first question, because as of last night our staging environment is running the debug version of mysql 4.1.13. How do I ensure that I have the correct level of logging turned on? --------------------- - Can you please wait a minute between "mysqladmin flush-tables" and "mysqladmin flush-threads" and save the SZ value from "ps -le" right before both commands and a miniute after the secons one? --------------------- I'll attach this as a file.
[7 Aug 2005 4:34]
Nathan Cheng
ps before and after flush-tables and flush-threads
Attachment: ps.log (application/octet-stream, text), 19.84 KiB.
[29 Aug 2005 23:09]
Nathan Cheng
As a work-around, we had been running flush-tables and flush-threads twice per day. But recently even this stopped working, so now we use crond to flush-tables and flush-threads every five minutes in our production environment. So far, this has contained the problem.
[30 Aug 2005 8:12]
Nathan Cheng
shows how closing all connections does nothing to free db memory
Attachment: debug-mysql.log (application/octet-stream, text), 70.27 KiB.
[30 Aug 2005 12:55]
Heikki Tuuri
Hi! To rule out a memory leak in InnoDB, you can monitor SHOW INNODB STATUS. It prints the total memory allocated by InnoDB. Regards, Heikki
[25 Sep 2005 9:53]
Valeriy Kravchuk
Hi, So, according to the error log provided you've got: 050829 8:40:00 [ERROR] /usr/local/mysql/bin/mysqld: Got error 12 from storage engine 050829 8:40:00 [ERROR] /usr/local/mysql/bin/mysqld: Sort aborted 050829 8:40:19 [ERROR] Got error 12 when reading table './newco/io_export_message_blob' error 12 when reading MyISAM table. Have you noted similar errors related to any of InnoDB tables (io_export_message)? If yes, please, send the results of SHOW INNODB STATUS command just after that, as Heikki asked you. We need to figure out is this possible memory leak related to InnoDB stogare engine. The results of ulimit -a command executed as a user who runs mysqld may be also useful. As well, as the server variables values (I saw no my.cnf file posted yet). You provided tables structure and a problem query. To make a test case we need to know the number of rows in each of these 2 tables. The results of EXPLAIN SELECT a.queue, a.destination, b.binary_data, a.tries, a.send_after, a.status, a.created, a.created_by, a.modified, a.modified_by, a.sequence_id FROM io_export_message AS a LEFT JOIN io_export_message_blob AS b USING(export_message_id ) WHERE a.export_message_id = ? with ? substituted by some reasonable value can also help to pinpoint the possible reason. Have you tried to work with the latest 4.1.14 version? I will be used for verification in any case. Thank you for your patience and commitment to MySQL.
[25 Oct 2005 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[6 Jan 2008 2:51]
Chris Miller
We're having problems with this too. How do I re-open this issue?
[7 Jan 2008 12:55]
Valeriy Kravchuk
Chris, Please, provide all the information you have on similar issue.
[8 Feb 2008 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".