Bug #1758 | Mysql crashed and had to be manually restarted | ||
---|---|---|---|
Submitted: | 5 Nov 2003 9:36 | Modified: | 13 Jan 2004 11:37 |
Reporter: | Richard Thomas | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server | Severity: | S1 (Critical) |
Version: | 4.0.16-0 | OS: | Linux (Redhat 9) |
Assigned to: | CPU Architecture: | Any |
[5 Nov 2003 9:36]
Richard Thomas
[11 Nov 2003 7:20]
Martin Mokrejs
You have to resolve the stacktrace and post the resolved version at least. For example, this might help you: #!/bin/sh echo "This scripts expects stack trace from your mysql error log file" echo "placed in the current directory as ./stack" if [ ! -r ./stack ]; then echo "./stack file not present or readable" exit 255 fi if [ -r /usr/local/mysql/bin/mysqld.sym.gz ]; then gzip -dc /usr/local/mysql/bin/mysqld.sym.gz > ./symbols else echo "nm -n /usr/local/mysql/bin/mysqld > ./symbols" nm -n /usr/local/mysql/bin/mysqld > ./symbols fi echo "/usr/local/mysql/bin/resolve_stack_dump -s ./symbols -n ./stack" /usr/local/mysql/bin/resolve_stack_dump -s ./symbols -n ./stack You should find uout which table caused the crashed and what was the SQL command used.
[11 Nov 2003 8:29]
Richard Thomas
Results from provided script 0x8071fa4 + 134684580 0x8292178 + 136913272 0x82a3081 + 136982657 0x8292509 + 136914185 0x82a34a0 + 136983712 0x8309c98 + 137403544 0x8309cbd + 137403581 0x830a776 + 137406326 0x830a945 + 137406789 0x8309aac + 137403052 0x80c1499 + 135009433 0x80be417 + 134997015 0x80975c9 + 134837705 0x8097fb7 + 134840247 0x8096169 + 134832489 0x8095916 + 134830358 0x807cb4a + 134728522 0x8080775 + 134743925 0x807bcc3 + 134724803 0x807b70e + 134723342 0x807aef8 + 134721272 0x828f92c + 136902956 0x82c303a + 137113658 Suspect query from logs SELECT count(*) as parts FROM broker_parts.partall WHERE comp_id = 4099 Server has been up 6 days since reboot without problems now.
[12 Nov 2003 8:07]
Richard Thomas
Another crash today.. Same exact stack, diffrent query. query at 0x870f120 = SELECT COUNT(*) AS cnt FROM broker_users.companytable WHERE (approval=1 OR approval=2) AND signup_rep=16 Stacks match,query is diffrent.
[10 Dec 2003 6:21]
Tom Zlos
Running RH9/Apache 1.3.29/php 4.3.4/mysql 4.0.16 MySQL is running on a separate machine, WWW has its own, separate machine. I experience very same problems with queries using WHERE clause, running on frequently updated (by INSERT, UPDATE & DELETE) tables. Example of query: SELECT AVG( re.rating_value ) AS avgscore, r . * , r.date_added AS date_review_added, i . * , v . * FROM rating AS r, items AS i, vendors AS v, rating_entries AS re WHERE r.item_id = i.item_id AND v.vendor_id = i.vendor_id AND re.rating_id = r.rating_id GROUP BY rating_id ORDER BY r.date_added DESC LIMIT 5 This query requires extremely healthy indexes, as due to WHERE is used together with GROUP BY and ORDER BY. When index is not refreshed at the execution of DELETE query, problems with capability of handling huge temporary tables & files occurs immediately. Size vs. index -------------- The query is operating on 4 tables, which are not impressive by its size on standalone basis: - rating: approx. 27000 records - rating_entries: approx. 31000 records - items: approx. 4700 items - vendor: approx. 400 items But especially rating and rating_entries are updated approx. 5-10 times per minute (including issuing DELETE commands to these tables). This is having a tremendous impact on indexes, which for some reason are not refreshed after deleting rows from a database. Let's have a look on EXPLAIN of above mentioned query: table type possible_keys key key_len ref rows Extra i ALL PRIMARY,product_id,vendor_id,item_vendor_category NULL NULL NULL 4818 Using temporary; Using filesort v ref PRIMARY,vendor_id PRIMARY 4 i.vendor_id 1 r ref PRIMARY,product_id,rating_id product_id 4 i.item_id 6 re ref rating_id rating_id 4 r.rating_id 1 Using where This means we have a temporary table of 4818 x 1 x 6 x 1, total 28908 rows. Not small, right? But those 4 tables are equipped with right indexes and execution time goes below 1 second. As soon as DELETE command removes one row from tables rating and rating_entries, which are relating to each other, we have problem with indexes of tables rating & rating_entries. For some (???) reason those indexes cannot be accessed anymore and mysqld builds a sort file on all the rows from these two tables. Accordingly: This means we have a temporary table of 4818 x 27000 x 31000 x 1, total 4032666000000 rows. Nice... This hits hard disk and fills it immediately upto limits. At the same time CPU (1 x P4 2,4MHz) stays at 30-50%, RAM is still enough, no memory leaks at all. But simply it cannot work like that. One would need huge 40-60 GB RAM drive to serve it correctly. Of course there are ways to overcome it, for example by declaring cron job which constantly optimizes these tables. As I did. But then it becomes the question of luck, which process will be first right after deleting a row from these tables - SELECT query or OPTIMIZE query? Therefore my machine surrvives sometimes 8 days, but with huge load (like Google spidering my site) even each 5 minuts server has to be restarted. Hope this will lead in the right direction.
[10 Dec 2003 7:58]
Richard Thomas
I think your right, a fragmented index can cause higher memory usage and issues. I was already optimizing every night, the problems did not start till I moved to a new server dedicated to mysql ( the previous was shared with apache and sendmail ). I fine tuned the memory really close to using the max ram. After the crashes I scaled back those settings slightly and have had a uptime of 21 days now.. the longest yet. If your having issues try reducing your memory settings in my.cnf some and see if that helps.
[11 Dec 2003 0:36]
Tom Zlos
Well. I am even more surprised, as I have also recently moved to a dedicated mysql machine. For me the real issue is fragmentation of indexes. As I have 2GB of RAM only, exceeding some Linux system limites values is not a problem, I hope. Good way to repeat this bug is: - simple delete one row from table, which you use in your WHERE clause, - point your file manager (mc) to mysql temporary catalogue - then run the query - observe temporary file size If your machine is strong enough and table is not locked you can try to obtain EXPLAIN of your SELECT statement. Would be interesting for you to see how big Kartesian table is. In my case temporary files go immediately over 4GB of HDD space. This is not actually stopping server, it simply extremely slows down execution of mysql threads. MyISAM tables get locked and all the other threads (including OPTIMIZE) are waiting untill fat boys will be pushed through HDD. I am just wondering, should I report a separated bug related to index fragmentation?
[11 Dec 2003 0:52]
Tom Zlos
Sorry, I have forgotten to mention, that optimization (run from apache' cron job) goes each 20 minutes. As my mysql serves approx. 5000 queries per minute: - I can't rung cron more frequently, due to accesibility issued & performance of the server itself - Indexes got fragmented anyhow :-(
[13 Dec 2003 11:37]
Michael Widenius
This bug report contains two different, independent issues. I try to handle both here. Future discussion about 'framgendted indexes' should however be moded to a separte bug report. The crash: The stack trace doesn't help us as this is not symbolic (we need the names of the called functions to get an idea of the problem). Sorry but I don't know why the script supplied by Martin Mokrejs doesn't work :( Please read the link http://www.mysql.com/doc/en/Using_stack_trace.html for how to do the stack trace. If you have more crashes, I would recommend you to download the debug version of the MySQL server and start this with the --core-file option. If you now get a crash, then the generated core file should contain enough information to solve the problem. You can upload the core file, the used mysqld binary and a short description of the bug to ftp://support.mysql.com/pub/mysql/secret/ so that we can analyse it. About index fragementation (should be moved to another bug report) There are a few separate, independent problems with indexes when one does a lot of deletes on a table - The cardinality of the index changes (not likely but possible) and MySQL uses the wrong index in the query. In this case one could try to update the cardinality with ANALYZE TABLE (faster than OPTIMIZE TABLE) or use IGNORE INDEX / USE INDEX in the query to get MySQL to use the right indexes. - The indexes gets 'unbalanced' and the range optimizer estimates wrongly the wrong number of matching keys for expressions like 'key_column op constant'. In this case one should update the index with OPTIMIZE TABLE or use IGNORE INDEX / USE INDEX. - The index files could be fragmented on disk, causing more seeks than are optimal, but I don't think this is the problem here. Neither of the above problems are frequent and only happens to some specific tables and specific usage. One common case where the index gets unbalanced is when you have an indexed timestamp columns and you insert a lot of new rows and delete a lot of old rows with an old timestamp. In this case the optimizer will estimate wrongly the number of rows for queries like WHERE timestamp_column BETWEEN date and data2 In most cases the estimate is still 'good enough' but we know of cases when the estimate is so wrong that MySQL will instead choose another index to solve the query. Fragmentation of indexes should not happen for just a single delete of one row. If this is the case then there is some unknown factor involved and we would need to get a test case to be able to find this out. To know what is happening in this particular case I would need the explain from tables before the delete is done (when the query was fast) and the explain after the delete was done (and the query is slow). You can the the information what exactly happened during a query by doing: mysql> FLUSH STATUS; mysql> SELECT ...; <A short version of the output from SELECT, including the time taken to run the query> mysql> SHOW STATUS; <output from SHOW STATUS> (The above assumes you are the only one using the system as the STATUS variables are global). The best way to know what is happening is of course if we could get a copy of the problem tables when the query is slow so that we can test this yourself. (Feel free to reset any columns that are not part of the query with NULL or ""). Upload the tables, with a description of the problem, to ftp://support.mysql.com/pub/mysql/secret. We presently found out that 4.0.16 has a bug (4.0.15 was ok) that in some cases makes table scans when it doesn't have to. This will be fixed in 4.0.17 that will be out in a few days. Regads, Monty
[14 Feb 2005 22:54]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".