Bug #2104 after deleting row index got fragmented
Submitted: 12 Dec 2003 6:02 Modified: 30 Jan 2004 5:45
Reporter: Tom Zlos Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S1 (Critical)
Version:4.0.16 OS:Linux (RH 9, Apache 1.3.29)
Assigned to: Michael Widenius CPU Architecture:Any

[12 Dec 2003 6:02] Tom Zlos
Description:
Running RH9/Apache 1.3.29/php 4.3.4/mysql 4.0.16
MySQL is running on a separate machine, WWW has its own, separate machine.

I experience problems with queries using WHERE clause, running on frequently updated (by INSERT, UPDATE & DELETE) tables.

Example of query:
SELECT AVG( re.rating_value ) AS avgscore, r . * , r.date_added AS
date_review_added, i . * , v . * 
FROM rating AS r, items AS i, vendors AS v, rating_entries AS re
WHERE r.item_id = i.item_id AND v.vendor_id = i.vendor_id AND re.rating_id =
r.rating_id
GROUP BY rating_id
ORDER BY r.date_added DESC 
LIMIT 5  

This query requires extremely healthy indexes, as due to WHERE is used together
with GROUP BY and ORDER BY. When index is not refreshed at the execution of
DELETE query, problems with capability of handling huge temporary tables & files occurs immediately.

Size vs. index
--------------
The query is operating on 4 tables, which are not impressive by its size on
standalone basis:
- rating: approx. 27000 records
- rating_entries: approx. 31000 records
- items: approx. 4700 items
- vendor: approx. 400 items

But especially rating and rating_entries are updated approx. 5-10 times per
minute (including issuing DELETE commands to these tables). This is having a
tremendous impact on indexes, which for some reason are not refreshed after
deleting rows from a database.

Let's have a look on EXPLAIN of above mentioned query:

table  type  possible_keys  key  key_len  ref  rows  Extra  
i ALL PRIMARY,product_id,vendor_id,item_vendor_category NULL NULL NULL 4818
Using temporary; Using filesort 
v ref PRIMARY,vendor_id PRIMARY 4 i.vendor_id 1   
r ref PRIMARY,product_id,rating_id product_id 4 i.item_id 6   
re ref rating_id rating_id 4 r.rating_id 1 Using where 

This means we have a temporary table of 4818 x 1 x 6 x 1, total 28908 rows. Not
small, right? But those 4 tables are equipped with right indexes and execution
time goes below 1 second.

As soon as DELETE command removes one row from tables rating and rating_entries, which are relating to each other, we have problem with indexes of tables rating & rating_entries. For some (???) reason those indexes cannot be accessed anymore and mysqld builds a sort file on all the rows from these two tables.

Accordingly:

This means we have a temporary table of 4818 x 27000 x 31000 x 1, total
4032666000000 rows. Nice...

This hits hard disk and fills it immediately upto limits. At the same time CPU
(1 x P4 2,4MHz) stays at 30-50%, RAM is still enough, no memory leaks at all.
But simply it cannot work like that. One would need huge 40-60 GB RAM drive to
serve it correctly.

Of course there are ways to overcome it, for example by declaring cron job which constantly optimizes these tables. As I did. But then it becomes the question of luck, which process will be first right after deleting a row from these tables - SELECT query or OPTIMIZE query? 

Therefore my machine surrvives sometimes 8 days, but with huge load (like Google spidering my site) even each 5 minuts server has to be restarted.

Sorry, I have forgotten to mention, that optimization (run from apache' cron
job) goes each 20 minutes. As my mysql serves approx. 5000 queries per minute:

- I can't run cron more frequently, due to accesibility issues & performance of the server itself
- Indexes got fragmented anyhow :-(

How to repeat:
1. Good way to repeat this bug is:
- simply delete one row from table, which you use in your WHERE clause, 
- point your file manager (mc) to mysql temporary catalogue
- then run the query
- observe temporary file size

If your machine is strong enough and table is not locked you can try to obtain
EXPLAIN of your SELECT statement. Would be interesting for you to see how big
Kartesian table is.

In my case temporary files go immediately over 4GB of HDD space. This is not
actually stopping server, it simply extremely slows down execution of mysql
threads.  MyISAM tables get locked and all the other threads (including
OPTIMIZE) are waiting untill fat boys will be pushed through HDD.

2. Another way to repeat this bug is:
- run EXPLAIN command with your WHERE clause on healthy indexes, note down Kartesian size
- simply delete one row from table, which you use in your WHERE clause, 
- run EXPLAIN command with your WHERE clause, compare Kartesian size

Suggested fix:
Structure of the table must be stable & consistent after any data update.
[13 Dec 2003 19:40] Michael Widenius
The following answer is from bug report #1758

There are a few separate, independent problems with indexes when one
does a lot of deletes on a table

- The cardinality of the index changes (not likely but possible) and MySQL uses 
 the wrong index in the query.

  In this case one could try to update the cardinality with ANALYZE
  TABLE (faster than OPTIMIZE TABLE) or use IGNORE INDEX / USE INDEX in
  the query to get MySQL to use the right indexes.

- The indexes gets 'unbalanced' and the range optimizer estimates
  wrongly the wrong number of matching keys for expressions like
  'key_column op constant'.  In this case one should update the index
  with OPTIMIZE TABLE or use IGNORE INDEX / USE INDEX.

- The index files could be fragmented on disk, causing more seeks than
are optimal, but I don't think this is the problem here.

Neither of the above problems are frequent and only happens to some
specific tables and specific usage.  One common case where the index
gets unbalanced is when you have an indexed timestamp columns and you
insert a lot of new rows and delete a lot of old rows with an old
timestamp.  In this case the optimizer will estimate wrongly the
number of rows for queries like

WHERE timestamp_column BETWEEN date and data2

In most cases the estimate is still 'good enough' but we know of cases
when the estimate is so wrong that MySQL will instead choose another
index to solve the query.

Fragmentation of indexes should not happen for just a single delete of
one row.  If this is the case then there is some unknown factor
involved and we would need to get a test case to be able to find this
out.

To know what is happening in this particular case I would need the
explain from tables before the delete is done (when the query was fast) 
and the explain after the delete was done (and the query is slow).

You can the the information what exactly happened during a query by doing:

mysql> FLUSH STATUS;
mysql> SELECT ...;
       <A short version of the output from SELECT,
       including the time taken to run the query>
mysql> SHOW STATUS;
       <output from SHOW STATUS>

(The above assumes you are the only one using the system as the STATUS
variables are global).

The best way to know what is happening is of course if we could get a
copy of the problem tables when the query is slow so that we can test
this yourself. (Feel free to reset any columns that are not part of
the query with NULL or ""). Upload the tables, with a description of
the problem, to ftp://support.mysql.com/pub/mysql/secret.

If you are unsure about the safety of your data, you can always take a support contract from MySQL AB as this will give you legal ground that your data is safe.

NOTE: We resently found out that 4.0.16 has a bug (4.0.15 was ok) that it in
some cases makes table scans when it doesn't have to.  This will be
fixed in 4.0.17 that will be out in a few days.

Regards,
Monty
[14 Dec 2003 9:55] Tom Zlos
>To know what is happening in this particular case I would need the
>explain from tables before the delete is done (when the query was fast) 
>and the explain after the delete was done (and the query is slow).

I have tried my best to obtain this EXPLAIN, but as tables get locked it returns no result...

The only thing which may lead to some conclusions is my slow queries log:

# User@Host: user[user] @ www.foo.pl [192.168.10.11]
# Query_time: 194  Lock_time: 169  Rows_sent: 5  Rows_examined: 20262
SELECT AVG(re.rating_value) as avgscore,r.*,r.date_added as date_review_added,i.*,v.* FROM rating as r, items as i, vendors as v, rating_entries as re  WHERE (...)

and please compare Rows_examined with this one

# User@Host: user[user] @ www.foo.pl [192.168.10.11]
# Query_time: 257  Lock_time: 201  Rows_sent: 5  Rows_examined: 94676
SELECT AVG(re.rating_value) as avgscore,r.*,r.date_added as date_review_added,i.*,v.* FROM rating as r, items as i, vendors as v, rating_entries as re  WHERE (...)

I am still observing mysql host with top command, these are my findings from performance point of view:
- during ussual work free memory is approx. 400 MB, CPU load approx. 10%
- when processes start to hang:
   :: RAM consumption jumps to 100%, 
   :: CPU load approx. 50%, 
   :: at the same time mysqld processes jump from 60-100MB to 400-500MB of
      memory
   :: tmp directory gets filled with approx. 4GB of temporary MYD and MYI files

During hang I have tried to ANALYZE tables, but with no success. Only OPTIMIZE - while pushed through the process list, settles the problem.

Mysql restart during the problem does not help, right after restarting again processes start to be hanged. Again, after restart only OPTIMIZE and manual killing of the hanged processes until OPTIMIZE TABLE gets unlocked are helping to solve the issue.

Queries are using WHERE on logical condition and ORDER BY timestamp.
[14 Dec 2003 10:01] Tom Zlos
TOP command during the hang produces this:
------------------------------------------

19:13:53  up 2 days,  4:43,  1 user,  load average: 38.14, 22.85, 10.72
182 processes: 132 sleeping, 50 running, 0 zombie, 0 stopped
CPU states:  46.7% user  14.5% system   0.0% nice   0.0% iowait  38.6% idle
Mem:  1022808k av, 1013364k used,    9444k free,       0k shrd,  154144k buff
                    627256k actv,  168424k in_d,   40636k in_c
Swap: 1020088k av,  111620k used,  908468k free                  446172k cache

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
17075 mysql     16   0  440M 357M  1736 R     5.9 35.8   0:13   0 mysqld
17225 mysql     15   0  440M 357M  1736 S     3.9 35.8   0:10   0 mysqld
17249 mysql     16   0  440M 357M  1736 R     2.9 35.8   0:04   0 mysqld
17079 mysql     16   0  440M 357M  1736 R     2.7 35.8   0:26   0 mysqld
17237 mysql     16   0  440M 357M  1736 R     2.1 35.8   0:10   0 mysqld
16812 mysql     16   0  440M 357M  1736 R     1.9 35.8   0:45   0 mysqld
16817 mysql     15   0  440M 357M  1736 S     1.9 35.8   0:36   0 mysqld
17232 mysql     16   0  440M 357M  1736 R     1.9 35.8   0:06   0 mysqld
17235 mysql     15   0  440M 357M  1736 S     1.9 35.8   0:06   0 mysqld
16814 mysql     16   0  440M 357M  1736 R     1.7 35.8   0:57   0 mysqld
17233 mysql     15   0  440M 357M  1736 S     1.5 35.8   0:01   0 mysqld
17338 mysql     16   0  440M 357M  1736 R     1.5 35.8   0:00   0 mysqld
17240 mysql     15   0  440M 357M  1736 S     1.3 35.8   0:06   0 mysqld
17321 mysql     15   0  440M 357M  1736 S     1.3 35.8   0:00   0 mysqld
17334 mysql     15   0  440M 357M  1736 S     1.3 35.8   0:00   0 mysqld
16820 mysql     16   0  440M 357M  1736 R     1.1 35.8   0:33   0 mysqld
16822 mysql     16   0  440M 357M  1736 R     1.1 35.8   0:39   0 mysqld

TOP command during the normal work produces this:
--------------------------------------------------

19:53:40  up 2 days,  5:23,  2 users,  load average: 0.14, 0.13, 2.18
180 processes: 179 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:   0.0% user 100.0% system   0.0% nice   0.0% iowait   0.0% idle
Mem:  1022808k av,  541028k used,  481780k free,       0k shrd,  162776k buff
                    320360k actv,  130080k in_d,   36852k in_c
Swap: 1020088k av,   27472k used,  992616k free                  217356k

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
17711 root      16   0  1176 1176   808 R    99.9  0.1   0:00   0 top
    1 root      15   0   104   68    52 S     0.0  0.0   0:03   0 init
    2 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 keventd
    3 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kapmd
    4 root      34  19     0    0     0 SWN   0.0  0.0   0:02   0 ksoftirqd_CP
    9 root      15   0     0    0     0 SW    0.0  0.0   0:02   0 bdflush
    5 root      15   0     0    0     0 SW    0.0  0.0   0:06   0 kswapd
    6 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kscand/DMA
    7 root      16   0     0    0     0 SW    0.0  0.0   2:17   0 kscand/Norma
    8 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kscand/HighM
   10 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kupdated
   11 root      24   0     0    0     0 SW    0.0  0.0   0:00   0 mdrecoveryd
   17 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 scsi_eh_0
   18 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 scsi_eh_1
   21 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kjournald
   78 root      25   0     0    0     0 SW    0.0  0.0   0:00   0 khubd
  291 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kjournald
  292 root      15   0     0    0     0 SW    0.0  0.0   0:22   0 kjournald
  293 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kjournald
  294 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kjournald
  295 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kjournald
  572 root      15   0   280  268   196 S     0.0  0.0   0:00   0 syslogd
  576 root      15   0    56    4     0 S     0.0  0.0   0:00   0 klogd
  585 root      15   0   364  224   140 S     0.0  0.0   0:00   0 sshd
  596 root      24   0   148    4     0 S     0.0  0.0   0:00   0 xinetd
  669 root      15   0   400  188   124 S     0.0  0.0   0:00   0 master
  684 root      15   0   152  132    84 S     0.0  0.0   0:00   0 crond
  696 root      21   0    56    4     0 S     0.0  0.0   0:00   0 mingetty
  697 root      21   0    56    4     0 S     0.0  0.0   0:00   0 mingetty
  698 root      21   0    56    4     0 S     0.0  0.0   0:00   0 mingetty
  699 root      21   0    56    4     0 S     0.0  0.0   0:00   0 mingetty
  700 root      21   0    56    4     0 S     0.0  0.0   0:00   0 mingetty
  701 root      21   0    56    4     0 S     0.0  0.0   0:00   0 mingetty
16196 root      19   0   708  564   536 S     0.0  0.0   0:00   0 su
16197 root      15   0  1300 1292   940 S     0.0  0.1   0:00   0 bash
17378 root      25   0  1124  952   948 S     0.0  0.0   0:00   0 mysqld_safe
17403 mysql     15   0  102M  90M  1780 S     0.0  9.1   0:00   0 mysqld
17404 mysql     15   0  102M  90M  1780 S     0.0  9.1   0:00   0 mysqld
17405 mysql     20   0  102M  90M  1780 S     0.0  9.1   0:00   0 mysqld
17406 mysql     25   0  102M  90M  1780 S     0.0  9.1   0:00   0 mysqld
17407 mysql     25   0  102M  90M  1780 S     0.0  9.1   0:00   0 mysqld
17408 mysql     20   0  102M  90M  1780 S     0.0  9.1   0:00   0 mysqld
17409 mysql     24   0  102M  90M  1780 S     0.0  9.1   0:00   0 mysqld
17410 mysql     15   0  102M  90M  1780 S     0.0  9.1   0:00   0 mysqld
17411 mysql     24   0  102M  90M  1780 S     0.0  9.1   0:00   0 mysqld
[14 Dec 2003 10:45] Tom Zlos
Here some files from my temp directory:

„       Name      „  Size  „    MTime    „ 
„ /..             „ UP--DIR„             „ 
„  #sql_43fb_0.MYD„ 118554K„ Dec 14 19:17„ 
„  #sql_43fb_0.MYI„    1024„ Dec 14 19:15„ 
„  #sql_43fb_1.MYD„ 118554K„ Dec 14 19:17„ 
„  #sql_43fb_1.MYI„    1024„ Dec 14 19:15„ 
„  #sql_43~_10.MYD„ 107529K„ Dec 14 19:17„ 
„  #sql_43~_10.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_11.MYD„ 100192K„ Dec 14 19:17„ 
„  #sql_43~_11.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_12.MYD„  84877K„ Dec 14 19:17„ 
„  #sql_43~_12.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_13.MYD„  95633K„ Dec 14 19:17„ 
„  #sql_43~_13.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_14.MYD„  67684K„ Dec 14 19:17„ 
„  #sql_43~_14.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_15.MYD„ 108857K„ Dec 14 19:17„ 
„  #sql_43~_15.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_16.MYD„  76829K„ Dec 14 19:17„ 
„  #sql_43~_16.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_17.MYD„  59156K„ Dec 14 19:17„ 
„  #sql_43~_17.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_18.MYD„  62723K„ Dec 14 19:17„ 
„  #sql_43~_18.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43fb_2.MYD„ 118554K„ Dec 14 19:16„ 
„  #sql_43fb_2.MYI„    1024„ Dec 14 19:15„ 
„  #sql_43~_20.MYD„  52171K„ Dec 14 19:17„ 
„  #sql_43~_20.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_23.MYD„  11741K„ Dec 14 19:17„ 
„  #sql_43~_23.MYI„    1024„ Dec 14 19:17„ 
„  #sql_43~_25.MYD„  36821K„ Dec 14 19:17„ 
„  #sql_43~_25.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_26.MYD„  41636K„ Dec 14 19:17„ 
„  #sql_43~_26.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_28.MYD„  41697K„ Dec 14 19:17„ 
„  #sql_43~_28.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43fb_3.MYD„ 118554K„ Dec 14 19:17„ 
„  #sql_43fb_3.MYI„    1024„ Dec 14 19:15„ 
„  #sql_43~_31.MYD„       0„ Dec 14 19:16„ 
„  #sql_43~_31.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_32.MYD„  33901K„ Dec 14 19:17„ 
„  #sql_43~_32.MYI„    1024„ Dec 14 19:16„ 
„  #sql_43~_35.MYD„  15362K„ Dec 14 19:17„ 
„  #sql_43~_35.MYI„    1024„ Dec 14 19:17„ 
„  #sql_43~_37.MYD„  11971K„ Dec 14 19:17„ 
„  #sql_43~_37.MYI„    1024„ Dec 14 19:17„ 
„  #sql_43~_39.MYD„ 8368668„ Dec 14 19:17„ 
„  #sql_43~_39.MYI„    1024„ Dec 14 19:17„ 
„  #sql_43fb_4.MYD„ 118554K„ Dec 14 19:17„ 
„  #sql_43fb_4.MYI„    1024„ Dec 14 19:15„ 

Please correct me if I am wrong, but MYI files with 1024 size are just empty indexes, right?
[30 Jan 2004 5:45] Michael Widenius
Don't understand what you mean with that you can't get an EXPLAIN
becasue of table locks.

EXPLAIN doesn't cause any new table locks compared to the SELECT that
you are doing an EXPLAIN of.  If you can do a SELECT you can do an EXPLAIN

I am sorry, but it doesn't help to get a description of the problem
you have as we know that MySQL in general works perfectly in
situations likes yours.  There must be something unique to your setup
and until we know this, by having an exact, preferably repeatable
example, it's very hard to help you solve this problem.

If the situation is as you describe, it should be trivial to do a test
case of this on another computer.  Have you tried this ?

MYI files with 1024 size are index files without index. (They just
contain metadata for the .MYD file).

Note that this public bug systems is only intended for bugs that we
are able to trivially repeat. If you can create a repeatable bug of
this, we will do our best to fix this to the next MySQL release.  We
can't however afford to spend time on finding a bug when we don't have
enough information to repeat it as this could take any amount of time.

Regards,
Monty