Bug #17126 CHECK TABLE on InnoDB causes a short hang during check of adaptive hash
Submitted: 4 Feb 2006 23:28 Modified: 23 Jun 2008 21:31
Reporter: Gabriel Barazer
Status: Closed
Category:Server: InnoDB Severity:S1 (Critical)
Version:5.0.18 / 4.1.13a OS:Linux (Linux x86_64)
Assigned to: Bugs System Target Version:

[4 Feb 2006 23:28] Gabriel Barazer
Description:
This has been tested on 2 production servers, one is version 4.1.13a compiled from
sources, the other is 5.0.18 Standard x86_64.

Config: the 2 servers are Dual-Opteron 246 with 4 and 8GB RAM, disks are RAID 10 arrays
on 10KRPM hard drives. OS is Fedora Core 3 x86_64, kernel 2.6.12.

MySQL config :
innodb_additional_mem_pool_size = 32M
innodb_buffer_pool_size = 2G (6G on the 2nd server)
innodb_data_file_path =
ibdata1:2G;ibdata2:2G;ibdata3:2G;ibdata4:2G;ibdata5:2G;ibdata6:2G
innodb_file_io_threads = 4
innodb_thread_concurrency = 16
innodb_flush_log_at_trx_commit = 2
innodb_fast_shutdown
innodb_log_buffer_size = 16M
innodb_log_file_size = 500M
innodb_log_files_in_group = 4
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 30

Context: the servers have an average of 800req/sec each, but no latency, and no overload
: less than 1.0 on the system

When using the CHECK TABLE command on InnoDB tables, all the next queries on InnoDB
tables are locked. When there is too many queries (and if the CHECK query is not manually
killed), the server does not crash (no error message, no core dump, nothing), but eats all
the system memory and connecting to the server is slower until trying to connect gives a
timeout.

This happens even when checking a small table like this one with only 1 row (from a
customer) :
CREATE TABLE `system2_thema` (
  `id_thema` tinyint(4) NOT NULL default '0',
  `url_thema` varchar(255) character set ascii NOT NULL default '',
  `nom` varchar(255) NOT NULL default '',
  `resum` varchar(255) NOT NULL default '',
  `id_admin` int(11) NOT NULL default '0',
  `image` varchar(255) NOT NULL default '',
  `id_forum` tinyint(4) NOT NULL default '0',
  `espo` varchar(255) NOT NULL default '',
  `etat` tinyint(4) NOT NULL default '0',
  PRIMARY KEY  (`id_thema`),
  KEY `url_thema` (`url_thema`,`etat`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

How to repeat:
Run MySQL Server 4.1.13a or 5.0.18 (this happens on both) on x86_64 platform.

I've been able to test the case only on production server, i think it happens only with
load on the server

do a CHECK TABLE on any InnoDB table. during the CHECK, do many queries on other InnoDB
tables (not the one that is checked).

The processlist growth, and next the server timeouts when trying to connect.

I have the complete scenario timeline and INNODB STATUS at time - few seconds, time = 0
(check table command), time +  4 seconds (many queries in queue on the innodb status).

Suggested fix:
The only "workaround" is a the moment to kill the CHECK query and kill all the queued
processes until the processlist becomes empty.
[5 Feb 2006 10:31] Heikki Tuuri
Gabriel,

your server uses InnoDB's adaptive hash indexes extensively. You have 3.7 million cells
used, and the hash heap is about 200 MB. CHECK TABLE validates the whole adaptive hash
index, regardless of the table that you are checking.

That explains why queries stop to wait until the check has completed.

The question is why your server seems to hang altogether? Maybe the pile-up of queries
eats all the memory?

If you reduce the the innodb_buffer_pool_size, does the CHECK then run to completion?

The simple workaround is not to run CHECK TABLE on a heavily loaded server.

Regards,

Heikki
[5 Feb 2006 16:14] Gabriel Barazer
Heikki,

Because i don't know what is the adaptive hash indexes, i haven't taken care of these
numbers in the past. But now i've seen them : what are theses 3.7 millions of indexes ?
If all InnoDB queries need them and CHECK TABLE lock all of them to complete, I now
understand why all the queries stop. Is there not anything to do to make CHECK TABLE
validating only the parts related to the checked tables ?

I reduced the innodb_buffer_pool_size (from 2G to 1G on the first server, and from 6G to
4G on the 2nd) and tested the CHECK command on small tables with less than 10 rows and
after ~10 seconds of "freeze" (all innodb related queries stopped and waiting) the check
completed successfully. I think if the CHECK was a bit longer in time, the server would
have timeout as before. Additionally, with the innodb buffer pool reduced, the server
can't handle as many queries as before without using temp space, and there is now plenty
of unused memory...

I don't use the CHECK command on InnoDB tables, (when i want to clean up things, i do a
simple ALTER TABLE ... TYPE='INNODB' to rebuild indexes), but i discovered this issue
from a customer, who has checked a table and froze the server...(the default table type
is innodb, because this engine is far better than MyISAM). As i can't disable the CHECK
command (i remember that the SELECT privilege is enough to do CHECK TABLE), i can't
explain to my current and future customers to not use the CHECK command (which is
sometimes used automatically in web applications installed by the customers, and they
don't even know how all this works...).

The workarounds I think about are : 
- Having a special privileges to do maintenance tasks (check, repair even if 'repair' is
not used by the innodb engine), so it is possible to deny doing CHECK TABLE

- Reduce the adaptive hash index (how ? I don't know what is it, and i don't know how it
works). I can do all the tests if you need.

- Make any action done on 1 table totally independant from the others. If 1 table is
blocking the queries on itself , it is a normal behavior, but blocking the whole innodb
engine allow any customer to do a denial of service on the server...

Regards,
Gabriel
[6 Feb 2006 10:53] Heikki Tuuri
Gabriel,

ok, looks like it was not a genuine deadlock. The pile-up of connections just exhausted
the server's resources.

Hmm.. this is a tradeoff. If we remove the check of adaptive hash indexes from CHECK
TABLE, we will not get bug reports if something is wrong with them. A bug was discovered
just 1 month ago through this mechanism.

We could add a my.cnf parameter to switch off the adaptive hash index check?

I wonder if you should really configure less memory for innodb_buffer_pool_size? Also
many other factors can cause the server to stall for a few seconds. If that causes an
infinite hang, that is not good!

Regards,

Heikki
[6 Feb 2006 14:56] Gabriel Barazer
Heikki,

Was the bug discovered related to the CHECK TABLE command ?

Just a thought : I read the manual page for the adaptive hash indexes :
http://dev.mysql.com/doc/refman/5.0/en/innodb-adaptive-hash.html

It seems to be a kind a cache system. Maybe is it possible to invalidate or destroy the
data related to a table when it is checked, so no more validation to do :
- some queries are issued to an innodb table
- the adaptive hash index for the table is built
- a CHECK TABLE is issued
- CHECK TABLE checks the integrity / consistency of the table
- the hash index related to the table is invalidated / destroyed
- futures queries to the table will make the hash index to be rebuilt.

As i don't know the Innodb internals, it is just my point of view. Maybe i'm saying crazy
things :)

But at the moment an option in my.cnf is right for me !

I tested different size for innodb_buffer_pool_size, and clearly the server goes faster
(in fact, it can handle more threads at the same time) when the value is higher. The
example config file advices to set innodb_buffer_pool_size to 80% of the physical RAM,
which gives 6.5GB for a 8GB server and I set 6GB. Same calc method for the 4GB server
(2GB set). And even with theses values, the server never swaps (or maybe 150KB max, but
not related to MySQL). When I lower this value, the memory not allocated is not used at
all, except when doing CHECK TABLE.

I used MySQL for many years, and since i discovered InnoDB, i never had a case when the
server stalled with a query, so i think this problem is not related to other factors.

Is it complicated to make a patch to add the my.cnf option ? Can the fact of not checking
this hash index cause integrity problem, or worse cause database corruption not repairable
with a server restart ?

Regards,
Gabriel
[6 Feb 2006 18:19] Heikki Tuuri
Hi!

I found a better solution. We can split ha_validate() and btr_search_validate() to small
phases, and re-acquire the X-latch on the btr_search_latch before each phase. The
validation operation simply loops through the hash array. We can release the
btr_search_latch after each, say, 10,000 loops.

I am assigning this to Osku. This should be fixed in 5.0 and 5.1. We will not fix this in
4.1.

Regards,

Heikki
[7 Feb 2006 10:52] Gabriel Barazer
Hi guys!

I don't understand how all of this works, but the solution you found will limit the CHECK
command to a maximum fixed time ? If i'm right, this can be a good compromise.

I will soon migrate my last 4.1 server to 5.0.

I suppose this will be commited in the next release ?

Thanks,
Gabriel
[5 Apr 2006 21:14] Elliot Murphy
Fixed in InnoDB snapshot368; fixes are in 5.0.20.
[11 Apr 2006 4:33] Paul DuBois
Noted in 5.0.20 changelog.

<literal>InnoDB</literal> tables with an adaptive hash blocked
other queries during <literal>CHECK TABLE</literal> statements
while the entire hash was checked. This could be a long time
for a large hash. (Bug #17126)
[23 Jun 2008 21:08] Brett Garland
I am also having this same issue on a heavily loaded mysql server.  I'm running version
5.0.51a of mysql.  Do I need to open a new bug, or just post my issue here?
[23 Jun 2008 21:31] Gabriel Barazer
I don't know, it depends on the real cause of your problem. Does your server have a big
adaptive hash index? Check your innodb status and look for "Hash table size..., used
cells ...".