Bug #43541 Innodb hangs leaving mysql datbase in corrupt state
Submitted: 10 Mar 19:26 Modified: 20 Aug 19:14
Reporter: bhushan uparkar
Status: Open
Category:Server: InnoDB Severity:S2 (Serious)
Version:5.1.30 OS:Other (Debian 4.1.1-21 - Linux version 2.6.24-etchnhalf.1-amd64 (Debian 2.6.24-6~etchnhalf.7) (dannf@debian)
Assigned to: Target Version:
Tags: Innodb hangs, corrupt tables

[10 Mar 19:26] bhushan uparkar
Description:
We have dataware house environment where we use mysql 5.1 with partitioning and innodb
engines for tables. The database size is around 1000GB, and we use innodb file per table
option for all tables. 
 During normal operations we have 8-10 threads writing to the db server and 6-8 threads
running queries on db. Also we have daily report job where we manipulate the data from
multiple tables and then populate the data into table which is further used for
reporting. When the daily report job is executing the db load on server goes from 10 [
during normal hours ] to 15. The system is working for last 6 months without any problems
till yesterday [3.4.2009] where it crashed the mysql server and corrupted entire database.
 We have to use the slave db server to restore database back , but the whole process was
time consuming and cumbersome.

How to repeat:
Have 8-10 db writes going simultaenously and peform complex query spanning 2-3 large
tables [100GB or more] .  This will most of the time produce the same scenario as
described above.

Suggested fix:
 At no points innodb should corrupt table, and entire databases because of high load on
db server. Since 5.1 uses partition , it should restrict it to the only related
partitions and error log should point out information about corrupt partitions. Also we
use file per table option, which should be used to pin point corrupt pages/partitions. 

These kind of bugs makes mysql/innodb less appealing to dataware house environment.
[10 Mar 19:33] bhushan uparkar
error log for dmdb904

Attachment: dmdb904.err.gz (application/x-gzip, text), 27.92 KiB.

[10 Mar 19:43] bhushan uparkar
I have attached my.cnf as well as error log from the db server where crash happened. As
can be seen from the error log, innodb  thread #  1156966752 has waited at
dict/dict0dict.c line 763 for 255.00 seconds the semaphore and the details for thread #
1156966752 
=======================
---TRANSACTION 0 640503234, not started, process no 23974, OS thread id 1156966752
mysql tables in use 101, locked 0
MySQL thread id 825004, query id 635801759 dm101.sc9.admob.int 10.11.16.124 queryuser
select universal_date_key, admob_dm_rep.fact_account_transaction.owner_key, owner_id,
owner_email, payment_type, transaction_type, transaction_subtype, amount  from
admob_dm_rep.fact_account_transaction inner join admob_dm.dim_account  on
admob_dm.dim_account.owner_key = admob_dm_rep.fact_account_transaction.owner_key  inner
join admob_dm.dim_transaction_type  on
admob_dm.dim_transaction_type.transaction_type_key=
admob_dm_rep.fact_account_transaction.transaction_type_key  where
admob_dm_rep.fact_account_transaction.universal_date_key >= 20081126  and
admob_dm_rep.fact_account_transaction.unive
=======================

 I checked the hardware for server, and we do not found anything wrong with it. Also we
do not use blob data type as well as few  TEXT datatype columns.
==================
mysql> select TABLE_CATALOG    , TABLE_SCHEMA   ,  TABLE_NAME , COLUMN_NAME , DATA_TYPE  
from columns where DATA_TYPE='TEXT' or DATA_TYPE='BLOB' and TABLE_SCHEMA != 'mysql' ;
+---------------+--------------+-----------------------+------------------+-----------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME            | COLUMN_NAME      | DATA_TYPE |
+---------------+--------------+-----------------------+------------------+-----------+
| NULL          | admob_dm     | dim_ad                | ad_text          | text      | 
| NULL          | admob_dm     | dim_ad                | ad_url           | text      | 
| NULL          | admob_dm     | dim_ad                | mob_page_text    | text      | 
| NULL          | admob_dm     | dim_ad                | ad_content       | text      | 
| NULL          | admob_dm     | dim_ad_20081111       | ad_text          | text      | 
| NULL          | admob_dm     | dim_ad_20081111       | ad_url           | text      | 
| NULL          | admob_dm     | dim_ad_20081111       | mob_page_text    | text      | 
| NULL          | admob_dm     | dim_ad_20081111       | ad_content       | text      | 
| NULL          | admob_dm     | dim_ad_bak_20090114   | ad_text          | text      | 
| NULL          | admob_dm     | dim_ad_bak_20090114   | ad_url           | text      | 
| NULL          | admob_dm     | dim_ad_bak_20090114   | mob_page_text    | text      | 
| NULL          | admob_dm     | dim_ad_bak_20090114   | ad_content       | text      | 
| NULL          | admob_dm     | dim_page              | page_content     | text      | 
| NULL          | admob_dm     | dim_site              | site_description | text      | 
| NULL          | admob_dm     | dim_site              | site_ad_filter   | text      | 
| NULL          | admob_dm     | dim_site              | site_content     | text      | 
| NULL          | admob_dm     | dim_site_bak_20090114 | site_description | text      | 
| NULL          | admob_dm     | dim_site_bak_20090114 | site_ad_filter   | text      | 
| NULL          | admob_dm     | dim_site_bak_20090114 | site_content     | text      | 
| NULL          | mysql        | help_topic            | description      | text      | 
| NULL          | mysql        | help_topic            | example          | text      | 
+---------------+--------------+-----------------------+------------------+-----------+
21 rows in set (0.12 sec)
==============
 Let me know what additional things you need from my end.