Bug #43541 Innodb hangs leaving mysql datbase in corrupt state
Submitted: 10 Mar 2009 18:26 Modified: 23 Aug 2010 17:53
Reporter: bhushan uparkar Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.1.30 OS:Other (Debian 4.1.1-21 - Linux version 2.6.24-etchnhalf.1-amd64 (Debian 2.6.24-6~etchnhalf.7) (dannf@debian)
Assigned to: CPU Architecture:Any
Tags: corrupt tables, Innodb hangs

[10 Mar 2009 18:26] bhushan uparkar
Description:
We have dataware house environment where we use mysql 5.1 with partitioning and innodb engines for tables. The database size is around 1000GB, and we use innodb file per table option for all tables. 
 During normal operations we have 8-10 threads writing to the db server and 6-8 threads running queries on db. Also we have daily report job where we manipulate the data from multiple tables and then populate the data into table which is further used for reporting. When the daily report job is executing the db load on server goes from 10 [ during normal hours ] to 15. The system is working for last 6 months without any problems till yesterday [3.4.2009] where it crashed the mysql server and corrupted entire database.  We have to use the slave db server to restore database back , but the whole process was time consuming and cumbersome.

How to repeat:
Have 8-10 db writes going simultaenously and peform complex query spanning 2-3 large tables [100GB or more] .  This will most of the time produce the same scenario as described above.

Suggested fix:
 At no points innodb should corrupt table, and entire databases because of high load on db server. Since 5.1 uses partition , it should restrict it to the only related partitions and error log should point out information about corrupt partitions. Also we use file per table option, which should be used to pin point corrupt pages/partitions. 

These kind of bugs makes mysql/innodb less appealing to dataware house environment.
[10 Mar 2009 18:33] bhushan uparkar
error log for dmdb904

Attachment: dmdb904.err.gz (application/x-gzip, text), 27.92 KiB.

[10 Mar 2009 18:43] bhushan uparkar
I have attached my.cnf as well as error log from the db server where crash happened. As can be seen from the error log, innodb  thread #  1156966752 has waited at dict/dict0dict.c line 763 for 255.00 seconds the semaphore and the details for thread # 1156966752 
=======================
---TRANSACTION 0 640503234, not started, process no 23974, OS thread id 1156966752
mysql tables in use 101, locked 0
MySQL thread id 825004, query id 635801759 dm101.sc9.admob.int 10.11.16.124 queryuser
select universal_date_key, admob_dm_rep.fact_account_transaction.owner_key, owner_id, owner_email, payment_type, transaction_type, transaction_subtype, amount  from admob_dm_rep.fact_account_transaction inner join admob_dm.dim_account  on admob_dm.dim_account.owner_key = admob_dm_rep.fact_account_transaction.owner_key  inner join admob_dm.dim_transaction_type  on admob_dm.dim_transaction_type.transaction_type_key=
admob_dm_rep.fact_account_transaction.transaction_type_key  where admob_dm_rep.fact_account_transaction.universal_date_key >= 20081126  and admob_dm_rep.fact_account_transaction.unive
=======================

 I checked the hardware for server, and we do not found anything wrong with it. Also we do not use blob data type as well as few  TEXT datatype columns.
==================
mysql> select TABLE_CATALOG    , TABLE_SCHEMA   ,  TABLE_NAME , COLUMN_NAME , DATA_TYPE   from columns where DATA_TYPE='TEXT' or DATA_TYPE='BLOB' and TABLE_SCHEMA != 'mysql' ;
+---------------+--------------+-----------------------+------------------+-----------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME            | COLUMN_NAME      | DATA_TYPE |
+---------------+--------------+-----------------------+------------------+-----------+
| NULL          | admob_dm     | dim_ad                | ad_text          | text      | 
| NULL          | admob_dm     | dim_ad                | ad_url           | text      | 
| NULL          | admob_dm     | dim_ad                | mob_page_text    | text      | 
| NULL          | admob_dm     | dim_ad                | ad_content       | text      | 
| NULL          | admob_dm     | dim_ad_20081111       | ad_text          | text      | 
| NULL          | admob_dm     | dim_ad_20081111       | ad_url           | text      | 
| NULL          | admob_dm     | dim_ad_20081111       | mob_page_text    | text      | 
| NULL          | admob_dm     | dim_ad_20081111       | ad_content       | text      | 
| NULL          | admob_dm     | dim_ad_bak_20090114   | ad_text          | text      | 
| NULL          | admob_dm     | dim_ad_bak_20090114   | ad_url           | text      | 
| NULL          | admob_dm     | dim_ad_bak_20090114   | mob_page_text    | text      | 
| NULL          | admob_dm     | dim_ad_bak_20090114   | ad_content       | text      | 
| NULL          | admob_dm     | dim_page              | page_content     | text      | 
| NULL          | admob_dm     | dim_site              | site_description | text      | 
| NULL          | admob_dm     | dim_site              | site_ad_filter   | text      | 
| NULL          | admob_dm     | dim_site              | site_content     | text      | 
| NULL          | admob_dm     | dim_site_bak_20090114 | site_description | text      | 
| NULL          | admob_dm     | dim_site_bak_20090114 | site_ad_filter   | text      | 
| NULL          | admob_dm     | dim_site_bak_20090114 | site_content     | text      | 
| NULL          | mysql        | help_topic            | description      | text      | 
| NULL          | mysql        | help_topic            | example          | text      | 
+---------------+--------------+-----------------------+------------------+-----------+
21 rows in set (0.12 sec)
==============
 Let me know what additional things you need from my end.
[23 Jul 2010 17:53] Valeriy Kravchuk
Please, check if this ever happens with a newer version, 5.1.49.
[23 Aug 2010 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".