Bug #40901 Corrupt partition causes mysql server to crash
Submitted: 20 Nov 2008 19:34 Modified: 9 Jan 2009 8:01
Reporter: bhushan uparkar Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:5.1.28 OS:Any
Assigned to: CPU Architecture:Any
Tags: corrupt partition

[20 Nov 2008 19:34] bhushan uparkar
Description:
We have a range based partitioned table using innodb engine, few of the partitions got courrupted [ dont know why ]. When ever a query refers to these partitions it causes mysql server to crash.

Here is back trace from the server error log file,
================================================

ttempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
/usr/sbin/mysqld(print_stacktrace+0x1e) [0x736dfe]
/usr/sbin/mysqld(handle_segfault+0x346) [0x6106e6]
/lib/libpthread.so.0 [0x2b9027c76410]
/lib/libc.so.6(gsignal+0x3b) [0x2b90287ab07b]
/lib/libc.so.6(abort+0x10e) [0x2b90287ac84e]
/usr/sbin/mysqld(mem_area_alloc+0x281) [0x80d491]
/usr/sbin/mysqld(mem_heap_create_block+0x157) [0x80be77]
/usr/sbin/mysqld(mem_heap_add_block+0x56) [0x80bee6]
/usr/sbin/mysqld(trx_undo_get_undo_rec_low+0x19e) [0x84d52e]
/usr/sbin/mysqld(trx_undo_get_undo_rec+0x4a) [0x84d5da]
/usr/sbin/mysqld(trx_undo_prev_version_build+0x15c) [0x84efac]
/usr/sbin/mysqld(row_vers_build_for_consistent_read+0x1c0) [0x840960]
/usr/sbin/mysqld(row_search_for_mysql+0x29a9) [0x83a2e9]
/usr/sbin/mysqld(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function)+0x1d4) [0x7a8b94
]
/usr/sbin/mysqld(ha_partition::handle_ordered_index_scan(unsigned char*, bool)+0x1dc) [0x6f294c]
/usr/sbin/mysqld(handler::read_multi_range_first(st_key_multi_range**, st_key_multi_range*, unsigned int, bool, st_handler_buff
er*)+0xd6) [0x6eb286]
/usr/sbin/mysqld(QUICK_RANGE_SELECT::get_next()+0x13d) [0x6d43bd]
/usr/sbin/mysqld [0x6e5b2d]
/usr/sbin/mysqld(sub_select(JOIN*, st_join_table*, bool)+0xa9) [0x678319]
/usr/sbin/mysqld [0x679973]
/usr/sbin/mysqld(JOIN::exec()+0x8f9) [0x687de9]
/usr/sbin/mysqld(mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*,
 Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x17b) [0x6898cb]
/usr/sbin/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x169) [0x68a1b9]
/usr/sbin/mysqld [0x61b65e]
/usr/sbin/mysqld(mysql_execute_command(THD*)+0x2c83) [0x6202c3]
/usr/sbin/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x1c2) [0x623202]
/usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x3dc) [0x6235fc]
/usr/sbin/mysqld(do_command(THD*)+0xc7) [0x624547]
/usr/sbin/mysqld(handle_one_connection+0x746) [0x618246]
/lib/libpthread.so.0 [0x2b9027c6ff1a]
/lib/libc.so.6(__clone+0x72) [0x2b90288456c2]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x35b61db0 = SELECT f.universal_date_key, COUNT(distinct d1.advertiser_id)  FROM admob_dm_rep.fact_cpc_day_ad_hse
t f JOIN admob_dm.dim_ad d1 USING (ad_key) WHERE 20080401 <= f.universal_date_key AND f.universal_date_key < 20080701   AND f.h
andset_key IN     (SELECT d2.handset_key FROM admob_dm.dim_handset d2 where d2.model LIKE 'iPhone%') GROUP BY f.universal_date_
key
thd->thread_id=350050
thd->killed=NOT_KILLED
=================================================

How to repeat:
Here is the query which could cause this crash,
SELECT f.universal_date_key, COUNT(distinct d1.advertiser_id)  FROM admob_dm_rep.fact_cpc_day_ad_hset f JOIN admob_dm.dim_ad d1 
USING (ad_key) 
WHERE 20080401 <= f.universal_date_key AND 
f.universal_date_key < 20080701   AND 
f.handset_key IN     (SELECT d2.handset_key FROM admob_dm.dim_handset d2 
    where d2.model LIKE 'iPhone%') 
GROUP BY f.universal_date_key

Suggested fix:
Ideally if partitions gets corrupted the query should fail giving the proper error message would be ideal. Information schema should provide information on corrupt partitions so far identified/known or a command which will list corrupt partitions from a table/database would be ideal.

 Its nice if we could provide a way to get the partition data out of the corrupts one so we could recreate the partitions
[20 Nov 2008 19:58] bhushan uparkar
Some information in our environment,

OS : Debian Linux  2.6.18-6-amd64 x86_64 GNU/Linux
Hardware : Dell 2950 with MD3000 as attached storage
MySql Version : 5.1.28
Replication: Row based
[21 Nov 2008 15:53] MySQL Verification Team
Thank you for the bug report. Could you please provide a complete test case with all tables involved and insert data to run the offended query?. Thanks in advance.
[25 Nov 2008 23:24] bhushan uparkar
Additional table schema

Attachment: part_corrupt_additional_tab.sql.rtf (application/rtf, text), 6.87 KiB.

[25 Nov 2008 23:27] bhushan uparkar
I have updated the ticket with couple of table defination. Unfortunately I can not provide data as the server dies on the call to this partition. 

To me it appears easy to simulate test [ I may be wrong ] , since our objective is create corrupt partition and then query the corrupt partition.
[9 Dec 2008 8:01] Valeriy Kravchuk
Please, try to repeat with a newer version, 5.1.30, and inform about the results.
[10 Jan 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".