MySQL Bugs: #40791: Inserting a large number of rows in Maria causes a hang

Bug #40791	Inserting a large number of rows in Maria causes a hang
Submitted:	17 Nov 2008 16:45	Modified:	18 Dec 2008 8:56
Reporter:	Vemund Østgaard	Email Updates:
Status:	Can't repeat	Impact on me:	None
Category:	MySQL Server: Maria storage engine	Severity:	S3 (Non-critical)
Version:	6.0.8	OS:	Linux (Linux siv35 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:56:28 EST 2006 x86_64 x86_64 x86_64 GNU/Linux)
Assigned to:		CPU Architecture:	Any

Description:
Problem is observed when running suite/large_tests after changing engine in the .test file to maria.

The test inserts larger and larger chunks of data into the same table, doing repeatedly: "insert into t1 select * from t1". After about 15 minutes of this, the test has reached an insert of about 67 million records (and the current size of the table is also 67 million records). At around this point in the test activity on the server seemed to slow down. I connected to mysqld with mysql client and did a "select count(*) from t1;", which returned and did not return an answer and has now been hanging like that for 2 hours. The test is also hanging at the same insert statement.

The problem has been observed repeatedly and seemes 100% reproducible when running with the regular mysqld. When using mysqld-debug I was not able to reproduce the same problem (the test completed the 67 million record insert and proceeded to the next).

The stacktrace of all the threads will be attached after the bug has been created. 

How to repeat:
Run suite/large_tests after changing engine in the .test file to maria. Might not reproduce with a mysqld compiled with debug.

threaddump

Attachment: threaddump (application/octet-stream, text), 13.71 KiB.

Thank you for this bug report. Unfortunately, the threaddump is invalid, as it states that
#8  0x0000000000a1e968 in pagecache_unlock_by_link (pagecache=0x1aaf930, block=0x1aaf320, lock=PAGECACHE_LOCK_READ_UNLOCK, pin=28009498,
    first_REDO_LSN_for_page=3003604992, lsn=35, was_changed=1 '\001', any=0 '\0') at ma_pagecache.c:3013
#9  0x0000000000086745 in ?? ()
#10 0x0000000001aaf320 in ?? ()
#11 0x0000000001aaf930 in ?? ()
#12 0x00000000b3076000 in ?? ()
#13 0x0000000000a36dd8 in _ma_set_share_data_file_length (share=0x1abe5a0, new_length=0) at ma_state.c:550

which is impossible (_ma_set_share_data_file_length() is a very short function which does not call pagecache_unlock_by_link()).

such bad thread dump makes me wonder about memoy corrution. Could you please re-run the test with mysqld under Valgrind?

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Actually Vemund provided feedback, and the problem seems to be gone now (could be explained by some recent fixes by Monty and Serg for bugs corrupting memory).