Bug #61735 index->lock held while preads are done
Submitted: 4 Jul 2011 11:30 Modified: 8 Jul 2011 6:01
Reporter: Domas Mituzas Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S3 (Non-critical)
Version:5.1, etc OS:Any
Assigned to: CPU Architecture:Any

[4 Jul 2011 11:30] Domas Mituzas
Description:
there are multiple "pessimistic" update/insert/delete code paths where pread() is done while holding index lock, killing performance of busy systems, e.g.:

#0  0x0000003ee120d7a8 in pread64 () from /lib64/libpthread.so.0
#1  0x0000000000752d2c in os_file_pread (file=46, buf=0x2aaab3b70000, n=16384, offset=1465221120, offset_high=<value optimized out>) at os/os0file.c:2194
#2  0x0000000000752e1c in os_file_read (file=46, buf=0x2aaab3b70000, offset=1465221120, offset_high=0, n=16384) at os/os0file.c:2448
#3  0x000000000075710b in os_aio (type=10, mode=<value optimized out>, name=<value optimized out>, file=46, buf=0x2aaab3b70000, offset=1465221120, offset_high=0, n=16384, message1=0x3160468, message2=0x2
aaaab83f200, io_perf2=0x31601d0, table_io_perf=0x32abe80) at os/os0file.c:3748
#4  0x00000000008011d8 in _fil_io (type=10, sync=<value optimized out>, space_id=8865, zip_size=<value optimized out>, block_offset=<value optimized out>, byte_offset=<value optimized out>, len=16384, bu
f=0x2aaab3b70000, message=0x2aaaab83f200, table_io_perf=0x32abe80) at fil/fil0fil.c:4771
#5  0x00000000007e0408 in buf_read_page_low (err=0x41ed8a28, sync=1, mode=1465221120, space=8865, zip_size=<value optimized out>, unzip=<value optimized out>, tablespace_version=6054, offset=89430, trx=0
x32abab8) at buf/buf0rea.c:148
#6  0x00000000007e0634 in buf_read_page (space=8865, zip_size=0, offset=89430, trx=0x32abab8) at buf/buf0rea.c:188
#7  0x00000000007d6e81 in buf_page_get_gen (space=8865, zip_size=<value optimized out>, offset=89430, rw_latch=<value optimized out>, guess=0x0, mode=<value optimized out>, file=0x97415a "./include/btr0b
tr.ic", line=53, mtr=0x41ed94d0) at buf/buf0buf.c:2274
#8  0x00000000007c2101 in btr_cur_latch_leaves (page=0x2aaab5fc8000 "?\026eT", space=8865, zip_size=0, page_no=188058, latch_mode=<value optimized out>, cursor=<value optimized out>, mtr=0x41ed94d0) at .
/include/btr0btr.ic:53
#9  0x00000000007c4351 in btr_cur_search_to_nth_level (index=0x2aaac002c858, level=<value optimized out>, tuple=<value optimized out>, mode=<value optimized out>, latch_mode=<value optimized out>, cursor
=0x41ed9d70, has_search_latch=0, file=0x96a0f6 "row/row0ins.c", line=2014, mtr=0x41ed94d0) at btr/btr0cur.c:585
#10 0x000000000077420a in row_ins_index_entry_low (mode=33, index=0x2aaac002c858, entry=0x2aaac8005ef8, n_ext=<value optimized out>, thr=0x2aaac8005588) at row/row0ins.c:2012
#11 0x00000000007758bf in row_ins_step (thr=<value optimized out>) at row/row0ins.c:2195
#12 0x00000000007805b6 in row_insert_for_mysql (mysql_rec=<value optimized out>, prebuilt=0x2aaac800eb88) at row/row0mysql.c:1142
#13 0x000000000073049f in ha_innobase::write_row (this=0x2aaac800e730, record=0x2aaac80104f0 "? 89ea151ef602f18db3060544ff5e7613") at handler/ha_innodb.cc:5180
#14 0x000000000068c098 in handler::ha_write_row (this=0x2aaac800e730, buf=0x2aaac80104f0 "? 89ea151ef602f18db3060544ff5e7613") at handler.cc:4786
#15 0x0000000000615794 in write_record (thd=0x326f090, table=0x2aaac800de90, info=0x41eda1b0) at sql_insert.cc:1386

also, read-write lock can block both reads and writes, if a write comes in asking for an exclusive lock, but a read thread is holding it in share mode while doing pread(), e.g.:

Thread 432
#0  0x00007fe6d8ba1243 in pread64
#1  0x000000000074a22f in pread
#2  os_file_pread
#3  0x000000000074a3a4 in os_file_read
#4  0x000000000074c902 in os_aio
#5  0x00000000007e8bb4 in _fil_io
#6  0x00000000007cd028 in buf_read_page_low
#7  0x00000000007cd374 in buf_read_page
#8  0x00000000007c40c1 in buf_page_get_gen
#9  0x00000000007ae4db in btr_copy_blob_prefix
#10 0x00000000007b20c9 in btr_copy_externally_stored_field_prefix_low
#11 0x00000000007b27b0 in btr_copy_externally_stored_field
#12 btr_rec_copy_externally_stored_field
#13 0x0000000000777bfe in row_sel_store_mysql_rec
#14 0x000000000077b649 in row_search_for_mysql
#15 0x0000000000720709 in ha_innobase::index_read
#16 0x000000000068bd98 in handler::index_read_idx_map
#17 0x00000000005fe8b2 in join_read_const
#18 0x00000000006094ad in join_read_const_table
#19 0x000000000060c92c in make_join_statistics
#20 0x000000000060e017 in JOIN::optimize
#21 0x0000000000611574 in mysql_select

How to repeat:
run insert/update benchmark with large rows on a system where working dataset falls out of memory, example:

#!/usr/bin/python

item_size = 7000
item_space = 16*1024*1024*1024

import MySQLdb
import time
import random
import hashlib
import string

def gen_data():
	return "".join([ random.choice(string.letters) for x in range(item_size) ])

def gen_key():
	return hashlib.md5(str(random.randint(0, item_space / item_size))).hexdigest()

random.seed(time.time())

data = gen_data()

conn = MySQLdb.Connect(unix_socket="/tmp/mysql.sock", db="test")
conn.autocommit(True)

while True:
  c = conn.cursor()
  c.execute("INSERT INTO blobtest0 VALUES (%s, %s) ON DUPLICATE KEY UPDATE data=VALUES(data)", (gen_key(), data))
  c = conn.cursor()
  c.execute("SELECT * FROM blobtest0 WHERE k=%s", (gen_key(),))
  c.fetchall()

Suggested fix:
don't lock whole b-tree when a block read is done
[5 Jul 2011 6:41] MySQL Verification Team
http://dom.as/2011/07/03/innodb-index-lock/