Description:
there are multiple "pessimistic" update/insert/delete code paths where pread() is done while holding index lock, killing performance of busy systems, e.g.:
#0 0x0000003ee120d7a8 in pread64 () from /lib64/libpthread.so.0
#1 0x0000000000752d2c in os_file_pread (file=46, buf=0x2aaab3b70000, n=16384, offset=1465221120, offset_high=<value optimized out>) at os/os0file.c:2194
#2 0x0000000000752e1c in os_file_read (file=46, buf=0x2aaab3b70000, offset=1465221120, offset_high=0, n=16384) at os/os0file.c:2448
#3 0x000000000075710b in os_aio (type=10, mode=<value optimized out>, name=<value optimized out>, file=46, buf=0x2aaab3b70000, offset=1465221120, offset_high=0, n=16384, message1=0x3160468, message2=0x2
aaaab83f200, io_perf2=0x31601d0, table_io_perf=0x32abe80) at os/os0file.c:3748
#4 0x00000000008011d8 in _fil_io (type=10, sync=<value optimized out>, space_id=8865, zip_size=<value optimized out>, block_offset=<value optimized out>, byte_offset=<value optimized out>, len=16384, bu
f=0x2aaab3b70000, message=0x2aaaab83f200, table_io_perf=0x32abe80) at fil/fil0fil.c:4771
#5 0x00000000007e0408 in buf_read_page_low (err=0x41ed8a28, sync=1, mode=1465221120, space=8865, zip_size=<value optimized out>, unzip=<value optimized out>, tablespace_version=6054, offset=89430, trx=0
x32abab8) at buf/buf0rea.c:148
#6 0x00000000007e0634 in buf_read_page (space=8865, zip_size=0, offset=89430, trx=0x32abab8) at buf/buf0rea.c:188
#7 0x00000000007d6e81 in buf_page_get_gen (space=8865, zip_size=<value optimized out>, offset=89430, rw_latch=<value optimized out>, guess=0x0, mode=<value optimized out>, file=0x97415a "./include/btr0b
tr.ic", line=53, mtr=0x41ed94d0) at buf/buf0buf.c:2274
#8 0x00000000007c2101 in btr_cur_latch_leaves (page=0x2aaab5fc8000 "?\026eT", space=8865, zip_size=0, page_no=188058, latch_mode=<value optimized out>, cursor=<value optimized out>, mtr=0x41ed94d0) at .
/include/btr0btr.ic:53
#9 0x00000000007c4351 in btr_cur_search_to_nth_level (index=0x2aaac002c858, level=<value optimized out>, tuple=<value optimized out>, mode=<value optimized out>, latch_mode=<value optimized out>, cursor
=0x41ed9d70, has_search_latch=0, file=0x96a0f6 "row/row0ins.c", line=2014, mtr=0x41ed94d0) at btr/btr0cur.c:585
#10 0x000000000077420a in row_ins_index_entry_low (mode=33, index=0x2aaac002c858, entry=0x2aaac8005ef8, n_ext=<value optimized out>, thr=0x2aaac8005588) at row/row0ins.c:2012
#11 0x00000000007758bf in row_ins_step (thr=<value optimized out>) at row/row0ins.c:2195
#12 0x00000000007805b6 in row_insert_for_mysql (mysql_rec=<value optimized out>, prebuilt=0x2aaac800eb88) at row/row0mysql.c:1142
#13 0x000000000073049f in ha_innobase::write_row (this=0x2aaac800e730, record=0x2aaac80104f0 "? 89ea151ef602f18db3060544ff5e7613") at handler/ha_innodb.cc:5180
#14 0x000000000068c098 in handler::ha_write_row (this=0x2aaac800e730, buf=0x2aaac80104f0 "? 89ea151ef602f18db3060544ff5e7613") at handler.cc:4786
#15 0x0000000000615794 in write_record (thd=0x326f090, table=0x2aaac800de90, info=0x41eda1b0) at sql_insert.cc:1386
also, read-write lock can block both reads and writes, if a write comes in asking for an exclusive lock, but a read thread is holding it in share mode while doing pread(), e.g.:
Thread 432
#0 0x00007fe6d8ba1243 in pread64
#1 0x000000000074a22f in pread
#2 os_file_pread
#3 0x000000000074a3a4 in os_file_read
#4 0x000000000074c902 in os_aio
#5 0x00000000007e8bb4 in _fil_io
#6 0x00000000007cd028 in buf_read_page_low
#7 0x00000000007cd374 in buf_read_page
#8 0x00000000007c40c1 in buf_page_get_gen
#9 0x00000000007ae4db in btr_copy_blob_prefix
#10 0x00000000007b20c9 in btr_copy_externally_stored_field_prefix_low
#11 0x00000000007b27b0 in btr_copy_externally_stored_field
#12 btr_rec_copy_externally_stored_field
#13 0x0000000000777bfe in row_sel_store_mysql_rec
#14 0x000000000077b649 in row_search_for_mysql
#15 0x0000000000720709 in ha_innobase::index_read
#16 0x000000000068bd98 in handler::index_read_idx_map
#17 0x00000000005fe8b2 in join_read_const
#18 0x00000000006094ad in join_read_const_table
#19 0x000000000060c92c in make_join_statistics
#20 0x000000000060e017 in JOIN::optimize
#21 0x0000000000611574 in mysql_select
How to repeat:
run insert/update benchmark with large rows on a system where working dataset falls out of memory, example:
#!/usr/bin/python
item_size = 7000
item_space = 16*1024*1024*1024
import MySQLdb
import time
import random
import hashlib
import string
def gen_data():
return "".join([ random.choice(string.letters) for x in range(item_size) ])
def gen_key():
return hashlib.md5(str(random.randint(0, item_space / item_size))).hexdigest()
random.seed(time.time())
data = gen_data()
conn = MySQLdb.Connect(unix_socket="/tmp/mysql.sock", db="test")
conn.autocommit(True)
while True:
c = conn.cursor()
c.execute("INSERT INTO blobtest0 VALUES (%s, %s) ON DUPLICATE KEY UPDATE data=VALUES(data)", (gen_key(), data))
c = conn.cursor()
c.execute("SELECT * FROM blobtest0 WHERE k=%s", (gen_key(),))
c.fetchall()
Suggested fix:
don't lock whole b-tree when a block read is done