MySQL Bugs: #82592: MySQL InnoDB update locking

Bug #82592	MySQL InnoDB update locking
Submitted:	16 Aug 2016 3:55	Modified:	23 Aug 2016 14:27
Reporter:	Ivan Ma	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S3 (Non-critical)
Version:	5.7.14	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
System is reported to fail (lock timeout) with concurrent update to the same row.

After more deep dive into the issue, the following is identified :
Same Table, Same Row, updating same column with varchar but different length data.

The locking is very expensive and causing timeout and system gets failed.

To repeat :
Database.Table : test.test (with Only 1 row)

Preparation :
drop table if exists test.test;

create table test.test(col1 int primary key, col2 bigint,col3 varchar(40)) engine=innodb charset=latin1;
begin;
insert into test.test(col1,col2,col3) values(1,1,'123');
commit;

Test :
(Terminal 1 or background job)
TESTA : mysqlslap -S </..../socketfile> --concurrency=100 --iterations=1000000000  --query="update test.test set col2=col2+1,col3='aaaa' where col1=1 limit 1;" --create-schema=test

(Terminal 2 or background job)
TESTB : mysqlslap -S </..../socketfile> --concurrency=30 --iterations=100000000 --query="update test.test set col2=col2+1,col3='bb' where col1=1 limit 1;" --create-schema=test

The 2 concurrent jobs running with update value with different length (aaaa or bb).

Checking continuously on the 
 show status like '%innodb_rows_update%';

The value with ONLY single mysqlslap (TESTA or TESTB), the value can grow properly.    Even with multiple TESTA running, the system function correctly.

Running TESTA and TESTB concurrently, the system behaves abnormally and eventually, one of TESTA or TESTB or both will die because of Lock Timeout.

e.g.
mysqlslap: Cannot run query update test.test set col2=col2+1,col3='aaaa' where col1=1 limit 1; ERROR : Lock wait timeout exceeded; try restarting transaction

How to repeat:

Preparation :
drop table if exists test.test;

create table test.test(col1 int primary key, col2 bigint,col3 varchar(40)) engine=innodb charset=latin1;
begin;
insert into test.test(col1,col2,col3) values(1,1,'123');
commit;

Test :
(Terminal 1 or background job)
TESTA : mysqlslap -S </..../socketfile> --concurrency=100 --iterations=1000000000  --query="update test.test set col2=col2+1,col3='aaaa' where col1=1 limit 1;" --create-schema=test

(Terminal 2 or background job)
TESTB : mysqlslap -S </..../socketfile> --concurrency=30 --iterations=100000000 --query="update test.test set col2=col2+1,col3='bb' where col1=1 limit 1;" --create-schema=test

The 2 concurrent jobs running with update value with different length (aaaa or bb).

Checking continuously on the 
 show status like '%innodb_rows_update%';

The value with ONLY single mysqlslap (TESTA or TESTB), the value can grow properly.    Even with multiple TESTA running, the system function correctly.

Running TESTA and TESTB concurrently, the system behaves abnormally and eventually, one of TESTA or TESTB or both will die because of Lock Timeout.

e.g.
mysqlslap: Cannot run query update test.test set col2=col2+1,col3='aaaa' where col1=1 limit 1; ERROR : Lock wait timeout exceeded; try restarting transaction

Suggested fix:
Please help ... to see if there is any metalock.

The lock timeout value is changed to 8 seconds.  The default is too long.
innodb_lock_wait_timeout =8

Configuration File attached :
[mysqld]
explicit_defaults_for_timestamp
server-id=1
datadir=/u02/mysql/data/test1
basedir=/home/mysql/mysql
port=3316
socket=/u02/mysql/data/test1/mysqld.sock
character-set-server=utf8
log-error=/u02/mysql/data/test1/mysqld.error
sql-mode=NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY
log-bin=mysqllog.bin
gtid-mode=on
enforce-gtid-consistency
master_info_repository=TABLE
relay_log_info_repository=TABLE
innodb_lock_wait_timeout =8
max_connections = 3500
sync_binlog=0
innodb_flush_log_at_trx_commit = 2

Hi,

First of all, InnoDB is a transactional storage engine (SE) and as such, lock wait timeouts and deadlocks are expected to happen and are the application programmer has to accommodate for those.

Second, in order to be able to repeat the behavior, we require a full test case, including tables definitions, their content and commands that were run.

Third, and last, as ACID SE's like InnoDB, have lock escalation, this could be that case. In order to find out whether this is so, we require the output from :

SHOW ENGINE INNODB STATUS;

Understood the lock timeout.  But the impact is significant.

With concurrency - same column with same length update, the system runs perfectly fine.

With concurrency - same column with different length varchar update, the system performance is significantly impacted.

The video is attached.

Hi!

Updating column with a value of the same length should be MUCH faster then the update of the columns with different lengths. This is due to the organization of the rows within the InnoDB page. The organization depends both, on the row format and file format that you have chosen for your InnoDB storage engine, or for your table, as this is part of the table definition within InnoDB SE.

Hence, this is the expected behavior. As I wrote before, if a difference in wait time is too large, send us InnoDB statuses for both operations.

Perf top showed below functions takes many cpu:
26.58%  mysqld                                [.] lock_rec_has_to_wait_in_queue
12.13%  mysqld                                [.] lock_rec_dequeue_from_page
11.02%  mysqld                                [.] RecLock::lock_add
 9.59%  mysqld                                [.] lock_rec_add_to_queue
 0.86%  libc-2.23.so                          [.] _int_malloc
 0.82%  mysqld                                [.] ut_delay

A simple test case to repeat it.

CREATE TABLE t1(c1 INT PRIMARY KEY, c2 VARCHAR(40)) ENGINE=INNODB;
INSERT INTO t1 VALUES(1, 'abcd');
SET GLOBAL innodb_deadlock_detect=OFF;
--exec $MYSQL_SLAP -S var/tmp/mysqld.1.sock -u root --concurrency=100 --iterations=100000000  --query="UPDATE t1 SET c2=repeat('b', rand()*10) WHERE c1=1;" --create-schema=test

For the operations in question, the perf ranking is approximately what is expected.

Some further explanations on why this is not a bug.

There is no secondary index, and the PRIMARY KEY is not being updated.  The clustered index record can be updated within the same page.

A variable-length record cannot be updated in-place if the length is changing. So, btr_cur_optimistic_update() cannot invoke the shortcut implemented in btr_cur_update_in_place().

However, the latching should be the same. The table should always consist of the single clustered index root page. However, the slowness should not propagate to user lock waits, but it could be due to the starvation problem.  The 8-second user lock wait timeout should not be exceeded.

I think that "Not a Bug" might be wrong.
"Won't fix" could be more appropriate. Or we could try to do something
about the problem.

There is no secondary index, and the PRIMARY KEY is not being updated.
The clustered index record can be updated within the same page, no matter if it is fixed-length or variable-length.

A variable-length record cannot be updated in-place if the length is
changing. So, btr_cur_optimistic_update() cannot invoke the shortcut
implemented in btr_cur_update_in_place().

In either case, the table should always consist of the single clustered index
root page. I do not see why the increased slowness of the operation should propagate to user lock timeouts. A page latch should not be held for longer than some milliseconds. Adding a millisecond to the latch hold time should not explain why the 8-second user lock wait timeout should be exceeded.
Everything is in main memory, too.

When it comes to InnoDB record locking, there is a small difference between in-place updates and optimistic updates: When we are updating a record in-place, the heap number will not change, and there is no need to update
the heap numbers in the explicit locks that point to the page.
A delete+insert would change the heap number and have to
update the lock structures accordingly.

An even more expensive class of updates would be a pessimistic update.
In a pessimistic operation, pages have to be allocated or freed. Normally,
they are caused by page splits or merges.

Also BLOB operations will be handled as pessimistic operations.
I would suggest a test where we have a column that is at least half
the innodb_page_size. Then, UPDATE the BLOB column in the tests. Any
update will do; InnoDB will do copy-on-write, keeping the index tree
X-latched so that no other thread can access the data, not even for the
SELECT part of the UPDATE.

Something like this:

create table test.test(col1 int primary key, col2 bigint,col3 blob) engine=innodb charset=latin1;
begin;
insert into test.test(col1,col2,col3) values(1,1,repeat('1',12345));
commit;

Then,
--query="update test.test set col2=col2+1,col3=repeat('a',12345) where col1=1 limit 1;"
and so on. How would it perform? Does it make a difference if the table is defined as ROW_FORMAT=DYNAMIC instead of ROW_FORMAT=COMPACT or ROW_FORMAT=REDUNDANT?

In the above BLOB case, the heap number should remain unchanged, because everything in the record itself would be fixed-length.

If you want to test the impact of page splits and merges, you could try SQL like this:

BEGIN;INSERT INTO test.test VALUES
(2,2,REPEAT('b',7000)),(3,3,REPEAT('c',7000)),(4,4,REPEAT('d',7000));
UPDATE test.test set col2=col2+1,col3='bb' where col1=1;
ROLLBACK;

The INSERTs should force page splits, and the ROLLBACK should invoke
btr_compress() to merge pages.

For the INSERTs, you would have to take gap locks into account.
The easiest way would be to INSERT with negative col1 values (-2,-3,-4) in one thread, and positive (2,3,4) in the other.

In this way, you should still get transactional locking conflicts on the UPDATE only.

Yet another case would be to use BLOBs for col3 in the UPDATE part of this test. Note that the REPEAT('b',7000) is on purpose chosen to be less than the InnoDB BLOB storage threshold when using innodb_page_size=16 (forcing in-page storage). And in my previous comment, the REPEAT('b',12345) should be above the BLOB storage threshold (forcing off-page storage).