MySQL Bugs: #98103: unexpected behavior while logging an aborted query in the slow query log

Bug #98103	unexpected behavior while logging an aborted query in the slow query log
Submitted:	1 Jan 2020 5:14	Modified:	13 Jan 2020 11:28
Reporter:	Pranay Motupalli	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Logging	Severity:	S3 (Non-critical)
Version:	All, 8.0.18	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	mdl, slow_query_log

Description:
When a query gets aborted due to exceeding lock_wait_timeout while the query was waiting for table metadata lock, the aborted query is being written to the slow query log.

Also, the time it spent in "waiting for table metadata lock" is getting counted under query_time and lock_time is taken as 0.

The same is the case when I kill the query while it was waiting for table meta data lock after long_query_time.

According to MySQL documentation https://dev.mysql.com/doc/refman/5.6/en/slow-query-log.html , the below is mentioned 

"The time to acquire the initial locks is not counted as execution time. MySQL writes a statement to the slow query log after it has been executed and after all locks have been released" 

Ideally, MySQL shouldn't have logged this query in slow query log. 

How to repeat:
######### Scenario 1 #########

Query got aborted after lock_wait_timeout and the query was logged in the slow query log. Also, it has counted the time while it was waiting for table metadata lock in the Query_time.

mysql> select @@slow_query_log,@@long_query_time,@@log_output,@@lock_wait_timeout;
+------------------+-------------------+--------------+---------------------+
| @@slow_query_log | @@long_query_time | @@log_output | @@lock_wait_timeout |
+------------------+-------------------+--------------+---------------------+
|                1 |         10.000000 | FILE         |                  30 |
+------------------+-------------------+--------------+---------------------+
1 row in set (0.00 sec)

Session 1:
----------

mysql> lock table t3 write;
Query OK, 0 rows affected (0.00 sec)

Session 2:
----------
mysql> select SQL_NO_CACHE count(*) from t3;

Session 1:
----------
mysql> show full processlist;
+----+------+-----------+------+---------+------+---------------------------------+--------------------------------------+
| Id | User | Host      | db   | Command | Time | State                           | Info                                 |
+----+------+-----------+------+---------+------+---------------------------------+--------------------------------------+
|  1 | root | localhost | db99 | Query   |    0 | init                            | show full processlist                |
|  2 | root | localhost | db99 | Query   |   12 | Waiting for table metadata lock | select SQL_NO_CACHE count(*) from t3 |
|  3 | root | localhost | NULL | Sleep   | 1049 |                                 | NULL                                 |
+----+------+-----------+------+---------+------+---------------------------------+--------------------------------------+
3 rows in set (0.00 sec)

After lock_wait_timeout,

Session 2:
----------
mysql> select SQL_NO_CACHE count(*) from t3;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Checked the slow query log: [ The above query was logged ]
---------------------------

root@ip-172-31-26-131:/usr/local/mysql/data# tail -5  ip-172-31-26-131-slow.log
# User@Host: root[root] @ localhost []  Id:     2
# Query_time: 30.000452  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 0
use db99;
SET timestamp=1577769017;
select SQL_NO_CACHE count(*) from t3;

######## Scenario 2 #########
Killed the query after long_query_time while it was waiting for table metadata lock the and the killed query was logged in the slow query log. Also, it has counted the time while it was waiting for table metadata lock in the Query_time before it got killed.

mysql> select @@slow_query_log,@@long_query_time,@@log_output,@@lock_wait_timeout;
+------------------+-------------------+--------------+---------------------+
| @@slow_query_log | @@long_query_time | @@log_output | @@lock_wait_timeout |
+------------------+-------------------+--------------+---------------------+
|                1 |         10.000000 | FILE         |                  30 |
+------------------+-------------------+--------------+---------------------+
1 row in set (0.00 sec)

Session 1:
----------
mysql> lock table t3 write;
Query OK, 0 rows affected (0.00 sec)

Session 2:
----------
mysql> select SQL_NO_CACHE count(*) from t3;

Session 1:
----------

mysql> show full processlist;
+----+------+-----------+------+---------+------+---------------------------------+--------------------------------------+
| Id | User | Host      | db   | Command | Time | State                           | Info                                 |
+----+------+-----------+------+---------+------+---------------------------------+--------------------------------------+
|  1 | root | localhost | db99 | Query   |    0 | init                            | show full processlist                |
|  2 | root | localhost | db99 | Query   |   12 | Waiting for table metadata lock | select SQL_NO_CACHE count(*) from t3 |
|  3 | root | localhost | NULL | Sleep   | 1049 |                                 | NULL                                 |
+----+------+-----------+------+---------+------+---------------------------------+--------------------------------------+
3 rows in set (0.00 sec)

After long_query_time I.e. 10 sec, I killed the query.

Session 2:
----------
mysql> select SQL_NO_CACHE count(*) from t3;
^CCtrl-C -- sending "KILL QUERY 2" to server ...
Ctrl-C -- query aborted.
ERROR 1317 (70100): Query execution was interrupted

Checked the slow query log: [ The above query was logged ]
---------------------------

root@ip-172-31-26-131:/usr/local/mysql/data# tail -5  ip-172-31-26-131-slow.log
# Time: 191231  5:54:19
# User@Host: root[root] @ localhost []  Id:     1
# Query_time: 16.521522  Lock_time: 0.000000 Rows_sent: 0  Rows_examined: 1012505
SET timestamp=1577771659;
select SQL_NO_CACHE count(*) from t3;

By checking the trace, noticed that , In the case of MDL, utime_after_lock and thd->start_utime have the same value while calling update_server_status() which is basically set by THD::set_time function and utime_after_lock value is not getting updated at all after that. Because of this, while updating the server status, the following condition is resulting to true and hence, query is getting logged.

void update_server_status()
  {
    ulonglong end_utime_of_query= current_utime();
    if (end_utime_of_query > utime_after_lock + variables.long_query_time)
      server_status|= SERVER_QUERY_WAS_SLOW;
  }

So, the condition "end_utime_of_query > utime_after_lock + variables.long_query_time" will be true.

(gdb) p this->start_utime
$22 = 1577699611659829
(gdb) p utime_after_lock
$18 = 1577699611659829
(gdb) p end_utime_of_query
$19 = 1577699635781233
1577699635781233 > 1577699611659829 + 10000000 (long_query_time) -> True

In the case of RECORD LOCKS, utime_after_lock is getting updated with the help of the following function and hence the aborted query is not getting logged in slow query log.
void thd_storage_lock_wait(THD *thd, long long value)
{
  thd->utime_after_lock+= value;
}

Stack Trace:
-----------

MDL:
----
Thread 25 "mysqld" hit Breakpoint 4, THD::update_server_status (this=0x555557e93e80) at /root/mysql-server/sql/sql_class.h:3450
3450	    ulonglong end_utime_of_query= current_utime();
(gdb) n
3451	    if (end_utime_of_query > utime_after_lock + variables.long_query_time)
(gdb) n
3452	      server_status|= SERVER_QUERY_WAS_SLOW;
(gdb) n
3453	  }
(gdb) p utime_after_lock
$18 = 1577699611659829
(gdb) p end_utime_of_query
$19 = 1577699635781233
(gdb) p end_utime_of_query > utime_after_lock + variables.long_query_time
$20 = true
(gdb) p end_utime_of_query - utime_after_lock
$21 = 24121404
(gdb) p this->start_utime
$22 = 1577699611659829
(gdb) p this->start_utime - this->utime_after_lock
$23 = 0
(gdb) bt
#0  THD::update_server_status (this=0x555557e93e80) at /root/mysql-server/sql/sql_class.h:3453
#1  0x0000555555b76601 in dispatch_command (command=COM_QUERY, thd=0x555557e93e80, packet=0x555557e97ae1 "select SQL_NO_CACHE count(*) from t3", packet_length=36) at /root/mysql-server/sql/sql_parse.cc:1821
#2  0x0000555555b74138 in do_command (thd=0x555557e93e80) at /root/mysql-server/sql/sql_parse.cc:1064
#3  0x0000555555b398c6 in do_handle_one_connection (thd_arg=0x555557e93e80) at /root/mysql-server/sql/sql_connect.cc:982
#4  0x0000555555b39614 in handle_one_connection (arg=0x555557e93e80) at /root/mysql-server/sql/sql_connect.cc:899
#5  0x00005555561ee149 in pfs_spawn_thread (arg=0x5555575d13f0) at /root/mysql-server/storage/perfschema/pfs.cc:1861
#6  0x00007ffff77816db in start_thread (arg=0x7fffce152700) at pthread_create.c:463
#7  0x00007ffff6b6b88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) 
------------------------
RECORD LOCKS:
------------
Thread 25 "mysqld" hit Breakpoint 2, THD::update_server_status (this=0x555557e937d0) at /root/mysql-server/sql/sql_class.h:3450
3450     ulonglong end_utime_of_query= current_utime();
(gdb) n
3451     if (end_utime_of_query > utime_after_lock + variables.long_query_time)
(gdb) n
3453   }
(gdb) p end_utime_of_query
$2 = 1577788557020643
(gdb) p utime_after_lock
$3 = 1577788554290580
(gdb) p variables.long_query_time
$4 = 10000000
(gdb) p end_utime_of_query > utime_after_lock + variables.long_query_time
$5 = false
(gdb) bt
#0  THD::update_server_status (this=0x555557e937d0) at /root/mysql-server/sql/sql_class.h:3453
#1  0x0000555555b76601 in dispatch_command (command=COM_QUERY, thd=0x555557e937d0, packet=0x555557e97431 "update t3 set t1_id=t1_id+1", packet_length=27)
    at /root/mysql-server/sql/sql_parse.cc:1821
#2  0x0000555555b74138 in do_command (thd=0x555557e937d0) at /root/mysql-server/sql/sql_parse.cc:1064
#3  0x0000555555b398c6 in do_handle_one_connection (thd_arg=0x555557e937d0) at /root/mysql-server/sql/sql_connect.cc:982
#4  0x0000555555b39614 in handle_one_connection (arg=0x555557e937d0) at /root/mysql-server/sql/sql_connect.cc:899
#5  0x00005555561ee149 in pfs_spawn_thread (arg=0x5555575e6070) at /root/mysql-server/storage/perfschema/pfs.cc:1861
#6  0x00007ffff77816db in start_thread (arg=0x7fffce152700) at pthread_create.c:463
#7  0x00007ffff6b6b88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
--------
(gdb) bt
#0  thd_storage_lock_wait (thd=0x555557e937d0, value=51007068) at /root/mysql-server/sql/sql_class.cc:669
#1  0x0000555555ea53ea in thd_set_lock_wait_time (thd=0x555557e937d0, value=51007068) at /root/mysql-server/storage/innobase/handler/ha_innodb.cc:1432
#2  0x0000555555f0fcf2 in lock_wait_suspend_thread (thr=0x7fff8000aff0) at /root/mysql-server/storage/innobase/lock/lock0wait.cc:372
#3  0x0000555555fa3b20 in row_mysql_handle_errors (new_err=0x7fffce14f3f4, trx=0x7fff80007f78, thr=0x7fff8000aff0, savept=0x0)
    at /root/mysql-server/storage/innobase/row/row0mysql.cc:634
#4  0x0000555555fd75b3 in row_search_for_mysql (buf=0x7fff8000a010 "\377", mode=1, prebuilt=0x7fff8000a898, match_mode=0, direction=0)
    at /root/mysql-server/storage/innobase/row/row0sel.cc:5282
#5  0x0000555555eaf82a in ha_innobase::index_read (this=0x7fff80008f50, buf=0x7fff8000a010 "\377", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY)
    at /root/mysql-server/storage/innobase/handler/ha_innodb.cc:7695
#6  0x0000555555eb054d in ha_innobase::index_first (this=0x7fff80008f50, buf=0x7fff8000a010 "\377")
    at /root/mysql-server/storage/innobase/handler/ha_innodb.cc:8063
#7  0x0000555555eb079f in ha_innobase::rnd_next (this=0x7fff80008f50, buf=0x7fff8000a010 "\377")
    at /root/mysql-server/storage/innobase/handler/ha_innodb.cc:8160
#8  0x00005555559cf8dc in handler::ha_rnd_next (this=0x7fff80008f50, buf=0x7fff8000a010 "\377") at /root/mysql-server/sql/handler.cc:2696
#9  0x0000555555d34ee2 in rr_sequential (info=0x7fffce1500f0) at /root/mysql-server/sql/records.cc:480
#10 0x0000555555c13a94 in mysql_update (thd=0x555557e937d0, table_list=0x7fff80004f98, fields=..., values=..., conds=0x0, order_num=0, order=0x0, 
    limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false, found_return=0x7fffce150740, updated_return=0x7fffce1507d0)
    at /root/mysql-server/sql/sql_update.cc:744
#11 0x0000555555b7a04c in mysql_execute_command (thd=0x555557e937d0) at /root/mysql-server/sql/sql_parse.cc:3375
#12 0x0000555555b82842 in mysql_parse (thd=0x555557e937d0, rawbuf=0x7fff80004eb0 "update t3 set t1_id=t1_id+1", length=27, parser_state=0x7fffce1516a0)
    at /root/mysql-server/sql/sql_parse.cc:6422
#13 0x0000555555b751c4 in dispatch_command (command=COM_QUERY, thd=0x555557e937d0, packet=0x555557e97431 "update t3 set t1_id=t1_id+1", packet_length=27)
    at /root/mysql-server/sql/sql_parse.cc:1399
#14 0x0000555555b74138 in do_command (thd=0x555557e937d0) at /root/mysql-server/sql/sql_parse.cc:1064
#15 0x0000555555b398c6 in do_handle_one_connection (thd_arg=0x555557e937d0) at /root/mysql-server/sql/sql_connect.cc:982
#16 0x0000555555b39614 in handle_one_connection (arg=0x555557e937d0) at /root/mysql-server/sql/sql_connect.cc:899
#17 0x00005555561ee149 in pfs_spawn_thread (arg=0x5555575e6070) at /root/mysql-server/storage/perfschema/pfs.cc:1861
#18 0x00007ffff77816db in start_thread (arg=0x7fffce152700) at pthread_create.c:463
#19 0x00007ffff6b6b88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I have taken another scenario where the query was waiting for table metadata lock and then after some time (before lock_wait_timeout exceedes), the lock was acquired. The query executed for more than long_query_time.

In this scenairo, lock_time is getting counted as expected. Checked the stack for the same 

If the query acquires lock before lock_wait_timeout, then utime_after_lock is getting updated with the help of the below function 

void set_time_after_lock() 
{ 
utime_after_lock= my_micro_time(); 
MYSQL_SET_STATEMENT_LOCK_TIME(m_statement_psi, (utime_after_lock - start_utime)); 
} 

The above function is called in mysql_lock_tables. However, when the query gets aborted after lock_wait_timeout or got killed while waiting for MDL, the function mysql_lock_tables itself is not called. Hence, utime_after_lock is not getting updated. 

Stack trace: 
------------ 

(gdb) bt 
#0 THD::set_time_after_lock (this=0x555557e02240) at /root/mysql-server/sql/sql_class.h:3437 
#1 0x0000555555cefb2f in mysql_lock_tables (thd=0x555557e02240, tables=0x7fffa0006088, count=1, flags=0) at /root/mysql-server/sql/lock.cc:341 
#2 0x0000555555b0bfc5 in lock_tables (thd=0x555557e02240, tables=0x7fffa0005550, count=1, flags=0) at /root/mysql-server/sql/sql_base.cc:6005 
#3 0x0000555555baaf86 in mysql_select (thd=0x555557e02240, tables=0x7fffa0005550, wild_num=0, fields=..., conds=0x0, order=0x555557e049a8, 
group=0x555557e048e0, having=0x0, select_options=2147748608, result=0x7fffa0005b68, unit=0x555557e04198, select_lex=0x555557e047e0) 
at /root/mysql-server/sql/sql_select.cc:1202 
#4 0x0000555555ba8cd8 in handle_select (thd=0x555557e02240, result=0x7fffa0005b68, setup_tables_done_option=0) at /root/mysql-server/sql/sql_select.cc:101 
#5 0x0000555555b7fa4a in execute_sqlcom_select (thd=0x555557e02240, all_tables=0x7fffa0005550) at /root/mysql-server/sql/sql_parse.cc:5153 
#6 0x0000555555b783cb in mysql_execute_command (thd=0x555557e02240) at /root/mysql-server/sql/sql_parse.cc:2711 
#7 0x0000555555b82842 in mysql_parse (thd=0x555557e02240, rawbuf=0x7fffa0005270 "select SQL_NO_CACHE count(*) from t3", length=36, 
parser_state=0x7fffce1936a0) at /root/mysql-server/sql/sql_parse.cc:6422 
#8 0x0000555555b751c4 in dispatch_command (command=COM_QUERY, thd=0x555557e02240, packet=0x555557e8f461 "select SQL_NO_CACHE count(*) from t3", 
packet_length=36) at /root/mysql-server/sql/sql_parse.cc:1399 
#9 0x0000555555b74138 in do_command (thd=0x555557e02240) at /root/mysql-server/sql/sql_parse.cc:1064 
#10 0x0000555555b398c6 in do_handle_one_connection (thd_arg=0x555557e02240) at /root/mysql-server/sql/sql_connect.cc:982 
#11 0x0000555555b39614 in handle_one_connection (arg=0x555557e02240) at /root/mysql-server/sql/sql_connect.cc:899 
#12 0x00005555561ee149 in pfs_spawn_thread (arg=0x5555575c6e10) at /root/mysql-server/storage/perfschema/pfs.cc:1861 
#13 0x00007ffff77816db in start_thread (arg=0x7fffce194700) at pthread_create.c:463 
#14 0x00007ffff6b6b88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Suggested fix:
May be we can call set_time_after_lock() when the query gets aborted while waiting for MDL so that utime_after_lock gets updated and accordingly the query will not be considered to get logged in slow query log.

Hello Pranay,

Thank you for the report and test case.

regards,
Umesh

Posted by developer:
 
This issue can no longer be repeated and is considered fixed by the
patch for Bug#33236909 ("Lock_time computed differently by slow log
and performance_schema").