Bug #79822 Large values in Time column of SHOW PROCESSLIST
Submitted: 2 Jan 2016 2:26 Modified: 2 Jan 2016 4:40
Reporter: Roel Van de Paar Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.7.10 OS:Any
Assigned to: CPU Architecture:Any

[2 Jan 2016 2:26] Roel Van de Paar
Description:
Doing any serious QA run, one can see the following outputs in SHOW PROCESSLIST;

+----+-----------------+-----------+------+---------+------------+-----------------------------+------------------------------------------------------------------------------------------------------+
| Id | User            | Host      | db   | Command | Time       | State                       | Info                                                                                                 |
+----+-----------------+-----------+------+---------+------------+-----------------------------+------------------------------------------------------------------------------------------------------+
|  6 | root            | localhost | test | Query   | 1451698799 | executing                   | SELECT t.table_name, c1.column_name FROM information_schema.tables t INNER JOIN information_schema.c |
|  8 | event_scheduler | localhost | NULL | Daemon  |         12 | Waiting for next activation | NULL                                                                                                 |
| 16 | root            | localhost | NULL | Query   |          0 | starting                    | SHOW PROCESSLIST                                                                                     |
+----+-----------------+-----------+------+---------+------------+-----------------------------+------------------------------------------------------------------------------------------------------+

+----+-----------------+-----------+------+---------+------------+------------------------+---------------------------------------------------------------------------------+
| Id | User            | Host      | db   | Command | Time       | State                  | Info                                                                            |
+----+-----------------+-----------+------+---------+------------+------------------------+---------------------------------------------------------------------------------+
|  7 | root            | localhost | test | Query   | 1118400196 | closing tables         | INSERT INTO t1  (col1) VALUES(STR_TO_DATE('15.13.2004 15.30','%d.%m.%Y %H.%i')) |
| 10 | event_scheduler | localhost | NULL | Daemon  |          0 | Waiting on empty queue | NULL                                                                            |
| 11 | root            | localhost | NULL | Query   |          0 | starting               | SHOW PROCESSLIST                                                                |
+----+-----------------+-----------+------+---------+------------+------------------------+---------------------------------------------------------------------------------+

+----+-----------------+-----------+------+---------+-----------+------------------------+------------------+
| Id | User            | Host      | db   | Command | Time      | State                  | Info             |
+----+-----------------+-----------+------+---------+-----------+------------------------+------------------+
|  7 | root            | localhost | test | Sleep   | 218483511 | cleaning up            | NULL             |
| 10 | event_scheduler | localhost | NULL | Daemon  |         2 | Waiting on empty queue | NULL             |
| 13 | root            | localhost | NULL | Query   |         0 | starting               | SHOW PROCESSLIST |
+----+-----------------+-----------+------+---------+-----------+------------------------+------------------+

+----+-----------------+-----------+------+---------+------------+------------------------+------------------+
| Id | User            | Host      | db   | Command | Time       | State                  | Info             |
+----+-----------------+-----------+------+---------+------------+------------------------+------------------+
|  7 | root            | localhost | test | Sleep   | 1451698806 | cleaning up            | NULL             |
| 11 | event_scheduler | localhost | NULL | Daemon  |          0 | Waiting on empty queue | NULL             |
| 12 | root            | localhost | NULL | Query   |          0 | System lock            | SHOW PROCESSLIST |
+----+-----------------+-----------+------+---------+------------+------------------------+------------------+

+----+-----------------+-----------+------+---------+-----------+------------------------+--------------------------------------+
| Id | User            | Host      | db   | Command | Time      | State                  | Info                                 |
+----+-----------------+-----------+------+---------+-----------+------------------------+--------------------------------------+
|  1 | event_scheduler | localhost | NULL | Daemon  |         0 | Waiting on empty queue | NULL                                 |
|  7 | root            | localhost | test | Query   | 315666426 | starting               | SET @@global.default_week_format = 5 |
|  9 | root            | localhost | NULL | Query   |         0 | starting               | SHOW PROCESSLIST                     |
+----+-----------------+-----------+------+---------+-----------+------------------------+--------------------------------------+

+----+-----------------+-----------+------+---------+-----------+-----------------------------+----------------------------------------------------------------------------+
| Id | User            | Host      | db   | Command | Time      | State                       | Info                                                                       |
+----+-----------------+-----------+------+---------+-----------+-----------------------------+----------------------------------------------------------------------------+
|  1 | event_scheduler | localhost | NULL | Daemon  |         1 | Waiting for next activation | NULL                                                                       |
|  7 | root            | localhost | test | Query   | 411376332 | statistics                  | SELECT MIN( a ) AS min_a FROM t1  WHERE a > 1 AND a IS NULL ORDER BY min_a |
|  9 | root            | localhost | NULL | Query   |         0 | starting                    | SHOW PROCESSLIST                                                           |
+----+-----------------+-----------+------+---------+-----------+-----------------------------+----------------------------------------------------------------------------+

+----+-----------------+-----------+------+---------+------------+------------------------+------------------------------------------------------------------------------------------------------+
| Id | User            | Host      | db   | Command | Time       | State                  | Info                                                                                                 |
+----+-----------------+-----------+------+---------+------------+------------------------+------------------------------------------------------------------------------------------------------+
|  7 | root            | localhost | test | Query   | 1451700044 | init                   | INSERT INTO t VALUES (3066160928799469981,-557922679,'Qyrmg2HxSi6EpKZvCe9sxm7fkrJx9wvpuz9','NDXHAYyO |
| 14 | event_scheduler | localhost | NULL | Daemon  |          2 | Waiting on empty queue | NULL                                                                                                 |
| 16 | root            | localhost | NULL | Query   |          0 | starting               | SHOW PROCESSLIST                                                                                     |
+----+-----------------+-----------+------+---------+------------+------------------------+------------------------------------------------------------------------------------------------------+

The issue being an excessively large value in Time. This is seen for all queries, all thread states.

How to repeat:
1) Setup a large QA run (pquery or RQG)
2) Use something like

DIR=/dev/shm/202355; while(true); do /sdc/mysql-5.7.10/bin/mysql -uroot -S${DIR}/$(ls ${DIR})/socket.sock -e"SHOW PROCESSLIST; sleep 1; done

3) Monitor output

Alternatively, and likely better, in 5.7.10, edit sql/sql_show.cc and around line 2324, add the following assertions:

=============================
[roel@localhost mysql-server_TIMEHACK]$ git diff
diff --git a/sql/sql_show.cc b/sql/sql_show.cc
index 5711250..03356fb 100644
--- a/sql/sql_show.cc
+++ b/sql/sql_show.cc
@@ -2320,9 +2320,13 @@ void mysqld_list_processes(THD *thd,const char *user, bool verbose)
       protocol->store(thd_info->proc_info, system_charset_info);
     else
       protocol->store(command_name[thd_info->command].str, system_charset_info);
-    if (thd_info->start_time)
+    if (thd_info->start_time){
       protocol->store_long ((longlong) (now - thd_info->start_time));
-    else
+
+      DBUG_ASSERT(thd_info->start_time>0);
+      DBUG_ASSERT(now>thd_info->start_time);
+      DBUG_ASSERT((now-thd_info->start_time)<1000000);
+    }else
       protocol->store_null();
     protocol->store(thd_info->state_info, system_charset_info);
     protocol->store(thd_info->query_string.str(),
=============================

Suggested fix:
Besides fixing this bug, please consider adding one or more DBUG_ASSERT's to catch these in the future before they happen.
[2 Jan 2016 2:26] Roel Van de Paar
Ref 
- https://bugs.launchpad.net/percona-server/+bug/1379582
- http://bugs.mysql.com/bug.php?id=74416
[2 Jan 2016 2:26] Roel Van de Paar
Example with negative time;

+----+-----------------+-----------+------+---------+-----------+-----------------------------+------------------+
| Id | User            | Host      | db   | Command | Time      | State                       | Info             |
+----+-----------------+-----------+------+---------+-----------+-----------------------------+------------------+
|  7 | root            | localhost | test | Sleep   | -10000000 | cleaning up                 | NULL             |
| 15 | event_scheduler | localhost | NULL | Daemon  |         1 | Waiting for next activation | NULL             |
| 17 | root            | localhost | NULL | Query   |         0 | starting                    | SHOW PROCESSLIST |
+----+-----------------+-----------+------+---------+-----------+-----------------------------+------------------+

This is a serious output bug.
[2 Jan 2016 2:29] Roel Van de Paar
Using standard pquery-run.sh and asserts added above, it's easy to reproduce;

[roel@localhost 051115]$ ~/percona-qa/pquery-results.sh
================ Sorted unique issue strings (61 trials executed, 12 remaining reducer scripts)
.now-thd_info->start_time                              (Seen  10 times: reducers 10,16,18,19,34,40,43,50,55,59)
now>thd_info->start_time                               (Seen   2 times: reducers 30,46)
[2 Jan 2016 2:41] Roel Van de Paar
Just for a bit of humor for those who get it;

Knowing the issue to be timing/race related, I was very surprised to see that one of these crashes started reproducing & reducing well using reducer.sh.

Picturing it to be something related to setting time, I was anxiously looking forward to see the testcase...

[roel@localhost 051115]$ cat /sda/051115/30/pquery_thread-0.sql_out
DROP DATABASE test;
SHOW PROCESSLIST;

:)
[2 Jan 2016 3:12] Roel Van de Paar
DBUG_ASSERT(now>thd_info->start_time);

Can be slightly improved to

DBUG_ASSERT(now>=thd_info->start_time);

To avoid quick-processing queries like the one from the last comment :)
[2 Jan 2016 3:16] Roel Van de Paar
Same for >0:

DBUG_ASSERT(thd_info->start_time>=0);
DBUG_ASSERT(now>=thd_info->start_time);
DBUG_ASSERT((now-thd_info->start_time)<1000000);
[2 Jan 2016 4:06] MySQL Verification Team
Check and make sure you're not doing something like this:

mysql> show processlist;
+----+------+-----------+------+---------+------+----------+------------------+
| Id | User | Host      | db   | Command | Time | State    | Info             |
+----+------+-----------+------+---------+------+----------+------------------+
|  6 | root | localhost | test | Query   |    0 | starting | show processlist |
+----+------+-----------+------+---------+------+----------+------------------+
1 row in set (0.00 sec)

mysql> set timestamp=unix_timestamp(now()-interval 1451698799 second);
Query OK, 0 rows affected (0.00 sec)

mysql> show processlist;
+----+------+-----------+------+---------+------------+----------+------------------+
| Id | User | Host      | db   | Command | Time       | State    | Info             |
+----+------+-----------+------+---------+------------+----------+------------------+
|  6 | root | localhost | test | Query   | 1451698800 | starting | show processlist |
+----+------+-----------+------+---------+------------+----------+------------------+
1 row in set (0.00 sec)
[2 Jan 2016 4:07] MySQL Verification Team
Make sure whatever reducer does,  that it starts with a new connection and/or new server instance, so old session variables don't influence previous run ;)
[2 Jan 2016 4:12] Roel Van de Paar
Reducer starts 100% fresh for each trial.
[2 Jan 2016 4:38] Roel Van de Paar
Yep, you're right Shane. It's that TIMESTAMP thing again. Not a bug on MS.
[2 Jan 2016 4:39] Roel Van de Paar
Reducer halted because of the > which if it had been >= (=0) would not have failed (quick-executing commands)
[2 Jan 2016 4:39] Roel Van de Paar
(The > assert that was - i.e. no bug in reducer)
[2 Jan 2016 4:40] Roel Van de Paar
Thank you @ Shane
[2 Jan 2016 5:04] MySQL Verification Team
remember we still have this verified:
https://bugs.mysql.com/bug.php?id=73999