Bug #22047 Time in SHOW PROCESSLIST for SQL thread in replication seems to become negative
Submitted: 6 Sep 2006 11:09 Modified: 14 Apr 18:21
Reporter: Oli Sennhauser
Status: Closed
Category:Server Severity:S3 (Non-critical)
Version:4.1.20, 5.0 BK OS:Linux (linux)
Assigned to: Georgi Kodinov Target Version:5.0+
Triage: Triaged: D4 (Minor)

[6 Sep 2006 11:09] Oli Sennhauser
Description:
Time in SHOW PROCESSLIST for SQL thread in replication seems to become negative

How to repeat:
mysql> show processlist;

+--------+-------------+---------------------------+--------+---------+------------+-----------------------------------------------------------------------+------------------+
| Id     | User        | Host                      | db     | Command | Time       |
State                                                                 | Info            
|
+--------+-------------+---------------------------+--------+---------+------------+-----------------------------------------------------------------------+------------------+
|      2 | system user |                           | NULL   | Connect |     723995 |
Waiting for master to send event                                      | NULL            
|
|      3 | system user |                           | NULL   | Connect | 4294967184 | Has
read all relay log; waiting for the slave I/O thread to update it | NULL             |
...

This looks like HEX FFFFFF90 or DEC -112

mysql> show status like 'uptime%';
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| Uptime        | 723790 |
+---------------+--------+

Suggested fix:
no idea

Seems similar to Bug #5676: http://bugs.mysql.com/bug.php?id=5676

We have simple master slave replication with high load. The only system which is NOT
lagging ist the one with the symptoms described in this bug!!! All other masters were
lagging some seconds (approx. 70)
[7 Sep 2006 8:32] Sveta Smirnova
Hi Oli,

our manual at http://dev.mysql.com/doc/refman/5.0/en/show-processlist.html says
"#Time

The time in seconds between the start of the statement or command and now."

What is the correlation between uptime and time field in output of SHOW PROCESSLIST?
[7 Sep 2006 10:05] Oli Sennhauser
Hi Sveta

I just wanted to show that the time cannot be more than uptime 723790 s because system is
not up more than this amount of seconds... So I guess that no query can run longer than
uptime. We have a problem in this area (replication und high load lags and does not
recover by itself) and I hope it is somehow related to this...

Regards Oli
[7 Sep 2006 10:53] Sveta Smirnova
Oli, do you issue SHOW PROCESSLIST on the slave?

I so, did you read part of the
http://dev.mysql.com/doc/refman/5.0/en/replication-faq.html Jorge pointed:

----<start quote>----
 A: You can read the Seconds_Behind_Master column in SHOW SLAVE STATUS. See Section 6.3,
“Replication Implementation Details”.

When the slave SQL thread executes an event read from the master, it modifies its own
time to the event timestamp. (This is why TIMESTAMP is well replicated.) In the Time
column in the output of SHOW PROCESSLIST, the number of seconds displayed for the slave
SQL thread is the number of seconds between the timestamp of the last replicated event
and the real time of the slave machine. You can use this to determine the date of the
last replicated event. Note that if your slave has been disconnected from the master for
one hour, and then reconnects, you may immediately see Time values like 3600 for the
slave SQL thread in SHOW PROCESSLIST. This is because the slave is executing statements
that are one hour old. 
----<end quote>----

Your case is other? Which value of Seconds_Behind_Master?
[11 Sep 2006 9:13] Oli Sennhauser
Hi Sveta

As you can see from the posted output, the SQL thread is 4 Mia seconds behind the master.
This cannot be because the system is running only 724k seconds:

|      2 | system user |                           | NULL   | Connect |    
723995 | Waiting for master to send event                                      |
NULL             |
|      3 | system user |                           | NULL   | Connect |
4294967184 | Has read all relay log; waiting for the slave I/O thread to update
it | NULL             |

Because 4.2 Mia is very close to the max of an int (4 bytes). I assume that we have/had
some troubles with a int overflow or negative values

4294967184 looks like HEX FFFFFF90 or DEC -112

In the article you mentioned it is written, that the SQL thread can have a lag of some
seconds (e.g. 3600) but no NEGATIVE lag (this would mean that the Slave is IN FRONT of
the master which technically is not possible.

Regards Oli
[13 Sep 2006 8:54] Sveta Smirnova
Slave and master are running on the same operation system?
[13 Sep 2006 10:46] Oli Sennhauser
Yes, 64-bit RHEL
[13 Sep 2006 11:16] Sveta Smirnova
Verified as described: "Waiting for master to send event" > uptime on Windows (master) and
Solaris (slave). System clocks on these 2 computers are unsynchronized.

To repeat start replication before midnight and issue show processlist;, show status like
'uptime%'; statements after midnight. To see the problem you should wait about 3-4 hours.
[13 Sep 2006 11:20] Oli Sennhauser
I will check at Cust. if they have ntp or similar...
[13 Sep 2006 13:24] Oli Sennhauser
Customer replied: "All servers are synchronized using ntp every 5 minutes."
[4 Jul 2007 11:17] Sveta Smirnova
Bug #29538 was marked as duplicate of this one.
[6 Mar 16:31] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68517

2829 Georgi Kodinov	2009-03-06
      Bug #22047: Time in SHOW PROCESSLIST for SQL thread in replication seems to 
      become negative
            
      THD::start_time has a dual meaning : it's either the time since the process
      entered a given state or is the transaction time returned by e.g. NOW().
      This causes problems, as sometimes THD::start_time may be set to a value
      that is correct and needed when used as a base for NOW(), but these times
      may be arbitrary (SET @@timestamp) or non-local (coming from the master 
      through the replication feed).
      If one such non-local time is set there's no way to return a correct value
      for e.g. SHOW PROCESSLIST or SELECT ... FROM INFORMATION_SCHEMA.PROCESSLIST.
      Fixed by introducing a local equivalent of start_time (local_start_time) 
      and using the new member exclusively for timing and reporting execution 
      intervals.
            
      Note that no reliable test suite can be constructed, since it would require
      knowing the local time and can't be achieved by the means of the current test
      suite.
     @ sql/sql_class.cc
        Bug #22047: introduced the local time for measuring time intervals
     @ sql/sql_class.h
        Bug #22047: introduced the local time for measuring time intervals
     @ sql/sql_insert.cc
        Bug #22047: set local time for the delayed insert thread
     @ sql/sql_show.cc
        Bug #22047: use local time in I_S.PROCESSLIST
[12 Mar 15:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/69041

3116 Georgi Kodinov	2009-03-12
      Bug #22047: Time in SHOW PROCESSLIST for SQL thread in replication seems to 
            become negative
      
      THD::start_time has a dual meaning : it's either the time since the process
      entered a given state or is the transaction time returned by e.g. NOW().
      This causes problems, as sometimes THD::start_time may be set to a value
      that is correct and needed when used as a base for NOW(), but these times
      may be arbitrary (SET @@timestamp) or non-local (coming from the master 
      through the replication feed).
      If one such non-local time is set there's no way to return a correct value
      for e.g. SHOW PROCESSLIST or SELECT ... FROM INFORMATION_SCHEMA.PROCESSLIST.
      Fixed by making the Time column in SHOW PROCESSLIST SIGNED LONG instead of 
      UNSIGNED LONG and doing the correct conversions.
      
      Note that no reliable test suite can be constructed, since it would require
      knowing the local time and can't be achieved by the means of the current test
      suite.
     @ sql/sql_show.cc
        Bug #22047: make the Time column SIGNED LONG from UNSIGNED LONG
[13 Mar 14:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/69145

2778 Georgi Kodinov	2009-03-13
      Bug #22047 : Time in SHOW PROCESSLIST for SQL thread in replication 
      seems to become negative
      
      THD::start_time has a dual meaning : it's either the time since the process
      entered a given state or is the transaction time returned by e.g. NOW().
      This causes problems, as sometimes THD::start_time may be set to a value
      that is correct and needed when used as a base for NOW(), but these times
      may be arbitrary (SET @@timestamp) or non-local (coming from the master 
      through the replication feed).
      If one such non-local time is set there's no way to return a correct value
      for e.g. SHOW PROCESSLIST or SELECT ... FROM INFORMATION_SCHEMA.PROCESSLIST.
      Fixed by making the Time column in SHOW PROCESSLIST SIGNED LONG instead of 
      UNSIGNED LONG and doing the correct conversions.
            
      Note that no reliable test suite can be constructed, since it would require
      knowing the local time and can't be achieved by the means of the current test
      suite.
     @ sql/sql_show.cc
        Bug #22047: make the Time in SHOW PROCESSLIST LONG from 
        LONG UNSIGNED
[13 Mar 14:53] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/69146

2774 Georgi Kodinov	2009-03-13
      Bug #22047 : Time in SHOW PROCESSLIST for SQL thread in replication 
      seems to become negative
      
      THD::start_time has a dual meaning : it's either the time since the process
      entered a given state or is the transaction time returned by e.g. NOW().
      This causes problems, as sometimes THD::start_time may be set to a value
      that is correct and needed when used as a base for NOW(), but these times
      may be arbitrary (SET @@timestamp) or non-local (coming from the master 
      through the replication feed).
      If one such non-local time is set there's no way to return a correct value
      for e.g. SHOW PROCESSLIST or SELECT ... FROM INFORMATION_SCHEMA.PROCESSLIST.
      Fixed by making the Time column in SHOW PROCESSLIST SIGNED LONG instead of 
      UNSIGNED LONG and doing the correct conversions.
            
      Note that no reliable test suite can be constructed, since it would require
      knowing the local time and can't be achieved by the means of the current test
      suite.
     @ sql/sql_show.cc
        Bug #22047: make the Time in SHOW PROCESSLIST LONG from 
        LONG UNSIGNED
[13 Mar 15:27] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/69159

2839 Georgi Kodinov	2009-03-13 [merge]
      Bug #22047: Time in SHOW PROCESSLIST for SQL thread in replication seems to 
      become negative
      
      - merged the fix to 5.1
      - extended to cover I_S.PROCESSLIST.TIME
      - Changed the column type of I_S.PROCESSLIST.TIME from LOGNLONG UNSIGNED
        to LONG (to match the SHOW PROCESSLIST type)
      - Added a test case
[27 Mar 15:32] Bugs System
Pushed into 5.0.80 (revid:joro@sun.com-20090327142516-55gumdxj39z6eijj) (version source
revid:leonard@mysql.com-20090324072904-7w2lxdxzw8hx1rnm) (merge vers: 5.0.80) (pib:6)
[27 Mar 15:57] Bugs System
Pushed into 5.1.34 (revid:joro@sun.com-20090327143448-wuuuycetc562ty6o) (version source
revid:leonard@mysql.com-20090316090622-sr8lylqvsl1jrcnv) (merge vers: 5.1.34) (pib:6)
[13 Apr 11:21] Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090413084402-snnrocwzktcl88ny) (version
source revid:joro@sun.com-20090317155017-yldq76zghpxoln3p) (merge vers: 6.0.11-alpha)
(pib:6)
[14 Apr 18:21] Paul DuBois
Noted in 5.0.80, 5.1.34, 6.0.11 changelogs.

The Time column for SHOW PROCESSLIST output and the value of the TIME
column of the INFORMATION_SCHEMA.PROCESSLIST table now can have
negative values. Previously, the column was unsigned and negative
values were displayed incorrectly as large positive values. Negative
values can occur if a thread alters the time into the future with SET
TIMESTAMP = value or the thread is executing on a slave and
processing events from a master that has its clock set ahead of the
slave.
[9 May 18:44] Bugs System
Pushed into 5.1.34-ndb-6.2.18 (revid:jonas@mysql.com-20090508185236-p9b3as7qyauybefl)
(version source revid:jonas@mysql.com-20090508185236-p9b3as7qyauybefl) (merge vers:
5.1.34-ndb-6.2.18) (pib:6)
[9 May 19:41] Bugs System
Pushed into 5.1.34-ndb-6.3.25 (revid:jonas@mysql.com-20090509063138-1u3q3v09wnn2txyt)
(version source revid:jonas@mysql.com-20090509063138-1u3q3v09wnn2txyt) (merge vers:
5.1.34-ndb-6.3.25) (pib:6)
[9 May 20:39] Bugs System
Pushed into 5.1.34-ndb-7.0.6 (revid:jonas@mysql.com-20090509154927-im9a7g846c6u1hzc)
(version source revid:jonas@mysql.com-20090509154927-im9a7g846c6u1hzc) (merge vers:
5.1.34-ndb-7.0.6) (pib:6)