Bug #57600 output of I/O sum[%lu] can go negative
Submitted: 20 Oct 2010 12:04 Modified: 13 Jan 2011 19:15
Reporter: Shane Bester (Platinum Quality Contributor) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB Plugin storage engine Severity:S3 (Non-critical)
Version:5.1.49, 5.1.54 bzr OS:Any
Assigned to: Jimmy Yang CPU Architecture:Any

[20 Oct 2010 12:04] Shane Bester
Description:
got this in innodb status output:

Pages read ahead 0.00/s, evicted without access 0.00/s
LRU len: 2187061, unzip_LRU len: 0
I/O sum[18446744073709551603]:cur[0], unzip sum[0]:cur[0]

fprintf(file,
		"LRU len: %lu, unzip_LRU len: %lu\n"
		"I/O sum[%lu]:cur[%lu], unzip sum[%lu]:cur[%lu]\n",
		UT_LIST_GET_LEN(buf_pool->LRU),
		UT_LIST_GET_LEN(buf_pool->unzip_LRU),
		buf_LRU_stat_sum.io, buf_LRU_stat_cur.io,
		buf_LRU_stat_sum.unzip, buf_LRU_stat_cur.unzip);

So somewhere buf_LRU_stat_sum.io got a wrong value.
Only place that updates this value is the function:
buf_LRU_stat_update(void)

/* Add the current value and subtract the obsolete entry. */
	buf_LRU_stat_sum.io += buf_LRU_stat_cur.io - item->io;
	buf_LRU_stat_sum.unzip += buf_LRU_stat_cur.unzip - item->unzip;

How to repeat:
no idea yet.
[13 Nov 2010 11:31] Sveta Smirnova
Thank you for the report.

Verified as described.

To repeat modify sources as follow:

=== modified file 'storage/innodb_plugin/buf/buf0lru.c'
--- storage/innodb_plugin/buf/buf0lru.c 2010-05-20 13:07:34 +0000
+++ storage/innodb_plugin/buf/buf0lru.c 2010-11-12 21:05:31 +0000
@@ -1957,6 +1957,7 @@
 
        /* Add the current value and subtract the obsolete entry. */
        buf_LRU_stat_sum.io += buf_LRU_stat_cur.io - item->io;
+fprintf(stderr, "I/O sum[%lu], buf_LRU_stat_cur.io[%lu], item->io[%lu]\n", buf_LRU_stat_sum.io, buf_LRU_stat_cur.io, item->io);
        buf_LRU_stat_sum.unzip += buf_LRU_stat_cur.unzip - item->unzip;
 
        /* Put current entry in the array. */

Then start mysqld from mysql-test directory as:

./mtr innodb_plugin.innodb_bug39438  --mysqld=--innodb_flush_log_at_trx_commit=0 --start &

Then run sysbench:

$../client/mysql -uroot -S ./var/tmp/mysqld.1.sock
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.1.54-debug-log Source distribution

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL v2 license

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database sbtest;
Query OK, 1 row affected (0.02 sec)

mysql> \q
Bye

$sysbench --test=oltp --mysql-table-engine=innodb --oltp-table-size=1000000 --mysql-socket=./var/tmp/mysqld.1.sock --mysql-user=root prepare

$sysbench --test=oltp --mysql-table-engine=innodb --oltp-table-size=1000000 --mysql-socket=./var/tmp/mysqld.1.sock --mysql-user=root --num-threads=16 --max-requests=100000  run

Wait when test finishes.

Then see in the error log:

I/O sum[938], buf_LRU_stat_cur.io[0], item->io[28]
I/O sum[918], buf_LRU_stat_cur.io[0], item->io[20]
I/O sum[892], buf_LRU_stat_cur.io[0], item->io[26]
I/O sum[870], buf_LRU_stat_cur.io[0], item->io[22]
I/O sum[851], buf_LRU_stat_cur.io[0], item->io[19]
I/O sum[835], buf_LRU_stat_cur.io[0], item->io[16]
I/O sum[691], buf_LRU_stat_cur.io[0], item->io[144]
I/O sum[555], buf_LRU_stat_cur.io[0], item->io[136]
I/O sum[18446744073709551520], buf_LRU_stat_cur.io[0], item->io[651]
I/O sum[18446744073709551520], buf_LRU_stat_cur.io[0], item->io[0]
I/O sum[18446744073709551520], buf_LRU_stat_cur.io[0], item->io[0]
I/O sum[18446744073709551520], buf_LRU_stat_cur.io[0], item->io[0]
I/O sum[18446744073709551520], buf_LRU_stat_cur.io[0], item->io[0]
I/O sum[18446744073709551520], buf_LRU_stat_cur.io[0], item->io[0]
I/O sum[18446744073709551520], buf_LRU_stat_cur.io[0], item->io[0]
...

Connect via mysql command line client and see:

mysql> show engine innodb status \G
*************************** 1. row ***************************
  Type: InnoDB
  Name: 
Status: 
=====================================
101113 14:30:37 INNODB MONITOR OUTPUT
=====================================
...
Pages read ahead 0.00/s, evicted without access 0.00/s
LRU len: 7291, unzip_LRU len: 0
I/O sum[18446744073709551520]:cur[0], unzip sum[0]:cur[0]
[13 Nov 2010 11:34] Sveta Smirnova
High write concurrency seems to be required to repeat the problem.
[13 Nov 2010 11:39] Sveta Smirnova
From error log:

  28 I/O sum[1084], buf_LRU_stat_cur.io[651], item->io[0]
...
 78 I/O sum[6105], buf_LRU_stat_cur.io[0], item->io[651]
  79 I/O sum[6104], buf_LRU_stat_cur.io[0], item->io[1]
...
2050 I/O sum[2975], buf_LRU_stat_cur.io[651], item->io[25]
...
2100 I/O sum[18446744073709551520], buf_LRU_stat_cur.io[0], item->io[651]
[3 Dec 2010 2:42] Jimmy Yang
Let's look at buf_LRU_stat_cur:

/** Current operation counters.  Not protected by any mutex.
Cleared by buf_LRU_stat_update(). */
extern buf_LRU_stat_t   buf_LRU_stat_cur;

Since it is not protected by any mutex, so it can changing while we do buf_LRU_stat_update()

buf_LRU_stat_update()
{
     ...
     buf_LRU_stat_sum.io += buf_LRU_stat_cur.io - item->io; <===

     ....
     memcpy(item, &buf_LRU_stat_cur, sizeof *item);<== here 
                                                   <== buf_LRU_stat_cur.io
                                                   <== could be a larger value
     ...
}

So when we copy buf_LRU_stat_cur to item, it could change to a larger value than the one when we add to buf_LRU_stat_sum.io. Thus result we have larger total value in item (buf_LRU_stat_arr[]) than total value in buf_LRU_stat_sum.
[3 Dec 2010 2:56] Jimmy Yang
Above scenario can be confirmed using following step:

1) Break in buf_LRU_stat_update right after following line

        /* Add the current value and subtract the obsolete entry. */
        buf_LRU_stat_sum.io += buf_LRU_stat_cur.io - item->io; <==

2) print buf_LRU_stat_cur.io, item->io and buf_LRU_stat_cur.io
(gdb) p buf_LRU_stat_cur.io
$13 = 32
(gdb) p item->io
$14 = 0
(gdb) p buf_LRU_stat_sum.io
$15 = 32

3) before this proceeds, let the thread loop

4) Do insertion in another session

5) Back to session 1, break the loop, you will find buf_LRU_stat_cur becomes a much large value:

(gdb)  p  buf_LRU_stat_cur.io
$18 = 1277

so it copies this much larger value to item(buf_LRU_stat_arr) than it added to buf_LRU_stat_sum.io. This will result a negative value when we loop back to fetch this item (buf_LRU_stat_arr) and subtract it.
[22 Dec 2010 21:31] Bugs System
Pushed into mysql-trunk 5.6.1 (revid:alexander.nozdrin@oracle.com-20101222212842-y0t3ibtd32wd9qaw) (version source revid:alexander.nozdrin@oracle.com-20101222212842-y0t3ibtd32wd9qaw) (merge vers: 5.6.1) (pib:24)
[8 Jan 2011 15:09] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:vasil.dimov@oracle.com-20110108150508-gpanhz48z8069qot) (version source revid:vasil.dimov@oracle.com-20110108150048-b1y9m8xe72hay0ch) (merge vers: 5.5.9) (pib:24)
[8 Jan 2011 15:12] Bugs System
Pushed into mysql-5.1 5.1.55 (revid:vasil.dimov@oracle.com-20110108145923-0tjqdpa5w3d6tuwn) (version source revid:vasil.dimov@oracle.com-20110108145923-0tjqdpa5w3d6tuwn) (merge vers: 5.1.55) (pib:24)
[13 Jan 2011 19:15] John Russell
Adding to change log:

In InnoDB status output, the value for I/O sum[] could be incorrect,
displayed as a very large number.