Bug #15935 update query: searching for rows to update: never-ending query
Submitted: 22 Dec 2005 11:59 Modified: 26 Jan 2006 0:33
Reporter: Xing Li Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.1.16 OS:Any (*)
Assigned to: Sergey Petrunya CPU Architecture:Any

[22 Dec 2005 11:59] Xing Li
Description:
Upgraded two servers, one Centos 4.2 and one Ubuntu Breezy and after both were upgraded from  4.1.15 to 4.1.16, I have started to experience an infinite loop like problem, mysql taking cpu with an unstoppable query causing load to spike to few hundreds.

Each time this happened was with a query like:

update mytable set sequence = '1' where primarykey = '12345' and sequence = '2'

The query status shows it is "searching for rows to update" which appears strange since the first argument is an primary key so it should only touch the index to find the rows.

The query runs forever, didn't wait forever but you get the point. The table is about 6 million rows and 22GB. Either the query went into infinite loop or the above update query for somereason went into full-file scan instead of using index.

Please note the actual table structure and real queries are in the private comment area (dev-only) for this bug note.

How to repeat:
Leave server on with 4.1.6 for a few hours.
[22 Dec 2005 23:46] Chris Clarke
We've run into the same issue on our 4.1.16 table.

When updating a column that is contained in a key, the query sits in the "Searching rows for update" state for an inordinate amount of time.
[23 Dec 2005 0:23] Chris Clarke
For us, rolling back to 4.1.15 appears to have resolved this issue temporarily.
[28 Dec 2005 0:43] Xing Li
Any update on this bug? Like Chris, I had to roll back to 4.1.15 as well.
[28 Dec 2005 8:29] Valeriy Kravchuk
Xing Li:

Please, send the results of the following statements:

select * from storytexts where storytextid = '12345' and storyid =
'1234' and chapter != '2' limit 1;

select * from storytexts where storytextid = '12345' and storyid =
'1234' limit 1;

show index from storytexts;

(I sent this question as private comment and you may not get it. Sorry.)

Chris Clarke:

Can you, please, provide a simple set of steps to repeat the problem (table definition, table data and other SQL statements that demonstrates it)?
[28 Dec 2005 17:37] Chris Elsworth
I'd like to chip in with a "me too" since upgrading to 4.1.16
My case is a bit simpler, the query (from replication) is:
UPDATE News SET PostID = 1608665 WHERE PostID IN (1608140)

However the numbers do vary, I'm seeing this repeated quite a lot.

mysql> show create table News\G
*************************** 1. row ***************************
       Table: News
Create Table: CREATE TABLE `News` (
  `FileID` int(8) unsigned NOT NULL auto_increment,
  `PostID` mediumint(8) unsigned NOT NULL default '0',
  `Subject` varchar(255) NOT NULL default '',
  `Category` tinyint(3) unsigned NOT NULL default '0',
  `SubCategory` smallint(5) unsigned NOT NULL default '0',
  `Date` int(10) unsigned NOT NULL default '0',
  `Parts` smallint(5) unsigned NOT NULL default '0',
  `Bytes` int(10) unsigned NOT NULL default '0',
  `fi_junk` int(10) unsigned NOT NULL default '0',
  `AuthorID` mediumint(8) unsigned NOT NULL default '0',
  `isNFO` tinyint(3) unsigned NOT NULL default '0',
  `SegFirstTime` int(10) unsigned NOT NULL default '0',
  `SegLastTime` int(10) unsigned NOT NULL default '0',
  `Segments` smallint(5) unsigned NOT NULL default '0',
  `SegmentsSeen` smallint(5) unsigned NOT NULL default '0',
  `GroupsHash` mediumint(8) unsigned NOT NULL default '0',
  PRIMARY KEY  (`FileID`),
  KEY `C_D_P_j` (`Category`,`Date`,`PostID`,`fi_junk`),
  KEY `P_j_F` (`PostID`,`fi_junk`,`FileID`),
  KEY `A_j_P_D` (`AuthorID`,`fi_junk`,`PostID`,`Date`),
  KEY `C_P_D_j` (`Category`,`PostID`,`Date`,`fi_junk`),
  KEY `Date` (`Date`),
  KEY `S_D` (`SubCategory`,`Date`),
  KEY `Subject` (`Subject`(10)),
  KEY `C_D` (`Category`,`Date`),
  KEY `C_B` (`Category`,`Bytes`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.48 sec)

a SELECT perfoms as you'd expect:
mysql> SELECT COUNT(*) FROM News WHERE PostID IN (1608140);
+----------+
| COUNT(*) |
+----------+
|       33 |
+----------+
1 row in set (0.00 sec)

Running query in the mysql client also takes ages. However I am not seeing a complete lock as implied in the other comments, they just take a very long time, up to 2 minutes per query of this type, leading to replication falling behind.
I have 2 slaves with this InnoDB table, only one is 4.1.16 and exhibiting the problem. The other, 4.1.14, is not.

I'm trying an OPTIMIZE TABLE (InnoDB will just recreate the entire thing, I know) to see if it fixes it, and will report back if it does. (75 million rows, take a while)
[28 Dec 2005 17:42] Chris Elsworth
Sorry for the second comment so soon. The OPTIMIZE finished and made no difference. Thought you might like this too (the number has changed, this is the query it's currently trying to do an UPDATE on):

mysql> explain SELECT COUNT(*) FROM News WHERE PostID IN (1608139);
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | News  | ref  | P_j_F         | P_j_F |       3 | const |   15 | Using where; Using index |
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
[28 Dec 2005 21:54] Greg Whalin
Just throwing in another me too.  I also rolled back (in my case to 4.1.9) to resolve the issue.  Was seeing this happening on a slave machine across several of our tables.  Did not have time to debug the issue however, this machine was running Fedora Core 2 on a dual Opteron machine w/ the binary tar file from mysql.com.
[30 Dec 2005 18:07] Valeriy Kravchuk
All reporters:

Please, send the SHOW PROCESSLIST results for the period, when your updates "hang" on 4.1.16.

Chris Elsworth:

How many rows are there in that your News table? For small table I was not able to repeat the problem you described.
[1 Jan 2006 19:17] Chris Elsworth
There are 7497970 rows in the table. The problem is still present:

mysql> UPDATE News SET PostID = 1608665 WHERE PostID IN (1608140);
Query OK, 0 rows affected (38.45 sec)
Rows matched: 0  Changed: 0  Warnings: 0

Avg_row_length is 203, making it around 1.5GB before indexes - a bit big to send to you :(

Show processlist reports:
Command: Query
   Time: 37
  State: Searching rows for update
   Info: UPDATE News SET PostID = 1608665 WHERE PostID IN (1608140)
[5 Jan 2006 19:07] Steven Roussey
OK, this is one of those problems that scares me, because it will be hard to track down why it is occurring. The best thing I have to offer is that it is something in the changeset from 4.1.15 to 4.1.16.

Anyhow, the story: At the beginning of the year we updated our db servers from 4.1.15 to 4.1.16. For the next several days we had intermittent problems, more easily seen if you looked at a minute by minute report of the web traffic and other metrics. The webservers were timing out on the application servers which we tracked down to waiting on mysql which was locking tables (MyISAM) do to simple updates taking several minutes to update (or not) a single record:

# Time: 060104 11:59:27
# User@Host: [apache] @ dozer.i [10.1.1.19]
# Query_time: 148  Lock_time: 0  Rows_sent: 0  Rows_examined: 0 
update forums_posts_new_0 set deleted = 'yes' where messageid = yyyyyyyyy AND forumid = xxxxxxx;

Just for reference:

explain select  deleted from forums_posts_new_0 where messageid = yyyyyyyyy AND forumid = xxxxxxx;
+----+-------------+--------------------+-------+----------------------------+---------+---------+-------------+------+-------+
| id | select_type | table              | type  | possible_keys              | key     | key_len | ref         | rows | Extra |
+----+-------------+--------------------+-------+----------------------------+---------+---------+-------------+------+-------+
|  1 | SIMPLE      | forums_posts_new_0 | const | PRIMARY,status,parent,root | PRIMARY |       8 | const,const |    1 |       |
+----+-------------+--------------------+-------+----------------------------+---------+---------+-------------+------+-------+
1 row in set (0.00 sec)

This would normally take less than a second, though often these days can take over a second. The PRIMARY KEY is (forumid,messageid). Updates like this are now taking quite a while on 4.1.16, not so after returning to 4.1.15. It does not happen every time obviously, but rather happens only occasionally.

Oddly it seems to not find the row in the really slow update, if I understand the slow query log correctly (Rows_examined: 0), but does find it (very quickly) on the select.

RHEL 4
[5 Jan 2006 22:40] Pete Harlan
I have found that 4.1.16, on our InnoDB tables, updates no longer use prefixes of compound keys.  This is on the precompiled x86_64 version.  Selects use them, just not updates.  An "explain select" looks normal.  Optimize/analyze table doesn't make a difference.
[5 Jan 2006 22:57] Steven Roussey
BTW: I'm using MyISAM
[5 Jan 2006 22:59] Pete Harlan
I have since verified that we see the same behavior on our MyISAM tables as on our InnoDB tables.
[7 Jan 2006 2:36] Xing Li
Just posting to see if the bug status can be upgraded from feedback to something else. My earlier private message already contains the show processlist result.
[9 Jan 2006 13:09] Andreas Mller
Hello,
I can confirm the same problem after upgrading from 4.1.15 to 4.1.16 on a Fedora Core 4 machine.

A select with this condition works absolutly normal - explain tells the correct keys.

So it seems that there is realy a bug - and should be fixed.

Regards,
Andreas
[10 Jan 2006 16:39] Valeriy Kravchuk
All reporters:

It looks like so far the bug is reported with replication being used only (and I am not able to repeat it with simpler and smaller test case not involving replication). So, please, clarify: do you use replication and if you do, is this bug observed on master, on slave (looks like this is true for most reports) or on both servers?
[10 Jan 2006 17:37] Pete Harlan
In our case (with the index prefixes not being used for an update), there is no replication.  This is repeatable on a standalone setup.

If you cannot find a test case, I can see if I can generate something reproducible here with nonsensitive data.  Part of the difficulty is that the number of records has to be large enough that I can tell whether the index is being used; is there a way to determine that without simply timing the update query?

Thank you for looking into this.
[10 Jan 2006 20:22] Steven Roussey
In our case, we are using replication, and I notice it on the master. The slave is for backup snapshots, so I haven't really looked at that, but I'll try.

Too bad you can't do an EXPLAIN UPDATE..., though that may not help.
[10 Jan 2006 20:32] Xing Li
Happened on both my master and slaves.
[14 Jan 2006 10:38] Valeriy Kravchuk
Test case as a separate file

Attachment: bug15935.txt (text/plain), 10.13 KiB.

[14 Jan 2006 10:45] Valeriy Kravchuk
I proved that UPDATE time grows with larger number of rows in the table, while SELECT time for the same rows does not change significantly. The entire test case is uploaded in a separate file. But here is the main idea:

1. Start server with --log-slow-queries=/tmp/slow.log and --log-queries-not-using-indexes, to be able to register slow queries and queries not using indexes.

2. Create table with index having multiple columns:

CREATE TABLE `News` (
`FileID` int(8) unsigned NOT NULL auto_increment,
`PostID` mediumint(8) unsigned NOT NULL default '0',
`Subject` varchar(255) NOT NULL default '',
`Category` tinyint(3) unsigned NOT NULL default '0',
`SubCategory` smallint(5) unsigned NOT NULL default '0',
`Date` int(10) unsigned NOT NULL default '0',
`Parts` smallint(5) unsigned NOT NULL default '0',
`Bytes` int(10) unsigned NOT NULL default '0',
`fi_junk` int(10) unsigned NOT NULL default '0',
`AuthorID` mediumint(8) unsigned NOT NULL default '0',
`isNFO` tinyint(3) unsigned NOT NULL default '0',
`SegFirstTime` int(10) unsigned NOT NULL default '0',
`SegLastTime` int(10) unsigned NOT NULL default '0',
`Segments` smallint(5) unsigned NOT NULL default '0',
`SegmentsSeen` smallint(5) unsigned NOT NULL default '0',
`GroupsHash` mediumint(8) unsigned NOT NULL default '0',
PRIMARY KEY  (`FileID`),
KEY `C_D_P_j` (`Category`,`Date`,`PostID`,`fi_junk`),
KEY `P_j_F` (`PostID`,`fi_junk`,`FileID`),
KEY `A_j_P_D` (`AuthorID`,`fi_junk`,`PostID`,`Date`),
KEY `C_P_D_j` (`Category`,`PostID`,`Date`,`fi_junk`),
KEY `Date` (`Date`),
KEY `S_D` (`SubCategory`,`Date`),
KEY `Subject` (`Subject`(10)),
KEY `C_D` (`Category`,`Date`),
KEY `C_B` (`Category`,`Bytes`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

3. Fill the table with some dummy data, up to several thousand of rows:

mysql> insert into News (PostID) values (1), (2), (3), (4);
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> insert into News (PostID) select PostID from News;
Query OK, 4 rows affected (0.03 sec)
Records: 4  Duplicates: 0  Warnings: 0

...

mysql> insert into News (PostID) select PostID from News;
Query OK, 16400 rows affected (11.02 sec)
Records: 16400  Duplicates: 0  Warnings: 0

4. Add 3 specific rows:

mysql> insert into News (PostID) values (1608140);
Query OK, 1 row affected (0.00 sec)

mysql> insert into News (PostID) values (1608140);
Query OK, 1 row affected (0.01 sec)

mysql> insert into News (PostID) values (1608140);
Query OK, 1 row affected (0.00 sec)

5. Check, that index is used for SELECT:

mysql> explain SELECT COUNT(*) FROM News WHERE PostID IN (1608140);
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | News  | ref  | P_j_F         | P_j_F |       3 | const | 3 | Using where; Using index |
+----+-------------+-------+------+---------------+-------+---------+-------+------+--------------------------+
1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM News WHERE PostID IN (1608140);
+----------+
| COUNT(*) |
+----------+
|        3 |
+----------+
1 row in set (0.01 sec)

6. Try to UPDATE:

mysql> UPDATE News SET PostID = 1608665 WHERE PostID IN (1608140);
Query OK, 3 rows affected (1.43 sec)
Rows matched: 3  Changed: 3  Warnings: 0

7. So, this update of 3 rows out of 16K+ took almost 2 seconds. Surely index is not used, but the UPDATE was not logged. Let's double the number of rows once more:

[root@Fedora tmp]# /usr/bin/mysql -uroot test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4 to server version: 4.1.16-standard-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> update News set PostID=1;
Query OK, 3 rows affected (0.75 sec)
Rows matched: 32803  Changed: 3  Warnings: 0

mysql> insert into News (PostID) select PostID from News;
Query OK, 32803 rows affected (18.20 sec)
Records: 32803  Duplicates: 0  Warnings: 0

mysql> insert into News (PostID) values (1608140);
Query OK, 1 row affected (0.01 sec)

mysql> insert into News (PostID) values (1608140);
Query OK, 1 row affected (0.00 sec)

mysql> insert into News (PostID) values (1608140);
Query OK, 1 row affected (0.01 sec)

mysql> SELECT COUNT(*) FROM News WHERE PostID IN (1608140);
+----------+
| COUNT(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)

mysql> UPDATE News SET PostID = 1608665 WHERE PostID IN (1608140);
Query OK, 3 rows affected (3.25 sec)
Rows matched: 3  Changed: 3  Warnings: 0

mysql> exit
Bye

8. Check the slow.log:

[root@Fedora tmp]# tail /tmp/slow.log
insert into News (PostID) select PostID from News;
# Time: 060114 13:14:16
# User@Host: root[root] @ localhost []
# Query_time: 0  Lock_time: 0  Rows_sent: 0  Rows_examined: 0
update News set PostID=1;
# Time: 060114 13:14:41
# User@Host: root[root] @ localhost []
# Query_time: 18  Lock_time: 0  Rows_sent: 0  Rows_examined: 65606
SET last_insert_id=32804,insert_id=32804;
insert into News (PostID) select PostID from News;

So, SELECT is instant, UPDATE took about 2 times more on the data 2 times larger. I believe, it is enough evidence that index is NOT USED FOR UPDATE. And (!) looks like server think this query does use index, by the way.
[20 Jan 2006 10:00] MySQL Verification Team
Attached a snipped of two traces from debug binary of 4.1.15 and 4.1.16....

Attachment: bug15935_traces.zip (application/zip, text), 4.95 KiB.

[20 Jan 2006 10:29] MySQL Verification Team
Can the fix of http://bugs.mysql.com/bug.php?id=12915 (UPDATE/DELETE with ORDER BY, index, LIMIT not optimized well) have anything to do with this bug??
[25 Jan 2006 20:22] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/1626
[25 Jan 2006 20:58] Steven Roussey
Just a note that it would have been easier to show that the bug existed if EXPLAIN worked for UPDATE (and DELETE). It would be nice to see this added some day.
[25 Jan 2006 21:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/1634
[25 Jan 2006 22:45] Sergey Petrunya
The fix has been also reviewed by Evgen Potemkin.
[25 Jan 2006 23:00] Sergey Petrunya
The fix has been pushed into 4.1.18 tree.

Notes for the changelog:
The bug affected execution of single-table UPDATE queries without ORDER BY clause , such that
 * The best way to search for records for update is to scan a range on some index IDX1
 * The field being modified is covered by the index IDX
The bug was that for these queries the entire index IDX1 was scanned, instead of scanning a range.
The resolution is that this no longer happens.

The bug has been introduced when adding another optimization: BUG#12915.
[26 Jan 2006 0:33] Mike Hillyer
Noted in 4.1.18 changelog:

   <listitem>
        <para>
          Single table <literal>UPDATE</literal> statements without
          <literal>ORDER BY</literal> clauses which updated the same indexed
          column that was being filtered on were optimized with a full
          index scan instead of a more appropriate index range scan.
          (Bug #15935)
        </para>
      </listitem>