Bug #14182 Large query with subselects gets stuck in state 'statistics', can lock server
Submitted: 20 Oct 2005 14:16 Modified: 21 Oct 2005 10:37
Reporter: [ name withheld ] Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:4.1.13, 4.1.14, mysql.com RPMs OS:Linux (RHEL4, RH 9.0)
Assigned to: CPU Architecture:Any

[20 Oct 2005 14:16] [ name withheld ]
Description:
This is a little complex to describe but in short we have a very large query with subselects that is generated as part of our internal content management system. The first time this query runs it will get stuck in an undocumented state of 'statistics' (I've opened a doc bug for that and another undocumented state at bugid 13462). It will not ever finish and leave this state. The problem comes from a second instance of this query running. As soon as it starts it too will get stuck in 'statistics' but at that point all other queries against the same tables will block in a 'Locked' state. This blocking only happens as soon as a second query enters the 'statistics' state though. At that point everything on our servers fall over until both 'statistics' queries are killed.

We've witnessed this on both 4.1.13 and 4.1.14 on two different servers running RHEL4 and RH9 respectively.

The full processlist showing the query in question will be sent as a separate attachment as it is too large for inclusion here. I've included the table structure of the tables in question in the How To Repeat section below.

There seems to be two problems here. First is that these queries get stuck in 'statistics' and second is the blocking that happens when multiple instances of these queries occur.

Please let me know what additional information I can provide to help you track this down. It's causing stability problems for us.

Thanks,

How to repeat:

Table structures in question:

| content |CREATE TABLE `content` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `name` varchar(64) default NULL,
  `content_type_id` int(10) unsigned NOT NULL default '0',
  `content_class` enum('editorial','marketing','sales','tech') NOT NULL default 'editorial',
  `site_id` int(10) unsigned NOT NULL default '0',
  `status` enum('live','archived','hidden','deleted') NOT NULL default 'live',
  `note` text,
  `create_time` datetime NOT NULL default '0000-00-00 00:00:00',
  `updated` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  `lock_user_id` int(10) unsigned NOT NULL default '0',
  `lock_time` datetime NOT NULL default '0000-00-00 00:00:00',
  `live_version_id` int(10) unsigned NOT NULL default '0',
  `staging_version_id` int(10) unsigned NOT NULL default '0',
  `former_id` int(10) unsigned NOT NULL default '0',
  `former_content_type_id` int(11) NOT NULL default '0',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `name` (`name`),
  KEY `content_type_id_idx` (`content_type_id`),
  KEY `status_idx` (`status`),
  KEY `site_id_idx` (`site_id`),
  KEY `idx_former_id` (`former_id`),
  KEY `idx_former_content_type_id` (`former_content_type_id`),
  KEY `idx_former_incarnation` (`former_content_type_id`,`former_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

| stories |CREATE TABLE `stories` (
  `version_id` int(10) unsigned NOT NULL auto_increment,
  `content_id` int(10) unsigned NOT NULL default '0',
  `version_number` int(10) unsigned NOT NULL default '0',
  `version_status` enum('staging','live','archived','autosave') NOT NULL default 'staging',
  `version_updated` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  `version_owner` int(10) unsigned NOT NULL default '0',
  `version_note` text NOT NULL,
  `title` tinytext,
  `subtitle` tinytext,
  `redtext` tinytext,
  `url` text,
  `expdate` date NOT NULL default '0000-00-00',
  `flagdate` date NOT NULL default '0000-00-00',
  `template` tinytext,
  `oneliner` text,
  `summary` text,
  `story` text,
  `booking` text,
  `priority` int(11) NOT NULL default '0',
  `seq_name` varchar(50) default NULL,
  `seq_num` int(10) unsigned default NULL,
  PRIMARY KEY  (`version_id`),
  KEY `idx_story_group` (`seq_name`),
  KEY `idx_story_group_seq` (`seq_num`),
  KEY `idx_story_group_and_seq` (`seq_name`,`seq_num`),
  KEY `idx_content_id` (`content_id`),
  KEY `idx_status` (`version_status`),
  KEY `idx_content_id_status` (`version_status`,`content_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |

| content_relationships | CREATE TABLE `content_relationships` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `content_id` int(10) unsigned NOT NULL default '0',
  `content_version_id` int(10) unsigned NOT NULL default '0',
  `relationship_id` int(10) unsigned NOT NULL default '0',
  `related_content_id` int(10) unsigned NOT NULL default '0',
  `argument` int(11) NOT NULL default '0',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `relationship` (`content_id`,`content_version_id`,`relationship_id`,`related_content_id`,`argument`),
  KEY `content_id_idx` (`content_id`),
  KEY `content_version_id_idx` (`content_version_id`),
  KEY `content_id_version_idx` (`content_id`,`content_version_id`),
  KEY `relationship_id_idx` (`relationship_id`),
  KEY `related_content_id_idx` (`related_content_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
[20 Oct 2005 14:18] [ name withheld ]
The large query associated with this bug report.

Attachment: statsquery.txt (text/plain), 18.78 KiB.

[20 Oct 2005 15:37] Valeriy Kravchuk
Thank you for a problem report. You'll be able to upload private files using the File tab (up to 200 K in size), but for now, please, just inform about the number of rows in tables involved. Or you may just send the results of EXPLAIN for that huge query from your machine.
[20 Oct 2005 15:42] [ name withheld ]
Here are the row counts for each table.

mysql> select count(*) from content;
+----------+
| count(*) |
+----------+
|   100636 |
+----------+
1 row in set (0.00 sec)
 
mysql> select count(*) from stories;
+----------+
| count(*) |
+----------+
|    10248 |
+----------+
1 row in set (0.00 sec)
 
mysql> select count(*) from content_relationships;
+----------+
| count(*) |
+----------+
|   271004 |
+----------+
1 row in set (0.00 sec)

I'll try to do the explain but it may take some time as I need to create a suitable development instance to run it on as I'm worried about running it on the production database servers where the bug is occuring.
[20 Oct 2005 15:50] [ name withheld ]
I've fired off an EXPLAIN on that query in one of our dev mysql instances (4.1.13 on RH9.0) and it's in a state of 'statistics' as well. I'll leave it running for a while, but I expect it may never return. I'll provide another update after it's run for a few hours or finished whichever comes first.

Thanks,

Tabor
[20 Oct 2005 16:09] [ name withheld ]
Sorry for the comment flurry. I've had to kill the EXPLAIN on that query because it was impacting the performance of that dev instance which is used for other development work. I'll re-run it later when the system is idle.

One thing I did note is that while it was in this state, mysql used up all available CPU on the system  for the 15 minutes it ran before being killed.

When I have more data to report on the EXPLAIN, I'll update the ticket again.

Thanks,

Tabor
[20 Oct 2005 17:42] MySQL Verification Team
Hi !

I have analysed your incomplete query (found in statement.txt) and it looks very much
a kind of problem that will be fixed by a new optimizer that will be cost based and is scheduled for MySQL 5.2.

You could also try 5.0 and try to set those optimizer variables (described in our manual) and see if you get any better timing, but I doubt it.
[20 Oct 2005 18:54] [ name withheld ]
Sinisa,

Thanks for the update. Just so that I understand more fully what's happening here is the following an accurate description of the issue?:

We have a ridiculously large query that isn't actually stuck but that is attempting to be optimized by the query optimizer (I'm assuming that's what the state of 'statistics' means -- gathering statistics about optimization?). Presumably if left alone for some indeterminate amount of time it will end up determining what the best execution method is and execute it. That amount of time could be hours or even days. At least that's how I read the text in http://dev.mysql.com/doc/refman/5.0/en/controlling-optimizer.html

So really our best choice is to optimize how our CMS creates that query to generate something much less taxing on the system.

But barring that 5.2 should have a fix which would allow this to be optimized more efficiently? I'm guessing the ETA on a stable 5.2 is probably a year or more off? That's unfortunate because I just upgraded from 4.0.24 to 4.1.13/14 to support this CMS. Upgrading again is undesirable but if the variables like optimizer_search_depth (I'm assuming that's what you're talking about) help we may do that.

One question though, why would the optimizer cause all queries against the tables to block if two of these very large queries are run at the same time? Why would any have to block at all if they're all read-only queries? That strikes me as ungraceful behavior on the part of the server.

Thanks for the help

Tabor
[21 Oct 2005 10:37] Sergei Golubchik
A couple of more comments.

First - our optimizer is cost based (at least since 3.22, but probably earlier).

Second - the problem is - you have too many tables in a join and MySQL is trying to chose a best join order by doing an exhaustive search over all possible combination (8!= 40320). Usually it doesn't have to do it, because indexes, statistics, etc allows to stop this process early, but it does not happen in yout case.

Workaround - use STRAIGHT_JOIN instead of INNER JOIN when you join content_relationships to itself.

Solution - upgrade to 5.0, it uses a different search logic (see the manual for "greedy search") that does not have to try all the combinations and is much faster when many tables are joined.