MySQL Bugs: #41109: Query takes >5 hrs and locks db, when optimised manually takes <1s

Bug #41109	Query takes >5 hrs and locks db, when optimised manually takes <1s
Submitted:	28 Nov 2008 17:13	Modified:	1 Apr 2009 6:51
Reporter:	Anthony Wright	Email Updates:
Status:	Verified	Impact on me:	None
Category:	MySQL Server: Optimizer	Severity:	S2 (Serious)
Version:	5.1.30, 5.0, 5.1, 6.0 bzr	OS:	Any
Assigned to:		CPU Architecture:	Any
Tags:	regression

Description:
Moodle is a system used by a large number of colleges and schools which can use MySQL as the backend database. However moodle isn't very good at writing SQL, so tends to create very convoluted SQL which could be re-written in a much simpler form.

Within moodle there is a feature that allows sub-databases to be created within its mysql database and for kids to search through the database. Unfortunately by entering anything more than simple search criteria, a user can cause moodle to generate SQL which takes forever to run, and locks the database while it does it, effectively pulling down the whole site for all users. This is a lot of fun for the pupils that have discovered it, and a nightmare for everybody else.

While the SQL generated by moodle is fairly convoluted, it can very easily be re-formed into something that runs very quickly.

How to repeat:
This is a query that goes wrong:

SELECT  c.recordid  FROM mdl_data_content c,mdl_data_records r,mdl_data_content cs, mdl_user u , mdl_data_content c144 , mdl_data_content c145 , mdl_data_content c146 , mdl_data_content c147 , mdl_data_content c148 , mdl_data_content c149 , mdl_data_content c150 , mdl_data_content c151 , mdl_data_content c152 , mdl_data_content c153 , mdl_data_content c155 , mdl_data_content c156  WHERE c.recordid = r.id
                         AND r.dataid = 16
                         AND r.userid = u.id
                         AND cs.recordid = r.id  AND c144.recordid = r.id AND c145.recordid = r.id AND c146.recordid = r.id AND c147.recordid = r.id AND c148.recordid = r.id AND c149.recordid = r.id AND c150.recordid = r.id AND c151.recordid = r.id AND c152.recordid = r.id AND c153.recordid = r.id AND c155.recordid = r.id AND c156.recordid = r.id;

This is an query that does exactly the same thing, but completes in <1s:

SELECT  c.recordid  FROM mdl_data_content c,mdl_data_records r,mdl_user u WHERE c.recordid = r.id
                         AND r.dataid = 16
                         AND r.userid = u.id ;

I will attach an archive that contains these two queries and a mysql dump that can be loaded into a clean 5.0.67 mysql install to run the queries against.

archive containing dump and copies of faultly and manually optimised queries

Attachment: mysql-bug.tar.bz2 (application/x-bzip2, text), 12.89 KiB.

Can you try to explain, please, how (any) query optimizer may find out that the first query is "equivalent" to the second? In any other way than doing them boss and comparing result sets?

The query that goes wrong repeats itself over and over again.

The table mdl_data_content is referenced as c,cs,c144,c145,c146,c147,c148,c149,c150,c151,c152,c153,c154,c155 & c156.

The query then goes on to select where:
    c.recordid=r.id,
    cs.recordid=r.id,
    c144.record_id=r.id,
    etc.

for each one of these references. Which is the equivalent of:
    mdl_data_content.recordid=r.id,
    mdl_data_content.recordid=r.id,
    mdl_data_content.recordid=r.id,
    etc.

By simply removing this duplication the query completes very quickly.

I realise that the code that produced the query is fairly dumb, but I would expect MySQL to spot this redundancy in the SQL.

Ok, I've made a fairly stupid mistake here. The SQL I posted is broken, so there is a bug in moodle which needs to be fixed.

However my quick test of the obvious fix isn't going to work, and I'm back with a problem in the MySQL optimiser. The obvious fix would be to change:

SELECT  c.recordid  FROM ...

to

SELECT  DISTINCT(c.recordid)  FROM ...

This should allow mysql to optimise away all the c144, c145 references, but it doesn't seem to have the desired effect.

I've just tested this on 5.1.30.

Can you, please, send a complete test case, with table(s) used and some data in them, to demonstrate the problem?

A database dump was attached with the initial report

The query that fails is:

SELECT  DISTINCT(c.recordid)  FROM mdl_data_content c,mdl_data_records r,mdl_data_content cs,
mdl_user u , mdl_data_content c144 , mdl_data_content c145 , mdl_data_content c146 ,
mdl_data_content c147 , mdl_data_content c148 , mdl_data_content c149 , mdl_data_content
c150 , mdl_data_content c151 , mdl_data_content c152 , mdl_data_content c153 ,
mdl_data_content c155 , mdl_data_content c156  WHERE c.recordid = r.id
                         AND r.dataid = 16
                         AND r.userid = u.id
                         AND cs.recordid = r.id  AND c144.recordid = r.id AND
c145.recordid = r.id AND c146.recordid = r.id AND c147.recordid = r.id AND c148.recordid =
r.id AND c149.recordid = r.id AND c150.recordid = r.id AND c151.recordid = r.id AND
c152.recordid = r.id AND c153.recordid = r.id AND c155.recordid = r.id AND c156.recordid =
r.id;

This is an query that does exactly the same thing, but completes in <1s:

SELECT  DISTINCT(c.recordid)  FROM mdl_data_content c,mdl_data_records r,mdl_user u WHERE c.recordid
= r.id
                         AND r.dataid = 16
                         AND r.userid = u.id ;

Thank you for the feedback.

Verified as described. Although problem query is surplus in version 4.1 in only took 7 sec on my box instead of current more than 500 (when I stopped waiting).

test case for the testsuite

Attachment: bug41109.test (application/octet-stream, text), 54.83 KiB.