| Bug #2565 | subselect poor performance (shockingly) | ||
|---|---|---|---|
| Submitted: | 29 Jan 2004 17:14 | Modified: | 18 Feb 2004 3:41 |
| Reporter: | Carson Battlelite | Email Updates: | |
| Status: | Not a Bug | Impact on me: | |
| Category: | MySQL Server | Severity: | S1 (Critical) |
| Version: | 4.1.1 and above | OS: | ANY PLATFORM |
| Assigned to: | Oleksandr Byelkin | CPU Architecture: | Any |
[2 Feb 2004 15:35]
Carson Battlelite
Have tested this on Windows 2000 - same performance problems
[2 Feb 2004 16:10]
Carson Battlelite
generates three data files and an sql file to load the data
Attachment: buggy.pl (text/plain), 2.21 KiB.
[2 Feb 2004 16:10]
Carson Battlelite
I grabbed 4.1.1a-alpha-nt and had the same performance problems - generated from buggy.pl I will attempt to add buggy.pl to the files so that people don't have to cut and paste
[2 Feb 2004 16:18]
Carson Battlelite
I have found that rewriting the query not to use subqueries performs just as badly i.e. SELECT count(*) FROM sess s, inf i, act a WHERE s.id = i.session_id AND i.action_id = a.id and a.code = 'GOOD'
[18 Feb 2004 3:41]
Oleksandr Byelkin
Thank you for bugreport, it was really helpful to testing subqueries performance! Of course list of constants will be faster then using tables (especially because MySQL sort constants before serching). I think subqueries should be compared with JOINs in this case to be identical. If this query will be rewriten with JOINs it will be slowly then subquery. I have tested following 2 queries: SELECT count(*) FROM sess s, inf i WHERE s.id = i.session_id AND i.action_id IN (SELECT id FROM act WHERE code = 'GOOD'); results are (sec.) 7.43, 9.04, 8.33, 8.07 SELECT count(*) FROM sess s, inf i, act WHERE s.id = i.session_id AND i.action_id = act.id and act.code = 'GOOD' results are (sec.) 9.01, 9.58, 9.66, 10.24 i.e. subquery is faster then join in this case (because of special optimisation for subqueries like "value IN (SELECT key_field FROM single_table WHERE some_conditions)" because such subqueries are widely used. In some of 5.x version we plan to add optimisation for IN subqueries with hash tables and in this case it will be about same performance as constant list. But you can't compare constant list with subquery, because: 1. subqery you can use everywere, constant list only if you completely sure that query with constants fit in maximum length of command 2. for subqueries all tables locked at one moment (or it will be done alwais as one transaction), for variant with constant and autocommit mode you can get inconsistent results, because tables can be changed between 2 queries (it is not important for you application as I understand but it is still argument :)

Description: Please see the repeat section. How to repeat: I have pasted a perl script (buggy.pl) that demonstrates the problem. output from the script is of the following form: sysdate query with subquery sysdate query with (subquery expanded by hand) sysdate 87 secs versus 7 secs seems to imply the subquery is run for every record in the outer query??? Of course please read the perl to see that I only create four files, three data files and an sql file. Please also read the sql file to see the point of the demonstration. Change the values for max_session and/or infs_per_session if needed. On a unix box ./buggy.pl /usr/local/mysql/bin/mysql < inf_sess_act.sql On my box with the current values in the perl script I get SYSDATE() 2004-01-30 11:47:09 count(*) 318182 SYSDATE() 2004-01-30 11:48:36 count(*) 318182 SYSDATE() 2004-01-30 11:48:43 ######################### PERL SCRIPT START ############################## #!/usr/bin/perl $max_session = 5000; $infs_per_session = 140; $max_inf = $infs_per_session * $max_session; @sessions = (1..$max_session); foreach $i (reverse 0 .. @sessions ) { #remove every 13th entry if (!($i % 13)) { splice(@sessions, $i, 1); } } @infs = (1..$max_inf); foreach $i (reverse 0 .. @infs ) { #remove every 11th entry if (!($i % 11)) { splice(@infs, $i, 1); } } foreach $i (0..@infs) { $inf_sess{$i} = $sessions[$i % @sessions]; } @action = (0..9); foreach $i (@action) { $act{$i} = (($i + 1)% 2) ? "'GOOD'" : "'BAD'"; push(@good_action, $i) if (($i + 1)% 2) ; } open(SESSION, "> session.dat"); print SESSION join("\n", @sessions),"\n"; close SESSION; open(INF, "> inf.dat"); foreach $a (keys %inf_sess) { print INF "$a,$inf_sess{$a},$action[$a % @action]\n"; } close INF; open(ACTION, "> action.dat"); foreach $a (sort {$a <=> $b} keys %act) { print ACTION "$a,$act{$a}\n"; } close ACTION ; open(SCRIPT, "> inf_sess_act.sql"); print SCRIPT "USE TEST;\n"; print SCRIPT "DROP TABLE IF EXISTS sess;\n"; print SCRIPT "DROP TABLE IF EXISTS inf;\n"; print SCRIPT "DROP TABLE IF EXISTS act;\n"; print SCRIPT "create table sess (id integer, PRIMARY KEY (id));\n"; print SCRIPT "create table inf (id integer, session_id integer, action_id integer,PRIMARY KEY (id));\n"; print SCRIPT "create table act (id integer, code varchar(20), PRIMARY KEY (id));\n"; print SCRIPT "LOAD DATA LOCAL INFILE 'session.dat' INTO TABLE sess FIELDS TERMINATED BY ',';\n"; print SCRIPT "LOAD DATA LOCAL INFILE 'inf.dat' INTO TABLE inf FIELDS TERMINATED BY ',';\n"; print SCRIPT "LOAD DATA LOCAL INFILE 'action.dat' INTO TABLE act FIELDS TERMINATED BY ',' ENCLOSED BY \"'\";\n"; print SCRIPT "SELECT SYSDATE() FROM DUAL;\n"; print SCRIPT "SELECT count(*) FROM sess s, inf i WHERE s.id = i.session_id AND i.action_id IN (SELECT id FROM act WHERE code = 'GOOD');\n"; print SCRIPT "SELECT SYSDATE() FROM DUAL;\n"; print SCRIPT "SELECT count(*) FROM sess s, inf i WHERE s.id = i.session_id AND i.action_id IN (" . join(",", @good_action) . ");\n"; print SCRIPT "SELECT SYSDATE() FROM DUAL;\n"; close SCRIPT ; ######################### PERL SCRIPT END ############################## Suggested fix: If there is no functional dependency from the inner query to the outer query then the results should be cached as per the second query in the example.