Bug #68818 Large Value List in WHERE Clause Spends Too Much Time in 'statistics' State
Submitted: 30 Mar 2013 5:29 Modified: 1 Apr 2013 14:37
Reporter: Jervin R Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S3 (Non-critical)
Version:5.6.10 OS:Any
Assigned to: CPU Architecture:Any

[30 Mar 2013 5:29] Jervin R
Description:
If I have list of column values in the where clause like 'SELECT * FROM t WHERE col != 1 AND col != 2 ... AND col != 1000', the optimizer spends too much time on statistics state unlike on 5.5. The performance worsens, say you have 500 values, statistics time increases as the number of values grow.

Based on oprofile (I run the queries in concurrent manner to get enough samples) most of the time is spent on test_rb_tree:

[root@forge ~]# opreport --demangle=smart --symbols --merge tgid session:mysql_t2 /wok/bin/mysql/5.6.10/bin/mysqld
CPU: Intel Sandy Bridge microarchitecture, speed 3192.52 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
133116   90.4093  test_rb_tree(SEL_ARG*, SEL_ARG*)
4541      3.0841  Field_long::cmp(unsigned char const*, unsigned char const*)
2474      1.6803  SEL_ARG::insert(SEL_ARG*)
1842      1.2510  key_and(RANGE_OPT_PARAM*, SEL_ARG*, SEL_ARG*, unsigned int)
1521      1.0330  SEL_ARG::rb_insert(SEL_ARG*)
1240      0.8422  Field::key_cmp(unsigned char const*, unsigned char const*)
411       0.2791  SEL_ARG::find_range(SEL_ARG*)
311       0.2112  SEL_ARG::SEL_ARG(Field*, unsigned char, unsigned char*, unsigned char*, unsigned char, unsigned char, unsigned char)
289       0.1963  alloc_root
281       0.1908  Arg_comparator::compare_int_unsigned_signed()

Could this be a regression of the new equality range optimization? http://dev.mysql.com/doc/refman/5.6/en/equality-range-optimization.html

I've tried setting optimizer_switch to similar to 5.5, several smaller optimizer_search_depth values, also based on this bug http://bugs.mysql.com/bug.php?id=68145 tmp_table_size and max_heap_table_size but none has any effect.

How to repeat:
CREATE TABLE `t` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `u` int(10) unsigned NOT NULL,
  `cn` int(10) unsigned NOT NULL,
  `f` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`),
  KEY `h_u_cn_f` (`cn`,`f`,`u`),
  KEY `u` (`u`)
) ENGINE=InnoDB;

Sample data and query to follow.

Suggested fix:
Workaround is to use NOT IN, however this may not be always backward compatible with older applications that cannot immediately modify application code.
[30 Mar 2013 5:32] Jervin R
Test data, WARNING: 110M compressed, 23M rows.
http://dotmanila.com/dld/mysql-bug-68818-t.csv.tgz

Test query:
http://dotmanila.com/dld/mysql-bug-68818-t.sql
[30 Mar 2013 10:00] MySQL Verification Team
I'll say this is a duplicate of http://bugs.mysql.com/bug.php?id=68046 ?
[30 Mar 2013 10:35] Jervin R
Shane,

Looks like indeed this is a duplicate, however there is a difference in impact between teh multiple values versus IN(). In any case, if its the same code, feel free to tag as so. Thanks!
[30 Mar 2013 12:41] Ovais Tariq
Jervin,

According to my tests I do not think equality range optimization in 5.6 has to do anything with it, I tested by setting eq_range_index_dive_limit to 0 and the performance was same. Even by default eq_range_index_dive_limit=10 and I can see that the test query has more ranges than that, so that wouldn't have been used anyhow.
[30 Mar 2013 14:14] Ovais Tariq
I tested the query with the provided data on 5.6.5 and it completes in 0.29 seconds, which I guess confirms that this is the same bug as 68046.

mysql [localhost] {msandbox} (test) > select version();
+-----------+
| version() |
+-----------+
| 5.6.5-m8  |
+-----------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > set profiling=1;
Query OK, 0 rows affected (0.00 sec)

mysql [localhost] {msandbox} (test) > SELECT sql_no_cache COUNT(DISTINCT u) FROM t WHERE f = 75901 AND cn = 42233 AND (u != 1075 AND u != 1094 AND u != 1180 AND u != 1185 AND u != 1334 AND u != 1489 
.....
complete query as available here: http://dotmanila.com/dld/mysql-bug-68818-t.sql

+-------------------+
| COUNT(DISTINCT u) |
+-------------------+
|                 0 |
+-------------------+
1 row in set (0.29 sec)

mysql [localhost] {msandbox} (test) > show profile for query 1;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.001413 |
| checking permissions | 0.000006 |
| Opening tables       | 0.000014 |
| System lock          | 0.000007 |
| init                 | 0.000316 |
| optimizing           | 0.000373 |
| statistics           | 0.288687 |
| preparing            | 0.000303 |
| executing            | 0.000054 |
| Sending data         | 0.002975 |
| end                  | 0.000004 |
| removing tmp table   | 0.000007 |
| end                  | 0.000003 |
| query end            | 0.000003 |
| closing tables       | 0.000006 |
| freeing items        | 0.000062 |
| cleaning up          | 0.000005 |
+----------------------+----------+
17 rows in set (0.00 sec)
[1 Apr 2013 14:16] MySQL Verification Team
mysql> source d:/tmp/mysql-bug-68818-t.sql
+-------------------+
| COUNT(DISTINCT u) |
+-------------------+
|               268 |
+-------------------+
1 row in set (0.26 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.6.12    |
+-----------+
1 row in set (0.00 sec)
[1 Apr 2013 14:37] MySQL Verification Team
Looks fixed indeed.  Here's my 5.6.10 output for comparison.
mysql> source d:/tmp/mysql-bug-68818-t.sql
+-------------------+
| COUNT(DISTINCT u) |
+-------------------+
|               268 |
+-------------------+
1 row in set (2.14 sec)

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.6.10    |
+-----------+
1 row in set (0.00 sec)