Bug #55645 LIKE search is slow using varbinary indexes on cluster
Submitted: 30 Jul 2010 9:16
Reporter: Mikiya Okuno Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:mysql-5.1-telco-7.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: 7.0.14
Triage: Triaged: D4 (Minor) / R6 (Needs Assessment) / E6 (Needs Assessment)

[30 Jul 2010 9:16] Mikiya Okuno
Description:
If a column type is varbinary and the column is indexed, LIKE search (prefix search) over that column is very slow. The problem doesn't happen on varchar or binary columns.

How to repeat:
1. Create a table like below

CREATE TABLE t1 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
  c1 varbinary(10) NOT NULL DEFAULT '',
  KEY c1 (c1)
) ENGINE=ndbcluster;

2. Populate the table

insert into t1 select NULL, rand() * 10000000000.0;

insert into t1 select NULL, rand() * 10000000000.0 from t1;

* repeat the later command until 0.5M rows or so

3. Do select!

select * from t1 where c1 like '123%';

The problem doesn't happen if the range is specified using BETWEEN.

select * from t1 where c1 between '123000000' and '123999999';

The problem doesn't happen if the column c1 is converted to varchar or binary.

Suggested fix:
no idea
[16 Aug 2010 10:57] Hartmut Holzgraefe
With VARCHAR both the LIKE and BETWEEN queries are pushed down to ndb and return results immediately.

With VARBINARY the LIKE variant still uses pushdown but takes noticeable time (e.g. about 0.1s for a 64K row table. BETWEEN on the other hand does not cause a pushdown and returns results immediately.

  DROP TABLE IF EXISTS t1;

  CREATE TABLE t1 (
      id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
      vc VARCHAR(10) NOT NULL DEFAULT '',
      KEY vc (vc),
      vb VARBINARY(10) NOT NULL DEFAULT '',
      KEY vb (vb)
    ) ENGINE=ndbcluster;

  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;
  INSERT INTO t1 SELECT NULL, rand() * 10000000000.0, rand() * 10000000000.0 FROM t1;

Checking timing:

mysql> select count(*) from t1 where vc like '123%';
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from t1 where vb like '123%';
+----------+
| count(*) |
+----------+
|       11 |
+----------+
1 row in set (0.13 sec)

mysql> select count(*) from t1 where vc between '123000000' and '123999999';
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from t1 where vb between '123000000' and '123999999';
+----------+
| count(*) |
+----------+
|       11 |
+----------+
1 row in set (0.00 sec)

Checking execution plans:

mysql> explain select count(*) from t1 where vc like '123%';
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                             |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------+
|  1 | SIMPLE      | t1    | range | vc            | vc   | 12      | NULL |   10 | Using where with pushed condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------+
1 row in set (0.00 sec)

mysql> explain select count(*) from t1 where vb like '123%';
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                             |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------+
|  1 | SIMPLE      | t1    | range | vb            | vb   | 12      | NULL |   10 | Using where with pushed condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------+
1 row in set (0.00 sec)

mysql> explain select count(*) from t1 where vc between '123000000' and '123999999';
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra                             |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------+
|  1 | SIMPLE      | t1    | range | vc            | vc   | 12      | NULL |   10 | Using where with pushed condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------------------+
1 row in set (0.00 sec)

mysql> explain select count(*) from t1 where vb between '123000000' and '123999999';
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | t1    | range | vb            | vb   | 12      | NULL |   10 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+

So for VARCHAR both LIKE and BETWEEN are pushed down, and return results fast.

For VARBINARY LIKE is pushed down while BETWEEN is not, and the pushed down LIKE is much slower than the BETWEEN or than both operations on VARCHAR.