Bug #49937 RQG's query simplifier can not handle multi-part indexes
Submitted: 25 Dec 2009 16:18 Modified: 30 Dec 2009 11:51
Reporter: Patrick Crews Email Updates:
Status: Closed Impact on me:
None 
Category:Tools: Random Query Generator Severity:S4 (Feature request)
Version: OS:Any
Assigned to: Bernt Marius Johnsen CPU Architecture:Any
Tags: multi-part keys, simplification

[25 Dec 2009 16:18] Patrick Crews
Description:
The random query generator can accidentally remove portions of a multi-part key via its simplification tools.  

The fields do not participate in the actual query, but are a part of a multi-part key that *is* necessary for a bug to be reproduced.

For example, this table was simplified via the RQG:
CREATE TABLE `G` (
  `pk` int(11) NOT NULL AUTO_INCREMENT,
  `col_int_key` int(11) DEFAULT NULL,
  `col_varchar_1024_latin1` varchar(1024) DEFAULT NULL,
  `col_varchar_10_utf8` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (`pk`),
  KEY `col_int_key` (`col_int_key`),
  KEY `test_idx` (`col_varchar_10_utf8`) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=21 DEFAULT CHARSET=latin1;

However, the original failure, required *this* definition of test_idx on the table to reproduce the bug:
ALTER TABLE G ADD INDEX `test_idx` USING BTREE (`col_varchar_10_utf8` (10) , `col_varchar_1024_utf8` (5) );

As col_varchar_1024_utf8 did not participate in the actual failing query, it was removed, the index was changed and the MTR test case could not reproduce the bug.

How to repeat:
Use the attached grammar and gendata files and the following command line against the 6.0-codebase-bugfixing tree:
perl ./runall.pl --basedir=<path>/mysql-6.0 --threads=1 --gendata=conf/range_access.zz
--grammar=conf/range_access.yy --mtr-build-thread=71 --queries=10000 --debug
--seed=1261617604
[25 Dec 2009 16:19] Patrick Crews
grammar and gendata files for demonstrating the bug against 6.0-codebase

Attachment: bug49906.tar.gz (application/x-gzip, text), 2.83 KiB.

[25 Dec 2009 22:29] Patrick Crews
Addendum:

Running the RQG as noted in 'How to reproduce' will produce a crash.  Simplifying the crashing query via util/simplify-crash will produce an MTR test case,  but it will not crash (the indexes / columns are different)
[28 Dec 2009 9:40] Bernt Marius Johnsen
The current simplifier does brute force simplification token by token with no information whatsoever on the schema nor the SQL involved. This bug report is a request for a much more advanced simplifier.
[28 Dec 2009 9:54] Philip Stoev
Bernt, this bug is not about the simplification of the query per-se, but about the dumping of the minimal set of tables and columns that are required to reproduce the bug. It is during that this part of the process that multi-part keys get converted to single-part.

The code is located in Simplify::Tables and does not use any form of parsing. The only thing that is needed is to be able to identify multi-part indexes and dump all participating columns.
[28 Dec 2009 12:30] Bernt Marius Johnsen
Thanx Philip. Did some research, and found that there is no easy way to determine that a column is a part of a multi-column index with SHOW FIELDS or INFORMATION_SCHEMA: e.g.:

mysql> create table t1 (pk integer primary key, i integer, j integer, k integer,l integer, key(i,j), key(k));
Query OK, 0 rows affected (0.01 sec)

mysql> show fields from t1;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| pk    | int(11) | NO   | PRI | NULL    |       |
| i     | int(11) | YES  | MUL | NULL    |       |
| j     | int(11) | YES  |     | NULL    |       |
| k     | int(11) | YES  | MUL | NULL    |       |
| l     | int(11) | YES  |     | NULL    |       |
+-------+---------+------+-----+---------+-------+
5 rows in set (0.02 sec)

(INFORMATION_SCHEMA.COLUMNS gives basically the same information).
There is no evidence here that j is part of an index, while l is not. Any pointers would be appreciated.
[28 Dec 2009 13:04] Bernt Marius Johnsen
SHOW INDEX has the needed information;
[28 Dec 2009 13:05] Philip Stoev
If the I_S schema does not provide the required information, there would be two ways to get it. SHOW KEYS, where you can see which field participated in which key, so a two or more fields that participate in the same key would be a multi-part key. If one of the fields participates in the query, we need to preserve and dump (not drop) the other fields as well, causing mysqld to preserve the original key definition.

The other option is to use SHOW CREATE TABLE, and use a simple regexp to parse the key definitions for multi-part keys.
[30 Dec 2009 11:51] Bernt Marius Johnsen
Fixed