Bug #71334 Small join on composite PK not performing well versus IN-list
Submitted: 9 Jan 2014 20:26 Modified: 11 Sep 8:32
Reporter: Greg Kemnitz Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:5.6.15-enterprise-commercial-advanced-lo OS:Solaris
Assigned to: CPU Architecture:Any

[9 Jan 2014 20:26] Greg Kemnitz
Description:
If I have a table like

create table foo (int a, float b, int c, d, e, primary key (a, b, c)) engine=innodb partition by hash(c) partitions 300;

and I create a very small table with only three elements:

create temporary table bar (int a) engine=memory;

and I do the following

select f.* from foo f, bar b where b.a = f.a and f.b <= -5.7 and f.c between 1234 and 2345;

...I get performance that is a whole lot slower than this query:

select f.* from foo f where f.a in (1, 2, 3) and b.a = f.a and f.b <= -5.7 and f.c between 1234 and 2345;

In the actual table, performance is about 30 ms for the IN form and about 3 seconds for the join form if everything's in the buffer pool, even though the join table only has the same three recs.  The actual "foo" table has about 500M records in it, with about 2.5M recs per partition.  I originally thought partition pruning wasn't working correctly, but performance appears consistent with the composite PK direct search inside each partition only being used for the join key and not the literal used in the "b" part.  (Note that I tried to put "5.7" in the "t" table and performance was the same.)

How to repeat:
See above.  Create a decently big "foo" table with similar structure to above.

Suggested fix:
Fix the join so the lookup on the PK uses both parts of the key, not just the leading key.
[9 Jan 2014 20:29] Greg Kemnitz
The IN form is actually this:

select f.* from foo f where f.a in (1, 2, 3) and f.b <= -5.7 and f.c between 1234 and 2345;
[24 Jan 2014 22:26] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior. Please provide configuration options you use.
[25 Feb 2014 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[10 Sep 2:48] Yoseph Phillips
This is related to Bug #116335.
This reproduces what Greg has observed using MySQL 8.0.43:

DROP PROCEDURE IF EXISTS createTestData;

DELIMITER $$

CREATE PROCEDURE createTestData() 
BEGIN
  DECLARE counter INT DEFAULT 0;
  
  SET time_zone = '+00:00';
  
  DROP TABLE IF EXISTS test_data;
  
  CREATE TABLE test_data (
    id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
    column1 TINYINT UNSIGNED NOT NULL,
    column2 MEDIUMINT UNSIGNED NOT NULL,
    column3 TIMESTAMP NOT NULL,
    column4 VARCHAR(1000),
    PRIMARY KEY (id),
    INDEX column1_column2_index (column1, column2),
    INDEX column1_column3_index (column1, column3)
  );
  
  INSERT INTO test_data (column1, column2, column3, column4)
  VALUES (
    floor(rand() * 256),
    floor(rand() * 16777216), 
    '2025-09-10 00:00:00' - INTERVAL floor(rand() * 16777216) MINUTE, 
    'The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.'
  );
  
  WHILE counter < 20 DO
	INSERT INTO test_data (column1, column2, column3, column4)
    SELECT 
      floor(rand() * 256),
      floor(rand() * 16777216), 
      '2025-09-10 00:00:00' - INTERVAL floor(rand() * 16777216) MINUTE, 
      t.column4
    FROM test_data t;
    
    SET counter = counter + 1;
  END WHILE;
END$$

DELIMITER ;

CALL createTestData();

DROP PROCEDURE IF EXISTS createTestData;

DROP TEMPORARY TABLE IF EXISTS test_data2;
  
CREATE TEMPORARY TABLE test_data2 (id TINYINT UNSIGNED NOT NULL, PRIMARY KEY (id));

INSERT INTO test_data2 (id) VALUES (50), (100), (150), (200);

EXPLAIN ANALYZE SELECT count(*) FROM test_data t WHERE t.column1 IN (50, 100, 150, 200) AND t.column2 > (16777216 - 100000);

EXPLAIN ANALYZE SELECT count(*) FROM test_data t WHERE t.column1 IN (50, 100, 150, 200) AND t.column3 > ('2025-09-10 00:00:00' - INTERVAL 100000 MINUTE);

EXPLAIN ANALYZE SELECT count(*) FROM test_data t INNER JOIN test_data2 t2 ON t2.id = t.column1 WHERE t.column2 > (16777216 - 100000);

EXPLAIN ANALYZE SELECT count(*) FROM test_data t INNER JOIN test_data2 t2 ON t2.id = t.column1 WHERE t.column3 > ('2025-09-10 00:00:00' - INTERVAL 100000 MINUTE);

DROP TEMPORARY TABLE IF EXISTS test_data2;