Description:
Problem:
When an outer table column is of type DOUBLE and the inner subquery returns VARCHAR values, the semijoin Nested Loop weedout path may perform index lookups on the outer DOUBLE column using the raw VARCHAR string values. If a VARCHAR string like '1e500' is cast to DOUBLE it overflows to DBL_MAX (1.7976931348623157e308). The B-tree index on the DOUBLE column stores the numeric DBL_MAX, not the string '1e500'. Because the semijoin index lookup uses the unconverted string as the lookup key, the lookup does not find the numeric DBL_MAX entry and the matching row is incorrectly excluded.
Scope and conditions:
- Triggered when: outer column is numeric (DOUBLE), inner subquery returns VARCHAR, and the string value changes representation when converted to the numeric type (overflow or other conversion differences).
- Only occurs when the optimizer chooses a semijoin Nested Loop plan that does index lookups (weedout path).
- With semijoin=off, the query evaluates the comparison with proper type conversion and returns the correct result.
- Observed in SQLancer-generated logs (database6 and database10) and reproducible with the provided minimal test case.
With default optimizer returns 3: ✗ WRONG - missing 1 row
+-----+
| cnt |
+-----+
| 3 |
+-----+
With semijoin=off, returns 4: ✓ CORRECT
+-----+
| cnt |
+-----+
| 4 |
+-----+
How to repeat:
DROP DATABASE IF EXISTS min_semijoin_overflow;
CREATE DATABASE min_semijoin_overflow;
USE min_semijoin_overflow;
CREATE TABLE t0 (c1 DOUBLE);
CREATE TABLE idx_t1 (c5 VARCHAR(255) NOT NULL);
-- Insert the DBL_MAX value and many filler rows to bias the optimizer
INSERT INTO t0 VALUES (1.7976931348623157e308);
INSERT INTO t0 VALUES (1.5),(2.5),(3.5),(4.5),(5.5),(6.5),(7.5),(8.5),(9.5),(10.5);
INSERT INTO t0 VALUES (11.5),(12.5),(13.5),(14.5),(15.5),(16.5),(17.5),(18.5),(19.5),(20.5);
INSERT INTO t0 VALUES (21.5),(22.5),(23.5),(24.5),(25.5),(26.5),(27.5),(28.5),(29.5),(30.5);
INSERT INTO t0 VALUES (31.5),(32.5),(33.5),(34.5),(35.5),(36.5),(37.5),(38.5),(39.5),(40.5);
INSERT INTO t0 VALUES (41.5),(42.5),(43.5),(44.5),(45.5),(46.5),(47.5),(48.5),(49.5),(50.5);
INSERT INTO t0 VALUES (51.5),(52.5),(53.5),(54.5),(55.5),(56.5),(57.5),(58.5),(59.5),(60.5);
INSERT INTO t0 VALUES (61.5),(62.5),(63.5),(64.5),(65.5),(66.5),(67.5),(68.5),(69.5),(70.5);
INSERT INTO t0 VALUES (71.5),(72.5),(73.5),(74.5),(75.5),(76.5),(77.5),(78.5),(79.5),(80.5);
INSERT INTO t0 VALUES (81.5),(82.5),(83.5),(84.5),(85.5),(86.5),(87.5),(88.5),(89.5),(90.5);
INSERT INTO t0 VALUES (91.5),(92.5),(93.5),(94.5),(95.5),(96.5),(97.5),(98.5),(99.5),(100.5);
INSERT INTO t0 VALUES (101.5),(102.5),(103.5),(104.5),(105.5),(106.5),(107.5),(108.5),(109.5),(110.5);
INSERT INTO t0 VALUES (111.5),(112.5),(113.5),(114.5),(115.5),(116.5),(117.5),(118.5),(119.5),(120.5);
INSERT INTO t0 VALUES (121.5),(122.5),(123.5),(124.5),(125.5),(126.5),(127.5),(128.5),(129.5),(130.5);
INSERT INTO t0 VALUES (131.5),(132.5),(133.5),(134.5),(135.5),(136.5),(137.5),(138.5),(139.5),(140.5);
INSERT INTO t0 VALUES (141.5),(142.5),(143.5),(144.5),(145.5),(146.5),(147.5),(148.5),(149.5),(150.5);
INSERT INTO t0 VALUES (151.5),(152.5),(153.5),(154.5),(155.5),(156.5),(157.5),(158.5),(159.5),(160.5);
INSERT INTO t0 VALUES (161.5),(162.5),(163.5),(164.5),(165.5),(166.5),(167.5),(168.5),(169.5),(170.5);
INSERT INTO t0 VALUES (171.5),(172.5),(173.5),(174.5),(175.5),(176.5),(177.5),(178.5),(179.5),(180.5);
INSERT INTO t0 VALUES (181.5),(182.5),(183.5),(184.5),(185.5),(186.5),(187.5),(188.5),(189.5),(190.5);
INSERT INTO idx_t1 VALUES ('1e500');
INSERT INTO idx_t1 VALUES ('1.5');
INSERT INTO idx_t1 VALUES ('2.5');
INSERT INTO idx_t1 VALUES ('3.5');
CREATE INDEX btree_idx ON idx_t1 (c5);
CREATE INDEX null_idx ON t0 (c1);
ANALYZE TABLE t0;
ANALYZE TABLE idx_t1;
-- Default optimizer (semijoin enabled) returns 3 (incorrect)
SELECT 'Default optimizer (BUG: expect 4):' AS label;
SELECT COUNT(*) AS cnt FROM t0 WHERE c1 IN (SELECT c5 FROM idx_t1);
-- Show the plan
EXPLAIN FORMAT=TREE SELECT c1 FROM t0 WHERE c1 IN (SELECT c5 FROM idx_t1);
-- Disable semijoin and get the correct result 4
SET SESSION optimizer_switch = 'semijoin=off';
SELECT 'Semijoin=off (CORRECT: expect 4):' AS label;
SELECT COUNT(*) AS cnt FROM t0 WHERE c1 IN (SELECT c5 FROM idx_t1);
SET SESSION optimizer_switch = 'default';
DROP DATABASE min_semijoin_overflow;
Suggested fix:
When the semijoin Nested Loop weedout path performs an index lookup on the outer table, convert the inner-subquery value to the outer column’s storage/index key type before performing the B-tree lookup. For numeric targets (e.g., DOUBLE), perform the same cast/normalization that the engine uses for stored index keys (including mapping overflow to DBL_MAX/+Inf as appropriate), and use that converted value as the lookup key. If conversion is lossy/ambiguous, fall back to a safe path (e.g., do not use index lookup for that key; use row comparison or full scan).
Description: Problem: When an outer table column is of type DOUBLE and the inner subquery returns VARCHAR values, the semijoin Nested Loop weedout path may perform index lookups on the outer DOUBLE column using the raw VARCHAR string values. If a VARCHAR string like '1e500' is cast to DOUBLE it overflows to DBL_MAX (1.7976931348623157e308). The B-tree index on the DOUBLE column stores the numeric DBL_MAX, not the string '1e500'. Because the semijoin index lookup uses the unconverted string as the lookup key, the lookup does not find the numeric DBL_MAX entry and the matching row is incorrectly excluded. Scope and conditions: - Triggered when: outer column is numeric (DOUBLE), inner subquery returns VARCHAR, and the string value changes representation when converted to the numeric type (overflow or other conversion differences). - Only occurs when the optimizer chooses a semijoin Nested Loop plan that does index lookups (weedout path). - With semijoin=off, the query evaluates the comparison with proper type conversion and returns the correct result. - Observed in SQLancer-generated logs (database6 and database10) and reproducible with the provided minimal test case. With default optimizer returns 3: ✗ WRONG - missing 1 row +-----+ | cnt | +-----+ | 3 | +-----+ With semijoin=off, returns 4: ✓ CORRECT +-----+ | cnt | +-----+ | 4 | +-----+ How to repeat: DROP DATABASE IF EXISTS min_semijoin_overflow; CREATE DATABASE min_semijoin_overflow; USE min_semijoin_overflow; CREATE TABLE t0 (c1 DOUBLE); CREATE TABLE idx_t1 (c5 VARCHAR(255) NOT NULL); -- Insert the DBL_MAX value and many filler rows to bias the optimizer INSERT INTO t0 VALUES (1.7976931348623157e308); INSERT INTO t0 VALUES (1.5),(2.5),(3.5),(4.5),(5.5),(6.5),(7.5),(8.5),(9.5),(10.5); INSERT INTO t0 VALUES (11.5),(12.5),(13.5),(14.5),(15.5),(16.5),(17.5),(18.5),(19.5),(20.5); INSERT INTO t0 VALUES (21.5),(22.5),(23.5),(24.5),(25.5),(26.5),(27.5),(28.5),(29.5),(30.5); INSERT INTO t0 VALUES (31.5),(32.5),(33.5),(34.5),(35.5),(36.5),(37.5),(38.5),(39.5),(40.5); INSERT INTO t0 VALUES (41.5),(42.5),(43.5),(44.5),(45.5),(46.5),(47.5),(48.5),(49.5),(50.5); INSERT INTO t0 VALUES (51.5),(52.5),(53.5),(54.5),(55.5),(56.5),(57.5),(58.5),(59.5),(60.5); INSERT INTO t0 VALUES (61.5),(62.5),(63.5),(64.5),(65.5),(66.5),(67.5),(68.5),(69.5),(70.5); INSERT INTO t0 VALUES (71.5),(72.5),(73.5),(74.5),(75.5),(76.5),(77.5),(78.5),(79.5),(80.5); INSERT INTO t0 VALUES (81.5),(82.5),(83.5),(84.5),(85.5),(86.5),(87.5),(88.5),(89.5),(90.5); INSERT INTO t0 VALUES (91.5),(92.5),(93.5),(94.5),(95.5),(96.5),(97.5),(98.5),(99.5),(100.5); INSERT INTO t0 VALUES (101.5),(102.5),(103.5),(104.5),(105.5),(106.5),(107.5),(108.5),(109.5),(110.5); INSERT INTO t0 VALUES (111.5),(112.5),(113.5),(114.5),(115.5),(116.5),(117.5),(118.5),(119.5),(120.5); INSERT INTO t0 VALUES (121.5),(122.5),(123.5),(124.5),(125.5),(126.5),(127.5),(128.5),(129.5),(130.5); INSERT INTO t0 VALUES (131.5),(132.5),(133.5),(134.5),(135.5),(136.5),(137.5),(138.5),(139.5),(140.5); INSERT INTO t0 VALUES (141.5),(142.5),(143.5),(144.5),(145.5),(146.5),(147.5),(148.5),(149.5),(150.5); INSERT INTO t0 VALUES (151.5),(152.5),(153.5),(154.5),(155.5),(156.5),(157.5),(158.5),(159.5),(160.5); INSERT INTO t0 VALUES (161.5),(162.5),(163.5),(164.5),(165.5),(166.5),(167.5),(168.5),(169.5),(170.5); INSERT INTO t0 VALUES (171.5),(172.5),(173.5),(174.5),(175.5),(176.5),(177.5),(178.5),(179.5),(180.5); INSERT INTO t0 VALUES (181.5),(182.5),(183.5),(184.5),(185.5),(186.5),(187.5),(188.5),(189.5),(190.5); INSERT INTO idx_t1 VALUES ('1e500'); INSERT INTO idx_t1 VALUES ('1.5'); INSERT INTO idx_t1 VALUES ('2.5'); INSERT INTO idx_t1 VALUES ('3.5'); CREATE INDEX btree_idx ON idx_t1 (c5); CREATE INDEX null_idx ON t0 (c1); ANALYZE TABLE t0; ANALYZE TABLE idx_t1; -- Default optimizer (semijoin enabled) returns 3 (incorrect) SELECT 'Default optimizer (BUG: expect 4):' AS label; SELECT COUNT(*) AS cnt FROM t0 WHERE c1 IN (SELECT c5 FROM idx_t1); -- Show the plan EXPLAIN FORMAT=TREE SELECT c1 FROM t0 WHERE c1 IN (SELECT c5 FROM idx_t1); -- Disable semijoin and get the correct result 4 SET SESSION optimizer_switch = 'semijoin=off'; SELECT 'Semijoin=off (CORRECT: expect 4):' AS label; SELECT COUNT(*) AS cnt FROM t0 WHERE c1 IN (SELECT c5 FROM idx_t1); SET SESSION optimizer_switch = 'default'; DROP DATABASE min_semijoin_overflow; Suggested fix: When the semijoin Nested Loop weedout path performs an index lookup on the outer table, convert the inner-subquery value to the outer column’s storage/index key type before performing the B-tree lookup. For numeric targets (e.g., DOUBLE), perform the same cast/normalization that the engine uses for stored index keys (including mapping overflow to DBL_MAX/+Inf as appropriate), and use that converted value as the lookup key. If conversion is lossy/ambiguous, fall back to a safe path (e.g., do not use index lookup for that key; use row comparison or full scan).