Description:
## Summary
The MIN/MAX optimization for ALL/ANY subqueries returns incorrect results when comparing expressions of different data types (e.g., INT vs VARCHAR).
## Problem Description
### Background: The MIN/MAX Optimization
MySQL optimizes ALL/ANY subqueries by transforming them into simpler MIN/MAX aggregate queries. This transformation is implemented in `Item_in_subselect::single_value_transformer()` (`sql/item_subselect.cc`).
For example:
- `x > ALL(SELECT y FROM t)` → `x > (SELECT MAX(y) FROM t)`
- `x < ALL(SELECT y FROM t)` → `x < (SELECT MIN(y) FROM t)`
- `x > ANY(SELECT y FROM t)` → `x > (SELECT MIN(y) FROM t)`
- `x < ANY(SELECT y FROM t)` → `x < (SELECT MAX(y) FROM t)`
This optimization is valid when the ordering used by MIN/MAX aggregation is consistent with the ordering used in the comparison.
### The Bug
The optimization does not account for implicit type conversion. When the left expression and subquery expression have different types, two different ordering semantics come into play:
1. **MIN/MAX aggregation**: Uses the column's native type ordering
2. **Comparison operation**: Uses ordering after implicit type conversion
For numeric vs string comparisons, these orderings can differ significantly:
- String (lexicographic) ordering: `'2' > '10'` (because `'2'` > `'1'` character-by-character)
- Numeric ordering: `2 < 10`
How to repeat:
## Reproduce
```sql
CREATE TABLE t1 (id INT, a INT);
CREATE TABLE t2 (c VARCHAR(20));
INSERT INTO t1 VALUES (1, 2), (2, 10);
INSERT INTO t2 VALUES ('2'), ('10');
SELECT * FROM t1 WHERE a > ALL(SELECT c FROM t2);
```
**Expected**: Empty set (because no `a` is greater than both 2 and 10)
**Actual**: Returns `(2, 10)`
**What happens**:
1. Optimizer transforms to: `a > (SELECT MAX(c) FROM t2)`
2. `MAX(c)` uses string ordering: `MAX('2', '10') = '2'` (since `'2' > '10'` lexicographically)
3. Comparison becomes: `a > '2'` -> after type conversion -> `a > 2`
4. Row with `a = 10` matches incorrectly
Suggested fix:
## Proposed Fix
### Solution
Add a type compatibility check before applying the MIN/MAX optimization. Skip the optimization when the left expression and subquery expression have incompatible types (one is string, the other is numeric).
### Code Change
In `sql/item_subselect.cc`, function `single_value_transformer()`, add the following check before the existing optimization conditions:
```cpp
/*
Check type compatibility for MIN/MAX optimization. The optimization is
only valid when both sides use the same ordering semantics. If one side
is a string type and the other is numeric, string ordering (lexicographic)
differs from numeric ordering after implicit type conversion.
*/
bool minmax_type_compatible = true;
Item *right_expr = select->base_ref_items[0];
if (right_expr != nullptr) {
Item_result left_type = left_expr->result_type();
Item_result right_type = right_expr->result_type();
// Compatible if: same type, or both are numeric types
minmax_type_compatible = (left_type == right_type) ||
(left_type != STRING_RESULT && right_type != STRING_RESULT);
}
```
Then add `minmax_type_compatible` as condition #4 in the optimization check.
### Why This Fix Works
The fix ensures the MIN/MAX optimization is only applied when both expressions use the same ordering semantics:
| Left Type | Right Type | Same Ordering? | Optimization |
|-----------|------------|----------------|--------------|
| STRING | STRING | Yes (both lexicographic) | Applied |
| INT | INT | Yes (both numeric) | Applied |
| INT | STRING | No | **Skipped** |
| STRING | INT | No | **Skipped** |
When skipped, the query falls back to the IN→EXISTS transformation, which correctly evaluates each comparison with proper type conversion.
### Impact
- **Correctness**: Fixes wrong results for type-mismatched ALL/ANY subqueries
- **Performance**: Minimal impact. Only type-mismatched cases skip the optimization; same-type cases retain the performance benefit
- **Compatibility**: No behavioral change for correctly-typed queries
## Patch
See attached file: `bug_fix_minmax_type_mismatch.patch`
Description: ## Summary The MIN/MAX optimization for ALL/ANY subqueries returns incorrect results when comparing expressions of different data types (e.g., INT vs VARCHAR). ## Problem Description ### Background: The MIN/MAX Optimization MySQL optimizes ALL/ANY subqueries by transforming them into simpler MIN/MAX aggregate queries. This transformation is implemented in `Item_in_subselect::single_value_transformer()` (`sql/item_subselect.cc`). For example: - `x > ALL(SELECT y FROM t)` → `x > (SELECT MAX(y) FROM t)` - `x < ALL(SELECT y FROM t)` → `x < (SELECT MIN(y) FROM t)` - `x > ANY(SELECT y FROM t)` → `x > (SELECT MIN(y) FROM t)` - `x < ANY(SELECT y FROM t)` → `x < (SELECT MAX(y) FROM t)` This optimization is valid when the ordering used by MIN/MAX aggregation is consistent with the ordering used in the comparison. ### The Bug The optimization does not account for implicit type conversion. When the left expression and subquery expression have different types, two different ordering semantics come into play: 1. **MIN/MAX aggregation**: Uses the column's native type ordering 2. **Comparison operation**: Uses ordering after implicit type conversion For numeric vs string comparisons, these orderings can differ significantly: - String (lexicographic) ordering: `'2' > '10'` (because `'2'` > `'1'` character-by-character) - Numeric ordering: `2 < 10` How to repeat: ## Reproduce ```sql CREATE TABLE t1 (id INT, a INT); CREATE TABLE t2 (c VARCHAR(20)); INSERT INTO t1 VALUES (1, 2), (2, 10); INSERT INTO t2 VALUES ('2'), ('10'); SELECT * FROM t1 WHERE a > ALL(SELECT c FROM t2); ``` **Expected**: Empty set (because no `a` is greater than both 2 and 10) **Actual**: Returns `(2, 10)` **What happens**: 1. Optimizer transforms to: `a > (SELECT MAX(c) FROM t2)` 2. `MAX(c)` uses string ordering: `MAX('2', '10') = '2'` (since `'2' > '10'` lexicographically) 3. Comparison becomes: `a > '2'` -> after type conversion -> `a > 2` 4. Row with `a = 10` matches incorrectly Suggested fix: ## Proposed Fix ### Solution Add a type compatibility check before applying the MIN/MAX optimization. Skip the optimization when the left expression and subquery expression have incompatible types (one is string, the other is numeric). ### Code Change In `sql/item_subselect.cc`, function `single_value_transformer()`, add the following check before the existing optimization conditions: ```cpp /* Check type compatibility for MIN/MAX optimization. The optimization is only valid when both sides use the same ordering semantics. If one side is a string type and the other is numeric, string ordering (lexicographic) differs from numeric ordering after implicit type conversion. */ bool minmax_type_compatible = true; Item *right_expr = select->base_ref_items[0]; if (right_expr != nullptr) { Item_result left_type = left_expr->result_type(); Item_result right_type = right_expr->result_type(); // Compatible if: same type, or both are numeric types minmax_type_compatible = (left_type == right_type) || (left_type != STRING_RESULT && right_type != STRING_RESULT); } ``` Then add `minmax_type_compatible` as condition #4 in the optimization check. ### Why This Fix Works The fix ensures the MIN/MAX optimization is only applied when both expressions use the same ordering semantics: | Left Type | Right Type | Same Ordering? | Optimization | |-----------|------------|----------------|--------------| | STRING | STRING | Yes (both lexicographic) | Applied | | INT | INT | Yes (both numeric) | Applied | | INT | STRING | No | **Skipped** | | STRING | INT | No | **Skipped** | When skipped, the query falls back to the IN→EXISTS transformation, which correctly evaluates each comparison with proper type conversion. ### Impact - **Correctness**: Fixes wrong results for type-mismatched ALL/ANY subqueries - **Performance**: Minimal impact. Only type-mismatched cases skip the optimization; same-type cases retain the performance benefit - **Compatibility**: No behavioral change for correctly-typed queries ## Patch See attached file: `bug_fix_minmax_type_mismatch.patch`