Bug #119706 Incorrect Results from ALL/ANY Subquery MIN/MAX Optimization with Type Mismatch
Submitted: 16 Jan 10:46
Reporter: Zike Wang Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.44, 9.5.0 OS:Any
Assigned to: CPU Architecture:Any
Tags: Optimizer;Subquery;ImplicitCast

[16 Jan 10:46] Zike Wang
Description:
## Summary

The MIN/MAX optimization for ALL/ANY subqueries returns incorrect results when comparing expressions of different data types (e.g., INT vs VARCHAR).

## Problem Description

### Background: The MIN/MAX Optimization

MySQL optimizes ALL/ANY subqueries by transforming them into simpler MIN/MAX aggregate queries. This transformation is implemented in `Item_in_subselect::single_value_transformer()` (`sql/item_subselect.cc`).

For example:
- `x > ALL(SELECT y FROM t)` → `x > (SELECT MAX(y) FROM t)`
- `x < ALL(SELECT y FROM t)` → `x < (SELECT MIN(y) FROM t)`
- `x > ANY(SELECT y FROM t)` → `x > (SELECT MIN(y) FROM t)`
- `x < ANY(SELECT y FROM t)` → `x < (SELECT MAX(y) FROM t)`

This optimization is valid when the ordering used by MIN/MAX aggregation is consistent with the ordering used in the comparison.

### The Bug

The optimization does not account for implicit type conversion. When the left expression and subquery expression have different types, two different ordering semantics come into play:

1. **MIN/MAX aggregation**: Uses the column's native type ordering
2. **Comparison operation**: Uses ordering after implicit type conversion

For numeric vs string comparisons, these orderings can differ significantly:
- String (lexicographic) ordering: `'2' > '10'` (because `'2'` > `'1'` character-by-character)
- Numeric ordering: `2 < 10`

How to repeat:
## Reproduce
```sql
CREATE TABLE t1 (id INT, a INT);
CREATE TABLE t2 (c VARCHAR(20));

INSERT INTO t1 VALUES (1, 2), (2, 10);
INSERT INTO t2 VALUES ('2'), ('10');

SELECT * FROM t1 WHERE a > ALL(SELECT c FROM t2);
```

**Expected**: Empty set (because no `a` is greater than both 2 and 10)

**Actual**: Returns `(2, 10)`

**What happens**:
1. Optimizer transforms to: `a > (SELECT MAX(c) FROM t2)`
2. `MAX(c)` uses string ordering: `MAX('2', '10') = '2'` (since `'2' > '10'` lexicographically)
3. Comparison becomes: `a > '2'` -> after type conversion -> `a > 2`
4. Row with `a = 10` matches incorrectly

Suggested fix:
## Proposed Fix

### Solution

Add a type compatibility check before applying the MIN/MAX optimization. Skip the optimization when the left expression and subquery expression have incompatible types (one is string, the other is numeric).

### Code Change

In `sql/item_subselect.cc`, function `single_value_transformer()`, add the following check before the existing optimization conditions:

```cpp
/*
  Check type compatibility for MIN/MAX optimization. The optimization is
  only valid when both sides use the same ordering semantics. If one side
  is a string type and the other is numeric, string ordering (lexicographic)
  differs from numeric ordering after implicit type conversion.
*/
bool minmax_type_compatible = true;
Item *right_expr = select->base_ref_items[0];
if (right_expr != nullptr) {
  Item_result left_type = left_expr->result_type();
  Item_result right_type = right_expr->result_type();
  // Compatible if: same type, or both are numeric types
  minmax_type_compatible = (left_type == right_type) ||
      (left_type != STRING_RESULT && right_type != STRING_RESULT);
}
```

Then add `minmax_type_compatible` as condition #4 in the optimization check.

### Why This Fix Works

The fix ensures the MIN/MAX optimization is only applied when both expressions use the same ordering semantics:

| Left Type | Right Type | Same Ordering? | Optimization |
|-----------|------------|----------------|--------------|
| STRING | STRING | Yes (both lexicographic) | Applied |
| INT | INT | Yes (both numeric) | Applied |
| INT | STRING | No | **Skipped** |
| STRING | INT | No | **Skipped** |

When skipped, the query falls back to the IN→EXISTS transformation, which correctly evaluates each comparison with proper type conversion.

### Impact

- **Correctness**: Fixes wrong results for type-mismatched ALL/ANY subqueries
- **Performance**: Minimal impact. Only type-mismatched cases skip the optimization; same-type cases retain the performance benefit
- **Compatibility**: No behavioral change for correctly-typed queries

## Patch

See attached file: `bug_fix_minmax_type_mismatch.patch`