Bug #120278 mts_event_coord_cmp() uses strcmp() for binlog filename comparison, causing incorrect ordering when binlog index exceeds
Submitted: 17 Apr 2:44 Modified: 28 Apr 2:52
Reporter: Demon Chen Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Replication Severity:S3 (Non-critical)
Version:8.0.44 8.4.9 OS:Any
Assigned to: CPU Architecture:Any

[17 Apr 2:44] Demon Chen
Description:
The function `mts_event_coord_cmp()` in `sql/rpl_slave.cc` uses `strcmp()` to compare binlog filenames when ordering MTS (Multi-Threaded Slave) recovery jobs. This is a lexicographic comparison, which produces incorrect results when the binlog sequence number crosses a digit-width boundary (e.g., from 999999 to 1000000).

For example:
```
strcmp("binlog.1073660", "binlog.918815") < 0
```
Because `'1' < '9'` in ASCII, `strcmp()` considers `binlog.1073660` to be **less than** `binlog.918815`, even though the actual sequence number 1073660 is greater than 918815.

## Code Location

```cpp
// sql/rpl_slave.cc, function mts_event_coord_cmp()
int mts_event_coord_cmp(LOG_POS_COORD *id1, LOG_POS_COORD *id2) {
  longlong filecmp = strcmp(id1->file_name, id2->file_name);
  longlong poscmp = id1->pos - id2->pos;
  return (filecmp < 0
              ? -1
              : (filecmp > 0 ? 1 : (poscmp < 0 ? -1 : (poscmp > 0 ? 1 : 0))));
}
```

## Impact

This function is called in `mts_recovery_groups()` at two places:

1. **Worker filtering (line ~6432):** Compares each worker's last executed position against the coordinator's checkpoint to determine which workers have un-checkpointed transactions. An incorrect comparison may cause a worker to be wrongly excluded from recovery, **silently losing transactions that need to be replayed after crash recovery**.

2. **Event matching (line ~6547):** Scans the relay log to find the exact event matching a worker's last position. An incorrect comparison may fail to locate the correct recovery starting point.

**Note:** This bug is **not triggered** when `gtid_mode=ON` and `master_auto_position=1`, because `mts_recovery_groups()` returns early in that case, relying on GTID auto-skip for crash recovery instead. The bug only affects installations using **file-position-based replication with MTS enabled**.

How to repeat:
1. Set up a replica with `slave_parallel_workers > 0`, `gtid_mode=OFF` (or `master_auto_position=0`).
2. Run the master long enough for the binlog index to exceed 999999 (e.g., `binlog.1000000`). This can be accelerated by setting `max_binlog_size` to a small value and issuing `FLUSH BINARY LOGS` repeatedly.
3. Ensure some workers have their last executed position in `binlog.9xxxxx` while the coordinator checkpoint is in `binlog.10xxxxx` (or vice versa).
4. Kill the replica process (simulate crash).
5. Restart the replica — `mts_recovery_groups()` will compare these filenames using `strcmp()` and may incorrectly determine which workers need recovery.

Suggested fix:
When the filename prefix is the same, a longer numeric suffix always represents a larger sequence number. So we can compare lengths first (longer = larger), and only fall back to `strcmp()` when lengths are equal (which guarantees same digit count, making lexicographic order correct):

```cpp
int mts_event_coord_cmp(LOG_POS_COORD *id1, LOG_POS_COORD *id2) {
  int filecmp;
  size_t len1 = strlen(id1->file_name);
  size_t len2 = strlen(id2->file_name);
  if (len1 != len2)
    filecmp = (len1 < len2) ? -1 : 1;
  else
    filecmp = strcmp(id1->file_name, id2->file_name);

  if (filecmp != 0) return filecmp;

  longlong poscmp = id1->pos - id2->pos;
  return (poscmp < 0 ? -1 : (poscmp > 0 ? 1 : 0));
}
```

This works because MySQL binlog filenames share the same prefix (e.g., `binlog.`), so the only variable part is the numeric suffix. A filename with more characters necessarily has a larger sequence number.

**Alternative approach:** Extract the numeric suffix and compare as integers, which is more explicit but slightly more complex:

```cpp
int mts_event_coord_cmp(LOG_POS_COORD *id1, LOG_POS_COORD *id2) {
  const char *dot1 = strrchr(id1->file_name, '.');
  const char *dot2 = strrchr(id2->file_name, '.');

  int filecmp;
  if (dot1 && dot2) {
    // Compare prefix part
    size_t prefix_len1 = dot1 - id1->file_name;
    size_t prefix_len2 = dot2 - id2->file_name;
    if (prefix_len1 != prefix_len2)
      filecmp = strcmp(id1->file_name, id2->file_name);
    else {
      filecmp = strncmp(id1->file_name, id2->file_name, prefix_len1);
      if (filecmp == 0) {
        // Same prefix, compare numeric suffix as integers
        long long seq1 = strtoll(dot1 + 1, nullptr, 10);
        long long seq2 = strtoll(dot2 + 1, nullptr, 10);
        filecmp = (seq1 < seq2) ? -1 : (seq1 > seq2 ? 1 : 0);
      }
    }
  } else {
    filecmp = strcmp(id1->file_name, id2->file_name);
  }

  if (filecmp != 0) return filecmp;

  longlong poscmp = id1->pos - id2->pos;
  return (poscmp < 0 ? -1 : (poscmp > 0 ? 1 : 0));
}
```

I recommend the first (length-based) approach for its simplicity and minimal code change.
[28 Apr 2:52] Demon Chen
~