Description:
The function `mts_event_coord_cmp()` in `sql/rpl_slave.cc` uses `strcmp()` to compare binlog filenames when ordering MTS (Multi-Threaded Slave) recovery jobs. This is a lexicographic comparison, which produces incorrect results when the binlog sequence number crosses a digit-width boundary (e.g., from 999999 to 1000000).
For example:
```
strcmp("binlog.1073660", "binlog.918815") < 0
```
Because `'1' < '9'` in ASCII, `strcmp()` considers `binlog.1073660` to be **less than** `binlog.918815`, even though the actual sequence number 1073660 is greater than 918815.
## Code Location
```cpp
// sql/rpl_slave.cc, function mts_event_coord_cmp()
int mts_event_coord_cmp(LOG_POS_COORD *id1, LOG_POS_COORD *id2) {
longlong filecmp = strcmp(id1->file_name, id2->file_name);
longlong poscmp = id1->pos - id2->pos;
return (filecmp < 0
? -1
: (filecmp > 0 ? 1 : (poscmp < 0 ? -1 : (poscmp > 0 ? 1 : 0))));
}
```
## Impact
This function is called in `mts_recovery_groups()` at two places:
1. **Worker filtering (line ~6432):** Compares each worker's last executed position against the coordinator's checkpoint to determine which workers have un-checkpointed transactions. An incorrect comparison may cause a worker to be wrongly excluded from recovery, **silently losing transactions that need to be replayed after crash recovery**.
2. **Event matching (line ~6547):** Scans the relay log to find the exact event matching a worker's last position. An incorrect comparison may fail to locate the correct recovery starting point.
**Note:** This bug is **not triggered** when `gtid_mode=ON` and `master_auto_position=1`, because `mts_recovery_groups()` returns early in that case, relying on GTID auto-skip for crash recovery instead. The bug only affects installations using **file-position-based replication with MTS enabled**.
How to repeat:
1. Set up a replica with `slave_parallel_workers > 0`, `gtid_mode=OFF` (or `master_auto_position=0`).
2. Run the master long enough for the binlog index to exceed 999999 (e.g., `binlog.1000000`). This can be accelerated by setting `max_binlog_size` to a small value and issuing `FLUSH BINARY LOGS` repeatedly.
3. Ensure some workers have their last executed position in `binlog.9xxxxx` while the coordinator checkpoint is in `binlog.10xxxxx` (or vice versa).
4. Kill the replica process (simulate crash).
5. Restart the replica — `mts_recovery_groups()` will compare these filenames using `strcmp()` and may incorrectly determine which workers need recovery.
Suggested fix:
When the filename prefix is the same, a longer numeric suffix always represents a larger sequence number. So we can compare lengths first (longer = larger), and only fall back to `strcmp()` when lengths are equal (which guarantees same digit count, making lexicographic order correct):
```cpp
int mts_event_coord_cmp(LOG_POS_COORD *id1, LOG_POS_COORD *id2) {
int filecmp;
size_t len1 = strlen(id1->file_name);
size_t len2 = strlen(id2->file_name);
if (len1 != len2)
filecmp = (len1 < len2) ? -1 : 1;
else
filecmp = strcmp(id1->file_name, id2->file_name);
if (filecmp != 0) return filecmp;
longlong poscmp = id1->pos - id2->pos;
return (poscmp < 0 ? -1 : (poscmp > 0 ? 1 : 0));
}
```
This works because MySQL binlog filenames share the same prefix (e.g., `binlog.`), so the only variable part is the numeric suffix. A filename with more characters necessarily has a larger sequence number.
**Alternative approach:** Extract the numeric suffix and compare as integers, which is more explicit but slightly more complex:
```cpp
int mts_event_coord_cmp(LOG_POS_COORD *id1, LOG_POS_COORD *id2) {
const char *dot1 = strrchr(id1->file_name, '.');
const char *dot2 = strrchr(id2->file_name, '.');
int filecmp;
if (dot1 && dot2) {
// Compare prefix part
size_t prefix_len1 = dot1 - id1->file_name;
size_t prefix_len2 = dot2 - id2->file_name;
if (prefix_len1 != prefix_len2)
filecmp = strcmp(id1->file_name, id2->file_name);
else {
filecmp = strncmp(id1->file_name, id2->file_name, prefix_len1);
if (filecmp == 0) {
// Same prefix, compare numeric suffix as integers
long long seq1 = strtoll(dot1 + 1, nullptr, 10);
long long seq2 = strtoll(dot2 + 1, nullptr, 10);
filecmp = (seq1 < seq2) ? -1 : (seq1 > seq2 ? 1 : 0);
}
}
} else {
filecmp = strcmp(id1->file_name, id2->file_name);
}
if (filecmp != 0) return filecmp;
longlong poscmp = id1->pos - id2->pos;
return (poscmp < 0 ? -1 : (poscmp > 0 ? 1 : 0));
}
```
I recommend the first (length-based) approach for its simplicity and minimal code change.
Description: The function `mts_event_coord_cmp()` in `sql/rpl_slave.cc` uses `strcmp()` to compare binlog filenames when ordering MTS (Multi-Threaded Slave) recovery jobs. This is a lexicographic comparison, which produces incorrect results when the binlog sequence number crosses a digit-width boundary (e.g., from 999999 to 1000000). For example: ``` strcmp("binlog.1073660", "binlog.918815") < 0 ``` Because `'1' < '9'` in ASCII, `strcmp()` considers `binlog.1073660` to be **less than** `binlog.918815`, even though the actual sequence number 1073660 is greater than 918815. ## Code Location ```cpp // sql/rpl_slave.cc, function mts_event_coord_cmp() int mts_event_coord_cmp(LOG_POS_COORD *id1, LOG_POS_COORD *id2) { longlong filecmp = strcmp(id1->file_name, id2->file_name); longlong poscmp = id1->pos - id2->pos; return (filecmp < 0 ? -1 : (filecmp > 0 ? 1 : (poscmp < 0 ? -1 : (poscmp > 0 ? 1 : 0)))); } ``` ## Impact This function is called in `mts_recovery_groups()` at two places: 1. **Worker filtering (line ~6432):** Compares each worker's last executed position against the coordinator's checkpoint to determine which workers have un-checkpointed transactions. An incorrect comparison may cause a worker to be wrongly excluded from recovery, **silently losing transactions that need to be replayed after crash recovery**. 2. **Event matching (line ~6547):** Scans the relay log to find the exact event matching a worker's last position. An incorrect comparison may fail to locate the correct recovery starting point. **Note:** This bug is **not triggered** when `gtid_mode=ON` and `master_auto_position=1`, because `mts_recovery_groups()` returns early in that case, relying on GTID auto-skip for crash recovery instead. The bug only affects installations using **file-position-based replication with MTS enabled**. How to repeat: 1. Set up a replica with `slave_parallel_workers > 0`, `gtid_mode=OFF` (or `master_auto_position=0`). 2. Run the master long enough for the binlog index to exceed 999999 (e.g., `binlog.1000000`). This can be accelerated by setting `max_binlog_size` to a small value and issuing `FLUSH BINARY LOGS` repeatedly. 3. Ensure some workers have their last executed position in `binlog.9xxxxx` while the coordinator checkpoint is in `binlog.10xxxxx` (or vice versa). 4. Kill the replica process (simulate crash). 5. Restart the replica — `mts_recovery_groups()` will compare these filenames using `strcmp()` and may incorrectly determine which workers need recovery. Suggested fix: When the filename prefix is the same, a longer numeric suffix always represents a larger sequence number. So we can compare lengths first (longer = larger), and only fall back to `strcmp()` when lengths are equal (which guarantees same digit count, making lexicographic order correct): ```cpp int mts_event_coord_cmp(LOG_POS_COORD *id1, LOG_POS_COORD *id2) { int filecmp; size_t len1 = strlen(id1->file_name); size_t len2 = strlen(id2->file_name); if (len1 != len2) filecmp = (len1 < len2) ? -1 : 1; else filecmp = strcmp(id1->file_name, id2->file_name); if (filecmp != 0) return filecmp; longlong poscmp = id1->pos - id2->pos; return (poscmp < 0 ? -1 : (poscmp > 0 ? 1 : 0)); } ``` This works because MySQL binlog filenames share the same prefix (e.g., `binlog.`), so the only variable part is the numeric suffix. A filename with more characters necessarily has a larger sequence number. **Alternative approach:** Extract the numeric suffix and compare as integers, which is more explicit but slightly more complex: ```cpp int mts_event_coord_cmp(LOG_POS_COORD *id1, LOG_POS_COORD *id2) { const char *dot1 = strrchr(id1->file_name, '.'); const char *dot2 = strrchr(id2->file_name, '.'); int filecmp; if (dot1 && dot2) { // Compare prefix part size_t prefix_len1 = dot1 - id1->file_name; size_t prefix_len2 = dot2 - id2->file_name; if (prefix_len1 != prefix_len2) filecmp = strcmp(id1->file_name, id2->file_name); else { filecmp = strncmp(id1->file_name, id2->file_name, prefix_len1); if (filecmp == 0) { // Same prefix, compare numeric suffix as integers long long seq1 = strtoll(dot1 + 1, nullptr, 10); long long seq2 = strtoll(dot2 + 1, nullptr, 10); filecmp = (seq1 < seq2) ? -1 : (seq1 > seq2 ? 1 : 0); } } } else { filecmp = strcmp(id1->file_name, id2->file_name); } if (filecmp != 0) return filecmp; longlong poscmp = id1->pos - id2->pos; return (poscmp < 0 ? -1 : (poscmp > 0 ? 1 : 0)); } ``` I recommend the first (length-based) approach for its simplicity and minimal code change.