| Bug #102651 | Crash in CONTINUEB when REDO log problem | ||
|---|---|---|---|
| Submitted: | 18 Feb 2021 18:18 | Modified: | 17 Mar 2021 18:08 |
| Reporter: | Mikael Ronström | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
| Version: | 8.0.23 | OS: | Any |
| Assigned to: | CPU Architecture: | Any | |
[22 Feb 2021 16:33]
MySQL Verification Team
Hi Mikael, Thanks for the report and the fix. all best Bogdan
[17 Mar 2021 18:08]
Jon Stephens
Documented fix as follows in the NDB 8.0.25 changelog:
To ensure that the log records kept for the redo log in main
memory are written to redo log file within one second, a time
supervisor in DBLQH acquires a lock on the redo log part prior
to the write. A fix for a previous issue caused a continueB
signal (introduced as part of that fix) to be sent when the redo
log file was not yet opened and ready for the write, then to
return without releasing the lock. Now such cases we release the
acquired lock before waiting for the redo log file to be open
and ready for the write.
Closed.

Description: When getting a REDO log problem it is possible that we send a CONTINUEB with a case that doesn't exist and we also miss unlocking the log part. This is the problematic code with the changes: if ((logPartPtr.p->m_log_problems & LogPartRecord::P_FILE_CHANGE_PROBLEM)!= 0) { jam(); ADDED unlock_log_part(logPartPtr.p); g_eventLogger->info("LDM(%u): Gci record write is waiting for " "the redo log file to be changed: " "logpart: %u log part state: %u " "log part problem: %u " "file: %u ref %u logFileStatus %u" "fileChangeState %u " "current mbyte: %u " "logPagePtr.i %u ", instance(), logPartPtr.p->logPartNo, logPartPtr.p->logPartState, logPartPtr.p->m_log_problems, logFilePtr.p->fileNo, logFilePtr.p->fileRef, logFilePtr.p->logFileStatus, logFilePtr.p->fileChangeState, logFilePtr.p->currentMbyte, logPagePtr.i); /* Wait for current file to be ready for writes */ ADDED signal->theData[0] = ZTIME_SUPERVISION; ADDED signal->theData[1] = logPartPtr.i; sendSignalWithDelay(cownref, GSN_CONTINUEB, signal, 50, 2); return; } How to repeat: Run sysbench with a too small REDO log Suggested fix: See above