Bug #60173 Partially written insert statement in binlog -- no errors reported
Submitted: 18 Feb 2011 20:54 Modified: 30 May 2013 19:33
Reporter: Mark Callaghan Email Updates:
Status: Closed Impact on me:
Category:MySQL Server: Replication Severity:S2 (Serious)
Version:5.1.47, 5.1.64, 5.5.27 OS:Any
Assigned to: CPU Architecture:Any
Tags: corruption, Partial, replication
Triage: Needs Triage: D2 (Serious)

[18 Feb 2011 20:54] Mark Callaghan
I have a binlog with: BEGIN, INSERT, COMMIT (XID). The insert statement was partially written. No errors were reported and the XID event was written so I don't think mysqld noticed an error. I don't know yet whether this was an auto-commit transaction.

I started to read the code and noticed that in MYSQL_BIN_LOG::write_cache code assumes that 0 returned from my_b_fill always means end-of-cache:

  } while ((length= my_b_fill(cache)));
  DBUG_ASSERT(carry == 0);
  return 0;                                     // All OK

But that is incorrect:

  Fill buffer of the cache.

    This assumes that you have already used all characters in the CACHE,
    independent of the read_pos value!

  0  On error or EOF (info->error = -1 on error)
  #  Number of characters

size_t my_b_fill(IO_CACHE *info)

And other callers of my_b_fill don't check for error:
my_b_copy_to_file, maybe my_b_gets, 

How to repeat:
read the code

Suggested fix:
check for errors
[18 Feb 2011 20:58] Mark Callaghan
I still don't know whether the transaction cache was used, but if it was the binlog_cache_size is 262144 and the corrupt event is ~4 MB so I assume the cache spilled to disk.
[18 Feb 2011 21:14] Mark Callaghan
From the corrupt event, the size from start to end_log_pos is 4042650

# at 849548868
#110217  2:58:24 server id 10211982  end_log_pos 853591518 
# Position  Timestamp   Type   Master ID        Size      Master Pos    Flags 
# 32a31644 50 ff 5c 4d   02   8e d2 9b 00   9a af 3d 00   de c5 e0 32   10 00
[18 Feb 2011 21:48] Mark Callaghan
My other theory is that the host ran out out of memory, malloc failed and mysqld ignored/missed the failure. I had one obscure bug in replication internals from that a few years back. But I have no evidence this host was near OOM.
[25 Feb 2011 22:38] Sveta Smirnova
Thank you for the report.

Could you please send us binary log with corrupted event?
[26 Mar 2011 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[11 Jul 2012 17:59] Mark Callaghan
I cannot send you the binlog with the corruption. But you can have an engineer read the code to confirm that errors from my_b_fill are ignored, even in 5.1.63 and 5.6 m8
[13 Jul 2012 19:38] Sveta Smirnova
Thank you for the feedback.

Verified by examining source code. I could not repeat corruption of binary log.
[13 Jul 2012 19:39] Sveta Smirnova
In trunk my_b_fill does not exists.
[30 May 2013 19:33] Daniel So
Route to DOCUMENTING as per status on Oracle Bug system.
[30 May 2013 19:33] Daniel So
Added changelog entry in reference manuals of versions 5.1.70,5.5.32,
5.6.12, 5.7.2:

'The binary log contents got corrupted sometimes, because the
"MYSQL_BIN_LOG::write_cache" function always thought it had reached the
"end-of-cache" when the function "my_b_fill()" reported a "0," while
that could also mean an error had occurred. This fix makes sure that
whenever my_b_fill() returns a "0," an error check is performed on