MySQL Bugs: #58129: Valgrind error when sanity check fails in read_log

Bug #58129	Valgrind error when sanity check fails in read_log_event
Submitted:	11 Nov 2010 1:31	Modified:	6 Jan 2011 4:53
Reporter:	Luis Soares	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Row Based Replication ( RBR )	Severity:	S3 (Non-critical)
Version:	next-mr, mysql-trunk	OS:	Any
Assigned to:	Luis Soares	CPU Architecture:	Any

Description:
If one forces the sanity check in read_log_event to fail (while
SQL thread is reading events from the relay log), we can get this
valgrind error when restarting the slave (after fixing the relay
log file) and start processing the remaining events:

==2603== Use of uninitialised value of size 8
==2603==    at 0xA8BBB0: _db_dump_ (dbug.c:1398)
==2603==    by 0x706C51: Field_blob::unpack(unsigned char*, unsigned char const*, unsigned int, bool) (field.cc:7820)
==2603==    by 0x890DFD: unpack_row(Relay_log_info const*, TABLE*, unsigned int, unsigned char const*, st_bitmap const*, unsigned char const**, unsigned long*, bool, bool) (rpl_record.cc:349)
==2603==    by 0x878928: Rows_log_event::unpack_current_row(Relay_log_info const*, st_bitmap const*, bool) (log_event.h:3748)
==2603==    by 0x875E20: Update_rows_log_event::do_exec_row(Relay_log_info const*) (log_event.cc:9957)
==2603==    by 0x86FE54: Rows_log_event::do_apply_event(Relay_log_info const*) (log_event.cc:7799)
==2603==    by 0x88EA31: Log_event::apply_event(Relay_log_info const*) (log_event.h:1153)
==2603==    by 0x8A177B: apply_event_and_update_pos(Log_event*, THD*, Relay_log_info*) (rpl_slave.cc:2572)
==2603==    by 0x8A1DA3: exec_relay_log_event(THD*, Relay_log_info*) (rpl_slave.cc:2756)
==2603==    by 0x8A44DA: handle_slave_sql (rpl_slave.cc:3593)
==2603==    by 0xA3D64D: pfs_spawn_thread (pfs.cc:1078)
==2603==    by 0x4E35970: start_thread (pthread_create.c:304)
==2603==    by 0x414370F: ???

Found while testing checksums for WL#2540. 

In WL#2540 tests we force the slave thread to corrupt an event
inside read_log_event, thence the CRC verification would
fail. Then we can check whether the slave stopped gracefully and
reported the correct error (number and message).

This was done using @@global.debug to inject the faults. The
tests end by restoring @@global.debug to original, harmless,
value and by restarting the slave. This is similar to stopping
the slave, fix the relay log files and restarting the slave.

How to repeat:
1. clone mysql-next-mr (revid: alexander.nozdrin@oracle.com-20101109084653-og7ehwexzhxc4g80)

2. patch log_event.cc with:

=== modified file 'sql/log_event.cc'
--- sql/log_event.cc	2010-10-17 23:27:40 +0000
+++ sql/log_event.cc	2010-11-11 01:03:24 +0000
@@ -1203,7 +1203,8 @@
   /* Check the integrity */
   if (event_len < EVENT_LEN_OFFSET ||
       buf[EVENT_TYPE_OFFSET] >= ENUM_END_EVENT ||
-      (uint) event_len != uint4korr(buf+EVENT_LEN_OFFSET))
+      (uint) event_len != uint4korr(buf+EVENT_LEN_OFFSET) ||
+      DBUG_EVALUATE_IF("read_log_event_fail_sanity", 1, 0))
   {
     *error="Sanity check failed";		// Needed to free buffer
     DBUG_RETURN(NULL); // general sanity check - will fail on a partial read

3. build with: ./BUILD/compile-pentium64-valgrind-max

4. run the following test case (lets call it rpl_sanity):

--source include/have_debug.inc
--source include/master-slave.inc
--source include/have_binlog_format_row.inc

call mtr.add_suppression('\'Sanity check failed\', data_len: 105, event_type: 15');

CREATE TABLE t1 (a INT NOT NULL PRIMARY KEY, b VARCHAR(10), c TEXT);
INSERT INTO t1 VALUES (1, 'a', NULL);
sync_slave_with_master;
source include/stop_slave.inc;
connection master;
UPDATE t1 SET c = REPEAT('a', 20);
DELETE FROM t1 WHERE a= 1;

-- connection slave
SET GLOBAL debug="+d,read_log_event_fail_sanity";
START SLAVE;
let $slave_sql_errno= 1593;
--source include/wait_for_slave_sql_error.inc
SET GLOBAL debug="-d,read_log_event_fail_sanity";
--source include/stop_slave.inc

--connection slave
--source include/start_slave.inc
--connection master
sync_slave_with_master;
let $diff_table_1= master:test.t1;
let $diff_table_2= slave:test.t1;
--source include/diff_tables.inc

--connection master
SET GLOBAL debug= "";
DROP TABLE t1;
--sync_slave_with_master
SET GLOBAL debug= "";

-- exit

  mysql-test$ perl mtr --valgrind --valgrind-options="--gen-suppressions=all --show-reachable=yes" rpl.rpl_sanity.test

5. Observe valgrind reporting errors:

   "Use of uninitialised value of size 8"
    ...

Env details
===========

  - gcc: gcc (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5
  - valgrind: valgrind-3.6.0.SVN-Debian
  - os: Linux diso 2.6.35-22-generic #35-Ubuntu SMP Sat Oct 16 20:45:36 UTC 2010 x86_64 GNU/Linux

Suggested fix:
.

Thank you for the report.

Verified as described.

Pushed into mysql-trunk 5.6.1-m5 (revid:alexander.nozdrin@oracle.com-20101129111021-874if2qsp0i8d5ze) (version source revid:alexander.nozdrin@oracle.com-20101129111021-874if2qsp0i8d5ze) (merge vers: 5.6.1-m5) (pib:23)

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/126806

3422 Luis Soares	2010-12-14
      BUG#58129: Valgrind error when sanity check fails in read_log_event
        
      The DBUG_DUMP instruction was being called with wrong pointer and
      wrong length.
        
      Fixed it by deploying two calls to DBUG_DUMP, one for the blob 
      field itself and another to print the blob value.

Queued in mysql-trunk-bugfixing.

Pushed into mysql-trunk 5.6.1 (revid:georgi.kodinov@oracle.com-20101217125013-y8pb3az32rtbplc9) (version source revid:mats.kindahl@oracle.com-20101215103340-a0wp0yq8t4byel1o) (merge vers: 5.6.1) (pib:24)

Does not appear in any release; no changelog entry necessary. Closed.