Bug #71400 | Binary log repeatedly corrupt on master - replication fails with error 1236 | ||
---|---|---|---|
Submitted: | 16 Jan 2014 15:10 | Modified: | 31 Jan 2014 14:56 |
Reporter: | Martin Kirchner | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.6.15 | OS: | Linux (Debian 7.1) |
Assigned to: | CPU Architecture: | Any |
[16 Jan 2014 15:10]
Martin Kirchner
[16 Jan 2014 17:39]
Sveta Smirnova
Thank you for the report. Have you checked disk space as was suggested in the error message? Have you checked disk space, dedicated for MySQL tmpdir? If you use huge transactions and queries which process a lot of data you can fill up temporary space. Please double-check and update the report.
[16 Jan 2014 19:11]
Martin Kirchner
Thanks for your answer. On the virtual machines (pia-db1/2) the disk levels (df -h) are something like this: Filesystem Size Used Free Use% Mount point rootfs 40G 4,4G 33G 12% / udev 10M 0 10M 0% /dev tmpfs 3,2G 240K 3,2G 1% /run /dev/disk/by-uuid/2133fe96-a39b-4dd6-b7e4-364af99f5eba 40G 4,4G 33G 12% / tmpfs 5,0M 0 5,0M 0% /run/lock tmpfs 6,3G 23M 6,3G 1% /run/shm /dev/sdb1 443G 147G 275G 35% /piasql //10.1.0.70/backup 1,8T 782G 1,1T 43% /mnt/backup On pia-db3 (physical machine and active master): Filesystem Size Used Free Use% Mount point rootfs 130G 2,4G 121G 2% / udev 10M 0 10M 0% /dev tmpfs 3,2G 292K 3,2G 1% /run /dev/disk/by-uuid/1a829d2f-921d-4d46-8dfc-450e4d80d1c1 130G 2,4G 121G 2% / tmpfs 5,0M 0 5,0M 0% /run/lock tmpfs 11G 23M 11G 1% /run/shm /dev/cciss/c0d1p1 388G 292G 77G 80% /piasql //10.1.0.70/backup 1,8T 782G 1,1T 43% /mnt/backup /piasql is the MySQL data partition. / and /piasql are under constant monitoring via Nagios/check_mk and did not raise any alarm, the diagrams do not show any peeks that differ significantly from the values listed above. MySQL uses /tmp as temp folder which seems to be a subfolder of /. kirchner@localhost [(none)]>show global variables like '%tmp%'; +----------------------------+------------+ | Variable_name | Value | +----------------------------+------------+ | default_tmp_storage_engine | InnoDB | | max_tmp_tables | 32 | | slave_load_tmpdir | /tmp | | tmp_table_size | 1073741824 | | tmpdir | /tmp | +----------------------------+------------+ 5 rows in set (0,00 sec) Thank you in advance for any help.
[18 Jan 2014 8:08]
MySQL Verification Team
Hi Martin, I think we'll have to recommend you to wait for 5.6.16. It should contain a fix for this: Bug 17842137 - ASSERT IN ROW BASED REPLICATION WHEN TRANSACTION CACHE SIZE IS EXACTLY 32768 Release (non-debug) build ends up with corrupted binlog, and debug build asserts.
[20 Jan 2014 10:23]
Martin Kirchner
Hi Shane, is there a roadmap when MySQL 5.6.16 will be available? Is there a workaround to the internal bug you mentioned? Can you give me some more information on that bug? Thanks a lot. Regards, Martin
[31 Jan 2014 14:56]
MySQL Verification Team
5.6.16 is out during next month.
[28 Dec 2016 19:52]
Arthur Burkart
Hi, I know this thread is long dead, but were there any steps for reproduction related to this issue? I'm trying to figure out if my release (10.1.20) of MariaDB is also impacted by this bug. Cheers
[28 Dec 2016 21:36]
MySQL Verification Team
On the topic of testcases, I did once make a psuedo-random one. you could try it in a test setup. How to repeat: -------------- Start server like this: ------------------------ --log-bin --log-slave-updates --binlog-format=row --binlog-checksum=CRC32 --enforce-gtid-consistency=true --gtid-mode=on --server-id=21 --binlog-rows-query-log-events=1 Setup testcase: ----------------- reset master; set global sync_binlog=0; set global lock_wait_timeout=1; set global innodb_lock_wait_timeout=1; set global innodb_flush_log_at_trx_commit=0; drop table if exists t1; create table t1(id tinyint unsigned primary key,a longblob)engine=innodb; delimiter $ drop procedure if exists p1 $ create procedure p1() begin declare continue handler for sqlexception begin end; repeat set @sql:=concat('replace into t1 set id=floor(rand()*255),a="',repeat(char(rand()*255),cast(rand()*100000 as unsigned)),'"'); #select @sql; prepare stmt from @sql; execute stmt; set global max_binlog_size=cast(1024*100 + rand()*1024*1024 as unsigned); delete from t1 where id=cast(floor(rand()*255) as unsigned); if rand()*1000 > 998 then flush logs; end if; until 1=2 end repeat; end $ delimiter ; call p1(); #Run this in ~3 threads. Launch a slave to read from the master. Here is a shortcut, just launch this command to read your binlogs: mysqlbinlog --read-from-remote-server -uroot --stop-never test-bin.000001 --to-last-log --base64-output=decode-rows -vvvv --verify-binlog-checksum > /dev/null & Wait some minutes.
[30 Dec 2016 3:37]
Arthur Burkart
@Shane, Thanks so much for the repro steps. After spending way more time than I'd care to admit setting up a pair of mysql servers to test with, I was able to successfully reproduce the issue in 5.6.15. I was also unable to repro the issue in MariaDB 10.1.20. I very much appreciate your help!