MySQL Bugs: #47211: backup_events.test fails randomly

Bug #47211	backup_events.test fails randomly
Submitted:	9 Sep 2009 9:09	Modified:	10 Nov 2009 21:22
Reporter:	Ingo Strüwing	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: Backup	Severity:	S3 (Non-critical)
Version:	5.4.4	OS:	Any (multiple)
Assigned to:	Chuck Bell	CPU Architecture:	Any
Tags:	experimental, pushbuild, pushbuild failure, sporadic, test failure

Description:
backup.backup_events w5 [ fail ]
        Test ended at 2009-09-08 12:27:11

CURRENT_TEST: backup.backup_events
mysqltest: At line 269: query 'RESTORE FROM 'events.bak' OVERWRITE' failed: 2013: Lost connection to MySQL server during query

or:

backup.backup_events w5 [ fail ]
        Test ended at 2009-09-09 02:13:27

CURRENT_TEST: backup.backup_events
mysqltest: At line 324: query 'RESTORE FROM 'events1.bak'' failed: 1698: Could not restore table `events`.`t2`

Warnings from just before the error:
Error 1050 Table 't2' already exists 
Error 1698 Could not restore table `events`.`t2`

How to repeat:
http://pb2.norway.sun.com/web.py?template=mysql_show_test_failure&test_failure_id=2311254&...

to clarify: the title is misleading - this is not a test issue

The backup_events test is being removed as it is no longer relevant because the test has been deemed unreliable and does not add value to the quality checking of events and backup.

The test has been replaced with the work from BUG#37445.

The following is from an email explaining this decision.

-------8<---------

BACKGROUND
----------
I have run the backup_events test over 200 iterations on three different platforms. I am unable to reproduce the problem described in BUG#47211. However, the test has failed a number of times in pushbuild. It is my theory that we are seeing yet another predictability issue WRT the event scheduler thread execution. I think it is entirely possible that these failures come only when the system is under high contention.

It is known and accepted (by most) that testing event firing in MTR is unpredictable. We have also concluded that test cases that rely on the event scheduler to fire events are also unpredictable. We concluded that any test cases that rely on a predictable firing of events should be removed. Furthermore, it is also known that the wait_condition loop employed in these test cases can be unreliable on some platforms.

The test backup_events has many test cases that rely on the scheduler and wait_condition loop working predictably. If I were to disable them, I would disable the entire test.

SUMMARY
-------
I think the BUG#37445 concept of disabling events on restore by default is a sound one and will benefit the users' experience. I have written a new proposed solution in the bug report that shows it is possible to do this with a minimal of effort and without modifying the stream format or creating new event states.

Regardless of whether we decide to implement BUG#37445, I think it imperative that we redesign the event test cases in backup_events to test only that restore can correctly recreate an event. We should *not* test the firing of events given the unreliable behavior stated above.

Lastly, I have argued strongly that testing of event firing before or after restore is well beyond the scope of reasonable testing efforts for the backup team. I understand the original premise that these tests can verify backup (or restore) do not 'damage' the firing of events, but that is absurd given what backup and restore does with events. There is no connection to the event scheduler -- the events are simply stored as a CREATE statement on backup and later executed on RESTORE. It is enough to establish the verity that an event is the same after restore as it was before backup. Nothing more need be done. If you accept this argument, the backup_events test should be dropped. Note: If BUG#37445 is implemented, that work can reuse the backup_events test replacing the existing test cases.