Bug #47211 backup_events.test fails randomly
Submitted: 9 Sep 11:09 Modified: 10 Nov 22:22
Reporter: Ingo Strüwing
Status: Won't fix
Category:Server: Backup Severity:S3 (Non-critical)
Version:5.4.4 OS:Any (multiple)
Assigned to: Chuck Bell Target Version:
Tags: pushbuild failure, pushbuild, test failure, sporadic, experimental
Triage: Triaged: D1 (Critical)

[9 Sep 11:09] Ingo Strüwing
Description:
backup.backup_events w5 [ fail ]
        Test ended at 2009-09-08 12:27:11

CURRENT_TEST: backup.backup_events
mysqltest: At line 269: query 'RESTORE FROM 'events.bak' OVERWRITE' failed: 2013: Lost
connection to MySQL server during query

or:

backup.backup_events w5 [ fail ]
        Test ended at 2009-09-09 02:13:27

CURRENT_TEST: backup.backup_events
mysqltest: At line 324: query 'RESTORE FROM 'events1.bak'' failed: 1698: Could not
restore table `events`.`t2`

Warnings from just before the error:
Error 1050 Table 't2' already exists 
Error 1698 Could not restore table `events`.`t2`

How to repeat:
http://pb2.norway.sun.com/web.py?template=mysql_show_test_failure&test_failure_id=2311254&...
[15 Sep 20:37] Omer BarNir
to clarify: the title is misleading - this is not a test issue
[10 Nov 22:22] Chuck Bell
The backup_events test is being removed as it is no longer relevant because the test has
been deemed unreliable and does not add value to the quality checking of events and
backup. 

The test has been replaced with the work from BUG#37445.

The following is from an email explaining this decision.

-------8<---------

BACKGROUND
----------
I have run the backup_events test over 200 iterations on three different platforms. I am
unable to reproduce the problem described in BUG#47211. However, the test has failed a
number of times in pushbuild. It is my theory that we are seeing yet another
predictability issue WRT the event scheduler thread execution. I think it is entirely
possible that these failures come only when the system is under high contention.

It is known and accepted (by most) that testing event firing in MTR is unpredictable. We
have also concluded that test cases that rely on the event scheduler to fire events are
also unpredictable. We concluded that any test cases that rely on a predictable firing of
events should be removed. Furthermore, it is also known that the wait_condition loop
employed in these test cases can be unreliable on some platforms.

The test backup_events has many test cases that rely on the scheduler and wait_condition
loop working predictably. If I were to disable them, I would disable the entire test.

SUMMARY
-------
I think the BUG#37445 concept of disabling events on restore by default is a sound one
and will benefit the users' experience. I have written a new proposed solution in the bug
report that shows it is possible to do this with a minimal of effort and without modifying
the stream format or creating new event states.

Regardless of whether we decide to implement BUG#37445, I think it imperative that we
redesign the event test cases in backup_events to test only that restore can correctly
recreate an event. We should *not* test the firing of events given the unreliable
behavior stated above.

Lastly, I have argued strongly that testing of event firing before or after restore is
well beyond the scope of reasonable testing efforts for the backup team. I understand the
original premise that these tests can verify backup (or restore) do not 'damage' the
firing of events, but that is absurd given what backup and restore does with events.
There is no connection to the event scheduler -- the events are simply stored as a CREATE
statement on backup and later executed on RESTORE. It is enough to establish the verity
that an event is the same after restore as it was before backup. Nothing more need be
done. If you accept this argument, the backup_events test should be dropped. Note: If
BUG#37445 is implemented, that work can reuse the backup_events test replacing the
existing test cases.