MySQL Bugs: #31383: Consistent Snapshot fails to initiate consistent read

Bug #31383	Consistent Snapshot fails to initiate consistent read
Submitted:	3 Oct 2007 19:21	Modified:	26 Feb 2008 0:39
Reporter:	Chuck Bell	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Backup	Severity:	S1 (Critical)
Version:	6.0	OS:	Any
Assigned to:	Chuck Bell	CPU Architecture:	Any

Description:
The consistent snapshot driver is no longer working as it should. Once the backup starts on an InnoDB table, any changes made to the table during the read of the table during backup are included in the backup archive.

How to repeat:
 1) Insert the backup_sync call in the be_default.cc file as shown:

===== be_default.cc 1.8 vs edited =====
--- 1.8/sql/backup/be_default.cc        2007-10-03 15:17:03 -04:00
+++ edited/be_default.cc        2007-10-03 14:02:57 -04:00
@@ -293,6 +293,7 @@
     cur_blob= 0;
     cur_table->use_all_columns();
     last_read_res = hdl->rnd_next(cur_table->record[0]);
+    BACKUP_SYNC("backup_snapshot");
     DBUG_EXECUTE_IF("SLEEP_DRIVER", sleep(4););
     /*
       If we are end of file, stop the read process and signal the

 2) Compile the server with "EXTRA_DEBUG" defined.
 3) Create an Innodb table and insert some data.
 4) Connect with 2 clients.
 5) In client 1, issue the backup command to backup the database with 
    the table. Note the size of the backup file.
 6) In client 2, issue "SELECT get_lock("backup_snapshot", 100);
 7) In client 1, issue the backup command to backup the database
    with the table. Note: the client with halt while the lock is taken.
 8) In client 2, insert some more rows.
 9) In client 2, issue "SELECT release_lock("backup_snapshot");
10) In client 1, observe backup now includes more data (should be the 
    same size as step 5).

Suggested fix:
Research how the consistent read is issued for InnoDB tables. The cause is most likely a parameter or condition that is not being met by the consistent snapshot driver to properly initiate the consistent read for InnoDB.

Cause of problem discovered:

The open_and_lock_tables must come after the call to begin the consistent snapshot.

Solution pending...

Patch ready for review. See: http://lists.mysql.com/commits/34982

A second look at this problem has resulted in the decision to implement a threading mechanism similar to Guilhem's MyISAM driver. There is about 20 hours of work remaining to be ready for review.

The solution to this bug is not trivial. The patch must satisfy these contraints:

* The default driver must open and lock its tables to create its validity point.
* The snapshot driver must start a transaction using a consistent read to start its validity point.
* The tables for each driver cannot be opened at the same time.
* One can only call open_and_lock_tables() once per thread (execution thread and THD thread).

http://lists.mysql.com/commits/35838

New patch ready. See http://lists.mysql.com/commits/35955

This patch includes enhancements to the code that fills in the state for the process list. If a user lock has the code locked the state will issue the string "debug_sync_point: XXXX" where XXXX is the name of the user lock. So if locked on backup_snapshot, the process list will look like this:

mysql> select * from information_schema.processlist\G
*************************** 1. row ***************************
     ID: 3
   USER: system user
   HOST:
     DB: NULL
COMMAND: Daemon
   TIME: 6
  STATE: NULL
   INFO: NULL
*************************** 2. row ***************************
     ID: 2
   USER: root
   HOST: localhost:2734
     DB: NULL
COMMAND: Query
   TIME: 6
  STATE: debug_sync_point: backup_snapshot
   INFO: backup database test to 'test.bak'
*************************** 3. row ***************************
     ID: 1
   USER: root
   HOST: localhost:2733
     DB: NULL
COMMAND: Query
   TIME: 0
  STATE: preparing
   INFO: select * from information_schema.processlist
3 rows in set (0.02 sec)

This allows one to use a deterministic step in the test to reliably tell when the backup (running in another client) has reached a given breakpoint. For example, to have the test wait until the backup_snapshot breakpoint has been reached, I can use this construct:

# Must wait to know when backup has entered lock.
let $wait_condition = SELECT state = "debug_sync_point: backup_snapshot"
                      FROM INFORMATION_SCHEMA.PROCESSLIST
                      WHERE info LIKE "backup database %";
--source include/wait_condition.inc

Cool.

Submitted a second patch for review. See http://lists.mysql.com/commits/36307

Submitted another patch. It has the following improvements:

* better encapsulation of thread code
* state used instead of booleans
* simplified thread code
* split thread handling to make prelock() fast
* sub classing of Backup_driver class to add variables
* ...and much more! :)

http://lists.mysql.com/commits/36809

Submitted another patch. It has the following improvements:

* better encapsulation of thread code
* state used instead of booleans
* simplified thread code
* split thread handling to make prelock() fast
* sub classing of Backup_driver class to add variables
* ...and much more! :)

http://lists.mysql.com/commits/36809

Good to push.

Pushed into 6.0.5-alpha

Noted in 6.0.5 changelog.

The consistent snapshot driver for Online Backup failed to initiate a
consistent read, so that changes to an InnoDB table during the backup
were included in the backup data.

Correction: No changelog entry needed; this bug did not appear in any released version.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

http://lists.mysql.com/commits/95009

3015 Chuck Bell 2009-12-18
BUG#31383 : Consistent Snapshot driver not consistent

This patch changes the default driver to use a separate thread to open and lock tables.
The default driver opens tables on the prelock() call from the kernel. The snapshot
driver initiates the CS read on the lock() call from the kernel and opens tables in the first
call to get_data() after the lock is taken. This is due to the fact that the default driver's
validity point is at open_and_lock_tables() while the snapshot driver's validity point
is at the start of the transaction.

original changeset: 2476.1260.1
@ sql/backup/CMakeLists.txt
Added the new be_thread source file and dependency for backup.
@ sql/backup/Makefile.am
Added the new be_thread source file and dependency for backup.
@ sql/backup/be_default.cc
Added new methods to support using a separate thread to
open and lock tables for backup.
@ sql/backup/be_default.h
Added new methods to support using a separate thread to
open and lock tables for backup.
@ sql/backup/be_snapshot.cc
Modifies the CS driver to open and close its own tables
while executing in the kernel's thread.
@ sql/backup/be_snapshot.h
Modifies the CS driver to open and close its own tables
while executing in the kernel's thread.
@ sql/backup/be_thread.cc
New source file for mutex initialization and helper
methods for using a thread to open and lock tables
in default and snapshot drivers.
@ sql/backup/be_thread.h
New source file for mutex initialization and helper
methods for using a thread to open and lock tables
in default and snapshot drivers.
@ sql/backup/data_backup.cc
Removed code to call open and lock tables from kernel.

A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95019

3015 Chuck Bell	2009-12-18
      BUG#31383 : Consistent Snapshot driver not consistent
        
      This patch changes the default driver to use a separate thread to open and lock tables.
      The default driver opens tables on the prelock() call from the kernel. The snapshot
      driver initiates the CS read on the lock() call from the kernel and opens tables in the first
      call to get_data() after the lock is taken. This is due to the fact that the default driver's
      validity point is at open_and_lock_tables() while the snapshot driver's validity point
      is at the start of the transaction.
      
      original changeset: 2476.1260.1
     @ sql/backup/be_thread.cc
        Added new error message for error handling in threads in default and snapshot drivers.
     @ sql/backup/be_thread.h
        New source file for mutex initialization and helper methods for using a thread to open
        and lock tables in default and snapshot drivers.