Bug #31383 Consistent Snapshot fails to initiate consistent read
Submitted: 3 Oct 2007 19:21 Modified: 26 Feb 2008 0:39
Reporter: Chuck Bell Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S1 (Critical)
Version:6.0 OS:Any
Assigned to: Chuck Bell
Triage: D2 (Serious)

[3 Oct 2007 19:21] Chuck Bell
Description:
The consistent snapshot driver is no longer working as it should. Once the backup starts on an InnoDB table, any changes made to the table during the read of the table during backup are included in the backup archive.

How to repeat:
 1) Insert the backup_sync call in the be_default.cc file as shown:

===== be_default.cc 1.8 vs edited =====
--- 1.8/sql/backup/be_default.cc        2007-10-03 15:17:03 -04:00
+++ edited/be_default.cc        2007-10-03 14:02:57 -04:00
@@ -293,6 +293,7 @@
     cur_blob= 0;
     cur_table->use_all_columns();
     last_read_res = hdl->rnd_next(cur_table->record[0]);
+    BACKUP_SYNC("backup_snapshot");
     DBUG_EXECUTE_IF("SLEEP_DRIVER", sleep(4););
     /*
       If we are end of file, stop the read process and signal the

 2) Compile the server with "EXTRA_DEBUG" defined.
 3) Create an Innodb table and insert some data.
 4) Connect with 2 clients.
 5) In client 1, issue the backup command to backup the database with 
    the table. Note the size of the backup file.
 6) In client 2, issue "SELECT get_lock("backup_snapshot", 100);
 7) In client 1, issue the backup command to backup the database
    with the table. Note: the client with halt while the lock is taken.
 8) In client 2, insert some more rows.
 9) In client 2, issue "SELECT release_lock("backup_snapshot");
10) In client 1, observe backup now includes more data (should be the 
    same size as step 5).

Suggested fix:
Research how the consistent read is issued for InnoDB tables. The cause is most likely a parameter or condition that is not being met by the consistent snapshot driver to properly initiate the consistent read for InnoDB.
[3 Oct 2007 20:50] Chuck Bell
Cause of problem discovered:

The open_and_lock_tables must come after the call to begin the consistent snapshot.

Solution pending...
[5 Oct 2007 14:56] Chuck Bell
Patch ready for review. See: http://lists.mysql.com/commits/34982
[15 Oct 2007 12:40] Chuck Bell
A second look at this problem has resulted in the decision to implement a threading mechanism similar to Guilhem's MyISAM driver. There is about 20 hours of work remaining to be ready for review.
[15 Oct 2007 22:05] Chuck Bell
The solution to this bug is not trivial. The patch must satisfy these contraints:

* The default driver must open and lock its tables to create its validity point.
* The snapshot driver must start a transaction using a consistent read to start its validity point.
* The tables for each driver cannot be opened at the same time.
* One can only call open_and_lock_tables() once per thread (execution thread and THD thread).
[18 Oct 2007 13:17] Chuck Bell
http://lists.mysql.com/commits/35838
[19 Oct 2007 22:09] Chuck Bell
New patch ready. See http://lists.mysql.com/commits/35955

This patch includes enhancements to the code that fills in the state for the process list. If a user lock has the code locked the state will issue the string "debug_sync_point: XXXX" where XXXX is the name of the user lock. So if locked on backup_snapshot, the process list will look like this:

mysql> select * from information_schema.processlist\G
*************************** 1. row ***************************
     ID: 3
   USER: system user
   HOST:
     DB: NULL
COMMAND: Daemon
   TIME: 6
  STATE: NULL
   INFO: NULL
*************************** 2. row ***************************
     ID: 2
   USER: root
   HOST: localhost:2734
     DB: NULL
COMMAND: Query
   TIME: 6
  STATE: debug_sync_point: backup_snapshot
   INFO: backup database test to 'test.bak'
*************************** 3. row ***************************
     ID: 1
   USER: root
   HOST: localhost:2733
     DB: NULL
COMMAND: Query
   TIME: 0
  STATE: preparing
   INFO: select * from information_schema.processlist
3 rows in set (0.02 sec)

This allows one to use a deterministic step in the test to reliably tell when the backup (running in another client) has reached a given breakpoint. For example, to have the test wait until the backup_snapshot breakpoint has been reached, I can use this construct:

# Must wait to know when backup has entered lock.
let $wait_condition = SELECT state = "debug_sync_point: backup_snapshot"
                      FROM INFORMATION_SCHEMA.PROCESSLIST
                      WHERE info LIKE "backup database %";
--source include/wait_condition.inc

Cool.
[25 Oct 2007 13:49] Chuck Bell
Submitted a second patch for review. See http://lists.mysql.com/commits/36307
[1 Nov 2007 13:15] Chuck Bell
Submitted another patch. It has the following improvements:

* better encapsulation of thread code
* state used instead of booleans
* simplified thread code
* split thread handling to make prelock() fast
* sub classing of Backup_driver class to add variables
* ...and much more! :)

http://lists.mysql.com/commits/36809
[1 Nov 2007 13:19] Chuck Bell
Submitted another patch. It has the following improvements:

* better encapsulation of thread code
* state used instead of booleans
* simplified thread code
* split thread handling to make prelock() fast
* sub classing of Backup_driver class to add variables
* ...and much more! :)

http://lists.mysql.com/commits/36809
[14 Nov 2007 19:47] Rafal Somla
Good to push.
[25 Feb 2008 20:19] Bugs System
Pushed into 6.0.5-alpha
[26 Feb 2008 0:39] Paul Dubois
Noted in 6.0.5 changelog.

The consistent snapshot driver for Online Backup failed to initiate a
consistent read, so that changes to an InnoDB table during the backup
were included in the backup data.
[14 Mar 2008 1:29] Paul Dubois
Correction: No changelog entry needed; this bug did not appear in any released version.
[18 Dec 2009 20:05] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95009

3015 Chuck Bell	2009-12-18
      BUG#31383 : Consistent Snapshot driver not consistent
        
      This patch changes the default driver to use a separate thread to open and lock tables.
      The default driver opens tables on the prelock() call from the kernel. The snapshot
      driver initiates the CS read on the lock() call from the kernel and opens tables in the first
      call to get_data() after the lock is taken. This is due to the fact that the default driver's
      validity point is at open_and_lock_tables() while the snapshot driver's validity point
      is at the start of the transaction.
      
      original changeset: 2476.1260.1
     @ sql/backup/CMakeLists.txt
        Added the new be_thread source file and dependency for backup.
     @ sql/backup/Makefile.am
         Added the new be_thread source file and dependency for backup.
     @ sql/backup/be_default.cc
        Added new methods to support using a separate thread to 
        open and lock tables for backup.
     @ sql/backup/be_default.h
        Added new methods to support using a separate thread to 
        open and lock tables for backup.
     @ sql/backup/be_snapshot.cc
        Modifies the CS driver to open and close its own tables
        while executing in the kernel's thread.
     @ sql/backup/be_snapshot.h
        Modifies the CS driver to open and close its own tables
        while executing in the kernel's thread.
     @ sql/backup/be_thread.cc
        New source file for mutex initialization and helper 
        methods for using a thread to open and lock tables 
        in default and snapshot drivers.
     @ sql/backup/be_thread.h
        New source file for mutex initialization and helper 
        methods for using a thread to open and lock tables 
        in default and snapshot drivers.
     @ sql/backup/data_backup.cc
        Removed code to call open and lock tables from kernel.
[18 Dec 2009 20:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95019

3015 Chuck Bell	2009-12-18
      BUG#31383 : Consistent Snapshot driver not consistent
        
      This patch changes the default driver to use a separate thread to open and lock tables.
      The default driver opens tables on the prelock() call from the kernel. The snapshot
      driver initiates the CS read on the lock() call from the kernel and opens tables in the first
      call to get_data() after the lock is taken. This is due to the fact that the default driver's
      validity point is at open_and_lock_tables() while the snapshot driver's validity point
      is at the start of the transaction.
      
      original changeset: 2476.1260.1
     @ sql/backup/be_thread.cc
        Added new error message for error handling in threads in default and snapshot drivers.
     @ sql/backup/be_thread.h
        New source file for mutex initialization and helper methods for using a thread to open
        and lock tables in default and snapshot drivers.