Bug #44569 Online backup crashes the server when run on large tables under load
Submitted: 30 Apr 2009 7:29 Modified: 6 Aug 2009 19:47
Reporter: Omer Barnir (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S3 (Non-critical)
Version:6.0.11 OS:Any
Assigned to: Rafal Somla CPU Architecture:Any

[30 Apr 2009 7:29] Omer Barnir
Description:
This bug is a 'private' case of bug#44568. 

When the scenario described in the above bug (running online backup against a database that has load run against it) includes a larger table (150K rows) the backup crashes the server with the following message:

090430  9:09:58 [Note] Backup: Starting backup process
090430  9:09:58 [Note] Backup: Backing up 1 database(s) `FlightStats_test`
mysqld: stream_v1_transport.c:1273: bstream_end_chunk: Assertion `s->buf.pos < s->buf.end' failed.
090430  9:11:46 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=21
max_threads=151
thread_count=1
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 60634 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x2627fa0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x4b56a180 thread_stack 0x40000
/data0/wl4044/bin_6011/bin/mysqld(my_print_stacktrace+0x29) [0xacca99]
/data0/wl4044/bin_6011/bin/mysqld(handle_segfault+0x33b) [0x654d0b]
/lib64/libpthread.so.0 [0x2b2b1cecdc10]
/lib64/libc.so.6(gsignal+0x35) [0x2b2b1d6acb95]
/lib64/libc.so.6(abort+0x110) [0x2b2b1d6adf90]
/lib64/libc.so.6(__assert_fail+0xf6) [0x2b2b1d6a6256]
/data0/wl4044/bin_6011/bin/mysqld [0xb5f0af]
/data0/wl4044/bin_6011/bin/mysqld(backup::Block_writer::write_buf(backup::Buffer const&)+0x41) [0xb51e71]
/data0/wl4044/bin_6011/bin/mysqld(backup::Backup_pump::pump(unsigned long*)+0x162) [0xb52962]
/data0/wl4044/bin_6011/bin/mysqld(backup::Scheduler::step()+0x82) [0xb53452]
/data0/wl4044/bin_6011/bin/mysqld(backup::write_table_data(THD*, Backup_info&, backup::Output_stream&)+0x528) [0xb53af8]
/data0/wl4044/bin_6011/bin/mysqld(Backup_restore_ctx::do_backup()+0x113) [0xb4f953]
/data0/wl4044/bin_6011/bin/mysqld(execute_backup_command(THD*, LEX*, String*, bool, bool)+0x4d0) [0xb50650]
/data0/wl4044/bin_6011/bin/mysqld(mysql_execute_command(THD*)+0x1b87) [0x667027]
/data0/wl4044/bin_6011/bin/mysqld(mysql_parse(THD*, char const*, unsigned int, char const**)+0x37f) [0x66aa2f]
/data0/wl4044/bin_6011/bin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xa89) [0x66b909]
/data0/wl4044/bin_6011/bin/mysqld(handle_one_connection+0x1a9) [0x65e4c9]
/lib64/libpthread.so.0 [0x2b2b1cec6143]
/lib64/libc.so.6(__clone+0x6d) [0x2b2b1d73c74d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x21b6ab8 = backup database FlightStats_test to '/data0/wl4044/bin_6011/mysql-test/fl_bk.bck'
thd->thread_id=207
thd->killed=NOT_KILLED

This is not observed with a low number of client threads and or small tables.

How to repeat:
1) Download the 'bug_test.tar.gz' file attached to bug#44568 and untar it in the
   mysql-test directory.
2) Download the 'ontime_1week.tar.gz' file located in 
   ml370-b.mysql.com:/data0/wl4044 to the mysql-test/bug_files directory and
   untar it
3) If running in a source tree, edit the run_crash script and change the value
   of  CLIENT_PATH=$CUR_PATH/bin to CLIENT_PATH=$CUR_PATH/client
4) Run the ./run_wait script (it will take about 5 minutes to complete)
5) You will observe output similar to the following:
   Thu Apr 30 08:08:00 CEST 2009 - Creating the Flight_Stat_test database and schema
   Thu Apr 30 08:08:00 CEST 2009 - Altering the tables
   Thu Apr 30 08:08:00 CEST 2009 - Loading the data
   Thu Apr 30 08:09:18 CEST 2009 - Adding procedures triggers  and IUDS tables
   Thu Apr 30 08:09:28 CEST 2009 - Started IUDS clients (in the background)
   Thu Apr 30 08:09:58 CEST 2009 - Running a backup
   Thu Apr 30 08:11:45 CEST 2009 - IUDS client run completed
   ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
worker[1] Server [mysqld.1 - pid: 7198, winpid: 7198, exit: 256] died
   Thu Apr 30 08:11:46 CEST 2009 - Backup done
   qauser@ml370-b:/data0/wl4044/bin_6011/mysql-test> mysql-test-run: *** ERROR: Test suite aborted

5) Similar to bug#44568 show processlist will show the backup process 'waiting
   for a read lock'. Once the client lad stops the state of the backup thread 
   will change to 'NULL' and soon after the server will crash.

Suggested fix:
Server should not crash and backup should e performed under load.
[30 Apr 2009 7:32] Omer Barnir
Observed with 6.0.11 but likely to have been there before
[30 Apr 2009 10:41] Rafal Somla
I'd like to look at this if no one objects.
[15 Jul 2009 11:59] Rafal Somla
Trying to repeat the failure with Omer's help.
[15 Jul 2009 23:44] Omer Barnir
Step #4 above should be 'run_crash' and not 'run_wait' - cut and paste error form bug#44568 - :-(
[27 Jul 2009 14:31] Rafal Somla
REFINED PROBLEM DESCRIPTION
===========================
The problem lies in this fragment of bstream_end_chunk() function 
(stream_v1_transport.c:1279):

  if (s->buf.pos == s->buf.header)
  {
    ASSERT(s->buf.pos < s->buf.end);
    *(s->buf.header++)= FR_EOC;
    s->buf.pos= s->buf.header;
  }

The code assumes that in the case of empty fragment (buf.pos == buf.header) we 
always have at least one byte left to the end of the output block. But this 
assumption is not true. As illustrated in BUG#45737 it is possible to arrive at 
a situation where buf.pos == buf.header == buf.end in which case the above 
assertion fails.
[28 Jul 2009 9:24] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/79388

2851 Rafal Somla	2009-07-28
      BUG#44569 - Online backup crashes the server when run on large tables under load
      
      The logic of bstream_end_chunk() was using wrong assumptions. This patch 
      refactors this logic adding comments explaining how it works.
      
      Patch also adds a unit test for testing the logic of bstream_end_chunk().
     @ sql/backup/stream_v1.h
        - In st_backup_stream, replace FIRST_BLOCK state with an independent first_block 
          flag.
        - Add NEW_CHUNK state for handling empty chunks.
     @ sql/backup/stream_v1_transport.c
        - Replaced FIRST_BLOCK state with first_block flag, to make the two independent.
        - In bstream_write_part() added code for handling the case where there is only one
          byte left in the current output block.
        - fixed logic of bstream_end_chunk().
        - In bstream_read_part() added check for the situation where the next fragemnt 
          in the stream contains no data but just an EOC marker.
     @ unittest/backup/Makefile.am
        Added new unit test bstr_eoc-t.
     @ unittest/backup/bstr_eoc-t.c
        A unit test which checks correct behaviour of bstream_end_chunk() function.
[28 Jul 2009 13:34] Chuck Bell
Approved with changes.
[3 Aug 2009 8:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/79858

2852 Rafal Somla	2009-08-03
      BUG#44569 - Online backup crashes the server when run on large tables under load
      
      The logic of bstream_end_chunk() was using wrong assumptions. This patch 
      refactors this logic adding comments explaining how it works.
      
      Patch also adds a unit test for testing the logic of bstream_end_chunk().
     @ sql/backup/stream_v1.h
        - In st_backup_stream, replace FIRST_BLOCK state with an independent first_block 
          flag.
        - Add NEW_CHUNK state for handling empty chunks.
     @ sql/backup/stream_v1_transport.c
        - Replaced FIRST_BLOCK state with first_block flag, to make the two independent.
        - In bstream_write_part() added code for handling the case where there is only one
          byte left in the current output block.
        - fixed logic of bstream_end_chunk().
        - In bstream_read_part() added check for the situation where the next fragment 
          in the stream contains no data but just an EOC marker.
     @ unittest/backup/Makefile.am
        Added new unit test bstr_eoc-t.
     @ unittest/backup/bstr_eoc-t.c
        A unit test which checks correct behaviour of bstream_end_chunk() function.
[4 Aug 2009 6:42] Rafal Somla
Pushed to mysql-6.0-backup tree on 2009-08-03.
[6 Aug 2009 8:26] Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090806082225-qssc912qdv1mm6xv) (version source revid:rafal.somla@sun.com-20090803075951-s5vvarlxj53muy8l) (merge vers: 5.4.4-alpha) (pib:11)
[6 Aug 2009 19:47] Paul DuBois
Not in any released version. No changelog entry needed.
[14 Aug 2009 2:20] Omer Barnir
Verified original test scenario not crashing when tested on the backup tree (rev 2857)
[23 Dec 2009 15:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/95559

3011 Chuck Bell	2009-12-23
      BUG#44569 - Online backup crashes the server when run on large tables under load
      
      The logic of bstream_end_chunk() was using wrong assumptions. This patch 
      refactors this logic adding comments explaining how it works.
      
      Patch also adds a unit test for testing the logic of bstream_end_chunk().
      
      
      original changeset: 2599.157.12
     @ unittest/backup/bstr_eoc-t.c
        A unit test which checks correct behaviour of bstream_end_chunk() function.