Bug #36795 Backup crash in backup::Mem_allocator::free at kernel.cc:911
Submitted: 19 May 2008 8:44 Modified: 26 Aug 2008 20:04
Reporter: Philip Stoev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S1 (Critical)
Version:6.0-backup OS:Any
Assigned to: Øystein Grøvlen CPU Architecture:Any

[19 May 2008 8:44] Philip Stoev
Description:
When running a concurrent DML workload  containing BACKUP, the server crashed as follows:

#0  0x00110402 in __kernel_vsyscall ()
#1  0x00bdc617 in pthread_kill () from /lib/libpthread.so.0
#2  0x0844f763 in write_core (sig=11) at stacktrace.c:305
#3  0x082a5d0e in handle_segfault (sig=11) at mysqld.cc:2640
#4  <signal handler called>
#5  0x087a2764 in backup::Mem_allocator::free (this=0x9768c38, ptr=0x9962690) at kernel.cc:911
#6  0x087a27df in bstream_free (ptr=0x9962690 "ю") at kernel.cc:953
#7  0x087b81d2 in bstream_close (s=0x970435c) at stream_v1_transport.c:928
#8  0x087b9956 in backup::Output_stream::close (this=0x9704358) at stream.cc:237
#9  0x087a474d in Backup_restore_ctx::close (this=0xaefc2434) at kernel.cc:664
#10 0x087a593d in execute_backup_command (thd=0x98b06b0, lex=0x98b1754) at kernel.cc:210
#11 0x082b6e18 in mysql_execute_command (thd=0x98b06b0) at sql_parse.cc:2153
#12 0x082bf7f6 in mysql_parse (thd=0x98b06b0, inBuf=0x99486d0 "BACKUP DATABASE test TO \"/build/mysql-6.0-backup/mysql-test/var/backup38\"", length=73,
    found_semicolon=0xaefc3270) at sql_parse.cc:5747
#13 0x082c0243 in dispatch_command (command=COM_QUERY, thd=0x98b06b0,
    packet=0x97f7eb9 "BACKUP DATABASE test TO \"/build/mysql-6.0-backup/mysql-test/var/backup38\"", packet_length=73) at sql_parse.cc:1045
#14 0x082c14b6 in do_command (thd=0x98b06b0) at sql_parse.cc:722
#15 0x082aeb11 in handle_one_connection (arg=0x98b06b0) at sql_connect.cc:1134
#16 0x00bd750b in start_thread () from /lib/libpthread.so.0
#17 0x00b18b2e in clone () from /lib/libc.so.6

How to repeat:
A simplifed test case will hopefully follow shortly.
[19 May 2008 18:34] Philip Stoev
To reproduce, please use the second test case from bug 34547, available at:

http://bugs.mysql.com/file.php?id=9350

Please place the .txt files in mysql-test and the .test files in mysql-test/t. Then run:

$ perl ./mysql-test-run.pl --stress --stress-init-file=bug34547_2_init.txt --stress-test-file=bug34547_2_run.txt --stress-test-duration=60 --stress-threads=5  --skip-ndb --mysqld=--skip-innodb

This test will crash in one of several ways, each connected with memory management. Ideally, The test should run without issues until bug 34547 is observed at server shutdown.
[29 Jul 2008 11:19] Øystein Grøvlen
I have seen similar core dumps while experiementing with a way to reproduce Bug#36792.
[31 Jul 2008 7:33] Øystein Grøvlen
When running the test case supplied, I either get the mentioned seg fault, or the following assert:

mysqld: kernel.cc:1138: bstream_byte* bstream_alloc(long unsigned int): Assertion `Backup_restore_ctx::mem_alloc' failed.

#0  0x0000003f1100b122 in pthread_kill () from /lib64/libpthread.so.0
#1  0x0000000000a32320 in my_write_core (sig=6) at stacktrace.c:307
#2  0x000000000064ba79 in handle_segfault (sig=6) at mysqld.cc:2654
#3  <signal handler called>
#4  0x0000003f10430045 in raise () from /lib64/libc.so.6
#5  0x0000003f10431ae0 in abort () from /lib64/libc.so.6
#6  0x0000003f10429756 in __assert_fail () from /lib64/libc.so.6
#7  0x0000000000a9cd53 in bstream_alloc (size=<value optimized out>)
    at kernel.cc:1138
#8  0x0000000000ab34c1 in bstream_open_wr (s=0x837dc78, block_size=29137,
    offset=6) at stream_v1_transport.c:837
#9  0x0000000000aacbb2 in backup::Output_stream::init (this=0x837dc70)
    at stream.cc:289
#10 0x0000000000aad2b3 in backup::Output_stream::open (this=0x837dc70)
    at stream.cc:352
#11 0x0000000000a9eaed in Backup_restore_ctx::prepare_for_backup (
    this=0x47b75be0, location=
      {str = 0x8425550 "/home/og136792/mysql/shared/mysql-6.0-backup-clean/mysql-test/var/backup7", length = 73}, query=<value optimized out>,
    with_compression=false) at kernel.cc:503
#12 0x0000000000aa0400 in execute_backup_command (thd=0x841a638,
    lex=0x841c090) at kernel.cc:144
#13 0x00000000006575e4 in mysql_execute_command (thd=0x841a638)
    at sql_parse.cc:2172
#14 0x000000000065df74 in mysql_parse (thd=0x841a638,
    inBuf=0x8425150 "BACKUP DATABASE test TO \"/home/og136792/mysql/shared/mysql-6.0-backup-clean/mysql-test/var/backup7\"", length=99,
    found_semicolon=0x47b77028) at sql_parse.cc:5800
#15 0x000000000065ecac in dispatch_command (command=COM_QUERY, thd=0x841a638,
    packet=<value optimized out>, packet_length=99) at sql_parse.cc:1050
#16 0x000000000065fc07 in do_command (thd=0x841a638) at sql_parse.cc:723
#17 0x0000000000650b91 in handle_one_connection (arg=<value optimized out>)
    at sql_connect.cc:1153
#18 0x0000003f110062e7 in start_thread () from /lib64/libpthread.so.0
#19 0x0000003f104ce3bd in clone () from /lib64/libc.so.6
[31 Jul 2008 8:06] Øystein Grøvlen
The following test script reproduces the assert:

======
CREATE DATABASE backup_concurrent;
USE backup_concurrent;

CREATE TABLE t (
t1 INTEGER NOT NULL,
t2 CHAR(36),
PRIMARY KEY (t1)
);

connect (backup1,localhost,root,,);
USE backup_concurrent;
send BACKUP DATABASE backup_concurrent TO 'backup1';

# Second backup should fail because another backup is running
connection default;
--error ER_BACKUP_RUNNING
BACKUP DATABASE backup_concurrent TO 'backup2';

INSERT INTO t VALUES (1, 'test');
BACKUP DATABASE backup_concurrent TO 'backup3';
[31 Jul 2008 8:26] Øystein Grøvlen
The reason for the failing assert is related to that Backup_restore_ctx::mem_alloc is static.  The failing backup will set mem_alloc to null when terminating.  The next time the running backup wants to allocate memory, the assert will fail.

I suggest that we make mem_alloc non-static.  That way, concurrent backups will not interfere.  This requires that it is possible to bstream_alloc to find the right Backup_restore_ctx to use.  Suggest to fix that by changing the static is_running flag to a static pointer to the Backup_restore_ctx of the currently running backup.  If the pointer is null, it means that no backup is currently running.
[1 Aug 2008 13:30] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50809

2674 Oystein Grovlen	2008-08-01
      Bug#36795 Concurrency issues when starting backups in parallel.
      
      Raise condition on Backup_restore_ctx::mem_alloc since it is static.
      The failing backup will set mem_alloc to null when terminating.  The
      next time the running backup wants to allocate memory, an assert will
      fail.
      
      Makes mem_alloc non-static.  That way, concurrent backups will not
      interfere.  This requires that it is possible to bstream_alloc to find
      the right Backup_restore_ctx to use.  Fixes that by changing the
      static is_running flag to a static pointer to the Backup_restore_ctx
      of the currently running backup.  If the pointer is null, it means
      that no backup is currently running.
[6 Aug 2008 11:28] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/50989

2676 Oystein Grovlen	2008-08-06
            Bug#36795 Concurrency issues when starting backups in parallel.
        
      Raise condition on Backup_restore_ctx::mem_alloc since it is static.
      The failing backup will set mem_alloc to null when terminating.  The
      next time the running backup wants to allocate memory, an assert will
      fail.
            
      Makes mem_alloc non-static.  That way, concurrent backups will not
      interfere.  This requires that it is possible to bstream_alloc to find
      the right Backup_restore_ctx to use.  Fixes that by changing the
      static is_running flag to a static pointer, current_op, to the Backup_restore_ctx
      of the currently running backup.  If the pointer is null, it means
      that no backup is currently running.
[7 Aug 2008 15:45] Chuck Bell
Patch approval condition on the following:

Please add:

SET DEBUG_SYNC= 'reset';

to the end of your backup_concurrent test. This command is require to place the debug sync facility in a stable state. Without it, any test that follows that uses DEBUG_SYNC could be skipped (not fail) -- see backup_ddl_blocker:

main.backup_concurrent         [ pass ]            483
main.backup_ddl_blocker        [ skipped ]   Query 'SELECT ('$value' LIKE 'ON %'
) AS debug_sync' failed, required functionality not supported
[8 Aug 2008 9:39] Jørgen Løland
Good to push
[8 Aug 2008 11:18] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51184

2678 Oystein Grovlen	2008-08-08
      Bug#36795 Concurrency issues when starting backups in parallel.
      
      Raise condition on Backup_restore_ctx::mem_alloc since it is static.
      The failing backup will set mem_alloc to null when terminating.  The
      next time the running backup wants to allocate memory, an assert will
      fail.
      
      Makes mem_alloc non-static.  That way, concurrent backups will not
      interfere.  This requires that it is possible to bstream_alloc to find
      the right Backup_restore_ctx to use.  Fixes that by changing the
      static is_running flag to a static pointer, current_op, to the Backup_restore_ctx
      of the currently running backup.  If the pointer is null, it means
      that no backup is currently running.
[8 Aug 2008 11:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/51185

2678 Oystein Grovlen	2008-08-08
      Bug#36795 Concurrency issues when starting backups in parallel.
      
      Raise condition on Backup_restore_ctx::mem_alloc since it is static.
      The failing backup will set mem_alloc to null when terminating.  The
      next time the running backup wants to allocate memory, an assert will
      fail.
      
      Makes mem_alloc non-static.  That way, concurrent backups will not
      interfere.  This requires that it is possible to bstream_alloc to find
      the right Backup_restore_ctx to use.  Fixes that by changing the
      static is_running flag to a static pointer, current_op, to the Backup_restore_ctx
      of the currently running backup.  If the pointer is null, it means
      that no backup is currently running.
[8 Aug 2008 11:35] Øystein Grøvlen
Pushed up to revision 2678.
[26 Aug 2008 12:33] Øystein Grøvlen
Pushed into main for 6.0.7.

Documentation input:
Server crashed when starting a new Backup or Restore command while a Backup or Restore was ongoing.
[26 Aug 2008 20:04] Paul DuBois
Noted in 6.0.7 changelog.
[13 Sep 2008 22:39] Bugs System
Pushed into 6.0.6-alpha  (revid:oystein.grovlen@sun.com-20080808111737-t4tpz8zwgnr3a79l) (version source revid:hakan@mysql.com-20080716105246-eg0utbybp122n2w9) (pib:3)