Bug #43511 The test backup_bml fails inconsistently on Windows and Linux
Submitted: 9 Mar 2009 20:52 Modified: 26 Mar 2009 23:42
Reporter: Chuck Bell Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Backup Severity:S3 (Non-critical)
Version:6.0.11 OS:Windows
Assigned to: Chuck Bell CPU Architecture:Any

[9 Mar 2009 20:52] Chuck Bell
Description:
The backup_bml fails sometimes on Linux with:

backup.backup_bml                        [ fail ]
        Test ended at 2009-03-09 16:41:18

CURRENT_TEST: backup.backup_bml
--- /home/cbell/source/bzr/mysql-6.0-test-backup/mysql-test/suite/backup/r/backu
p_bml.result    2009-03-09 22:11:14.000000000 +0300
+++ /home/cbell/source/bzr/mysql-6.0-test-backup/mysql-test/suite/backup/r/backu
p_bml.reject    2009-03-09 23:41:18.000000000 +0300
@@ -39,7 +39,6 @@
 (8, 'INDEX',     'i1 ON t2', '(b)');
 DROP PROCEDURE IF EXISTS test.check_results\\
 Warnings:
-Note   1305    PROCEDURE test.check_results does not exist
 CREATE PROCEDURE test.check_results()
 BEGIN
 # show databases

The same test fails on Windows with a more serious crash:

backup.backup_bml                        [ fail ]
        Test ended at 2009-03-09 16:48:02

CURRENT_TEST: backup.backup_bml
mysqltest: At line 361: query 'reap' failed: 2013: Lost connection to MySQL serv
er during query

The result from queries just before the failure was:
< snip >
v1      CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFIN
ER VIEW `bml_test`.`v1` AS select `bml_test`.`t2`.`b` AS `b` from `bml_test`.`t2
`       latin1  latin1_swedish_ci
Function        sql_mode        Create Function character_set_client    collatio
n_connection    Database Collation
f1              CREATE DEFINER=`root`@`localhost` FUNCTION `f1`() RETURNS int(11
)
RETURN 1        latin1  latin1_swedish_ci       latin1_swedish_ci
Procedure       sql_mode        Create Procedure        character_set_clientcoll
ation_connection        Database Collation
p1              CREATE DEFINER=`root`@`localhost` PROCEDURE `p1`()
    COMMENT 'testing alter'
SET @foo=1      latin1  latin1_swedish_ci       latin1_swedish_ci
event_name
e2
==================

# Resume BACKUP/RESTORE - this allows DDL3 and DDL4 to complete.
SET DEBUG_SYNC= 'now SIGNAL finish_bup';
# con5: Reaping BACKUP/RESTORE
backup_id
500
SET debug="-d";
# con3: Completing DDL3
# con4: Completing DDL4

Note: The mysqld.err file is attached to this bug report.

How to repeat:
1) build the mysql-6.0-backup tree on Windows and Linux.
2) run ./mysql-test-run.pl backup_bml --retry=0

Suggested fix:
There are two parts to this solution:

1) Research the cause of the missing warning and if safe to record the new result file.

2) Determine why the test fails on Windows.
[9 Mar 2009 20:53] Chuck Bell
Server error log.

Attachment: mysqld.err (application/octet-stream, text), 6.03 KiB.

[10 Mar 2009 0:04] Chuck Bell
I experimented with this (on Windows):

./mysql-test-run.pl backup_bml --record

Once I did this, the test fails with the crash described above randomly. The crash occurs in this segment from bml_test.inc:

@line 272
--echo # Resume BACKUP/RESTORE - this allows DDL3 and DDL4 to complete.
SET DEBUG_SYNC= 'now SIGNAL finish_bup';

         connection con5;
         --echo # con5: Reaping BACKUP/RESTORE
         reap;
         SET debug="-d";

     connection con3;
     --echo # con3: Completing DDL3
     reap;  <----- CRASHES HERE

       connection con4;
       --echo # con4: Completing DDL4
       reap;
[10 Mar 2009 10:38] Rafal Somla
A stack trace from the reported crash follows:

thd: 0x35419c8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
017B075C    mysqld.exe!strcmp()[strcmp.asm:139]
01555C72    mysqld.exe!InList()[dbug.c:1583]
01555D6B    mysqld.exe!_db_keyword_()[dbug.c:1830]
015550B7    mysqld.exe!_db_doprnt_()[dbug.c:1342]
014E9F63    mysqld.exe!alloc_root()[my_alloc.c:223]
013E088B    mysqld.exe!open_table_from_share()[table.cc:2009]
0129E9D8    mysqld.exe!open_temporary_table()[sql_base.cc:4695]
013B5E8C    mysqld.exe!create_altered_table()[sql_table.cc:5931]
013B8911    mysqld.exe!mysql_alter_table()[sql_table.cc:7102]
013041EC    mysqld.exe!mysql_execute_command()[sql_parse.cc:2888]
0130BC3D    mysqld.exe!mysql_parse()[sql_parse.cc:5823]
01300F58    mysqld.exe!dispatch_command()[sql_parse.cc:1057]
013007D9    mysqld.exe!do_command()[sql_parse.cc:739]
01421579    mysqld.exe!handle_one_connection()[sql_connect.cc:1146]
0150FC18    mysqld.exe!pthread_start()[my_winthread.c:61]
017C5A63    mysqld.exe!_callthreadstartex()[threadex.c:348]
017C5A04    mysqld.exe!_threadstartex()[threadex.c:331]
75C54911    kernel32.dll!BaseThreadInitThunk()
773CE4B6    ntdll.dll!RtlInitializeExceptionChain()
773CE489    ntdll.dll!RtlInitializeExceptionChain()
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0354BF10=ALTER TABLE t1 ADD INDEX `i` (a)
thd->thread_id=6
thd->killed=NOT_KILLED

From this one can see that the crash happened when executing ALTER TABLE statement, during open_temporary_table() operation. The crash point is here in alloc_root():

  ...
  point= (uchar*) ((char*) next+ (next->size-next->left));
  /*TODO: next part may be unneded due to mem_root->first_block_usage counter*/
  if ((next->left-= length) < mem_root->min_malloc)
  {						/* Full block */
    *prev= next->next;				/* Remove block from list */
    next->next= mem_root->used;
    mem_root->used= next;
    mem_root->first_block_usage= 0;
  }
  DBUG_PRINT("exit",("ptr: %p", point));  <----------- HERE
  DBUG_RETURN((void*) point);
  ...

Probably points at invalid memory location. But hard to tell why it is so...
[10 Mar 2009 14:14] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68775

2791 Chuck Bell	2009-03-10
      BUG#43511 : The test backup_bml fails inconsistently on Windows and Linux
      
      This patch solves the problem with the missing warning messages
      found on Linux. It does not solve the Windows issue.
      
      This is part 1 of 2 patches. Patch 2 forthcoming.
      modified:
        sql/sql_class.h
[10 Mar 2009 14:18] Rafal Somla
First patch good to push.
[10 Mar 2009 14:46] Jørgen Løland
First patch good to push.
[10 Mar 2009 15:19] Chuck Bell
The origin of the crash is in the InList() method in dbug.c. The linkp->next_link is pointing to an unused memory location.

This appears random as the crash has occurred where DBUG_PRINT() was called from various points in the code. Here is one example:

mysqld.exe!strcmp(unsigned char * str1=0x011f2e48, unsigned char * str2=0x00010001)  Line 139	Asm
mysqld.exe!InList(link * linkp=0xdddddddd, const char * cp=0x011f2e48)  Line 1583 + 0x10 bytes	C
mysqld.exe!_db_keyword_(_db_code_state_ * cs=0x035989d8, const char * keyword=0x011f2e48, int strict=65538)  Line 1830 + 0x34 bytes	C
mysqld.exe!_db_doprnt_(const char * format=0x011f2e50, ...)  Line 1342 + 0x12 bytes	C
mysqld.exe!tablename_to_filename(const char * from=0x011fa63e, char * to=0x0910b048, unsigned int to_length=512)  Line 160 + 0x20 bytes	C++
mysqld.exe!build_table_filename(char * buff=0x0910b924, unsigned int bufflen=512, const char * db=0x0352e350, const char * table_name=0x011fa63e, const char * ext=0x011fa5ef, unsigned int flags=0)  Line 222 + 0x15 bytes	C++
mysqld.exe!mysql_register_view(THD * thd=0x0347c460, TABLE_LIST * view=0x0352e150, enum_view_create_mode mode=VIEW_ALTER)  Line 878 + 0x24 bytes	C++
mysqld.exe!mysql_create_view(THD * thd=0x0347c460, TABLE_LIST * views=0x0352e150, enum_view_create_mode mode=VIEW_ALTER)  Line 621 + 0x11 bytes	C++
mysqld.exe!mysql_execute_command(THD * thd=0x0347c460)  Line 4573 + 0x1a bytes	C++
mysqld.exe!mysql_parse(THD * thd=0x0347c460, const char * inBuf=0x0352de78, unsigned int length=25, const char * * found_semicolon=0x0910f678)  Line 5823 + 0x9 bytes	C++
mysqld.exe!dispatch_command(enum_server_command command=COM_QUERY, THD * thd=0x0347c460, char * packet=0x0346ffb1, unsigned int packet_length=25)  Line 1057 + 0x1e bytes	C++
mysqld.exe!do_command(THD * thd=0x0347c460)  Line 739 + 0x1b bytes	C++
 	mysqld.exe!handle_one_connection(void * arg=0x0347c460)  Line 1146 + 0x9 bytes	C++

I have tried to trace down all assignments to next_link but so far nothing seems out of place.
[10 Mar 2009 16:08] Chuck Bell
Eureeka! I see the problem. If you have a set of debug keywords inserted with SET SESSION debug="+d... and later reset them using SET debug="-d", the keywords are removed. However, if you have multiple connections open, the code that frees the keyword list is not protected from concurrency and thus it is possible for a portion of the code to reference a partially deleted keyword list.

The solution is to not attempt the SET debug="-d" until after all threads (connections in a test) and complete and/or no longer reference the debug keyword list.

The patch for part b) of this bug report corrects the backup_bml test in this manner.
[10 Mar 2009 16:19] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68798

2794 Chuck Bell	2009-03-10
      BUG#43511 : The test backup_bml fails inconsistently on Windows and Linux
      
      This patch solves the problem with the backup_bml test
      crashing on Windows.
            
      This is part 2 of 2 patches
      modified:
        mysql-test/suite/backup/include/bml_test.inc
        mysql-test/suite/backup/r/backup_bml.result
[10 Mar 2009 16:25] Chuck Bell
Ticking Jorgen as he approved patch 1 of 2 and only 1 reviewer needed for patch 2 of 2 (test changes only).
[10 Mar 2009 16:28] Rafal Somla
Good to push.
[10 Mar 2009 16:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/68800

2774 Chuck Bell	2009-03-10 [merge]
      Local merge of patch for BUG#43511.
      modified:
        mysql-test/suite/backup/include/bml_test.inc
        mysql-test/suite/backup/r/backup_bml.result
[26 Mar 2009 12:35] Bugs System
Pushed into 6.0.11-alpha (revid:alik@sun.com-20090326121822-pt84kzxxayzho4mn) (version source revid:charles.bell@sun.com-20090310163259-7iw2kc4ouktjebyq) (merge vers: 6.0.11-alpha) (pib:6)
[26 Mar 2009 23:42] Paul DuBois
Test case changes. No changelog entry needed.