Bug #39465 Random mysql-test failure under Windows XP SP3
Submitted: 15 Sep 2008 22:06 Modified: 15 Jun 2009 11:00
Reporter: Igor Solodovnikov Email Updates:
Status: Can't repeat Impact on me:
None 
Category:Tests: Server Severity:S7 (Test Cases)
Version:5.0.67 OS:Windows (XP (SP3))
Assigned to: Patrick Crews CPU Architecture:Any
Tags: Failure, random, test

[15 Sep 2008 22:06] Igor Solodovnikov
Description:
I executed "perl.exe mysql-test-run.pl" several times. As stated in 'mysql-test\README': "All tests must pass". But on my system testing process fails on random test.

I run mysql-test-run.pl 15 times. Then I divided all failures into 3 classes:

Class 1 (10 times of 15) - error 13
Various tests (alter_table, create, ctype_big5, archive_gis, compress) failed with error messages describing errno 13 (or in some cases Errcode 13).
For example here is error message from 'create' test: mysqltest: At line 295: query 'create table t2 like t3' failed: 1004: Can't create file '.\test\t2.frm' (errno: 13)
This class of failures has something in common with bug 33114 (there is also random error 13)

Class 2 (3 times of 15) - 'rm' does not exist on windows
Failed test: ctype_big5
Error message: "rm" is not internal or external command...
mysqltest: At line 82: command "rm $MYSQLTEST_VARDIR/master-data/test/t1.txt" failed

Class 3 (2 times of 15) - "Strange" errors
This class contain 2 "strange" failures:
Failed test: compress
Error message: mysqltest: In included file ".\include\common-tests.inc": At line 1435: query 'select distinct companynr,rtrim(space(512+companynr)) from t3 order by 1,2' failed: 2006: MySQL server has gone away
Failed test: alter_table
Error message: mysqltest: At line 109: query 'drop database mysqltest' failed: 1010: Error dropping database (can't rmdir '.\mysqltest', errno: 41)

How to repeat:
1. Download mysql-5.0.67-win32.zip, extract Setup.exe and install it on Windows XP SP3.
2. Download mysql-noinstall-5.0.67-win32.zip and extract its contect to arbitrary directory (for example c:\MySQL)
3. Install ActivePerl (i used ActivePerl 5.8.6 Build 811)
4. Go to "c:\MySQL\mysql-test" directory and execute "perl.exe mysql-test-run.pl" command

Suggested fix:
I monitored my tests with SysInternals Process Monitor. So i have idea about Class 1 failues: every time before such failure Process Monitor logged exactly the same sequence of file operations done by mysqld-nt.exe:
1. Open file with ShareMode=Read|Write|Delete (Result=SUCCESS)
2. Delete file (Result=SUCCESS)
3. Close file (Result=SUCCESS)
4. Open same file with ShareMode=Read|Write|Delete (Result=DELETE PENDING)
After this sequence some test fails with errno=13. On the other side there is no operations with "DELETE PENDING" result before failed test. So i think "DELETE PENDING" result is the source of errno=13.
Using web search i found some information about "DELETE PENDING" result:
http://blogs.msdn.com/junfeng/archive/2005/05/11/416570.aspx
http://blogs.msdn.com/junfeng/archive/2004/04/09/110278.aspx
Provided that information my resume is: the root cause of the problem is CreateFile flag FILE_SHARE_DELETE. The only safe way to delete a file opened with FILE_SHARE_DELETE is to rename it to some unique temporary name and then delete.
I can post process Monitor logs here if this is necessary.
[15 Sep 2008 22:37] Igor Solodovnikov
Originally i classified this bug to Tests:Server category but now i think it better match to MySQL:Tests or to MyISAM Storage Engine because source of problem lies in the way mysqld-nt.exe works with database files.
[16 Sep 2008 11:21] MySQL Verification Team
Thank you for the bug report. Looks like a duplicate or related to the bug: http://bugs.mysql.com/bug.php?id=38831.
[16 Sep 2008 13:01] Igor Solodovnikov
Yes, that bug is related, but it is mainly about unix commands missing on windows.

In my report i described conditions leading to errno 13 and found that mysqld-nt.exe generates this error code when windows result code for file operation is DELETE PENDING.
[12 Oct 2008 20:17] MySQL Verification Team
Thank you for the bug report.
[13 Nov 2008 22:48] MySQL Verification Team
Bug: http://bugs.mysql.com/bug.php?id=40720 has been marked as duplicate of this one.
[23 Jan 2009 1:47] Patrick Crews
Error class 2 is due to the use of UNIX commands in the test suite.

See #38831 [Com]: 11 test cases fail on Windows due to missing commands
Currently working on removing UNIXisms from the test suite or ensuring a test is temporarily disabled until it can work properly on Windows without resorting to UNIX commands.
[15 Jun 2009 11:00] Patrick Crews
Ran the test suite multiple times, using both Cygwin and the standard Windows command shell.

My experiments were run on Windows XP, 32 bit.

The sporadic, 'server went away' failures seem to point to other issues than faulty tests, but I personally have not seen such issues occurring on a regular basis, either on my own machines or on Pushbuild.
If these issues occur again, please open a new bug.  Please note whether this is a constant failure or random, and if it seems limited to certain tests (if possible).

The 'rm' does not exist type of failures have been corrected as a fix for another bug: Bug#38311	Some tests use 'rm', which is not portable.  I have removed the use of Unix-specific calls from the 5.0 test suite and we have taken some steps to prevent such calls from being added back in.

The first class of failures does seem to be similar to Bug#33114, but feedback on that bug is pointing to issues with anti-virus software rather than the server itself.  If these issues continue, please update the aforementioned bug or open a new one.
[14 Jul 2009 23:00] Alexander Ljungberg
I do not believe this bug is related to anti virus. While I haven't run this test in particular I regularly see the same error message on the Windows platform with ALTER TABLE: several subsequent ALTER TABLEs to the same table sometimes causes this error. Using the exact same database and exact same sequence of commands sometimes succeeds, sometimes does not. The error appears to be timing sensitive - waiting or restarting mysql between ALTER TABLE commands seems to alleviate the problem.

Most likely it's indeed caused by Window's delete pending behavior. The work around is to retry the alter table command until it succeeds.