Bug #47055 unconditional exit(1) on ERROR_WORKING_SET_QUOTA 1453 (0x5AD) for InnoDB backend
Submitted: 2 Sep 2009 8:11 Modified: 19 Jun 2010 17:52
Reporter: George Danchev Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S2 (Serious)
Version:5.0.51b OS:Microsoft Windows
Assigned to: Satya B
Triage: Triaged: D2 (Serious)

[2 Sep 2009 8:11] George Danchev
Description:
We use MySQL v 5.0.51b  and libmysql v.5.0.19 linked with our applications under Windows 2003.
We have a database with multiple InnoDB and MyISAM tables.

The following problem occurs on several of our server which appears to be related to the InnoDB tables.
Here is an excerpt from MySQL error log:

090624 20:35:04  InnoDB: Operating system error number 1453 in a file operation.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.0/en/operating-system-error-codes.html
InnoDB: File name .\ibdata1
InnoDB: File operation call: 'aio write'.
InnoDB: Cannot continue operation.
InnoDB: Log scan progressed past the checkpoint lsn 0 3192273582
090624 20:38:52  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 0 3192289463
090624 20:38:53  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 

53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 51270062, file name binlog_slave.000685
InnoDB: Last MySQL binlog file position 0 48504357, file name D:\REPLICATION\LOGS_BIN_MYSQL\binlog_slave.000524
090624 20:38:54  InnoDB: Started; log sequence number 0 3192289463
090624 20:38:54 [Note] Recovering after a crash using D:/REPLICATION/LOGS_BIN_MYSQL/binlog_slave
090624 20:38:54 [Note] Starting crash recovery...
090624 20:38:54 [Note] Crash recovery finished.
090624 20:38:55 [Note] D:\MySQL_sever\bin\mysqld-nt: ready for connections.
Version: '5.0.51b-community-nt-log'  socket: ''  port: 3306  MySQL Community Edition (GPL)

We looked at the source of MySQL 5.0.51b and realized that the following code from innobase/os/os0file.c is involved:
a function named os_file_handle_error_cond_exit() is being called by the os_file_get_last_error() and which calls a Windows specific function GetLastError().
It  happens that the latter function returns ERROR_WORKING_SET_QUOTA 1453 (0x5AD) - Insufficient quota to complete the requested service [1], which leads to 

unconditional exit(1) later within os_file_handle_error_cond_exit(). 

Our questions being:

1) Is that return code of ERROR_WORKING_SET_QUOTA 1453 (0x5AD) bogus as returned by the operating system, since we do not even use quotas?
2) Why the rest of possible return values of GetLastError() are not checked as well, but an unconditional exit(1) is preffered?

From the log file we can see File operation call: 'aio write', which might be a failed asynchronous write file operation.

We also found similar problem in MSSQL when the ERROR_WORKING_SET_QUOTA 1453 (0x5AD) is returned [2]. 

Citing from the above document:
-------------------------------------
Resource-based retries
SQL Server 2000 performs read retries only when beginning the read operations fails and returns an operating system error of ERROR_WORKING_SET_QUOTA (1453) 

or ERROR_NO_SYSTEM_RESOURCES (1450). Unlike the SQL Server 2005 enhancements, SQL Server 2000 does not try any other form of read retry other than sort 

failures.
When the error occurs, the SQL Server worker yields for 100ms and tries the read operation again. This loop continues until the I/O is successfully issued. A 

simplified example demonstrates this behavior.

WHILE(		FALSE == ReadFile() 
    && (1450 == GetLastError() ||  1453 == GetLastError()) 
)
{
    Yield(100);
}

SQL Server 2005 maintains the same logic when it is trying to start a read operation. 
-------------------------------------

Should MySQL insist on writing to the file, until GetLastError() reports an successful operation ? We also had a look at MySQL 6.0 code and it is roughtly the same error handling.

[1] http://msdn.microsoft.com/en-us/library/ms681385(VS.85).aspx
[2] http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/SQLIOBas...

How to repeat:
It is probably related to the InnoDB under high load under Windows.

Suggested fix:
Just looking at the code in innobase/os/os0file.c it should be clear that either my OS returns bad error code or either the MySQL daemon does not check them all and behave appropriately.
[3 Sep 2009 16:06] Sveta Smirnova
Thank you for the report.

> 1) Is that return code of ERROR_WORKING_SET_QUOTA 1453 (0x5AD) bogus as returned by the operating system, since we do not even use quotas?

We can't know this, because this can be some problem in your environment. Please check OS log files. Also version 5.0.51 is old and many bugs were fixed since. It makes sense to upgrade to current version 5.0.85.

> 2) Why the rest of possible return values of GetLastError() are not checked as well, but
an unconditional exit(1) is preffered?

As I understand this is not possible to check every single OS error, because always would be unexpected things. There is a sense to do it only for particular errors which can be fixed while server is running.
[4 Sep 2009 8:13] George Danchev
> Thank you for the report.

> > 1) Is that return code of ERROR_WORKING_SET_QUOTA 1453 (0x5AD) bogus as 
> > returned by the operating system, since we do not even use quotas?

> We can't know this, because this can be some problem in your environment.  
> Please check OS log files. Also version 5.0.51 is old and many bugs 
> were fixed since. 

As I already wrote, we faced that problem on several of our servers. 
There are some more bugs reported as ours, though not in your bug tracking system:

http://forums.mysql.com/read.php?132,237235,237235
http://forums.theplanet.com/index.php?showtopic=85485

> It makes sense to upgrade to current version 5.0.85.

The relevant code found in os/os0file.c is the same. It is the same even in 6.0.

> > 2) Why the rest of possible return values of GetLastError() are not checked 
> > as well, but an unconditional exit(1) is preffered?

> As I understand this is not possible to check every single OS error, because > always would be unexpected things. There is a sense to do it only for 
> particular errors which can be fixed while server is running.

How it is possible for MSSQL to deal with that, but MySQL can not? Please, have a deeper look at the example I provided in my previous message.
[4 Sep 2009 8:59] Sveta Smirnova
Thank you for the feedback.

> As I already wrote, we faced that problem on several of our servers. 

MySQL just uses OS error code. This is OS what prevent io operation. This is why I ask to check OS error log.

> The relevant code found in os/os0file.c is the same. It is the same even in 6.0.

If this is MySQL bug real problem is not in error handling, but wrong operation before error occurs. It is not clear from description if this is the case, but can be occasionally solved.

> How it is possible for MSSQL to deal with that, but MySQL can not? Please, have a deeper
look at the example I provided in my previous message.

I can verify this report as feature request "Please handle ERROR_WORKING_SET_QUOTA error". Let us know if this is your only concern regarding to this situation.
[4 Sep 2009 9:56] George Danchev
> MySQL just uses OS error code. This is OS what prevent io operation. This is > why I ask to check OS error log.

Okay, that makes sense. We will be waiting for the problem to occur, and then grab Event Viewer | System and Application logs.

> > The relevant code found in os/os0file.c is the same. It is the same even in > >6.0.

> If this is MySQL bug real problem is not in error handling, but wrong 
> operation before error occurs. It is not clear from description if 
> this is the case, but can be occasionally solved.

It occures upon os_file_write() operation. If the 'write operation' fails, then why not just sleep() for a while and try again.

> > How it is possible for MSSQL to deal with that, but MySQL can not? Please, 
> >have a deeper look at the example I provided in my previous message.

> I can verify this report as feature request "Please handle 
> ERROR_WORKING_SET_QUOTA error".
> Let us know if this is your only concern regarding to this situation.

It makes no big difference how you classified the bug-report, when your SQL daemon performs exit(1) and leave the clients speechless. Yes, it is a request to handle ERROR_WORKING_SET_QUOTA and all possible return codes of GetLastError(), since they are already a stable set of codes available at:

http://msdn.microsoft.com/en-us/library/ms681381(VS.85).aspx

I know that it is not always possible to recover, but the codes MySQL currently handle is just far from a sensible set of possible failures. 

Otherwise, either we change the daemon or the OS as well. Sad, but true.
[4 Sep 2009 18:34] Sveta Smirnova
Thank you for the feedback.

Verified as feature request "Please handle ERROR_WORKING_SET_QUOTA more smart, for example, like reporter suggested in the initial description".
[14 Oct 2009 14:39] Bugs System
Pushed into 5.1.41 (revid:joro@sun.com-20091014143611-cphb0enjlx6lpat1) (version source revid:satya.bn@sun.com-20091009140218-24h3v55dgsxgs609) (merge vers: 5.1.40) (pib:13)
[22 Oct 2009 6:36] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091022063126-l0qzirh9xyhp0bpc) (version source revid:alik@sun.com-20091019135554-s1pvptt6i750lfhv) (merge vers: 6.0.14-alpha) (pib:13)
[22 Oct 2009 7:09] Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091022060553-znkmxm0g0gm6ckvw) (version source revid:alik@sun.com-20091019131022-2o2ymjfjjoraq833) (merge vers: 5.5.0-beta) (pib:13)
[2 Nov 2009 23:11] Calvin Sun
Before the fix, when having a failed IO operation with return code of ERROR_WORKING_SET_QUOTA from the Windows operating system, InnoDB will intentionally crash the server. Now, InnoDB will sleep for 100ms and retry the failed operation.
[12 Nov 2009 4:00] Paul Dubois
Noted in 5.1.41, 5.5.0, 6.0.14 changelogs.

On WIndows, when a failed I/O operation occurred with return code of
ERROR_WORKING_SET_QUOTA, InnoDB intentionally crashed the server. Now
InnoDB sleeps for 100ms and retries the failed operation.
[18 Dec 2009 10:39] Bugs System
Pushed into 5.1.41-ndb-7.1.0 (revid:jonas@mysql.com-20091218102229-64tk47xonu3dv6r6) (version source revid:jonas@mysql.com-20091218095730-26gwjidfsdw45dto) (merge vers: 5.1.41-ndb-7.1.0) (pib:15)
[18 Dec 2009 10:54] Bugs System
Pushed into 5.1.41-ndb-6.2.19 (revid:jonas@mysql.com-20091218100224-vtzr0fahhsuhjsmt) (version source revid:jonas@mysql.com-20091217101452-qwzyaig50w74xmye) (merge vers: 5.1.41-ndb-6.2.19) (pib:15)
[18 Dec 2009 11:09] Bugs System
Pushed into 5.1.41-ndb-6.3.31 (revid:jonas@mysql.com-20091218100616-75d9tek96o6ob6k0) (version source revid:jonas@mysql.com-20091217154335-290no45qdins5bwo) (merge vers: 5.1.41-ndb-6.3.31) (pib:15)
[18 Dec 2009 11:23] Bugs System
Pushed into 5.1.41-ndb-7.0.11 (revid:jonas@mysql.com-20091218101303-ga32mrnr15jsa606) (version source revid:jonas@mysql.com-20091218064304-ezreonykd9f4kelk) (merge vers: 5.1.41-ndb-7.0.11) (pib:15)
[5 May 2010 15:24] Bugs System
Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[6 May 2010 17:43] Paul Dubois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.
[28 May 2010 6:14] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)
[28 May 2010 6:42] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[28 May 2010 7:10] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[29 May 2010 22:41] Paul Dubois
Push resulted from incorporation of InnoDB tree. No changes pertinent to this bug.
Re-closing.
[15 Jun 2010 8:21] Bugs System
Pushed into 5.5.5-m3 (revid:alik@sun.com-20100615080459-smuswd9ooeywcxuc) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (merge vers: 5.1.47) (pib:16)
[15 Jun 2010 8:38] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100615080558-cw01bzdqr1bdmmec) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (pib:16)
[17 Jun 2010 12:20] Bugs System
Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:08] Bugs System
Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)
[17 Jun 2010 13:48] Bugs System
Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)