Bug #38970 Crash in function called from falcon_init when running test cases
Submitted: 22 Aug 2008 17:45 Modified: 15 May 2009 12:52
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Falcon storage engine Severity:S1 (Critical)
Version:6.0-rpl OS:Any
Assigned to: Vladislav Vaintroub CPU Architecture:Any
Tags: 6.0-rpl-green, core, crash, F_ERROR HANDLING, replication, test failure

[22 Aug 2008 17:45] Sven Sandberg
Description:
When running the test suite in 6.0-rpl locally, I got coredumps inside falcon code during the slave's startup in five replication tests. The coredumps are not repeatable every time.

I will attach the stack traces (gdb /path/to/mysqld /path/to/core --eval-command='thread apply all bt' --batch).

How to repeat:
Run the suite in 5.1-rpl.
[22 Aug 2008 17:47] Sven Sandberg
list of stack traces for five crashes

Attachment: stack-traces (application/octet-stream, text), 31.18 KiB.

[22 Aug 2008 18:11] Sven Sandberg
Correction: the coredumps did not only happen in the startup of slave servers, it happened also in the startup of master servers.
[25 Aug 2008 2:11] Kevin Lewis
This does not look like a Falcon issue as likely as a disk or IO system issue.  The errors are either IO::writePages,IO.cpp:338; "write error on page %d (%d/%d/%d) of \"%s\": %s (%d)") which sets a fatal error flag, after which the other type of error will occur to other threads;  IO::writePages, IO.cpp:308, "can't continue after fatal error".  

Since the error is not consistent, my guess is that it is not in the way the file is opened.
[25 Aug 2008 17:12] Vladislav Vaintroub
Found this in the stacktraces...

[Falcon] Error: write error on page 0 (4096/4096/4) of "/home/sven/bzr/merge/6.0-rpl_from_5.1-rpl/mysql-test/var/2/mysqld.2/data/falcon_master.fts": Input/output error (5)

So, pwrite is getting IO error- errno 5. I've no idea how can this be. wild guess is that file descriptor 5 was opened by falcon , then closed by somebody else and we're doing pwrite on socket or similar
[25 Aug 2008 17:13] Vladislav Vaintroub
file descriptor 5 should really  befile descriptor 4 in previous comment.
[13 Nov 2008 18:52] Sven Sandberg
Note that only crash 3 and crash 4 contain the text "Error: write error on page 0 (4096/4096/4)..."

Crash 1, 2, and 5 instead contain the text "can't continue after fatal error" in the stack trace. I just reproduced crash 1/2/5 on my local machine in 6.0-rpl, which has BUG#39458 fixed. So this is not a symptom of BUG#39458.
[13 Nov 2008 18:56] Sven Sandberg
stack trace from crash 6

Attachment: stacktrace (application/octet-stream, text), 7.86 KiB.

[13 Nov 2008 20:22] Kevin Lewis
Sven,  I assume that you now consider this bug verified.  Has it happened again since Aug 25?  Assigning this to Vlad to look into.  If this is a one time problem we may have to make it Can't Repeat.
[14 Nov 2008 8:45] Sven Sandberg
Kevin, yes crash#6 happened yesterday with a 6.0-rpl tree. With 6.0-rpl it usually happens at least a couple of times each time I run the suite. Let me know if I shall try to repeat it with 6.0 main.
[14 Nov 2008 11:24] Vladislav Vaintroub
Sven, please try to reproduce with main. it is not a usual crash and looks very like as if somebody (pointing to rpl ;)) closes file descriptors that belong to falcon.
[18 Dec 2008 12:33] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/61964

2945 Vladislav Vaintroub	2008-12-18
       Bug #38970 Crash in function called from falcon_init when running test cases 
      Problem: Upon encountering IO errors, falcon crashes with assert.
      Solution:Instead of assert, throw an exception. This allows to more graceful error handling during
      Falcon startup .Error text will be written into error log and Falcon will not load.
[18 Dec 2008 13:04] Vladislav Vaintroub
Pushed into falcon-team
[13 Feb 2009 7:25] Bugs System
Pushed into 6.0.10-alpha (revid:alik@sun.com-20090211182317-uagkyj01fk30p1f8) (version source revid:hky@sun.com-20081218223730-ujuygclo2fezfurq) (merge vers: 6.0.9-alpha) (pib:6)
[15 May 2009 12:52] MC Brown
A note has been added to the 6.0.10 changelog: 

When the Falcon storage engine encountered an I/O error, mysqld would crash. Errors now raise an exception, which is reported to the error log and Falcon will fail to initialize.