MySQL Bugs: #346: assertion in raid.cc:160

Bug #346	assertion in raid.cc:160
Submitted:	30 Apr 2003 1:44	Modified:	8 Feb 2005 19:42
Reporter:	Lenz Grimmer	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Server: ISAM storage engine	Severity:	S3 (Non-critical)
Version:	4.0.12	OS:	Linux (Red Hat Linux 7.2)
Assigned to:		CPU Architecture:	Any

Description:
ok, I've seen that problem in the archives but there was no solution to it. 
I'm running the 4.0.12-max RPMs on my redhat-7.2-system. 
Every now and then (currently nearly each day), it crashes with something like 
the following lines: 
######### 
mysqld-max: raid.cc:160: my_off_t my_raid_seek(int, long long unsigned int, 
int, int): Assertion `pos != (~(my_off_t) 0)' failed. 
 
Number of processes running now: 1 
mysqld-max process hanging, pid 10176 - killed 
030426 01:15:16  mysqld restarted 
030426  1:15:16  InnoDB: Database was not shut down normally. 
InnoDB: Starting recovery from log files... 
InnoDB: Starting log scan based on checkpoint at 
InnoDB: log sequence number 0 199454254 
InnoDB: Doing recovery: scanned up to log sequence number 0 199454254 
030426  1:15:17  InnoDB: Flushing modified pages from the buffer pool... 
030426  1:15:17  InnoDB: Started 
/usr/sbin/mysqld-max: ready for connections. 
Version: '4.0.12-Max'  socket: '/var/lib/mysql/mysql.sock'  port: 3306 
 
######## 
 
Its not that bad as its getting restarted but I often have corrupted tables 
because of this. 
 
I'm using MyISAM-tables only and no raid-functions or similar. Just using the 
max-binary because 
I have some bdb-tables in the tree which created crashes without -max ;-). 
 
There are no errors/warnings shown before or after that crash (beside the same 
lines for the other ones ...). 
 
Any idea on how to fix or at least trace this problem? 
 
Is there a way to disable the raid-code with a config-option? 
 

How to repeat:

For the time being, we will disable RAID for the 4.x Max binaries again (starting with 4.0.13), until 
this bug is fixed.

It seems to be hitting other people too: 
 
David Garamond wrote: 
>> On Fri, Apr 04, 2003 at 01:22:39PM +0700, David Garamond wrote: 
>> 
>>> i found this on my server log: 
>>> 
>>> mysqld-max: raid.cc:160: my_off_t my_raid_seek(int, long long 
>>> unsigned int, int, int): Assertion `pos != (~(my_off_t) 0)' failed. 
>>> 
>>> and then mysqld shuts down. i start it again but after a short while 
>>> the same error appears and mysqld stops again. what does this 
>>> indicate? a disk failure? 
>> 
>> Oh, good.  It's not just the machines at Yahoo, then. 
>> 
>> I haven't looked into it much yet, but we had a machine hit that a few 
>> times.  That made me realize that I had been building our MySQL 
>> servers with raid support.  We don't have any need for it, so I've 
>> removed it.  But clearly something is funky with the raid code. 
>> 
>> I've yet to figure out a way to reproduce the bug.  Well, I have't 
>> tried very hard either... 
>> 
>> Any chance you can?  If so, getting it fixed shouldn't be a problem. 
> 
> we are using 4.0.12, binary RPMs provided at mysql.com. the machine got 
> rebooted and reiserfsck shows some errors. i guess we'll be replacing 
> the disk with another one for now... 
 
after the filesystem is clean, mysqld is still behaving the same. since 
we could not afford to have any more downtime, i downgraded the 
installation to 3.23.xx (it's 3.23.54a, not the latest but the RPM files 
were lying around so i just used them). the system's been running nicely 
since then. so i guess it's probably the 4.0.x code. that's all i could 
say for now.

To be able to solve this, we need a resolved stack trace when the assert happens.

The problem is not in the raid code but in some other code that sends a wrong value to my_seek();  By pure chance the normal my_seek() code can handle this case gracefully, but we would relally like a stack trace or a test case to find the code that calls my_seek() with a wrong value.

No feedback was provided. The bug is being suspended because
we assume that you are no longer experiencing the problem.
If this is not the case and you are able to provide the
information that was requested earlier, please do so and
change the status of the bug back to "Open". Thank you.