MySQL Bugs: #34796: ioctl crash under solaris with zfs

Bug #34796	ioctl crash under solaris with zfs
Submitted:	24 Feb 2008 22:31	Modified:	4 Jun 2009 11:53
Reporter:	benjamin grant	Email Updates:
Status:	No Feedback	Impact on me:	None
Category:	MySQL Proxy: Core	Severity:	S1 (Critical)
Version:	0.6.1	OS:	Solaris (SunOS 5.11 x86)
Assigned to:		CPU Architecture:	Any
Tags:	ioctl solaris zone

Description:
ioctl(53, FIONREAD, ...) said there is something to read, oops: 18011
ioctl(53, FIONREAD, ...) said there is something to read, oops: 14602
ioctl(53, FIONREAD, ...) said there is something to read, oops: 557
[debug] (command) unhandled type COM_STATISTICS
ioctl(53, FIONREAD, ...) said there is something to read, oops: 3097
ioctl(53, FIONREAD, ...) said there is something to read, oops: 9
[debug] (command) unhandled type COM_STATISTICS
network-mysqld-proxy.c.3412: COM_(0x03), packet 11 should not be (NULL|EOF), got: fffffffe

How to repeat:
Not sure how to repeat this outside of our env.  Might have something to do with ZFS -- don't know.

We get it when we use siege to generate a load/concurrency test to a single master / dual slave setup with mysql proxy in front of it.     Our app generates a fair amount of queries at that level -- hundreds/sec.  Mostly all reads, but good amount of updates/inserts as well due to session instantiation etc.

It seems to happen more reliably with a concurrency level of 32 or more. 

This is what we're starting mysql-proxy with:

#!/usr/bin/bash

export EVENT_NOEVPORT=1

LUA_PATH="/opt/csw/share/mysql-proxy/?.lua" \
/opt/csw/sbin/mysql-proxy \
--proxy-backend-addresses=10.17.86.210:3306 \
--proxy-read-only-backend-addresses=10.12.43.194:3306 \
--proxy-read-only-backend-addresses=10.12.43.195:3306 \
--daemon \
--pid-file=/opt/csw/var/run/mysql-proxy.pid \
--proxy-lua-script=/opt/csw/share/mysql-proxy/rw-splitting.lua \

We get this with EVENT_NOEVPORT = 0 also.

The host is an solaris zone over at Joyent.

We get this problem with the binary distro of 0.6.1 AND with our own locally compiled version against the latest libevent etc.

Thank you for the report.

Do you do something before get this error or you get it immediately after start of MySQL Proxy?

We are also having the same problem but under FC5. The proxy outputs

[debug] (command) unhandled type COM_STATISTICS

while on the application side the queries return errors:

Lost connection to mysql.

our thread concurrency is 6.

this usually happens during peak hours where our qps is in the 500 and inserts/updates are very high.

PS.
Reading the commands.lua file, am I right to infer that the com_statistics packet is from a real mysql server and not a client?

Thanks!

Yes, COM_STATISTICS should be from real server.

> Do you do something before get this error or you get it immediately after start of MySQL Proxy?

This error occurs(occured) after mysql proxy had been running for many minutes, under test loads being generated through our application via http benchmarking / testing tools (such as httperf, siege, apachebench, etc).    Query throughput was roughly several hundred/sec.

Could you try the latest code from the launchpad repository?

We have made many improvements to the code since 0.6.1

You can find the latest source code here
https://launchpad.net/mysql-proxy

Thank you.

Diego, we are no longer in a position to reproduce this, having migrated our systems to a different O/S and hardware.   We -are- using the latest mysqlproxy now in both testing and production environments and are not experiencing this issue, but that's most likely irrelevant to this report, since we're on an entirely different platform now.  

If someone who has a mysql deployment at a Joyent accelerator using InnoDB (file-per-table) storage situated on a ZFS filesystem can test the current mysqlproxy at equivalent levels of throughput (high hundreds/low thousands of queries per sec), that might be relevant.    All I can add at this point is that when queried, the folks managing the ZFS storage systems indicated the block size was -not- set to InnoDB's block size (16).  Whether that's a factor or not remains unproven.

No feedback was provided. The bug is being suspended because we assume that you are no longer experiencing the problem. If this is not the case and you are able to provide the information that was requested earlier, please do so and change the status of the bug back to "Open". Thank you.