Bug #35163 memlock make raw disk partition unaccessable
Submitted: 8 Mar 2008 16:04 Modified: 30 Oct 2008 17:57
Reporter: Bin Tian Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Documentation Severity:S3 (Non-critical)
Version:5.0.32 OS:Linux
Assigned to: Paul DuBois CPU Architecture:Any

[8 Mar 2008 16:04] Bin Tian
Description:
I have already add mysql to additional groups who can access the raw disk partition. Disable --memlock, mysql can access the disk, but not if enable it.

I hooked the open call and execute 'id' in it. 

mailgw-01:~# LD_PRELOAD=/root/aa.so HOME=/etc/mysql /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --pid-file=/var/run/mysqld/mysqld.pid --skip-external-locking --port=3306 --socket=/var/run/mysqld/mysqld.sock
open(/dev/sdb1, 00000002, 0000003e)
uid=0(root) gid=0(root) euid=101(mysql) egid=103(mysql) groups=0(root)
080308 23:03:09  InnoDB: Operating system error number 13 in a file operation.
InnoDB: The error means mysqld does not have the access rights to
InnoDB: the directory.
InnoDB: File name /dev/sdb1
InnoDB: File operation call: 'open'.
InnoDB: Cannot continue operation.

You can see that mysqld process' user identity information is uid=0(root) gid=0(root) euid=101(mysql) egid=103(mysql) groups=0(root). 

It's uid=101(mysql) gid=103(mysql) groups=25(floppy),103(mysql), if memlock is disabled.

The work-around is to disable memlock or add root user to group which can access the raw disk.

How to repeat:
1. add a block device to innodb table space.
2. add mysql to specific group which can access the block device.
3. disable memlock
4. start mysqld as user mysql. you can see mysqld started successfully.
5. stop mysqld
6. enable memlock
7. start mysqld as user mysql. you can see mysqld won't start.

or you can try this

1. add a block device to innodb table space.

mailgw-01:~# sudo -u mysql id
uid=101(mysql) gid=103(mysql) groups=25(floppy),103(mysql)
mailgw-01:~# sudo -u mysql dd if=/dev/sdb1 of=/tmp/out count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 5.1e-05 seconds, 10.0 MB/s
mailgw-01:~# cp -a /usr/bin/id /tmp/
mailgw-01:~# cp -a /bin/dd /tmp/
mailgw-01:~# chown mysql.mysql /tmp/id /tmp/dd
mailgw-01:~# chmod g+s,u+s /tmp/id /tmp/dd
mailgw-01:~# /tmp/id
uid=0(root) gid=0(root) euid=101(mysql) egid=103(mysql) groups=0(root)
mailgw-01:~# /tmp/dd if=/dev/sdb1 of=/tmp/out count=1
/tmp/dd: opening `/dev/sdb1': Permission denied
mailgw-01:~#

Suggested fix:
call set_user in mysqld.cc even if memlock is enabled.
[8 Mar 2008 16:21] Bin Tian
Adding root user to group which can access the disk may have security issues. And it doesn't work during the system booting. Because the booting process won't call `init_groups`.

It works only if the root user login and manually start mysql. You can modify mysql startup scripts and use sudo to start mysqld. Then it will work during system booting.
[10 Mar 2008 13:28] Heikki Tuuri
Why do you want to use raw partitions? Usually, there is no performance benefit. Management of raw partitions is more diffult than files.

Regards,

Heikki
[11 Mar 2008 0:19] Bin Tian
For your question:
I have another server which connected to SAN. I'm planning to allocate some paticular LUNs to mysql as innodb tablespace. It's quite easy to manage LUNs. Before doing that, I'my tring to do some tests on a normal PC server (not connected to SAN). I use partitions for LUNs.

My question:
What you said is quite different from MYSQL manual. It says `a significant performance gain can be achieved by placing InnoDB data files and log files on raw devices or on a separate direct I/O UFS filesystem (using mount option forcedirectio; see mount_ufs(1M))`. Do I have misunderstanding or is there any other any other tips I didn't find?

For the problem:
When memlock is enable, mysqld.cc calls set_effective_user instead of set_user. To correct it, we can call initgroups in set_effective_user as in set_user. Maybe we should getgroups before initgroups (to get root user's groups), after call initgroups, call getgroups again (to get mysql user's groups). Finally, we merge that and call setgroups.
[11 Mar 2008 14:43] Heikki Tuuri
Bin,

you are right that the manual is incorrect about the benefits of raw disk partitions. I am a bit worried if you use a little-used feature in a SAN. Our impression is that SAN's usually are reliable persistent storage; that is, there are not too blatant bugs in the implementation of fsync() there. But a little-used feature might not be as reliable.

Note that we discourage people from using NFS, because too many reliability problems have been reported over years.

Paul,

I am setting this bug report to the 'Documenting' status. Please remove from the manual the claims that raw disk partitions give better performance. I have seen 1 % improvement in some benchmarks.

Regards,

Heikki
[11 Mar 2008 14:44] Heikki Tuuri
Bin,

should we also add something about the memlock to the manual?

--Heikki
[12 Mar 2008 0:18] Bin Tian
Heikki, 

Thanks for your explanation. You are right, reliability is more important than performance in some situation.

memlock option do have some filesystem access problem. Under Linux and some other os, a process has supplementary group IDs in addition to the effective group ID. The OS use it to determining file access permissions. somebody maybe use this feature along with mysql to get more flexibility. But if they enable memlock, they would be confused why it doesn't work.

The reason is that when memlock is enabled, mysqld change only the process' effective user&group, the supplementary group IDs are leaved unchanged. In my opion, we should fix it rather than document it.

BTW: This bug's category should be changed to `Server`. It's not innodb's problem.
[30 Oct 2008 17:57] Paul DuBois
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.

* Pointed out that --memlock affects raw partition access
* Pointed out that raw partitions do not necessarily improve performance