Bug #12061 CPU usage
Submitted: 20 Jul 2005 15:06 Modified: 9 Apr 2008 9:29
Reporter: Pascal Gloor Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S3 (Non-critical)
Version:4.0.24 OS:FreeBSD (FreeBSD 5.2.1)
Assigned to: CPU Architecture:Any

[20 Jul 2005 15:06] Pascal Gloor
Description:
I'm running MySQL 4.0.24 with FreeBSD 5.2.1 and have the same issue as in this case:

http://bugs.mysql.com/bug.php?id=414

  PID USERNAME PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
84666 mysql    129    0   166M 84860K RUN    0 1837.4 76.61% 76.61% mysqld

it happens after 2-3 weeks. Restarting mysql resolves the problem
for 2-3 weeks.

I've been doing further investigation and noticed a few things:

1) strace -c -p <pid>

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.98    0.326158          25     12932           poll
 48.69    0.311519          24     12935           gettimeofday
  0.20    0.001308         327         4           write
  0.07    0.000422          21        20         8 read
  0.04    0.000278          17        16           fcntl
  0.02    0.000128          16         8           lseek
  0.00    0.000015          15         1           clock_gettime
------ ----------- ----------- --------- --------- ----------------
100.00    0.639828                 25916         8 total

we can note that poll and gettimeofday are probably the reason for taking all
that cpu time, but probably not the source of the problem.

2) strace -tt -p <pid>

16:33:28.100282 gettimeofday({1121870008, 100334}, NULL) = 0
16:33:28.100777 poll([{fd=3, events=POLLRDNORM, revents=POLLRDNORM}, {fd=42,
events=POLLRDNORM}, {fd=50, events=POLLRDNORM}, {fd=5, events=POLLRDNORM},
{fd=6, events=POLLRDNORM}, {fd=31, events=POLLRDNORM}, {fd=27,
events=POLLRDNORM}, {fd=155, events=POLLRDNORM}, {fd=154, events=POLLRDNORM},
{fd=153, events=POLLRDNORM}, {fd=104, events=POLLRDNORM}, {fd=102,
events=POLLRDNORM}, {fd=11, events=POLLRDNORM}], 13, 456) = 1
16:33:28.101175 gettimeofday({1121870008, 101282}, NULL) = 0
16:33:28.101620 poll([{fd=3, events=POLLRDNORM, revents=POLLRDNORM}, {fd=42,
events=POLLRDNORM}, {fd=50, events=POLLRDNORM}, {fd=5, events=POLLRDNORM},
{fd=6, events=POLLRDNORM}, {fd=31, events=POLLRDNORM}, {fd=27,
events=POLLRDNORM}, {fd=155, events=POLLRDNORM}, {fd=154, events=POLLRDNORM},
{fd=153, events=POLLRDNORM}, {fd=104, events=POLLRDNORM}, {fd=102,
events=POLLRDNORM}, {fd=11, events=POLLRDNORM}], 13, 455) = 1
16:33:28.102007 gettimeofday({1121870008, 102035}, NULL) = 0
16:33:28.102298 poll([{fd=3, events=POLLRDNORM, revents=POLLRDNORM}, {fd=42,
events=POLLRDNORM}, {fd=50, events=POLLRDNORM}, {fd=5, events=POLLRDNORM},
{fd=6, events=POLLRDNORM}, {fd=31, events=POLLRDNORM}, {fd=27,
events=POLLRDNORM}, {fd=155, events=POLLRDNORM}, {fd=154, events=POLLRDNORM},
{fd=153, events=POLLRDNORM}, {fd=104, events=POLLRDNORM}, {fd=102,
events=POLLRDNORM}, {fd=11, events=POLLRDNORM}], 13, 454) = 1

woahh... nice poll()ing in a loop. note the very short interval between the
functions execution. this is probably the reason of the high cpu usage. its
checking if it can read from a certain amount of FDs. poll returns 1, which
means there's one FD which can be read. we see its FD 3. who/what is FD 3?

3) fstat -v -p <pid>

USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
...
mysql    mysqld     84666    3* pipe ccbfc204 <-> cf7c6408  16384 rw
mysql    mysqld     84666    4* pipe cf7c6408 <-> ccbfc204      0 rw
...

ok, FD 3 is a pipe and FD 4 is the other end. it seems to have some data waiting
to be read on the FD 3 side.

I'm wondering what for this pipe is..

4) searching the libc pipe() function in the source

I found it only once, in mysqlmanager.c it seems to be used to communicate with
a child after fork()+exec() at startup.

In short, there's a pipe open, there's something to read but none is reading it.
the effect is that poll() is always returning >=1 (and doesnt do the expected "sleep until data or timeout") and takes all the CPU.

Has this been fixed yet?

How to repeat:
Hard to say, but it seems others had/have the same problem (regarding bug id 414). I hope my description of the problem is clear enough, otherwise, let me know.
[19 Sep 2005 22:09] Andreas Longwitz
I have the same problem runnig FreeBSD 4.11 Stable and MySQL 4.0.24.

The pipe you mentioned was not created by MySQL, it is the thread_kern_pipe from
the thread-library libc_r and is created in uthread_init.c. This pipe is needed to catch
signals being missed in calls to _select. 

Maybe the problem can be understand if libc_r is compiled with DEBUG_SIGNAL.
[20 Sep 2005 22:28] Andreas Longwitz
Using DEBUG for signals shows that the problem arises if a signal is
ignored in libc_r. Thats a bug in FreeBSD, a patch can be found at 
   http://www.FreeBSD.org/cgi/query-pr.cgi?pr=bin/32295
[1 Oct 2005 7:14] Greg Lehey
Nice analysis so far.  It does look as if the server may not be handling the return values of poll()
correctly.  It's not made any easier by the fact that it takes a long time before it reoccurs.
The obvious thing to do is to build the server with debug symbols (if it isn't already) and attach
a debugger when it happens.  That way we can at least find out where this is happening, and
hopefully why the return value isn't resulting in an I/O.
[1 Oct 2005 7:34] Valeriy Kravchuk
Pascal,

Have you tried to apply the patch to FreeBSD proposed in http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/32295? Does it really help in your case? Please, inform about the results, if any.
[2 Nov 2005 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[9 Apr 2008 9:29] Susanne Ebrecht
I'll close this issue now because it don't occur anymore by using newer MySQL with newer FreeBSD versions.

If you will against our expectactions still have problems, please, feel free to open this bug again.