Bug #46461 Running test case archive_aio_posix with valgrind takes several minutes
Submitted: 29 Jul 2009 20:08 Modified: 22 Sep 2009 10:15
Reporter: Davi Arnaut (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Archive storage engine Severity:S5 (Performance)
Version:azalea-bzr OS:Linux
Assigned to: Davi Arnaut CPU Architecture:Any

[29 Jul 2009 20:08] Davi Arnaut
Description:
When running the test suite with valgrind the archive_aio_posix test case takes around 60 minutes to complete, causing the test suite to fail due to timeout -- happens often on PB2.

The problem is due to a busy wait (spin) inside the archive storage engine. Profiling revealed a lot of time spent inside the azio_ready function, which spins waiting for a thread to change status and locking and unlocking a mutex each time if verifies the status.

Unfortunately, this is bound to perform badly if scheduling is not good enough (starvation). Just for sake of testing, i placed a pthread_yield within the loop and the run time for the test with valgrind came down to less then one minute.

How to repeat:
./mtr --valgrind-mysqld archive_aio_posix

Suggested fix:
Since the pthread_yield is just a hack of playing games with the scheduler, i guess we can use the azio container condition variable to also signal status changes. As a bonus point, the patch also reduces the runtime of the test from 8 seconds to 1 second on a old non-smp Pentium 4 2.20GHz.

Patch attached, see below.
[29 Jul 2009 20:11] Davi Arnaut
Wait for thread status change.

Attachment: archive-azio-spin.patch (application/octet-stream, text), 1.23 KiB.

[7 Aug 2009 19:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/80392

3522 Davi Arnaut	2009-08-07
      Bug#46461: Running test case archive_aio_posix with valgrind takes several minutes
      
      A busy wait (spin) inside the archive storage engine might cause
      a excessive slowdown when running the server with valgrind. The
      problem can also show up (to a lower extent) in single cpu systems.
      
      The busy wait could occur when the storage engine used asynchronous
      I/O, where it used busy waiting to repeatedly check if pending I/O
      for a table had been flushed.
      
      The solution is to replace the busy wait with a synchronized wait
      that sleeps on a condition variable waiting for the I/O thread to
      signal once pending I/O has been flushed.
     @ storage/archive/azio.c
        Replace busy wait with a pthread waiting mechanism.
[21 Aug 2009 21:06] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/81352

3546 Davi Arnaut	2009-08-21
      Bug#46461: Running test case archive_aio_posix with valgrind takes several minutes
      
      A busy wait (spin) inside the archive storage engine might cause
      a excessive slowdown when running the server with valgrind. The
      problem can also show up (to a lower extent) in single cpu systems.
      
      The busy wait could occur when the storage engine used asynchronous
      I/O, where it used busy waiting to repeatedly check if pending I/O
      for a table had been flushed.
      
      The solution is to replace the busy wait with a synchronized wait
      that sleeps on a condition variable waiting for the I/O thread to
      signal once pending I/O has been flushed.
     @ storage/archive/azio.c
        Replace busy wait with a pthread waiting mechanism.
[21 Aug 2009 21:14] Davi Arnaut
Queued to mysql-pe
[14 Sep 2009 16:05] Bugs System
Pushed into 5.4.4-alpha (revid:alik@sun.com-20090914155317-m1g9wodmndzdj4l1) (version source revid:alik@sun.com-20090914155317-m1g9wodmndzdj4l1) (merge vers: 5.4.4-alpha) (pib:11)
[22 Sep 2009 10:15] Tony Bedford
An entry has been added to the 5.4.4 changelog:

When running the test suite with Valgrind the archive_aio_posix test case took approximately 60 minutes to complete, causing the test suite to fail due to timeout.
[2 Apr 2013 18:46] Linhai Song
I am quite interested in this bug. I hope I can use it in my concurrent performance tuning project. Could you help me find the buggy version? 

I have checked the source code of mysql-5.4.3, but it does not contain the buggy code. 

Thanks a lot!