Bug #72539 InnoDB mutex atomics implementation incorrect on POWER and ARM
Submitted: 5 May 2014 18:08 Modified: 12 May 2014 5:03
Reporter: Morgan Tocker Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S4 (Feature request)
Version:5.6+ OS:Any
Assigned to: Assigned Account CPU Architecture:ARM

[5 May 2014 18:08] Morgan Tocker
Description:
(Originally sent to internals@lists.mysql.com)

-----

Hi all,

seeing as the page to register an account for filing a MySQL bug doesn't
work (even from a text based browser) and just times out or gives a 500
Internal Server Error.... I'll "file" this one here.

So, at least for MySQL 5.6 (I haven't looked at previous versions, but I
strongly suspect the last few have this as well), the code in
sync0sync.ic for manipulating and loocking at mutex->lock_word is
missing a barrier.

This would affect multi CPU systems of both POWER and ARM (although I
haven't tried on ARM). I could incredibly reliably reproduce on a POWER8
system but likely also hit on POWER7.

I've reproduced *very* reliably with a frest MySQL 5.6.17 build (from
source) on a big endian host (e.g. Fedora) with a simple mysqlslap run:
mysqlslap --concurrency=128 --iterations=2 --number-int-cols=2
--number-char-cols=3 --auto-generate-sql --number-of-queries=10000

(this could probably be paired down)

What's interesting to note is that the test suite passes but a simple
very small stress test like the above makes the server hit an assert in
the InnoDB list code (in create read view) that's protected by.. yep,
one of these mutexes. Obviously nobody's hammering this too hard on
multicore ARM yet.

If you disable the use of GCC atomics for these mutexes and instead fall
back to pthread mutex, it all works fine.

The fix is fairly simple:
1) ib_mutex_test_and_set needs a __sync_synchronize() before the call to
os_atomic_test_and_set_byte()
2) mutex_reset_lock_word needs a __sync_synchronize() before
os_atomic_test_and_set_byte() call
3) and  __sync_synchronize() is needed after the read in
mutex_get_lock_word().

The barriers aren't required on x86.

MariaDB, Percona Server and Percona XtraBackup are probably also
affected, although I haven't tried these.

-----

How to repeat:
I've reproduced *very* reliably with a fresh MySQL 5.6.17 build (from source) on a big endian host (e.g. Fedora) with a simple mysqlslap run:
mysqlslap --concurrency=128 --iterations=2 --number-int-cols=2 --number-char-cols=3 --auto-generate-sql --number-of-queries=10000

Suggested fix:
The fix is fairly simple:
1) ib_mutex_test_and_set needs a __sync_synchronize() before the call to
os_atomic_test_and_set_byte()
2) mutex_reset_lock_word needs a __sync_synchronize() before
os_atomic_test_and_set_byte() call
3) and  __sync_synchronize() is needed after the read in
mutex_get_lock_word().
[5 May 2014 18:11] Morgan Tocker
Link to original thread with bug report from Stewart Smith:
http://lists.mysql.com/internals/38786
[6 May 2014 5:22] MySQL Verification Team
Thanks for the detailed report.  Looks like a duplicate I filed internally.
Bug 17573535 - INVESTIGATE INNODB OWN ATOMICS/CACHE/MEMORY COHERENCY ISSUE ON MIPS64
[6 May 2014 5:23] MySQL Verification Team
also: http://bugs.mysql.com/bug.php?id=47213
[12 May 2014 5:03] Yasufumi Kinoshita
This seems sibling problem problem with bug#47213.
I'd like to merge the discussion to that.

I have uploaded the experimental patch for mysql-5.6,
because I don't have account to use the problematic server and I cannot confirm by myself.

If you feel something problematic on your server (!x86 && !x86_64),
please test the patch and confirm the patch fixes the problem.

After confirmed that the some cases are fixed actually,
this bug will be fixed in public releases.

Thanks.