Bug #34175 | Use GCC-provided atomic lock operations to replace pthread_mutex_lock functions | ||
---|---|---|---|
Submitted: | 30 Jan 2008 20:24 | Modified: | 17 Mar 2009 11:48 |
Reporter: | Larry Zhou | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S4 (Feature request) |
Version: | 5.0 | OS: | Linux (2.6.18.5-gg16-mixed64-32) |
Assigned to: | Marko Mäkelä | CPU Architecture: | Any |
Tags: | atomic, innodb, lock, mutex, pthread, sync |
[30 Jan 2008 20:24]
Larry Zhou
[30 Jan 2008 20:26]
Larry Zhou
change OS details.
[31 Jan 2008 9:19]
Susanne Ebrecht
Many thanks for writing a feature request.
[5 Feb 2008 23:37]
Larry Zhou
Here are other things I did: 1. make lock_word smaller, using "byte" instead of "ulint". 2. remove os_fast_mutex_t in mutex_struct if lock_word is used for lock. That saved 40 bytes on 64-bit machine. I have tested above changes, and all tests I am aware of passed.
[5 Feb 2008 23:40]
Larry Zhou
My comment 3 ( private to MySQL developer) is not quite right. For WIN32 and WIN64, different inline assembly code is needed, since we are talking about size of pointer. I was mistaken.
[6 Feb 2008 15:23]
MySQL Verification Team
Thank you, Mr. Zhou, This is very interesting contribution. Few questions for me: * are there any solutions for GCC 3.*.* ??? * are there any solutions for Windows * are there any solutions for other CPU's, like SPARC ???
[6 Feb 2008 15:28]
Heikki Tuuri
Larry, does changing the lock word to a byte help? The processor may actually do more work to change a byte in memory than to change a 32-bit or 64-bit unsigned integer. I replaced the inline assembly for test-and-set around 2003 because gcc-2.96 (?) was not able to compile it. And using pthread_mutex_trylock() as 'test-and-set' is portable. But looks like it is time to go back to inline assembly again. I am assigning this feature request to Marko, who is right now working on this matter. Regards, Heikki
[6 Feb 2008 17:03]
Larry Zhou
For Sinisa Milivojevic: * are there any solutions for GCC 3.*.* ??? Yes, here is the code. Not thoroughly tested. +#if defined(__GNUC__) +/* The following piece of code can be found + in function AtomicExchange of file atomicops-internals-x86.h + under http://google-perftools.googlecode.com/svn/trunk/src/base/ +*/ +#if defined(__x86_64__) +#define TAS(_lw, _res) \ + asm volatile("xchgq %1,%0":"=r"(_res):"m"(*_lw),"0"(_res):"memory") +#elif defined(__i386__) +#define TAS(_lw, _res) \ + asm volatile("xchgl %1,%0":"=r"(_res):"m"(*_lw),"0"(_res):"memory") +#endif +/* TAS is only defined for GNUC on x86_64 and i386, + that is where GNUC x86 inline assembly can be used */ +#endif For lock, use TAS( lw, 1); for unlock, use TAS(lw, 0); We did very limited tests, it works. Since we will never, ever move back to GCC 3.x, we do bother to do more testing. * are there any solutions for Windows Yes, for _WIN32 LOCK: __asm MOV ECX, lw __asm MOV EDX, 1 __asm XCHG DL, BYTE PTR [ECX] __asm MOV res, DL UNLOCK: __asm MOV EDX, 0 __asm MOV ECX, lw __asm XCHG DL, BYTE PTR [ECX] It should be very similar for _WIN64. * are there any solutions for other CPU's, like SPARC ??? Sort of. Whenever the compiler is GCC with version greater than 4.1.0, __sync_lock_test_and_set can be used throughout. No tests had been done. I have been reading a lot lately. There were AMD hardware bug, and GCC bug reported by Boehm. So using __sync_lock_test_and_set is most portable one than other implementation with full memory barrier. ( for example, mfence in X86 ) Let me know if you have more questions.
[6 Feb 2008 17:33]
Larry Zhou
For Heikki Tuuri: > does changing the lock word to a byte help? The processor may actually do more > work to change a byte in memory than to change a 32-bit or 64-bit unsigned integer. At least it doesn't hurt. Since we use xchg for both lock and unlock. Here is the assembly code: lock: 00000000000005b0 <mutex_enter_noninline>: 5b0: b8 01 00 00 00 mov $0x1,%eax 5b5: 86 47 08 xchg %al,0x8(%rdi) 5b8: 84 c0 test %al,%al 5ba: 75 04 jne 5c0 <mutex_enter_noninline+0x10> unlock: 00000000000003f0 <mutex_exit_noninline>: 3f0: 48 8d 47 08 lea 0x8(%rdi),%rax 3f4: 31 d2 xor %edx,%edx 3f6: 86 10 xchg %dl,(%rax) 3f8: 48 8b 47 10 mov 0x10(%rdi),%rax 3fc: 48 85 c0 test %rax,%rax 3ff: 75 02 jne 403 <mutex_exit_noninline+0x13> >I replaced the inline assembly for test-and-set around 2003 because gcc-2.96 (?) was not > able to compile it. And using pthread_mutex_trylock() as 'test-and-set' is portable. I think so. > But looks like it is time to go back to inline assembly again. I am assigning this feature > request to Marko, who is right now working on this matter. Let me know if you have any questions. Thanks, Larry
[6 Feb 2008 18:12]
Sergei Golubchik
check also gcc's __sync_* builtins. Should be as good as assembly but work on more platforms.
[6 Feb 2008 18:17]
Larry Zhou
yes, I would guess that GCC's __sync_ would work with versions greater than 4.1.0. regardless of CPUs. But, yes, here is the but part. Only __sync_lock_test_and_set, as lock and unlock, works reliably on 64-bit Opteron machines compiled with GCC 4.1.1.
[6 Feb 2008 18:39]
Sergei Golubchik
interesting, could you elaborate please ? what does it mean that __sync_lock_test_and_set works "unreliably" ?
[6 Feb 2008 18:39]
Sergei Golubchik
eh, sorry. not __sync_lock_test_and_set bot others work unreliably. anyway, what does it mean ?
[6 Feb 2008 19:21]
Larry Zhou
Sergei, In the ideal world, I would just use __sync_lock_test_and_set for lock and __sync_lock_release for unlock. It doesn't work now, at least GCC 4.1.1 with 64-bit opteron. reload tests got deadlocked pretty quickly. There are all sorts of sync primitives and helpers. For example, compiler barrier, memory barrier, here is google implementation: http://google-perftools.googlecode.com/svn/trunk/src/base/atomicops-internals-x86.h Hans Boehm's claims ( from his talk and PDF file ) http://www.youtube.com/watch?v=mrvAqvtWYb4 http://www.hpl.hp.com/personal/Hans_Boehm/misc_slides/c++threads.pdf Page 11: * gcc __sync_synchronize() “full memory barrier”. erroneously generates no-op (& perhaps has to). * P4 lfence, sfence instructions appeared to be no-ops in most cases, but the spec didn't say. After having gone thru all this, we just want to be on the safe side of it. From _WIN32 implementation, both lock and unlock use xchg on x86, we think that is probably the safest way at present time.
[17 Mar 2009 11:48]
Marko Mäkelä
The InnoDB Plugin 1.0.3 makes use of GCC built-in functions for atomic memory access where available. It was released on March 11, 2009.