| Bug #61179 | Unoptimized versions of *korr/*store macros in my_global.h are used on x86_64 | ||
|---|---|---|---|
| Submitted: | 15 May 2011 15:45 | Modified: | 5 Jun 2013 12:27 |
| Reporter: | Alexey Kopytov | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: General | Severity: | S3 (Non-critical) |
| Version: | 5.1, 5.5 | OS: | Any |
| Assigned to: | Tor Didriksen | CPU Architecture: | Any |
[17 May 2011 18:09]
MySQL Verification Team
Kaamos, my friend,
I was not able to repeat the problem. I have new iMac, which has 64-bit CPU.
I was checking your bug by getting pre-processor output only , for mi_key.c source file, with the following command:
gcc -DHAVE_CONFIG_H -I. -I../../include -I../../include -I../../include -I../../regex -I../../sql -I. -g -DSAFE_MUTEX -DSAFEMALLOC -Wimplicit -Wreturn-type -Wswitch -Wtrigraphs -Wcomment -W -Wchar-subscripts -Wformat -Wparentheses -Wsign-compare -Wwrite-strings -Wunused-function -Wunused-label -Wunused-value -Wunused-variable -Wunused-parameter -mtune=nocona -DUNIV_MUST_NOT_INLINE -DEXTRA_DEBUG -DFORCE_INIT_OF_VARS -DSAFEMALLOC -DPEDANTIC_SAFEMALLOC -DSAFE_MUTEX -D_P1003_1B_VISIBLE -DSIGNAL_WITH_VIO_CLOSE -DSIGNALS_DONT_BREAK_READ -DIGNORE_SIGHUP_SIGQUIT -DDONT_DECLARE_CXA_PURE_VIRTUAL -MT mi_key.o -MD -MP -MF .deps/mi_key.Tpo -E mi_key.c
If you look at the bottom of the mi_key.c source file, you will see one switch command. The end of that command , after pre-processing, looks like this:
case HA_KEYTYPE_LONGLONG:
s_value= (*((longlong *) (key)));
break;
case HA_KEYTYPE_ULONGLONG:
value= (*((ulonglong *) (key)));
break;
default:
(__builtin_expect(!(0), 0) ? __assert_rtn(__func__, "mi_key.c", 557, "0") : (void)0);
value=0;
break;
which shows that it works nicely with those macros.
Tell me, please, if I am wrong and where am I wrong .....
[17 May 2011 19:02]
Alexey Kopytov
Dear Sinisa,
This is what I get from the same command using 4.1.2 on a 64-bit Linux machine:
case HA_KEYTYPE_LONGLONG:
s_value= (longlong) ((ulonglong)(((uint32) ((uchar) (key)[0])) + (((uint32) ((uchar) (key)[1])) << 8) + (((uint32) ((uchar) (key)[2])) << 16) + (((uint32) ((uchar) (key)[3])) << 24)) + (((ulonglong) (((uint32) ((uchar) (key)[4])) + (((uint32) ((uchar) (key)[5])) << 8) + (((uint32) ((uchar) (key)[6])) << 16) + (((uint32) ((uchar) (key)[7])) << 24))) << 32));
break;
case HA_KEYTYPE_ULONGLONG:
value= ((ulonglong)(((uint32) ((uchar) (key)[0])) + (((uint32) ((uchar) (key)[1])) << 8) + (((uint32) ((uchar) (key)[2])) << 16) + (((uint32) ((uchar) (key)[3])) << 24)) + (((ulonglong) (((uint32) ((uchar) (key)[4])) + (((uint32) ((uchar) (key)[5])) << 8) + (((uint32) ((uchar) (key)[6])) << 16) + (((uint32) ((uchar) (key)[7])) << 24))) << 32));
break;
default:
((0) ? (void) (0) : (__assert_fail ("0", "mi_key.c", 557, __PRETTY_FUNCTION__), (void) (0)));
value=0;
break;
}
It is the same with gcc 4.2.1 on Mac OS X.
I also disassembled that function in mysql-5.5.10-linux2.6-x86_64.tar.gz downloaded from dev.mysql.com. It's hard to read optimized code, but there are clearly lots of shifts in that code:
...
0x00000000008e0cdc <retrieve_auto_increment+380>: xchg %ax,%ax
0x00000000008e0ce0 <retrieve_auto_increment+384>: movzbl 0x5(%rsi),%ecx
0x00000000008e0ce4 <retrieve_auto_increment+388>: movzbl 0x6(%rsi),%eax
0x00000000008e0ce8 <retrieve_auto_increment+392>: movzbl 0x1(%rsi),%edx
0x00000000008e0cec <retrieve_auto_increment+396>: shl $0x10,%eax
0x00000000008e0cef <retrieve_auto_increment+399>: shl $0x8,%ecx
0x00000000008e0cf2 <retrieve_auto_increment+402>: add %eax,%ecx
0x00000000008e0cf4 <retrieve_auto_increment+404>: movzbl 0x4(%rsi),%eax
0x00000000008e0cf8 <retrieve_auto_increment+408>: shl $0x8,%edx
0x00000000008e0cfb <retrieve_auto_increment+411>: add %eax,%ecx
0x00000000008e0cfd <retrieve_auto_increment+413>: movzbl 0x7(%rsi),%eax
0x00000000008e0d01 <retrieve_auto_increment+417>: shl $0x18,%eax
0x00000000008e0d04 <retrieve_auto_increment+420>: add %eax,%ecx
0x00000000008e0d06 <retrieve_auto_increment+422>: movzbl 0x2(%rsi),%eax
0x00000000008e0d0a <retrieve_auto_increment+426>: shl $0x20,%rcx
0x00000000008e0d0e <retrieve_auto_increment+430>: shl $0x10,%eax
0x00000000008e0d11 <retrieve_auto_increment+433>: add %eax,%edx
0x00000000008e0d13 <retrieve_auto_increment+435>: movzbl (%rsi),%eax
0x00000000008e0d16 <retrieve_auto_increment+438>: add %eax,%edx
0x00000000008e0d18 <retrieve_auto_increment+440>: movzbl 0x3(%rsi),%eax
0x00000000008e0d1c <retrieve_auto_increment+444>: shl $0x18,%eax
0x00000000008e0d1f <retrieve_auto_increment+447>: add %eax,%edx
0x00000000008e0d21 <retrieve_auto_increment+449>: lea (%rcx,%rdx,1),%rax
0x00000000008e0d25 <retrieve_auto_increment+453>: jmpq 0x8e0bd0 <retrieve_auto_increment+112>
...
Are you sure the compiler on your machine generates 64-bit code with those options? What is the output of "gcc -dM -E - </dev/null | grep 86" in your case?
[18 May 2011 5:03]
Jonas Oreland
I got the "unoptimized" versions...does it matter that I tried debug build... probably not... gcc (Gentoo 4.4.3-r2 p1.2) 4.4.3 Linux eel 2.6.34.5-jonas #5 SMP Mon Nov 22 14:03:35 CET 2010 x86_64 Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz GenuineIntel GNU/Linux [fk-patches/]:jonas@eel:~/src/telco-7.0/storage/myisam> file mi_key.o mi_key.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
[18 May 2011 17:13]
MySQL Verification Team
Kaamos, Jonas, If I insist on the tuning to be done for Nocona, which is strictly 64-bit CPU, I get fully optimized code, with gcc 4.0.1. I will try 4.2.1 once I buy USB disk, do backup, install Snow Leopard etc .... ;o) However, with -m64 I get non-optimized code, which is kind of strange. So, fixing is necessary, but could you tell me why do you think that this is a bug in our code and not in GCC ??? Why couldn't they (GCC) support i386 on 64-bit as well ????
[5 Jun 2013 12:27]
Tor Didriksen
Fixed in 5.6 by patch tor.didriksen@oracle.com-20120111093352-wca2tmilh4sq9sd5
[5 Jun 2013 16:45]
Paul DuBois
Noted in 5.6.13 changelog. Unoptimized versions of the xxxkorr() macros in my_global.h were used on 64-bit x86 processors.

Description: my_global.h defines a set of macros for reading/storing integers of various sizes in a machine-independent format. Since the format is little-endian, optimized versions of those macros are used on x86, with more heavy-weight implementations being used on big-endian architectures, for example: #if defined(__i386__) || defined(_WIN32) ... #define uint8korr(A) (*((ulonglong *) (A))) ... #define int8store(T,A) *((ulonglong *) (T))= (ulonglong) (A) ... #else ... #define uint8korr(A) ((ulonglong)(((uint32) ((uchar) (A)[0])) +\ (((uint32) ((uchar) (A)[1])) << 8) +\ (((uint32) ((uchar) (A)[2])) << 16) +\ (((uint32) ((uchar) (A)[3])) << 24)) +\ (((ulonglong) (((uint32) ((uchar) (A)[4])) +\ (((uint32) ((uchar) (A)[5])) << 8) +\ (((uint32) ((uchar) (A)[6])) << 16) +\ (((uint32) ((uchar) (A)[7])) << 24))) <<\ 32)) ... #define int8store(T,A) do { uint def_temp= (uint) (A), def_temp2= (uint) ((A) >> 32); \ int4store((T),def_temp); \ int4store((T+4),def_temp2); } while(0) ... #endif The problem is that the same optimized versions could be used on x86_64 as well, but GCC does not define __i386__ on x86_64 and hence, heavy-weight versions are used in binaries (and no, those implementations are not optimized by the compiler even in release binaries). How to repeat: 1. Examine the code in my_global.h starting from: /* Optimized store functions for Intel x86 */ #if defined(__i386__) || defined(_WIN32) 2. gcc -dM -E - </dev/null | grep 86