| Bug #14668 | funktions storing in machine independent format not optimized for PPC | ||
|---|---|---|---|
| Submitted: | 5 Nov 2005 8:19 | Modified: | 20 May 2009 7:20 |
| Reporter: | Gunnar von Boehn | Email Updates: | |
| Status: | Verified | Impact on me: | |
| Category: | MySQL Server | Severity: | S5 (Performance) |
| Version: | 5.0.16-BK, 5.1.34 | OS: | Any (any) |
| Assigned to: | CPU Architecture: | Any | |
[6 Nov 2005 8:31]
Valeriy Kravchuk
Verified also on 5.0.16-BK sources (ChangeSet@1.1957.1.18, 2005-11-03 20:29:21+02:00, jani@ua141d10.elisa.omakaista.fi).
Line 1183 of includes/my_global.h (not in #ifdef block):
#define int4net(A) (int32) (((uint32) ((uchar) (A)[3])) |\
(((uint32) ((uchar) (A)[2])) << 8) |\
(((uint32) ((uchar) (A)[1])) << 16) |\
(((uint32) ((uchar) (A)[0])) << 24))
[12 Oct 2007 18:40]
Konstantin Osipov
Thank you for a valid performance request.
[20 May 2009 7:20]
Valeriy Kravchuk
In 5.1.34 my_global.h still has this same definition (not in #ifdef): /* Macro for reading 32-bit integer from network byte order (big-endian) from unaligned memory location. */ #define int4net(A) (int32) (((uint32) ((uchar) (A)[3])) |\ (((uint32) ((uchar) (A)[2])) << 8) |\ (((uint32) ((uchar) (A)[1])) << 16) |\ (((uint32) ((uchar) (A)[0])) << 24))
[20 May 2009 7:20]
Valeriy Kravchuk
In 5.1.34 my_global.h still has this same definition (not in #ifdef): /* Macro for reading 32-bit integer from network byte order (big-endian) from unaligned memory location. */ #define int4net(A) (int32) (((uint32) ((uchar) (A)[3])) |\ (((uint32) ((uchar) (A)[2])) << 8) |\ (((uint32) ((uchar) (A)[1])) << 16) |\ (((uint32) ((uchar) (A)[0])) << 24))
[27 May 2014 3:10]
Stewart Smith
Just a FYI: I haven't yet seen this show up on a profile. So I'm not sure this is actually much of an issue.

Description: In include/my_global.h many macros are defined to load and store data in a machine independent format. For x86 the macros are optimized for other architectures they are currently not. The PowerPC/POWER architecture can savely access unaligned memory and have native load/store commands which support both big-endian and little-endian. Many of the macros which are currently defined with up to 10 commands could be implemented faster with just a single asm command. For examples and suggested fix see below Adding optimized macros for PowerPC would improve our builds for the following platforms - Linux (IBM/Motorola/Freescale POWER/PowerPC) - Mac OS X - IBM AIX(IBM POWER) Cheers Gunnar How to repeat: na Suggested fix: Some examples: a) Macro to read big-endian; #define int4net(A) (int32) (((uint32) ((uchar) (A)[3])) |\ (((uint32) ((uchar) (A)[2])) << 8) |\ (((uint32) ((uchar) (A)[1])) << 16) |\ (((uint32) ((uchar) (A)[0])) << 24)) On PowerPC which is big-endian native, this could be done with: uint=(unsigned long*)A; b) Macro to store long as little-endian #define longstore(T,A) do { *(((char*)T)+3)=((A));\ *(((char*)T)+2)=(((A) >> 8));\ *(((char*)T)+1)=(((A) >> 16));\ *(((char*)T)+0)=(((A) >> 24)); } while(0) The PowerPC have native commands to load/store little and big endian. The above could be implemented with one stwbrx (STORE WORD Byte-Reverse) command. stwbrx A,r0,T c) Macro for reading short with little-endian #define ushortget(V,M) do { V = (uint16) (((uint16) ((uchar) (M)[1]))+\ ((uint16) ((uint16) (M)[0]) << 8)); } while(0) The PowerPC have native commands to load/store little and big endian. The above could be implemented with one lhbrw (LOAD HALVE WORD Byte-Reverse) command. lhbrx V,r0,M