Bug #14668 funktions storing in machine independent format not optimized for PPC
Submitted: 5 Nov 2005 8:19 Modified: 20 May 2009 7:20
Reporter: Gunnar von Boehn Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:5.0.16-BK, 5.1.34 OS:Any (any)
Assigned to: CPU Architecture:Any

[5 Nov 2005 8:19] Gunnar von Boehn
Description:
In include/my_global.h
many macros are defined to load and store data in a machine independent format.
For x86 the macros are optimized for other architectures they are currently not.

The PowerPC/POWER architecture can savely access unaligned memory
and have native load/store commands which support both big-endian and little-endian.
Many of the macros which are currently defined with up to 10 commands
could be implemented faster with just a single asm command.

For examples and suggested fix see below

Adding optimized macros for PowerPC would improve our builds for the following platforms
- Linux (IBM/Motorola/Freescale POWER/PowerPC)
- Mac OS X
- IBM AIX(IBM POWER)

Cheers
Gunnar

How to repeat:
na

Suggested fix:
Some examples:

a) Macro to read big-endian;
#define int4net(A)        (int32) (((uint32) ((uchar) (A)[3]))        |\
                                  (((uint32) ((uchar) (A)[2])) << 8)  |\
                                  (((uint32) ((uchar) (A)[1])) << 16) |\
                                  (((uint32) ((uchar) (A)[0])) << 24))
On PowerPC which is big-endian native, this could be done with:
uint=(unsigned long*)A;

b) Macro to store long as little-endian
#define longstore(T,A)  do { *(((char*)T)+3)=((A));\
                             *(((char*)T)+2)=(((A) >> 8));\
                             *(((char*)T)+1)=(((A) >> 16));\
                             *(((char*)T)+0)=(((A) >> 24)); } while(0)
The PowerPC have native commands to load/store little and big endian.
The above could be implemented with one stwbrx (STORE WORD Byte-Reverse) command.
stwbrx A,r0,T 

c) Macro for reading short with little-endian
#define ushortget(V,M)  do { V = (uint16) (((uint16) ((uchar) (M)[1]))+\
                                 ((uint16) ((uint16) (M)[0]) << 8)); } while(0)
The PowerPC have native commands to load/store little and big endian.
The above could be implemented with one lhbrw (LOAD HALVE WORD Byte-Reverse) command.
lhbrx V,r0,M
[6 Nov 2005 8:31] Valeriy Kravchuk
Verified also on 5.0.16-BK sources (ChangeSet@1.1957.1.18, 2005-11-03 20:29:21+02:00, jani@ua141d10.elisa.omakaista.fi).

Line 1183 of includes/my_global.h (not in #ifdef block):

#define int4net(A)        (int32) (((uint32) ((uchar) (A)[3]))        |\
                                  (((uint32) ((uchar) (A)[2])) << 8)  |\
                                  (((uint32) ((uchar) (A)[1])) << 16) |\
                                  (((uint32) ((uchar) (A)[0])) << 24))
[12 Oct 2007 18:40] Konstantin Osipov
Thank you for a valid performance request.
[20 May 2009 7:20] Valeriy Kravchuk
In 5.1.34 my_global.h still has this same definition (not in #ifdef):

/*
  Macro for reading 32-bit integer from network byte order (big-endian)
  from unaligned memory location.
*/
#define int4net(A)        (int32) (((uint32) ((uchar) (A)[3]))        |\
				  (((uint32) ((uchar) (A)[2])) << 8)  |\
				  (((uint32) ((uchar) (A)[1])) << 16) |\
				  (((uint32) ((uchar) (A)[0])) << 24))
[20 May 2009 7:20] Valeriy Kravchuk
In 5.1.34 my_global.h still has this same definition (not in #ifdef):

/*
  Macro for reading 32-bit integer from network byte order (big-endian)
  from unaligned memory location.
*/
#define int4net(A)        (int32) (((uint32) ((uchar) (A)[3]))        |\
				  (((uint32) ((uchar) (A)[2])) << 8)  |\
				  (((uint32) ((uchar) (A)[1])) << 16) |\
				  (((uint32) ((uchar) (A)[0])) << 24))
[27 May 2014 3:10] Stewart Smith
Just a FYI: I haven't yet seen this show up on a profile.

So I'm not sure this is actually much of an issue.