Bug #31761 Code for cluster is not safe for strict-alias optimization in new gcc
Submitted: 22 Oct 2007 19:55 Modified: 5 Nov 2007 23:26
Reporter: Joerg Bruehe
Status: Closed
Category:Server: Cluster Severity:S3 (Non-critical)
Version:5.0.52, 5.1-telco OS:Any (gcc 4.2.1)
Assigned to: Bugs System Target Version:

[22 Oct 2007 19:55] Joerg Bruehe
Description:
This was found by doing a build on a SuSE 10.3 installation (x86),
this version uses gcc 4.2.1.

"BUILD/compile-pentium-max" defaults to use "-O3 -fno-omit-frame-pointer" as compiler
options.
The resulting binary will *fail* the following 42 tests, all involving NDB / cluster, with
error 879:

loaddata_autocom_ndb ndb_alter_table ndb_alter_table2 ndb_autodiscover ndb_autodiscover2
ndb_autodiscover3 ndb_backup_print ndb_basic ndb_bitfield ndb_blob ndb_bug26793 ndb_cache
ndb_cache2 ndb_cache_multi ndb_cache_multi2 ndb_charset ndb_condition_pushdown ndb_gis
ndb_index ndb_index_ordered ndb_index_unique ndb_insert ndb_limit ndb_load
ndb_loaddatalocal ndb_lock ndb_minmax ndb_multi ndb_read_multi_range ndb_replace
ndb_restore ndb_restore_different_endian_data ndb_restore_print ndb_single_user
ndb_subquery ndb_transaction ndb_trigger ndb_truncate ndb_update ps_7ndb
rpl_ndb_innodb_trans strict_autoinc_5ndb

For some tests, it is "create table" that fails, for others, "insert".

Reducing the optimization to "-O2" will not change this.

Setting it to "-O2 -fno-strict-aliasing" will make *all* tests pass.

Setting it to "-O3 -fno-strict-aliasing" will let all tests pass except "ndb_alter_table"
and "ndb_replace" which report timeout:
   Warning;  Aborted waiting on pid file: 
   '/MySQL/M50/work-5.0/mysql-test/var/run/master.pid' after 70 seconds

How to repeat:
Build using gcc 4.2.1 and optimize including "-fstrict-aliasing".

Suggested fix:
Immediately: 
- Make the default optimization depend on the compiler version
     OR
- Change the Makefiles so that for "cluster" a separate, lower optimization
  takes effect (and ensure the lower default just for cluster).

Consider both the builds in a BK source tree and the release builds.

Alternative:
- Fix the source code so that "-O3" can still be applied.
[22 Oct 2007 21:59] Kristian Nielsen
I managed to repeat this with 5.0.52 on x86_64 by compiling a gcc 4.2.1.
[22 Oct 2007 23:03] Kristian Nielsen
Tried also with latest 5.1-telco.
It also fails, though in a different way:

ndb.ndb_alter_table            [ fail ]

Errors are (from /tmp/mysql-5.1-telco/mysql-test/var/log/mysqltest-time) :
mysqltest: In included file "./include/ndb_not_readonly.inc": At line 25: query
'die("Failed while waiting for mysqld to come out of readonly mode")' failed: 1064: You
have an error in your SQL syntax; check the manual that corresponds to your MySQL server
version for the right syntax to use near 'die("Failed while waiting for mysqld to come out
of readonly mode")' at line 1
[23 Oct 2007 15:17] Kristian Nielsen
Found the problem.

In ndb/src/ndbapi/NdbOperationDefine.cpp, we have this:

  Uint32 ahValue;

  (void) AttributeHeader::init(&ahValue, tAttrId, totalSizeInWords);
  insertATTRINFO( ahValue );

Strict aliasing rules allows the compiler to ignore the call to AttributeHeader::init(),
since it allows assuming that ahValue will not alias with any AttributeHeader object.

So gcc 4.2.1 does not in fact generate any code for the init() call, and even warns that
ahValue is used uninitialized ...

Still thinking about the best way to solve this. I think there is a lot of similar code
using AttributeHeader invalidly like this...

Code generated by GCC 4.2.1:

// const Uint32 bitsInLastWord = 8 * (sizeInBytes & 3) ;
// if (len != sizeInBytes && (len != 0)) { ... }
.LVL233:
	imull	96(%rbx), %r13d
	.loc 2 560 0
	cmpl	%r13d, %r14d
	je	.L219
	testl	%r14d, %r14d
	jne	.L240
.L219:
// const Uint32 totalSizeInWords = (sizeInBytes + 3)/4;last word
// const Uint32 sizeInWords = sizeInBytes / 4;
// (void) AttributeHeader::init(&ahValue, tAttrId, totalSizeInWords);
	.loc 2 565 0
	movl	%r13d, %eax
.LVL234:
	shrl	$2, %eax
	movl	%eax, -8052(%rbp)
.LVL235:
// insertATTRINFO( ahValue );
	.loc 2 567 0
	movl	%ebx, %esi
.LVL236:
	movq	%r12, %rdi
	call	_ZN12NdbOperation14insertATTRINFOEj@PLT

Note that _no_ code is generated at all for the init() call ...
[23 Oct 2007 16:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/36149

ChangeSet@1.2487, 2007-10-23 16:45:56+02:00, knielsen@ymer.(none) +5 -0
  BUG#31761: Code for cluster is not safe for strict-alias optimization in new gcc
  
  Fix problem with AttributeHeader::init() seen with gcc 4.2.1.
  
  Using the same object as both Uint32 and class AttributeHeader violates
  strict aliasing rule.
[23 Oct 2007 23:36] Kristian Nielsen
See also BUG#31799 for another bug seen with gcc 4.2.1.
[25 Oct 2007 11:18] Kristian Nielsen
Pushed into mysql-5.0-ndb, mysql-5.1-new-ndb, mysql-5.1-telco-gca, mysql-5.1-telco-6.1,
mysql-5.1-telco-6.2, and mysql-5.1-telco-6.3.
[5 Nov 2007 14:53] Bugs System
Pushed into 6.0.4-alpha
[5 Nov 2007 14:56] Bugs System
Pushed into 5.1.23-rc
[5 Nov 2007 14:58] Bugs System
Pushed into 5.0.52
[5 Nov 2007 23:26] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of
that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version,
including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented in changelogs: 5.0.52, 5.1.15-ndb-6.1.23, 5.1.22-ndb-6.2.8, 5.1.22-ndb-6.3.6,
5.1.23, and 6.0.4. Closed.