Bug #31761 Code for cluster is not safe for strict-alias optimization in new gcc
Submitted: 22 Oct 2007 17:55 Modified: 5 Nov 2007 22:26
Reporter: Joerg Bruehe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:5.0.52, 5.1-telco OS:Any (gcc 4.2.1)
Assigned to: Kristian Nielsen CPU Architecture:Any

[22 Oct 2007 17:55] Joerg Bruehe
Description:
This was found by doing a build on a SuSE 10.3 installation (x86),
this version uses gcc 4.2.1.

"BUILD/compile-pentium-max" defaults to use "-O3 -fno-omit-frame-pointer" as compiler options.
The resulting binary will *fail* the following 42 tests, all involving NDB / cluster, with error 879:

loaddata_autocom_ndb ndb_alter_table ndb_alter_table2 ndb_autodiscover ndb_autodiscover2 ndb_autodiscover3 ndb_backup_print ndb_basic ndb_bitfield ndb_blob ndb_bug26793 ndb_cache ndb_cache2 ndb_cache_multi ndb_cache_multi2 ndb_charset ndb_condition_pushdown ndb_gis ndb_index ndb_index_ordered ndb_index_unique ndb_insert ndb_limit ndb_load ndb_loaddatalocal ndb_lock ndb_minmax ndb_multi ndb_read_multi_range ndb_replace ndb_restore ndb_restore_different_endian_data ndb_restore_print ndb_single_user ndb_subquery ndb_transaction ndb_trigger ndb_truncate ndb_update ps_7ndb rpl_ndb_innodb_trans strict_autoinc_5ndb

For some tests, it is "create table" that fails, for others, "insert".

Reducing the optimization to "-O2" will not change this.

Setting it to "-O2 -fno-strict-aliasing" will make *all* tests pass.

Setting it to "-O3 -fno-strict-aliasing" will let all tests pass except "ndb_alter_table" and "ndb_replace" which report timeout:
   Warning;  Aborted waiting on pid file: 
   '/MySQL/M50/work-5.0/mysql-test/var/run/master.pid' after 70 seconds

How to repeat:
Build using gcc 4.2.1 and optimize including "-fstrict-aliasing".

Suggested fix:
Immediately: 
- Make the default optimization depend on the compiler version
     OR
- Change the Makefiles so that for "cluster" a separate, lower optimization
  takes effect (and ensure the lower default just for cluster).

Consider both the builds in a BK source tree and the release builds.

Alternative:
- Fix the source code so that "-O3" can still be applied.
[22 Oct 2007 19:59] Kristian Nielsen
I managed to repeat this with 5.0.52 on x86_64 by compiling a gcc 4.2.1.
[22 Oct 2007 21:03] Kristian Nielsen
Tried also with latest 5.1-telco.
It also fails, though in a different way:

ndb.ndb_alter_table            [ fail ]

Errors are (from /tmp/mysql-5.1-telco/mysql-test/var/log/mysqltest-time) :
mysqltest: In included file "./include/ndb_not_readonly.inc": At line 25: query 'die("Failed while waiting for mysqld to come out of readonly mode")' failed: 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'die("Failed while waiting for mysqld to come out of readonly mode")' at line 1
[23 Oct 2007 13:17] Kristian Nielsen
Found the problem.

In ndb/src/ndbapi/NdbOperationDefine.cpp, we have this:

  Uint32 ahValue;

  (void) AttributeHeader::init(&ahValue, tAttrId, totalSizeInWords);
  insertATTRINFO( ahValue );

Strict aliasing rules allows the compiler to ignore the call to AttributeHeader::init(), since it allows assuming that ahValue will not alias with any AttributeHeader object.

So gcc 4.2.1 does not in fact generate any code for the init() call, and even warns that ahValue is used uninitialized ...

Still thinking about the best way to solve this. I think there is a lot of similar code using AttributeHeader invalidly like this...

Code generated by GCC 4.2.1:

// const Uint32 bitsInLastWord = 8 * (sizeInBytes & 3) ;
// if (len != sizeInBytes && (len != 0)) { ... }
.LVL233:
	imull	96(%rbx), %r13d
	.loc 2 560 0
	cmpl	%r13d, %r14d
	je	.L219
	testl	%r14d, %r14d
	jne	.L240
.L219:
// const Uint32 totalSizeInWords = (sizeInBytes + 3)/4;last word
// const Uint32 sizeInWords = sizeInBytes / 4;
// (void) AttributeHeader::init(&ahValue, tAttrId, totalSizeInWords);
	.loc 2 565 0
	movl	%r13d, %eax
.LVL234:
	shrl	$2, %eax
	movl	%eax, -8052(%rbp)
.LVL235:
// insertATTRINFO( ahValue );
	.loc 2 567 0
	movl	%ebx, %esi
.LVL236:
	movq	%r12, %rdi
	call	_ZN12NdbOperation14insertATTRINFOEj@PLT

Note that _no_ code is generated at all for the init() call ...
[23 Oct 2007 14:46] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/36149

ChangeSet@1.2487, 2007-10-23 16:45:56+02:00, knielsen@ymer.(none) +5 -0
  BUG#31761: Code for cluster is not safe for strict-alias optimization in new gcc
  
  Fix problem with AttributeHeader::init() seen with gcc 4.2.1.
  
  Using the same object as both Uint32 and class AttributeHeader violates
  strict aliasing rule.
[23 Oct 2007 21:36] Kristian Nielsen
See also BUG#31799 for another bug seen with gcc 4.2.1.
[25 Oct 2007 9:18] Kristian Nielsen
Pushed into mysql-5.0-ndb, mysql-5.1-new-ndb, mysql-5.1-telco-gca, mysql-5.1-telco-6.1, mysql-5.1-telco-6.2, and mysql-5.1-telco-6.3.
[5 Nov 2007 13:53] Bugs System
Pushed into 6.0.4-alpha
[5 Nov 2007 13:56] Bugs System
Pushed into 5.1.23-rc
[5 Nov 2007 13:58] Bugs System
Pushed into 5.0.52
[5 Nov 2007 22:26] Jon Stephens
Thank you for your bug report. This issue has been committed to our source repository of that product and will be incorporated into the next release.

If necessary, you can access the source repository and build the latest available version, including the bug fix. More information about accessing the source trees is available at

    http://dev.mysql.com/doc/en/installing-source.html

Documented in changelogs: 5.0.52, 5.1.15-ndb-6.1.23, 5.1.22-ndb-6.2.8, 5.1.22-ndb-6.3.6, 5.1.23, and 6.0.4. Closed.