Bug #27634 group_by test fails
Submitted: 4 Apr 2007 5:36 Modified: 15 Jun 2007 13:45
Reporter: Lenz Grimmer Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Tests Severity:S3 (Non-critical)
Version:5.1-bk OS:Linux
Assigned to: Martin Hansson CPU Architecture:Any

[4 Apr 2007 5:36] Lenz Grimmer
Description:
The "group_by" test performed by the daily snapshot builds fails with the following output:

up_by
Logging: ./mysql-test-run.pl group_by
MySQL Version 5.1.18
Using binlog format 'mixed'
Using ndbcluster when necessary, mysqld supports it
Setting mysqld to support SSL connections
Using MTR_BUILD_THREAD      = 0
Using MASTER_MYPORT         = 9306
Using MASTER_MYPORT1        = 9307
Using SLAVE_MYPORT          = 9308
Using SLAVE_MYPORT1         = 9309
Using SLAVE_MYPORT2         = 9310
Using NDBCLUSTER_PORT       = 9310
Using NDBCLUSTER_PORT_SLAVE = 9311
Using IM_PORT               = 9312
Using IM_MYSQLD1_PORT       = 9313
Using IM_MYSQLD2_PORT       = 9314
Killing Possible Leftover Processes
mysql-test-run: WARNING: Found non pid file master-slow.log in /data0/autobuild/my/mysql-5.1.18-beta-20070403-build/build/mysql-test/var/run
Removing Stale Files
Creating Directories
Installing Master Database
Installing Master Database
=======================================================
Starting Tests in the 'main' suite

TEST                           RESULT         TIME (ms)
-------------------------------------------------------

group_by                       [ fail ]

Errors are (from /data0/autobuild/my/mysql-5.1.18-beta-20070403-build/build/mysql-test/var/log/mysqltest-time) :
mysqltest: Result length mismatch
(the last lines may be the most important ones)
Below are the diffs between actual and expected results:
-------------------------------------------------------
*** r/group_by.result   2007-04-03 15:17:48.000000000 +0300
--- r/group_by.reject   2007-04-04 08:31:30.000000000 +0300
***************
*** 1049,1061 ****
  test.t1       analyze status  OK
  EXPLAIN SELECT a FROM t1 WHERE a < 2;
  id    select_type     table   type    possible_keys   key     key_len ref     rows    Extra
! 1     SIMPLE  t1      range   PRIMARY,i2      PRIMARY 4       NULL    2       Using where; Using index
  EXPLAIN SELECT a FROM t1 WHERE a < 2 ORDER BY a;
  id    select_type     table   type    possible_keys   key     key_len ref     rows    Extra
! 1     SIMPLE  t1      range   PRIMARY,i2      PRIMARY 4       NULL    2       Using where; Using index
  EXPLAIN SELECT a FROM t1 WHERE a < 2 GROUP BY a;
  id    select_type     table   type    possible_keys   key     key_len ref     rows    Extra
! 1     SIMPLE  t1      range   PRIMARY,i2      PRIMARY 4       NULL    2       Using where; Using index
  EXPLAIN SELECT a FROM t1 IGNORE INDEX (PRIMARY,i2);
  id    select_type     table   type    possible_keys   key     key_len ref     rows    Extra
  1     SIMPLE  t1      ALL     NULL    NULL    NULL    NULL    256
--- 1049,1061 ----
  test.t1       analyze status  OK
  EXPLAIN SELECT a FROM t1 WHERE a < 2;
  id    select_type     table   type    possible_keys   key     key_len ref     rows    Extra
! 1     SIMPLE  t1      range   PRIMARY,i2      i2      4       NULL    2       Using where; Using index
  EXPLAIN SELECT a FROM t1 WHERE a < 2 ORDER BY a;
  id    select_type     table   type    possible_keys   key     key_len ref     rows    Extra
! 1     SIMPLE  t1      range   PRIMARY,i2      i2      4       NULL    2       Using where; Using index
  EXPLAIN SELECT a FROM t1 WHERE a < 2 GROUP BY a;
  id    select_type     table   type    possible_keys   key     key_len ref     rows    Extra
! 1     SIMPLE  t1      range   PRIMARY,i2      i2      4       NULL    2       Using where; Using index
  EXPLAIN SELECT a FROM t1 IGNORE INDEX (PRIMARY,i2);
  id    select_type     table   type    possible_keys   key     key_len ref     rows    Extra
  1     SIMPLE  t1      ALL     NULL    NULL    NULL    NULL    256
-------------------------------------------------------
Please follow the instructions outlined at
http://www.mysql.com/doc/en/Reporting_mysqltest_bugs.html
to find the reason to this problem and how to report this.

Result from queries before failure can be found in /data0/autobuild/my/mysql-5.1.18-beta-20070403-build/build/mysql-test/var/log/group_by.log

Aborting: group_by failed in default mode. To continue, re-run with '--force'.
Stopping All Servers

How to repeat:
Take a current mysql-5.1 BK tree, compile it with "BUILD/compile-dist", run the "group_by" test and observe the diff.
[4 Apr 2007 6:05] Valeriy Kravchuk
Thank you for a bug report. Verified just as described.
[22 May 2007 14:05] Martin Hansson
Another experiment:
(as mysqldev)
>bk clone /data0/autobuild/my/mysql-5.1 mysql-5.1-b
>cd mysql-5.1-b/
>MTR_BUILD_THREAD=86
>CC="ccache gcc" CXX="ccache gcc" BUILD/compile-dist
>cd mysql-test;./mtr group_by

And it fails!

I pulled a new clone, mysql-5.1-a, did 
everything that I did for mysql-5.1-b, except setting CC and CXX.
And the group_by test ... passes.

Conclusion: The error is inside the compiler cache.
[22 May 2007 15:12] Martin Hansson
I cleared the compiler cache and rebuilt mysql-5.1-b:
>ccache -C
>make clean
>CC="ccache gcc" CXX="ccache gcc" BUILD/compile-dist
>cd mysql-test/ 
>./mtr group_by 

fails.

So maybe it's not the compiler cache either. Obviously, setting these variables causes compile-dist to build the binaries wrong somehow.

I would suggest removing the CC and CXX options from the autobuild scripts until this is fixed.
[22 May 2007 15:33] Lenz Grimmer
Could it possibly be reproduced by setting CXX=gcc only?
[23 May 2007 14:19] Martin Hansson
Hi Lenz, not a 100% sure what you mean, but here's what I did:

>bk clone /data0/autobuild/my/mysql-5.1 mysql-5.1-c
>cd mysql-5.1-c
>MTR_BUILD_THREAD=86
>CC="ccache gcc" CXX=gcc BUILD/compile-dist
>cd mysql-test/; ./mtr group_by

failure
[23 May 2007 14:43] Martin Hansson
It's reasonable to believe that the optimizer binaries are the ones afflicted by this build glitch, so I tried rebuilding the sql/ directory, to no avail:

>cd mysql-5.1-a/sql
>touch *.h
>CC="ccache gcc" CXX="ccache gcc" make

group_by still passes

After some detective work I noticed the Makefile's were different, so I did

mysqldev@production:~/mhansson/mysql-5.1-a/sql> mv Makefile Makefile.old
mysqldev@production:~/mhansson/mysql-5.1-a/sql> cp ../../mysql-5.1-b/sql/Makefile .
mysqldev@production:~/mhansson/mysql-5.1-a/sql> touch *.h
mysqldev@production:~/mhansson/mysql-5.1-a/sql> make 

it fails.

 diff . ../../mysql-5.1-b/sql/
Common subdirectories: ./.deps and ../../mysql-5.1-b/sql/.deps
Common subdirectories: ./examples and ../../mysql-5.1-b/sql/examples
Binary files ./gen_lex_hash and ../../mysql-5.1-b/sql/gen_lex_hash differ
Binary files ./gen_lex_hash.o and ../../mysql-5.1-b/sql/gen_lex_hash.o differ
Common subdirectories: ./.libs and ../../mysql-5.1-b/sql/.libs
Only in .: Makefile.old
Binary files ./mysqld and ../../mysql-5.1-b/sql/mysqld differ
Binary files ./net_serv.o and ../../mysql-5.1-b/sql/net_serv.o differ
Common subdirectories: ./SCCS and ../../mysql-5.1-b/sql/SCCS
Common subdirectories: ./share and ../../mysql-5.1-b/sql/share

A diff betweem working makefile (<) and non-working (>) makefiles with 
path name differences edited out, follows:

228c228
< CONF_COMMAND = ./configure '--with-embedded-server' '--with-archive-storage-engine' '--with-blackhole-storage-engine' '--with-csv-storage-engine' '--with-example-storage-engine' '--with-federated-storage-engine' '--with-innodb' '--with-ssl' '--enable-thread-safe-client' '--with-extra-charsets=complex' '--with-ndbcluster' '--with-zlib-dir=bundled' 'CC=ccache gcc' 'CXXFLAGS=-felide-constructors -fno-exceptions -fno-rtti' 'CXX=ccache gcc'
---
> CONF_COMMAND = ./configure '--with-embedded-server' '--with-archive-storage-engine' '--with-blackhole-storage-engine' '--with-csv-storage-engine' '--with-example-storage-engine' '--with-federated-storage-engine' '--with-innodb' '--with-ssl' '--enable-thread-safe-client' '--with-extra-charsets=complex' '--with-ndbcluster' '--with-zlib-dir=bundled' 'CC=ccache gcc' 'CXX=ccache gcc'
235c235
< CXXFLAGS =  -felide-constructors -fno-exceptions -fno-rtti   -fno-implicit-templates -fno-exceptions -fno-rtti
---
> CXXFLAGS = -O3    -fno-implicit-templates -fno-exceptions -fno-rtti
342c342
< SAVE_CXXFLAGS = -felide-constructors -fno-exceptions -fno-rtti
---
> SAVE_CXXFLAGS = 

We can immediately draw the following conclusions:
1) There is no point in calling compile-dist with CC="ccache gcc" CXX="ccache gcc"; these will be set anyway.
2) The cause of binaries that fail the unit tests is to be found in some the side effects of this.
[23 May 2007 15:35] Martin Hansson
We have now reached a verdict: The difference has been boiled down to the following line in sql/Makefile (once the system is built):

working
CXXFLAGS=-fno-implicit-templates -fno-exceptions -fno-rtti -felide-constructors

failing
CXXFLAGS =  -O3    -fno-implicit-templates -fno-exceptions -fno-rtti -felide-constructors

So it's the -O3 flag that causes it. As do -O2, -O1 and -O.

However, the following passes:

CXXFLAGS = -fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers     -fno-implicit-templates -fno-exceptions -fno-rtti -felide-constructors

Which is kinda odd, considering that -O should only be a shorthand for
          -fdefer-pop 
          -fmerge-constants 
          -fthread-jumps 
          -floop-optimize 
          -fcrossjumping 
          -fif-conversion 
          -fif-conversion2 
          -fdelayed-branch 
          -fguess-branch-probability 
          -fcprop-registers

A gcc bug perhaps?
[23 May 2007 15:45] Martin Hansson
A correction to the last post. The -O flag also sets -fomit-frame-pointer. The following also passes, however:

CXXFLAGS = -fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers -fomit-frame-pointer    -fno-implicit-templates -fno-exceptions -fno-rtti -felide-constructors
[24 May 2007 13:35] Martin Hansson
The problem has now been narrowed down to this: It occurs when opt_range is compiled with the -O flag (or higher) to gcc. Not surprising, since the GROUP BY code is in there.
[4 Jun 2007 16:05] Martin Hansson
This is a gcc bug. The function get_key_scans_params() has a loop with a nested if statement where the body gets executed regardless of the guards.

The following code does not get compiled correctly:

for (idx= 0,key=tree->keys, end=key+param->keys;
     key != end ;
     key++,idx++)
{
...
    if (*key)
 ...
      if (read_time > found_read_time && found_records != HA_POS_ERROR)
      {
        read_time=    found_read_time;
        best_records= found_records;
        key_to_read=  key;
      }
    }
}

'key_to_read' will always be assigned to 'key', regardless of the value of
the expression 'found_records != HA_POS_ERROR'.

As concluded earlies, the workaround is not to set CC="ccache gcc" CXX=gcc when
building.
[4 Jun 2007 16:06] Martin Hansson
This is a gcc bug. See comments.
[5 Jun 2007 18:04] Sergei Golubchik
This is not a gcc bug. This is -ffloat-store gcc option, or, more precisely, the optimization that it inhibits.
[6 Jun 2007 12:22] Martin Hansson
Quite correct. Passing -ffloat-store to gcc avoids this problem by not storing the cost for reading of (wrong) key i2 in register st0. However I don't understand why this works. The offending expression is read_time > found_read_time. The value of read_time is always in memory and its value is 
3.4100000000000001, equal to the supposed values of found_read_time. When st0 
is used for found_read_time, however its value is 

3.4100000000000001421085471520200372

In other words, *even greater* than it's supposed to be, and still the expression is true.

Anyway, -ffloat-store is the solution to this problem.
[6 Jun 2007 13:20] Sergei Golubchik
No, the solution is not to use non-deterministic tests. Add few more rows to the table to ensure that EXPLAIN result is stable
[12 Jun 2007 13:09] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/28576

ChangeSet@1.2561, 2007-06-12 15:10:33+03:00, mhansson@linux-st28.site +2 -0
  Bug#27634: group_by test fails
  
  On many architectures, e.g. 68000, x86, the double registers have higher precision 
  than the IEEE standard prescribes. When compiled with flags -O and higher, some double's 
  go into registers and therefore have higher precision. In one test case the cost 
  information of the best and second-best key were close enough to be influenced by this 
  effect, causing a failed test in distribution builds.
  
  Fixed by removing some rows from the table in question so that cost information is not
  influenced by decimals beyond standard definition of double.
[14 Jun 2007 19:00] Bugs System
Pushed into 5.1.20-beta
[15 Jun 2007 13:45] Paul DuBois
No changelog entry needed.