Bug #27634 | group_by test fails | ||
---|---|---|---|
Submitted: | 4 Apr 2007 5:36 | Modified: | 15 Jun 2007 13:45 |
Reporter: | Lenz Grimmer | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Tests | Severity: | S3 (Non-critical) |
Version: | 5.1-bk | OS: | Linux |
Assigned to: | Martin Hansson | CPU Architecture: | Any |
[4 Apr 2007 5:36]
Lenz Grimmer
[4 Apr 2007 6:05]
Valeriy Kravchuk
Thank you for a bug report. Verified just as described.
[22 May 2007 14:05]
Martin Hansson
Another experiment: (as mysqldev) >bk clone /data0/autobuild/my/mysql-5.1 mysql-5.1-b >cd mysql-5.1-b/ >MTR_BUILD_THREAD=86 >CC="ccache gcc" CXX="ccache gcc" BUILD/compile-dist >cd mysql-test;./mtr group_by And it fails! I pulled a new clone, mysql-5.1-a, did everything that I did for mysql-5.1-b, except setting CC and CXX. And the group_by test ... passes. Conclusion: The error is inside the compiler cache.
[22 May 2007 15:12]
Martin Hansson
I cleared the compiler cache and rebuilt mysql-5.1-b: >ccache -C >make clean >CC="ccache gcc" CXX="ccache gcc" BUILD/compile-dist >cd mysql-test/ >./mtr group_by fails. So maybe it's not the compiler cache either. Obviously, setting these variables causes compile-dist to build the binaries wrong somehow. I would suggest removing the CC and CXX options from the autobuild scripts until this is fixed.
[22 May 2007 15:33]
Lenz Grimmer
Could it possibly be reproduced by setting CXX=gcc only?
[23 May 2007 14:19]
Martin Hansson
Hi Lenz, not a 100% sure what you mean, but here's what I did: >bk clone /data0/autobuild/my/mysql-5.1 mysql-5.1-c >cd mysql-5.1-c >MTR_BUILD_THREAD=86 >CC="ccache gcc" CXX=gcc BUILD/compile-dist >cd mysql-test/; ./mtr group_by failure
[23 May 2007 14:43]
Martin Hansson
It's reasonable to believe that the optimizer binaries are the ones afflicted by this build glitch, so I tried rebuilding the sql/ directory, to no avail: >cd mysql-5.1-a/sql >touch *.h >CC="ccache gcc" CXX="ccache gcc" make group_by still passes After some detective work I noticed the Makefile's were different, so I did mysqldev@production:~/mhansson/mysql-5.1-a/sql> mv Makefile Makefile.old mysqldev@production:~/mhansson/mysql-5.1-a/sql> cp ../../mysql-5.1-b/sql/Makefile . mysqldev@production:~/mhansson/mysql-5.1-a/sql> touch *.h mysqldev@production:~/mhansson/mysql-5.1-a/sql> make it fails. diff . ../../mysql-5.1-b/sql/ Common subdirectories: ./.deps and ../../mysql-5.1-b/sql/.deps Common subdirectories: ./examples and ../../mysql-5.1-b/sql/examples Binary files ./gen_lex_hash and ../../mysql-5.1-b/sql/gen_lex_hash differ Binary files ./gen_lex_hash.o and ../../mysql-5.1-b/sql/gen_lex_hash.o differ Common subdirectories: ./.libs and ../../mysql-5.1-b/sql/.libs Only in .: Makefile.old Binary files ./mysqld and ../../mysql-5.1-b/sql/mysqld differ Binary files ./net_serv.o and ../../mysql-5.1-b/sql/net_serv.o differ Common subdirectories: ./SCCS and ../../mysql-5.1-b/sql/SCCS Common subdirectories: ./share and ../../mysql-5.1-b/sql/share A diff betweem working makefile (<) and non-working (>) makefiles with path name differences edited out, follows: 228c228 < CONF_COMMAND = ./configure '--with-embedded-server' '--with-archive-storage-engine' '--with-blackhole-storage-engine' '--with-csv-storage-engine' '--with-example-storage-engine' '--with-federated-storage-engine' '--with-innodb' '--with-ssl' '--enable-thread-safe-client' '--with-extra-charsets=complex' '--with-ndbcluster' '--with-zlib-dir=bundled' 'CC=ccache gcc' 'CXXFLAGS=-felide-constructors -fno-exceptions -fno-rtti' 'CXX=ccache gcc' --- > CONF_COMMAND = ./configure '--with-embedded-server' '--with-archive-storage-engine' '--with-blackhole-storage-engine' '--with-csv-storage-engine' '--with-example-storage-engine' '--with-federated-storage-engine' '--with-innodb' '--with-ssl' '--enable-thread-safe-client' '--with-extra-charsets=complex' '--with-ndbcluster' '--with-zlib-dir=bundled' 'CC=ccache gcc' 'CXX=ccache gcc' 235c235 < CXXFLAGS = -felide-constructors -fno-exceptions -fno-rtti -fno-implicit-templates -fno-exceptions -fno-rtti --- > CXXFLAGS = -O3 -fno-implicit-templates -fno-exceptions -fno-rtti 342c342 < SAVE_CXXFLAGS = -felide-constructors -fno-exceptions -fno-rtti --- > SAVE_CXXFLAGS = We can immediately draw the following conclusions: 1) There is no point in calling compile-dist with CC="ccache gcc" CXX="ccache gcc"; these will be set anyway. 2) The cause of binaries that fail the unit tests is to be found in some the side effects of this.
[23 May 2007 15:35]
Martin Hansson
We have now reached a verdict: The difference has been boiled down to the following line in sql/Makefile (once the system is built): working CXXFLAGS=-fno-implicit-templates -fno-exceptions -fno-rtti -felide-constructors failing CXXFLAGS = -O3 -fno-implicit-templates -fno-exceptions -fno-rtti -felide-constructors So it's the -O3 flag that causes it. As do -O2, -O1 and -O. However, the following passes: CXXFLAGS = -fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers -fno-implicit-templates -fno-exceptions -fno-rtti -felide-constructors Which is kinda odd, considering that -O should only be a shorthand for -fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers A gcc bug perhaps?
[23 May 2007 15:45]
Martin Hansson
A correction to the last post. The -O flag also sets -fomit-frame-pointer. The following also passes, however: CXXFLAGS = -fdefer-pop -fmerge-constants -fthread-jumps -floop-optimize -fcrossjumping -fif-conversion -fif-conversion2 -fdelayed-branch -fguess-branch-probability -fcprop-registers -fomit-frame-pointer -fno-implicit-templates -fno-exceptions -fno-rtti -felide-constructors
[24 May 2007 13:35]
Martin Hansson
The problem has now been narrowed down to this: It occurs when opt_range is compiled with the -O flag (or higher) to gcc. Not surprising, since the GROUP BY code is in there.
[4 Jun 2007 16:05]
Martin Hansson
This is a gcc bug. The function get_key_scans_params() has a loop with a nested if statement where the body gets executed regardless of the guards. The following code does not get compiled correctly: for (idx= 0,key=tree->keys, end=key+param->keys; key != end ; key++,idx++) { ... if (*key) ... if (read_time > found_read_time && found_records != HA_POS_ERROR) { read_time= found_read_time; best_records= found_records; key_to_read= key; } } } 'key_to_read' will always be assigned to 'key', regardless of the value of the expression 'found_records != HA_POS_ERROR'. As concluded earlies, the workaround is not to set CC="ccache gcc" CXX=gcc when building.
[4 Jun 2007 16:06]
Martin Hansson
This is a gcc bug. See comments.
[5 Jun 2007 18:04]
Sergei Golubchik
This is not a gcc bug. This is -ffloat-store gcc option, or, more precisely, the optimization that it inhibits.
[6 Jun 2007 12:22]
Martin Hansson
Quite correct. Passing -ffloat-store to gcc avoids this problem by not storing the cost for reading of (wrong) key i2 in register st0. However I don't understand why this works. The offending expression is read_time > found_read_time. The value of read_time is always in memory and its value is 3.4100000000000001, equal to the supposed values of found_read_time. When st0 is used for found_read_time, however its value is 3.4100000000000001421085471520200372 In other words, *even greater* than it's supposed to be, and still the expression is true. Anyway, -ffloat-store is the solution to this problem.
[6 Jun 2007 13:20]
Sergei Golubchik
No, the solution is not to use non-deterministic tests. Add few more rows to the table to ensure that EXPLAIN result is stable
[12 Jun 2007 13:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/28576 ChangeSet@1.2561, 2007-06-12 15:10:33+03:00, mhansson@linux-st28.site +2 -0 Bug#27634: group_by test fails On many architectures, e.g. 68000, x86, the double registers have higher precision than the IEEE standard prescribes. When compiled with flags -O and higher, some double's go into registers and therefore have higher precision. In one test case the cost information of the best and second-best key were close enough to be influenced by this effect, causing a failed test in distribution builds. Fixed by removing some rows from the table in question so that cost information is not influenced by decimals beyond standard definition of double.
[14 Jun 2007 19:00]
Bugs System
Pushed into 5.1.20-beta
[15 Jun 2007 13:45]
Paul DuBois
No changelog entry needed.