| Bug #17355 | Combination of "Group By" query crashes server | ||
|---|---|---|---|
| Submitted: | 13 Feb 2006 23:37 | Modified: | 21 Mar 2006 13:31 |
| Reporter: | Mark Moseley | Email Updates: | |
| Status: | No Feedback | Impact on me: | |
| Category: | MySQL Server | Severity: | S1 (Critical) |
| Version: | 5.0.18-linux-x86_64-glibc23 | OS: | Linux (Red Hat Enterprise AS 4 Update 2) |
| Assigned to: | CPU Architecture: | Any | |
[14 Feb 2006 11:35]
Valeriy Kravchuk
Thank you for a detailed problem report. Please, send the SHOW CREATE TABLE MigrationStatus results. EXPLAIN results for your query (from 32-bit systems) and your my.cnf content will be also useful.
[14 Feb 2006 16:35]
Mark Moseley
I've added the three things you've asked for in attached files. The my.cnf file is from mysqlmp02 but it is identical to mysqlmp01 except for the server_id and the names in paths.
[14 Feb 2006 17:03]
Mark Moseley
I was able to do a little more testing today in 64-bit mode, with a dispensation to do so from my employer since the 32-bit version is somehow vastly slower than it was on a 32-bit box with 4 gigs and 2 processors (is that normal?). The important thing here is that I mysqldumped the MigrationStatus table and reimported it. While running the 64-bit mysqld, the query did *not* segfault on those queries. So apparently there's something in the (old) data itself that the 64-bit binary doesn't like but that the 32-bit binary doesn't mind. Both those TestGroup and Status_Migrated fields are very simple. Here's the full output of the latter query: mysql> select TestGroup,Status_Migrated, count(*) as count from MigrationStatus where origin=131 group by TestGroup,Status_Migrated order by TestGroup,Status_Migrated; +-----------+-----------------+-------+ | TestGroup | Status_Migrated | count | +-----------+-----------------+-------+ | LastGroup | 1 | 126 | | LastGroup | 2 | 27 | | LastGroup | 4 | 685 | | LastGroup | 5 | 2673 | | LastGroup | 6 | 4 | | LastGroup | 7 | 4 | | LastGroup | 11 | 2094 | | LastGroup | 12 | 43 | | LastGroup | 13 | 794 | | MWeek0 | 4 | 497 | | MWeek0 | 5 | 10 | | MWeek0 | 11 | 2 | | MWeek0 | 13 | 40 | | MWeek1 | 1 | 206 | | MWeek1 | 4 | 5207 | | MWeek1 | 5 | 44 | | MWeek1 | 11 | 4 | | MWeek1 | 12 | 74 | | MWeek1 | 13 | 482 | | MWeek2 | 1 | 1270 | | MWeek2 | 2 | 5 | | MWeek2 | 4 | 5786 | | MWeek2 | 5 | 474 | | MWeek2 | 11 | 91 | | MWeek2 | 12 | 353 | | MWeek2 | 13 | 30 | | MWeek3 | 1 | 8908 | | MWeek3 | 2 | 4 | | MWeek3 | 4 | 1870 | | MWeek3 | 5 | 60 | | MWeek3 | 6 | 1 | | MWeek3 | 11 | 29 | | MWeek3 | 12 | 14 | | MWeek3 | 13 | 39 | | MWeek4 | 1 | 6323 | | MWeek4 | 4 | 38 | | MWeek4 | 5 | 51 | | MWeek4 | 6 | 1 | | MWeek4 | 11 | 131 | | MWeek4 | 12 | 6 | | MWeek4 | 13 | 30 | | MWeek5 | 1 | 6226 | | MWeek5 | 2 | 1 | | MWeek5 | 4 | 82 | | MWeek5 | 5 | 53 | | MWeek5 | 6 | 3 | | MWeek5 | 11 | 31 | | MWeek5 | 12 | 10 | | MWeek5 | 13 | 59 | | MWeek6 | 1 | 7808 | | MWeek6 | 2 | 3 | | MWeek6 | 4 | 26 | | MWeek6 | 5 | 43 | | MWeek6 | 11 | 73 | | MWeek6 | 12 | 11 | | MWeek6 | 13 | 37 | | MWeek7 | 1 | 6248 | | MWeek7 | 2 | 2 | | MWeek7 | 4 | 374 | | MWeek7 | 5 | 79 | | MWeek7 | 7 | 2 | | MWeek7 | 11 | 196 | | MWeek7 | 12 | 15 | | MWeek7 | 13 | 115 | | sqluser | 1 | 3 | | sqluser | 5 | 2328 | | sqluser | 6 | 76 | | sqluser | 11 | 45 | | sqluser | 13 | 12 | +-----------+-----------------+-------+ 69 rows in set (1.14 sec) I neglected to mention in my initial bug report that these tables were originally on a server running 4.0.22 (different server, of course) on a 32-bit box, and were copied over cleanly. Running myisamchk (the one that came with the 64-bit 5.0.18 package) using "-a -o" does not seem to have any effect, as I had run that on MigrationStatus and was still able to get this same error.
[21 Feb 2006 13:31]
Valeriy Kravchuk
The only right and recommended way to migrate from 4.x.y to 5.0.x is with mysqldump and restore, even in case of 32-bit only. Especially when your table has varchar columns. Can you, please, upload files for your original MyISAM table, from 4.0.22?
[22 Mar 2006 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".

Description: We have found a query that when run on mysqld from mysql-standard-5.0.18-linux-x86_64-glibc23 causes the server to segfault. When running the same query on the same server using mysql-standard-5.0.18-linux-i686-glibc23, the query runs fine and nothing amiss happens. This has been replicated on both of our 64-bit boxes The servers are each a 4-way box with 16 gigs of memory. The bug occurs with exact consistency. When the query is run, the following message is sent to console: *** glibc detected *** double free or corruption (out): 0x0000002b1a2b8f40 *** This is the error log in its entirety: <begin> mysqld got signal 11; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=822083584 read_buffer_size=2093056 max_used_connections=1501 max_connections=1500 threads_connected=448 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 6940804 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. 060213 13:46:09 [Note] /usr/local/mysql-standard-5.0.18-linux-x86_64-glibc23/bin/mysqld: ready for connections. Version: '5.0.18-standard-log' socket: '/mysql/mysql.sock' port: 3306 MySQL Community Edition - Standard (GPL) *** glibc detected *** malloc(): memory corruption: 0x0000002af91ecb80 *** mysqld got signal 6; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=822083584 read_buffer_size=2093056 max_used_connections=723 max_connections=1500 threads_connected=682 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 6940804 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. 060213 15:52:57 [Note] /usr/local/mysql-standard-5.0.18-linux-x86_64-glibc23/bin/mysqld: ready for connections. Version: '5.0.18-standard-log' socket: '/mysql/mysql.sock' port: 3306 MySQL Community Edition - Standard (GPL) *** glibc detected *** free(): invalid next size (fast): 0x0000000000dbae40 *** <end> I've gotten this error too in the error logs: Fatal signal 11 while backtracing The table is very large, size- and column-wise: Output of "DESC MigrationStatus" will be attached. mysql> show table status like "MigrationStatus"\G *************************** 1. row *************************** Name: MigrationStatus Engine: MyISAM Version: 7 Row_format: Dynamic Rows: 421234 Avg_row_length: 1941 Data_length: 817658572 Max_data_length: 4294967295 Index_length: 48335872 Data_free: 2168 Auto_increment: 99955424 Create_time: 2006-01-23 13:22:43 Update_time: 2006-02-13 18:20:22 Check_time: 2006-02-13 17:28:07 Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec) The query is this: mysql> select TestGroup, count(*) from MigrationStatus where origin=131 group by TestGroup,Status_Migrated order by TestGroup; *** glibc detected *** double free or corruption (!prev): 0x0000000000bbb8f0 *** +-----------+----------+ | TestGroup | count(*) | +-----------+----------+ | LastGroup | 125 | | LastGroup | 27 | | LastGroup | 682 | | LastGroup | 2666 | | LastGroup | 4 | | LastGroup | 4 | | LastGroup | 2108 | | LastGroup | 45 | | LastGroup | 789 | | MWeek0 | 497 | | MWeek0 | 10 | | MWeek0 | 2 | | MWeek0 | 40 | | MWeek1 | 206 | | MWeek1 | 5204 | | MWeek1 | 44 | | MWeek1 | 4 | | MWeek1 | 74 | | MWeek1 | 485 | | MWeek2 | 1273 | | MWeek2 | 5 | | MWeek2 | 5785 | | MWeek2 | 472 | | MWeek2 | 92 | | MWeek2 | 353 | | MWeek2 | 29 | | MWeek3 | 10326 | | MWeek3 | 4 | | MWeek3 | 462 | | MWeek3 | 53 | | MWeek3 | 1 | | MWeek3 | 27 | | MWeek3 | 15 | | MWeek3 | 37 | | MWeek4 | 6343 | | MWeek4 | 32 | | MWeek4 | 41 | | MWeek4 | 1 | | MWeek4 | 130 | | MWeek4 | 6 | | MWeek4 | 27 | | MWeek5 | 6232 | | MWeek5 | 1 | | MWeek5 | 79 | | MWeek5 | 53 | | MWeek5 | 3 | | MWeek5 | 27 | | MWeek5 | 10 | | MWeek5 | 60 | | MWeek6 | 7810 | | MWeek6 | 3 | | MWeek6 | 24 | | MWeek6 | 42 | | MWeek6 | 74 | | MWeek6 | 12 | | MWeek6 | 36 | | MWeek7 | 6260 | | MWeek7 | 2 | | MWeek7 | 365 | | MWeek7 | 77 | | MWeek7 | 2 | | MWeek7 | 195 | | MWeek7 | 16 | | MWeek7 | 114 | | sqluser | 3 | | sqluser | 2330 | | sqluser | 76 | | sqluser | 43 | | sqluser | 12 | +-----------+----------+ 69 rows in set (1.38 sec) As you can see, it returns the results of the query. In this output, the server was started from the same session, so that glibc error has gone to stdout. From this point on, the server is unusable. Subsequent commands on the mysql console typically hang (though a couple have not, but the query still appears in "show processlist") and the server will not accept any further connections via socket or TCP. This query will also crash it every single time: select TestGroup,Status_Migrated, count(*) as count from MigrationStatus where origin=131 group by TestGroup,Status_Migrated order by TestGroup,Status_Migrated I've tried other combinations of fields within that table (combinations with other fields and TestGroup and other fields and Status_Migrated) and they seem to work fine. I've tried leaving out Status_Migrated from the entire query and it's fine. I've left off the "order by" portion (with TestGroup and Status_Migrated both present as selected fields and in the 'group by') and it still crashes. I've tried removing the 'as count' part of the select and it still crashes. I tried running the 32-bit version and the above queries all work just fine. It's only the 64-bit version that seems to have any issue. Please understand that we only have 2 64-bit servers and both are in production use. I am not going to be able to run any further tests as they've both been switched to 32-bit versions of mysql (they are our core database servers). Hopefully this won't make it impossible to debug. The data in those tables is highly sensitive, so I will not be able to furnish you with a sample of the data. I'm hoping that the facts presented here will be enough. Please note too I've never submitted a MySQL bug report, so please forgive any shortcomings :) Some other things that might be of use (both boxes are running identical OS's and have same rpm updates applied): # cat /etc/redhat-release Red Hat Enterprise Linux AS release 4 (Nahant Update 2) # uname -a Linux mysqlmp02 2.6.9-22.0.2.ELsmp #1 SMP Thu Jan 5 17:11:56 EST 2006 x86_64 x86_64 x86_64 GNU/Linux # rpm -q glibc glibc-2.3.4-2.13 How to repeat: Run: select TestGroup,Status_Migrated, count(*) as count from MigrationStatus where origin=131 group by TestGroup,Status_Migrated order by TestGroup,Status_Migrated