Bug #17355 Combination of "Group By" query crashes server
Submitted: 13 Feb 2006 23:37 Modified: 21 Mar 2006 13:31
Reporter: Mark Moseley Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.0.18-linux-x86_64-glibc23 OS:Linux (Red Hat Enterprise AS 4 Update 2)
Assigned to: CPU Architecture:Any

[13 Feb 2006 23:37] Mark Moseley
Description:
We have found a query that when run on mysqld from mysql-standard-5.0.18-linux-x86_64-glibc23 causes the server to segfault. When running the same query on the same server using mysql-standard-5.0.18-linux-i686-glibc23, the query runs fine and nothing amiss happens. This has been replicated on both of our 64-bit boxes

The servers are each a 4-way box with 16 gigs of memory. The bug occurs with exact consistency.

When the query is run, the following message is sent to console:
*** glibc detected *** double free or corruption (out): 0x0000002b1a2b8f40 ***

This is the error log in its entirety:

<begin>
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=822083584
read_buffer_size=2093056
max_used_connections=1501
max_connections=1500
threads_connected=448
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 6940804 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

060213 13:46:09 [Note] /usr/local/mysql-standard-5.0.18-linux-x86_64-glibc23/bin/mysqld: ready for connections.
Version: '5.0.18-standard-log'  socket: '/mysql/mysql.sock'  port: 3306  MySQL Community Edition - Standard (GPL)
*** glibc detected *** malloc(): memory corruption: 0x0000002af91ecb80 ***
mysqld got signal 6;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=822083584
read_buffer_size=2093056
max_used_connections=723
max_connections=1500
threads_connected=682
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 6940804 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

060213 15:52:57 [Note] /usr/local/mysql-standard-5.0.18-linux-x86_64-glibc23/bin/mysqld: ready for connections.
Version: '5.0.18-standard-log'  socket: '/mysql/mysql.sock'  port: 3306  MySQL Community Edition - Standard (GPL)
*** glibc detected *** free(): invalid next size (fast): 0x0000000000dbae40 ***
<end>

I've gotten this error too in the error logs:
Fatal signal 11 while backtracing

The table is very large, size- and column-wise:

Output of "DESC MigrationStatus" will be attached.

mysql> show table status like "MigrationStatus"\G
*************************** 1. row ***************************
           Name: MigrationStatus
         Engine: MyISAM
        Version: 7
     Row_format: Dynamic
           Rows: 421234
 Avg_row_length: 1941
    Data_length: 817658572
Max_data_length: 4294967295
   Index_length: 48335872
      Data_free: 2168
 Auto_increment: 99955424
    Create_time: 2006-01-23 13:22:43
    Update_time: 2006-02-13 18:20:22
     Check_time: 2006-02-13 17:28:07
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

The query is this:
mysql> select TestGroup, count(*)  from MigrationStatus where origin=131 group by TestGroup,Status_Migrated order by TestGroup;
*** glibc detected *** double free or corruption (!prev): 0x0000000000bbb8f0 ***
+-----------+----------+
| TestGroup | count(*) |
+-----------+----------+
| LastGroup |      125 |
| LastGroup |       27 |
| LastGroup |      682 |
| LastGroup |     2666 |
| LastGroup |        4 |
| LastGroup |        4 |
| LastGroup |     2108 |
| LastGroup |       45 |
| LastGroup |      789 |
| MWeek0    |      497 |
| MWeek0    |       10 |
| MWeek0    |        2 |
| MWeek0    |       40 |
| MWeek1    |      206 |
| MWeek1    |     5204 |
| MWeek1    |       44 |
| MWeek1    |        4 |
| MWeek1    |       74 |
| MWeek1    |      485 |
| MWeek2    |     1273 |
| MWeek2    |        5 |
| MWeek2    |     5785 |
| MWeek2    |      472 |
| MWeek2    |       92 |
| MWeek2    |      353 |
| MWeek2    |       29 |
| MWeek3    |    10326 |
| MWeek3    |        4 |
| MWeek3    |      462 |
| MWeek3    |       53 |
| MWeek3    |        1 |
| MWeek3    |       27 |
| MWeek3    |       15 |
| MWeek3    |       37 |
| MWeek4    |     6343 |
| MWeek4    |       32 |
| MWeek4    |       41 |
| MWeek4    |        1 |
| MWeek4    |      130 |
| MWeek4    |        6 |
| MWeek4    |       27 |
| MWeek5    |     6232 |
| MWeek5    |        1 |
| MWeek5    |       79 |
| MWeek5    |       53 |
| MWeek5    |        3 |
| MWeek5    |       27 |
| MWeek5    |       10 |
| MWeek5    |       60 |
| MWeek6    |     7810 |
| MWeek6    |        3 |
| MWeek6    |       24 |
| MWeek6    |       42 |
| MWeek6    |       74 |
| MWeek6    |       12 |
| MWeek6    |       36 |
| MWeek7    |     6260 |
| MWeek7    |        2 |
| MWeek7    |      365 |
| MWeek7    |       77 |
| MWeek7    |        2 |
| MWeek7    |      195 |
| MWeek7    |       16 |
| MWeek7    |      114 |
| sqluser   |        3 |
| sqluser   |     2330 |
| sqluser   |       76 |
| sqluser   |       43 |
| sqluser   |       12 |
+-----------+----------+
69 rows in set (1.38 sec)

As you can see, it returns the results of the query. In this output, the server was started from the same session, so that glibc error has gone to stdout. From this point on, the server is unusable. Subsequent commands on the mysql console typically hang (though a couple have not, but the query still appears in "show processlist") and the server will not accept any further connections via socket or TCP.

This query will also crash it every single time:
select TestGroup,Status_Migrated, count(*) as count from MigrationStatus where origin=131 group by TestGroup,Status_Migrated order by TestGroup,Status_Migrated

I've tried other combinations of fields within that table (combinations with other fields and TestGroup and other fields and Status_Migrated) and they seem to work fine. I've tried leaving out Status_Migrated from the entire query and it's fine. I've left off the "order by" portion (with TestGroup and Status_Migrated both present as selected fields and in the 'group by') and it still crashes. I've tried removing the 'as count' part of the select and it still crashes. 

I tried running the 32-bit version and the above queries all work just fine. It's only the 64-bit version that seems to have any issue.

Please understand that we only have 2 64-bit servers and both are in production use. I am not going to be able to run any further tests as they've both been switched to 32-bit versions of mysql (they are our core database servers). Hopefully this won't make it impossible to debug. The data in those tables is highly sensitive, so I will not be able to furnish you with a sample of the data. I'm hoping that the facts presented here will be enough. Please note too I've never submitted a MySQL bug report, so please forgive any shortcomings :)

Some other things that might be of use (both boxes are running identical OS's and have same rpm updates applied):

# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 4 (Nahant Update 2)

# uname -a
Linux mysqlmp02 2.6.9-22.0.2.ELsmp #1 SMP Thu Jan 5 17:11:56 EST 2006 x86_64 x86_64 x86_64 GNU/Linux

# rpm -q glibc
glibc-2.3.4-2.13

How to repeat:
Run:

select TestGroup,Status_Migrated, count(*) as count from MigrationStatus where origin=131 group by TestGroup,Status_Migrated order by TestGroup,Status_Migrated
[14 Feb 2006 11:35] Valeriy Kravchuk
Thank you for a detailed problem report. Please, send the SHOW CREATE TABLE MigrationStatus results. EXPLAIN results for your query (from 32-bit systems) and your my.cnf content will be also useful.
[14 Feb 2006 16:35] Mark Moseley
I've added the three things you've asked for in attached files. The my.cnf file is from mysqlmp02 but it is identical to mysqlmp01 except for the server_id and the names in paths.
[14 Feb 2006 17:03] Mark Moseley
I was able to do a little more testing today in 64-bit mode, with a dispensation to do so from my employer since the 32-bit version is somehow vastly slower than it was on a 32-bit box with 4 gigs and 2 processors (is that normal?).

The important thing here is that I mysqldumped the MigrationStatus table and reimported it. While running the 64-bit mysqld, the query did *not* segfault on those queries. So apparently there's something in the (old) data itself that the 64-bit binary doesn't like but that the 32-bit binary doesn't mind. Both those TestGroup and Status_Migrated fields are very simple. Here's the full output of the latter query:

mysql> select TestGroup,Status_Migrated, count(*) as count from MigrationStatus where origin=131 group by TestGroup,Status_Migrated order by TestGroup,Status_Migrated;
+-----------+-----------------+-------+
| TestGroup | Status_Migrated | count |
+-----------+-----------------+-------+
| LastGroup |               1 |   126 |
| LastGroup |               2 |    27 |
| LastGroup |               4 |   685 |
| LastGroup |               5 |  2673 |
| LastGroup |               6 |     4 |
| LastGroup |               7 |     4 |
| LastGroup |              11 |  2094 |
| LastGroup |              12 |    43 |
| LastGroup |              13 |   794 |
| MWeek0    |               4 |   497 |
| MWeek0    |               5 |    10 |
| MWeek0    |              11 |     2 |
| MWeek0    |              13 |    40 |
| MWeek1    |               1 |   206 |
| MWeek1    |               4 |  5207 |
| MWeek1    |               5 |    44 |
| MWeek1    |              11 |     4 |
| MWeek1    |              12 |    74 |
| MWeek1    |              13 |   482 |
| MWeek2    |               1 |  1270 |
| MWeek2    |               2 |     5 |
| MWeek2    |               4 |  5786 |
| MWeek2    |               5 |   474 |
| MWeek2    |              11 |    91 |
| MWeek2    |              12 |   353 |
| MWeek2    |              13 |    30 |
| MWeek3    |               1 |  8908 |
| MWeek3    |               2 |     4 |
| MWeek3    |               4 |  1870 |
| MWeek3    |               5 |    60 |
| MWeek3    |               6 |     1 |
| MWeek3    |              11 |    29 |
| MWeek3    |              12 |    14 |
| MWeek3    |              13 |    39 |
| MWeek4    |               1 |  6323 |
| MWeek4    |               4 |    38 |
| MWeek4    |               5 |    51 |
| MWeek4    |               6 |     1 |
| MWeek4    |              11 |   131 |
| MWeek4    |              12 |     6 |
| MWeek4    |              13 |    30 |
| MWeek5    |               1 |  6226 |
| MWeek5    |               2 |     1 |
| MWeek5    |               4 |    82 |
| MWeek5    |               5 |    53 |
| MWeek5    |               6 |     3 |
| MWeek5    |              11 |    31 |
| MWeek5    |              12 |    10 |
| MWeek5    |              13 |    59 |
| MWeek6    |               1 |  7808 |
| MWeek6    |               2 |     3 |
| MWeek6    |               4 |    26 |
| MWeek6    |               5 |    43 |
| MWeek6    |              11 |    73 |
| MWeek6    |              12 |    11 |
| MWeek6    |              13 |    37 |
| MWeek7    |               1 |  6248 |
| MWeek7    |               2 |     2 |
| MWeek7    |               4 |   374 |
| MWeek7    |               5 |    79 |
| MWeek7    |               7 |     2 |
| MWeek7    |              11 |   196 |
| MWeek7    |              12 |    15 |
| MWeek7    |              13 |   115 |
| sqluser   |               1 |     3 |
| sqluser   |               5 |  2328 |
| sqluser   |               6 |    76 |
| sqluser   |              11 |    45 |
| sqluser   |              13 |    12 |
+-----------+-----------------+-------+
69 rows in set (1.14 sec)

I neglected to mention in my initial bug report that these tables were originally on a server running 4.0.22 (different server, of course) on a 32-bit box, and were copied over cleanly. Running myisamchk (the one that came with the 64-bit 5.0.18 package) using "-a -o" does not seem to have any effect, as I had run that on MigrationStatus and was still able to get this same error.
[21 Feb 2006 13:31] Valeriy Kravchuk
The only right and recommended way to migrate from 4.x.y to 5.0.x is with mysqldump and restore, even in case of 32-bit only. Especially when your table has varchar columns. 

Can you, please, upload files for your original MyISAM table, from 4.0.22?
[22 Mar 2006 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".