Bug #47659 DISTINCT causes an OUT OF MEMORY error
Submitted: 27 Sep 2009 7:48 Modified: 28 Oct 2009 3:26
Reporter: Bing Wu Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: General Severity:S2 (Serious)
Version:5.0.75 OS:Linux (Ubuntu 9.04)
Assigned to: CPU Architecture:Any
Tags: distinct, out of memory

[27 Sep 2009 7:48] Bing Wu
Description:
When I tried to insert into a table from a distinct select, I always receive an Out of Memory error.
The query -
[code]
insert into emails (id, email)
select distinct individual_id, email from contact_records where
email is not null and individual_id is not null;
[/code]

The tables -
[code]
CREATE TABLE `emails` (
  `id` bigint(20) NOT NULL auto_increment,
  `email` varchar(256) NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `email_uindx` (`email`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=100000000

CREATE TABLE `contact_records` (
  `id` bigint(20) NOT NULL auto_increment,
  `cell_phone` varchar(255) character set latin1 default NULL,
  `city` varchar(255) character set latin1 default NULL,
  `company_name` varchar(255) character set latin1 default NULL,
  `creationDate` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
  `email` varchar(255) character set latin1 default NULL,
  `fax` varchar(255) character set latin1 default NULL,
  `first_name` varchar(255) character set latin1 default NULL,
  `home_phone` varchar(255) character set latin1 default NULL,
  `last_name` varchar(255) character set latin1 default NULL,
  `state` varchar(255) character set latin1 default NULL,
  `street` varchar(255) character set latin1 default NULL,
  `zip` varchar(255) character set latin1 default NULL,
  `update_date` timestamp NOT NULL default '0000-00-00 00:00:00',
  `individual_id` bigint(20) default NULL,
  PRIMARY KEY  (`id`),
) ENGINE=MyISAM AUTO_INCREMENT=4110000 DEFAULT CHARSET=utf8 MAX_ROWS=100000000 

[/code]

System -
An old AMD box running Ubuntu 9.04 and MySQL 5.0.75. 4G memory

Some system variable configurations -
key_buffer_size         = 1600M
max_allowed_packet      = 512M
#thread_stack            = 128M
thread_cache_size       = 32
tmp_table_size          = 2048M
max_heap_table_size     = 3072M
sort_buffer_size        = 128M

How to repeat:
1. Insert 5 million records into the contact_list table (random strings are fine)
2. Run the insert (with a DISTINCT select)

I consistently got "out of memory" error, even after I restarted the server.

More tests suggests that the distinct function causes this problem (everything is fine without it. Even if select from another test table with fewer columns than the contact_records table.
[27 Sep 2009 8:32] Valeriy Kravchuk
Thank you for the problem report. Please, send the results of:

show table status like ' contact_records'\G

and

file mysqld

Linux command. I want to check how big the table is and is your MySQL server 32-bit one.

In the meantime, try to set tmp_table_size and max_heap_table_size to some more reasonable values for the RAM you have. 100M or so maybe. This should allow you to execute the query successfully.
[27 Sep 2009 23:36] Bing Wu
Valeriy

Per your request, here is result of  "show table status"

*************************** 1. row ***************************
           Name: contact_records
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 4109999
 Avg_row_length: 166
    Data_length: 684243488
Max_data_length: 1099511627775
   Index_length: 197521408
      Data_free: 0
 Auto_increment: 4110000
    Create_time: 2009-09-27 16:14:21
    Update_time: 2009-09-27 16:15:27
     Check_time: 2009-09-27 16:16:20
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: max_rows=100000000
        Comment: 
-----------------------

"file mysqld" is - 
~# file /usr/sbin/mysqld
/usr/sbin/mysqld: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, stripped
------------------------

Please note, the OOM error does not happen to the contact_records table only. It happens whenever I have to do a "distinct", "count group by" on a large set.

As for tmp_table_size and max_heap_table_size - I need to dynamically create and destroy some large tmp tables. And I thought when memory is running low, the temp/memory tables will get a out-of-memory error first. So I kept them high. I will change and try again.
[28 Sep 2009 3:26] Valeriy Kravchuk
Please, change as I suggested and inform about the results. 32-bit mysqld is limited to 2G or so of memory allocated for all needs, so temporary tables in memory can NOT be 2+ G in size in any case.
[29 Oct 2009 0:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".