Bug #14682 big LOAD DATA causes too many connections
Submitted: 6 Nov 2005 18:12 Modified: 4 Oct 2006 23:24
Reporter: Rick James Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S5 (Performance)
Version:4.0.23-4.0.26 OS:FreeBSD (FreeBSD 4.8)
Assigned to: CPU Architecture:Any

[6 Nov 2005 18:12] Rick James
Description:
At the end of 2-hour LOAD DATA, connections are denied for a few minutes.  Usually the state of existing connections is "Opening table" or "Closing table".

(More details in "How to Repeat".)

Offending command:
LOAD DATA      LOCAL INFILE  foo.txt      INTO TABLE CapIQ.FinancialData         FIELDS TERMINATED BY "'~'"        LINES  TERMINATED BY "#@#@#"

Table:
CREATE TABLE `FinancialData` (
  `financialPeriodID` int(11) NOT NULL default '0',
  `dataItemId` mediumint(9) NOT NULL default '0',
  `dataItemValue` float default NULL,
  `unitTypeID` smallint(6) default NULL,
  `nmFlag` smallint(6) default NULL,
  KEY `financialPeriodID` (`financialPeriodID`,`dataItemId`)
) TYPE=MyISAM ;

After loading:
           Name: FinancialData
           Type: MyISAM
     Row_format: Fixed
           Rows: 288733324
 Avg_row_length: 16
    Data_length: 4905946112
Max_data_length: 68719476735
   Index_length: 4335301632
      Data_free: 286212928
 Auto_increment: NULL
    Create_time: 2005-05-12 16:43:16
    Update_time: 2005-11-06 09:16:13
     Check_time: 2005-11-05 20:51:37
 Create_options: 
        Comment: 

How to repeat:
Set max_connections to 500.
Have 4 connections per second, with query(s) amounting to 350/sec.  (The connections can disconnect or timeout.  Steady state is 100-150 connections at a time, most sleeping.)
Run a LOAD DATA that takes more than 5000 seconds.
Note: The one LOAD DATA is the only connection touching its database; the rest are running unblocked in other databases (MyISAM/InnoDB).

At the end of the LOAD, it will take about 4 minutes for the system to finish any operations.  During this time, max_connections will be hit and further connections will be denied.

Suggested fix:
unk
[6 Nov 2005 20:59] Rick James
The same happens just when DELETE on that big table commences before the LOAD even started.  So basicly ending of any massive operation on that huge table (5GB) triggers that.

It looks like db hangs on the "cleanup after the big operation" stage.
[7 Nov 2005 7:18] Valeriy Kravchuk
Thank you for a problem report. Please, send the results (or, better, upload them as file) of the SHOW PROCESSLIST command executed when you have this hang. Send the content of your my.cnf file too.

Have you tried to work with newer versions (4.0.26)? Have you seen similar problems on other operating systems? If you built the server yourself from sources, please, inform about the configure options used.
[7 Nov 2005 19:07] Rick James
Have not tried any newer versions.  Do you know of any possible fixes?

Can I send processlist some other way?  It may contain sensitive info.

Here is my.cnf:

# -*- sh -*-

[mysqld]
port            = 3306
socket          = /tmp/mysql.sock
user            = mysql
skip-locking
set-variable    = max_connect_errors=999999999
basedir         = /home/y
datadir         = /home/y/var/mysql/data
log-error       = /home/y/logs/mysql/mysqld.err
tmpdir          = /tmp/
pid-file        = /home/y/var/mysql/mysqld.pid

skip-bdb

set-variable = thread_cache=100

## Used in full joins (when indexes are not invoved). This should be
## used only when adding an index is impractical.

set-variable = join_buffer=1M

## key_buffer is ised to cache index blocks (MyISAM only).
## This is the best use of memory other than the OS filesystem cache

## This helps out sequential scans (such as a full table scan in an
## query which does not (or can not) use an index.

set-variable = record_buffer=1M

## Each thread that needs to do a sort will allocate a buffer of this
## size. Used in ORDER BY queries.

set-variable = sort_buffer=1M

## What's the largest a temporary table may be? Used in GROUP BY
## queries.

set-variable = tmp_table_size=32M

## table_cache limits how many tables mysqld will keep open.
## Each table requires two file desciptors.

## wait_timeout prevents folks from holding idle connections for very long.
## They can reconnect if they need.
## Note: Raising wait_timeout or max_connections may require
##   raising the other.

## avoid FreeBSD DNS problems
skip-name-resolve
## IP vs Hostname (also skip-host-cache)

## avoid "niced" "threads" on FreeBSD
skip-thread-priority

## max_connections is the total number of open connections we're
## willing to handle at a given time.
## If we set this too high, we could run out of file descriptiors.

## max_user_connectionslimits the number of connections that any one
## user can have to the server at a given time.
## Prevents one runaway from slamming the
## server and using up all available connections.

## replication -- how a slave sees the master:

##master-host = db.finance.yahoo.com
##master-user = repl
##master-password = C0pyIT
##master-connect-retry = 15
set-variable = slave_net_timeout=60

## Don't bother replicating these:
binlog-ignore-db=CapIQ
binlog-ignore-db=Log
#binlog-ignore-table=IDX.Dirty
#binlog-ignore-table=IDX.Dist
#binlog-ignore-table=IDX.Q4Quotes

## (used to include Moreover)

# Master sends data to slave's IO_thread, which writes it to relay-log:
# These are needed for Backup Master and Relay slave:

relay-log           = /home/y/var/mysql/repl/relay-log
relay-log-index     = /home/y/var/mysql/repl/relay-index
relay-log-info-file = /home/y/var/mysql/repl/relay-log.info

# Relay slave records some databases into the binary log

## RELAY: Acting as intl-east.db.finance, we need these:
####binlog-do-db=Yoda
####binlog-do-db=Live
####binlog-do-db=Datascope
####binlog-do-db=Chart
####binlog-do-db=CoreData

log-bin        = /home/y/var/mysql/repl/binary-log
log-bin-index  = /home/y/var/mysql/repl/index
log-slave-updates
## End RELAY

## Tuned InnoDB settings for Finance Feeds

innodb_data_file_path = ibdata1:4000M;ibdata2:4000M;ibdata3:4000M;ibdata4:4000M;ibdata5:4000M;ibdata6:100M:autoextend
innodb_data_home_dir      = /home/y/var/mysql/ibdata/
innodb_log_group_home_dir = /home/y/var/mysql/iblog/
innodb_log_arch_dir       = /home/y/var/mysql/iblog/
# Probably optimal for RAID:
innodb_flush_log_at_trx_commit = 1
innodb_log_archive = 0
set-variable = innodb_mirrored_log_groups=1
set-variable = innodb_log_files_in_group=3
set-variable = innodb_log_file_size=16M
set-variable = innodb_log_buffer_size=8M
set-variable = innodb_buffer_pool_size=800M
set-variable = innodb_additional_mem_pool_size=4M
set-variable = innodb_file_io_threads=4
set-variable = innodb_lock_wait_timeout=50

## Tuned parameters for Finance Feeds

set-variable = key_buffer=700M
set-variable = max_allowed_packet=16M
set-variable = table_cache=600
set-variable = sort_buffer=1M
set-variable = net_buffer_length=8K
set-variable = max_connections=500
set-variable = max_user_connections=150
set-variable = myisam_sort_buffer_size=32M
set-variable = wait_timeout=40
set-variable = thread_concurrency=6
set-variable = long_query_time=1
log-slow-queries

query_cache_size = 4M
query_cache_type = 2

# This is set in /home/y/var/mysql/my.cnf
server-id = 999

## End Tuned.

## SSL Support
## Wired off until fixed (not critical to Finance)

#ssl-ca=/home/y/etc/mysql/cacert.pem
#ssl-cert=/home/y/etc/mysql/server-cert.pem
#ssl-key=/home/y/etc/mysql/server-key.pem

[myisamchk]
set-variable    = key_buffer=200M
set-variable    = sort_buffer=100M
set-variable    = read_buffer=10M
set-variable    = write_buffer=10M

# In case things tacked on...
[mysqld]
server-id = 126
[8 Nov 2005 11:57] Valeriy Kravchuk
Thank you for the additional information.

I will check for any possible fixes in a newer versions. You may upload your SHOW PROCESSLIST results as a private file using the File tab. It will be visible to MySQL developers only. Please, try.
[8 Nov 2005 17:32] Rick James
can you make the last submission private?
[8 Nov 2005 17:44] Rick James
I have dozens more, but most are longer than 200K.  While I complain about most processes being in "closing" or "opening", sometimes there is a large number of "Lock" cases.  But note that in most cases, the "time" is many seconds even though very few of the commands should take even one second.  Actions on CapIQ tables is very long because its tables are very large.
[8 Nov 2005 18:15] Valeriy Kravchuk
Thank you for the additional information. All files you submitted are private now. Let me analyze this for some time.
[13 Nov 2005 13:05] Valeriy Kravchuk
Thank you for the additional information. I have a couple of questions and one suggestion for you. 

What version of FreeBSD do you use (I had to ask it earlier...)?
Have you compiled 4.0.23 yourself or used MySQL supplied binaries?

And, please, try to upgrade to 4.0.26. I was no able to find a change that may influent your case directly, but the fix of bug #10600 in 4.0.26 makes this upgrade desirable, I think.
[17 Nov 2005 18:01] Rick James
This morning's hiccup came at the end of a long (3 hour) InnoDB DELETE:
Shortly before the huccup, PROCESSLIST had this for the dying item:
25132825        yahoo   216.109.124.67:46680    CapIQ   Query   13859   end4    DELETE FROM FinancialData \n                             USING  FinancialData,FinancialPeriod__\n                             WHERE \n                             FinancialData.financialPeriodID\n                                 =\n                             FinancialPeriod__.financialPeriodID

There was also a LOAD DATA running thru the time.

Longest running processes:
SUMMARY of processes running on Master:
Seconds  ID (kill)  Host         Action   Table(s)
-------  ---------  -----------  -------  --------
  14084   25132825  proc20..dcn  end4     FROM FinancialData                           
    290   25214036  biz11..scd   closing  REPLACE INTO AutoReport                      
    290   25214040  biz10..scd   closing  REPLACE INTO AutoReport                      
    290   25195865  proc12..scd  closing  FROM Headlines H, S2H X                      
    290   25214050  proc20..dcn           INTO TABLE CapIQ.FinancialPeriod             
    290   25195188  proc21..dcn  statisti FROM NewsFlash                               
    290   25196620  proc13..scd  closing  REPLACE INTO IDX.HeadlineTransaction         
    290   25212698  proc20..dcn  Copying  FROM MultexDocToTicker,MultexReports         
    290   25212404  db26..dcn    Updating UPDATE Data                                  
    290   25212879  proc13..scd  closing  REPLACE INTO IDX.HeadlineTransaction         
    289   25195892  proc14..scd  statisti FROM NewsFlash                               
    289   25195934  proc10..scd  update   REPLACE INTO IDX.HeadlineTransaction         
    289   25102916  biz23..dcn   update   into BizRedirects values (1132247198, 1, 'fpc
    288   25214052  mbl-gtw2.oa. closing  REPLACE INTO AutoReport                      
    288   13463095  proc10..scd  Locked   LOCKED  FROM Data                            
    288   24806527  db26..dcn    statisti FROM Data                                    
    288   24806359  db26..dcn    statisti FROM Data                                    
    288   25214051  biz26..dcn   update   INTO NotFound (url,ref,hostname)             
    288   25214053  mbl-gtw4.oa. update   REPLACE INTO AutoReport                      
    288   25214054  biz12..scd   Locked   LOCKED  INTO NotFound (url,ref,hostname)     
    288   25214063  proc21..dcn  Sending  FROM ArticleHtmlFiles                        
    287   24806642  db26..dcn    closing  FROM Data                                    
    287   25214064  biz20..dcn   Locked   LOCKED  REPLACE INTO AutoReport              
    285   24806549  db26..dcn    statisti FROM Data                                    
    285   25214066  biz13..scd   Locked   LOCKED  INTO NotFound (url,ref,hostname)     
    283   25135189  biz30..mud   Locked   LOCKED  into BizRedirects values (1132247201,
    283   25214068  feed5.news.d Locked   LOCKED  REPLACE INTO AutoReport              
    283   25214074  proc12..scd  closing  FROM Locks                                   
    283   25214076  proc14..scd  Locked   LOCKED  REPLACE INTO AutoReport              
    282   25083073  biz33..mud   Locked   LOCKED  into BizRedirects values (1132247205,
    282   25214077  biz33..mud   Locked   LOCKED  INTO NotFound (url,ref,hostname)     
    281   25210240  proc12..scd  closing  INTO Headlines                               
    281   25214078  biz32..mud   closing  FROM FeedInfo                                
...
    267   25213363  tempest.corp Opening  INTO AdminAccess.UsageLog (                  
    267   25214303  biz12..scd   Opening  REPLACE INTO AutoReport                      
    266   13818888  beta3..scd   Opening  from tstamp                                  
    266   25214304  biz21..dcn   Opening  REPLACE INTO AutoReport                      
    266   21373849  ichart11..dc Opening  replace into UpdatedSymbols (Symbol, Source) 
    266   25185209  biz10..scd   Opening  into BizRedirects values (1132247220, 10, 'se
    265   25212788  wire3.nuq    Opening  INTO FeedData.RawData                        
    265   25214305  biz12..scd   Opening  FROM FeedInfo                                
...
Locks:
Database        Table                    In_use  Name_locked
--------------- ----------------             --   ---
Finance         Locks                       175     0
DbDist          AutoReport                   16     0
IDX             NewsFlash                     2     0
CapIQ           FinancialPeriod               1     0
Log             NotFound                     11     0
DbDist          Data                         12     0
IDX             S2H                           1     0
IDX             Dist                          1     0
Reports         MultexReports                 1     0
IDX             Headlines                     2     0
IDX             HeadlineTransaction           3     0
Yoda            Data                          3     0
IDX             Dirty                         1     0
CapIQ           FinancialPeriod__             1     0
IDX             ArticleHtmlFiles              1     0
FeedData        Item                         42     0
Finance         BizRedirects                  6     0
Reports         MultexDocToTicker             1     0
FeedData        RawData                       1     0
IDX             Nab                           3     0
Chart           UpdatedSymbols                1     0
IDX             FeedInfo                      6     0
MySQL_Admin     Heartbeat                     3     0
CapIQ           FinancialData                 1     0
(738 others)                                  0     0

[I have not upgraded server yet -- not something to do casually in a 24/7 master.]
[18 Nov 2005 16:59] Valeriy Kravchuk
Thank you for the additional information. I am not able to analyze all that really soon, but I still had not got the answer to the simple question in the previous comment:

What version of FreeBSD do you use?

I want to figure out are there any possible problems with threads implementation...
[18 Nov 2005 21:33] Rick James
FreeBSD tempest.corp.yahoo.com 4.8-YAHOO-20030628 FreeBSD 4.8-YAHOO-20030628 #1: Mon Mar 29 16:26:49 PST 2004      i386
Note:  linuxthreads is used.
(Jeremy Cole jcole@yahoo-inc.com may be able to explain further.)
[15 Dec 2005 17:22] Rick James
Does "end4" give you any clues?...

SUMMARY of processes running on Master:
Seconds  ID (kill)  Host         Action   Table(s)
-------  ---------  -----------  -------  --------
  14257    7665799  proc21..dcn  end4     FROM FinancialData                           
    338    1151588  proc10..scd  closing  UPDATE Data                                  
    338    7300405  db26..dcn    closing  FROM Data                                    
    338    7750843  proc20..dcn  closing  REPLACE INTO Data                            
    338    7751030  proc21..dcn           INTO TABLE CapIQ.FinancialPeriod             
    338    7748066  db26..dcn    closing  UPDATE Data                                  
...
[16 Dec 2005 19:03] Rick James
Or maybe Reopen:

SUMMARY of processes running on Master:
Seconds  ID (kill)  Host         Action   Table(s)
-------  ---------  -----------  -------  --------
    309    8189088  proc10..scd  Opening  FROM FeaturedArticle.Schedule S,             
    309    8189178  proc20..dcn           INTO TABLE CapIQ.FinancialPeriod             
    271    8189310  proc20..dcn  init     FROM Dist                                    
    251    8189440  biz20..dcn   closing  INTO NotFound (url,ref,hostname)             
    251    8189441  biz31..mud   closing  INTO NotFound (url,ref,hostname)             
    249    8189450  mbl-gtw1.oa. closing  REPLACE INTO AutoReport                      
    249    8189451  proc21..dcn  closing  REPLACE INTO AutoReport                      
    249    8189453  p14.oa.scd   Reopen t REPLACE INTO AutoReport                      
    248    8189457  biz22..dcn   Reopen t INTO NotFound (url,ref,hostname)             
    246    8189458  biz13..scd   closing  FROM FeedInfo                                
    246    8189459  biz26..dcn   closing  INTO NotFound (url,ref,hostname)             

Types:
  346 Active
    2 Binlog Dump
    2 Connect
   31 Sleep
Infos:
    1 
   34 Opening table
   72 Opening tables
    2 Reopen tables
    1 Waiting for tables
  191 closing tables
    1 init
   41 logging slow query
    9 undef
    3 update

  8189453 dbdist_report@p1    Query     249 Reopen table
REPLACE INTO AutoReport
                    SET Hostname         = 'p14.oa.scd.yahoo.com',
                        Pid              = 43736,
                        Frequency        = '60',
                        ClientVersion    = '2.38',
                        TargetGroup      = 'mobile-news',
                        PingTime         = 1134750324,
                        CursorId         = '171022565',
                        CursorInsertTime = '1134750247',
                        CursorUpdateTime = 1134750253

CREATE TABLE `AutoReport` (
  `TargetGroup` varchar(40) NOT NULL default '',
  `Hostname` varchar(40) NOT NULL default '',
  `Pid` int(10) unsigned NOT NULL default '0',
  `PingTime` int(10) unsigned NOT NULL default '0',
  `Frequency` int(10) unsigned NOT NULL default '0',
  `CursorId` int(10) unsigned NOT NULL default '0',
  `CursorInsertTime` int(10) unsigned NOT NULL default '0',
  `CursorUpdateTime` int(10) unsigned NOT NULL default '0',
  `ClientVersion` varchar(10) NOT NULL default '',
  `CursorTime` int(10) unsigned NOT NULL default '0',
  UNIQUE KEY `TargetGroup` (`TargetGroup`,`Hostname`),
  KEY `Hostname` (`Hostname`)
) TYPE=MyISAM;
[22 Dec 2005 18:16] Valeriy Kravchuk
According to the manual (http://dev.mysql.com/doc/refman/5.0/en/show-processlist.html, for example):

"Reopen table

The thread got a lock for the table, but noticed after getting the lock that the underlying table structure changed. It has freed the lock, closed the table, and is trying to reopen it."

So, is it possible that table was altered (ALTER TABLE ...) during that REPLACE statement execution?

As for 'end4' state, I have no clue yet. There is no such state in current sources:

[openxs@Fedora sql]$ grep -n Reopen *.cc
slave.cc:4783:  // The master has switched to a new log file; Reopen the old log file
sql_base.cc:1386:** Reopen an table because the definition has changed. The date file for the
sql_base.cc:1501:  Reopen all tables with closed data files
sql_base.cc:1660:    thd->proc_info="Reopen tables";
[openxs@Fedora sql]$ grep -n end4 *.cc
[openxs@Fedora sql]$
[22 Dec 2005 18:59] Rick James
It is extremely unlikely that either of those two tables was being ALTERed at that time.  Perhaps this SHOW TABLE STATUS 'proves' that it was not happening on 12/20 to AutoReport:

Name  	AutoReport
Type 	MyISAM
Row_format 	Dynamic
Rows 	252
Avg_row_length 	73
Data_length 	18512
Max_data_length 	4294967295
Index_length 	21504
Data_free 	0
Auto_increment 	
Create_time 	2004-11-05 00:45:01
Update_time 	2005-12-22 10:26:46
Check_time 	2005-12-18 03:25:27
Create_options 	
Comment 	

Or does CHECK count?  (I doubt if a CHECK was done at that time of day.)

The PROCESSLIST with "Reopen"s was taken about 2005-12-16 8:32 AM (server time is Pacific).

Binlog was not rolling over at that time:
...
-rw-rw----  1 mysql  users  1073742146 Dec 16 03:16 binary-log.2146
-rw-rw----  1 mysql  users  1073742078 Dec 16 12:42 binary-log.2147
...

TRUNCATE TABLE NotFound is done on Mondays (only).  But the max_connections problem happens about twice a day.
[25 Dec 2005 7:24] Valeriy Kravchuk
Is it possible that mysqldump is performed for the databases during that long-running operations with big tables? 

Please, provide OS-level disk I/O statistics for your server after that long-running operations? I mean, the results of the proper iostat and/or gstat commands.
[27 Dec 2005 4:49] Rick James
The system was upgraded to 4.0.26 29 days ago (about Nov 27).  Problem continues to occur.

mysqldump is rarely used (against the master), and not likely to have been used at when the spikes occurred.

Am sending disk/cpu/mem/etc stats around the several spikes separately.
[26 Jan 2006 11:21] Valeriy Kravchuk
Thank you for the additional information. I can neither repeat the load needed, nor explain the exact reasons for the results you got. So, I'll put this bug report back to "Open" status for somebody else to look at it.

I can recommend you to upgrade to MySQL 4.1.18, as soon as it will be released, and try to use it. But I can not point out any particular reason for this problem to disappear.
[4 Sep 2006 23:24] MySQL Verification Team
Please test with the 4.1 latest release according how Valeriy asked before.

http://dev.mysql.com/downloads/mysql/4.1.html

Thanks in advance.
[5 Oct 2006 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".