Bug #67454 SELECT ... INTO OUTFILE ends before all records are written
Submitted: 2 Nov 2012 9:50 Modified: 22 Dec 2012 21:04
Reporter: Per Lindberg Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.1.63 OS:Linux
Assigned to: CPU Architecture:Any

[2 Nov 2012 9:50] Per Lindberg
Description:
I try to save old records from a huge MyISAM table by using SELECT ... INTO OUTFILE. The operation terminates nicely, without any error message. but when I check the resulting CSV text file with 'tail foo.csv', I see that the operation has quit before all specified records are written.

This is really scary. It was pure luck that I discovered that the output was not complete. No error message!

More detail:

SELECT * FROM shown_ads
WHERE date < '20120601'
INTO OUTFILE 'shown_ads.csv'

FIELDS TERMINATED BY ','
  OPTIONALLY ENCLOSED BY '\''
  ESCAPED BY '\\'
LINES TERMINATED BY '\r\n'
;

The table contains records for all days, up to today (20121101), but the resulting CSV file ends around 20120403!
                         
'wc -l shown_ads.csv' gives a lower number (105010711) than SELECT COUNT(*) FROM shown_ads WHERE date < '20120601'.

Selecting on 'timestamp' instead gives a similar (but not exactly the same) result.

I run mysql  Ver 14.14 Distrib 5.1.63, for debian-linux-gnu (x86_64) using readline 6.1

Here's the table:

mysql> show create table shown_ads;
                                    
CREATE TABLE `shown_ads` (
  `timestamp` datetime NOT NULL,
  `date` char(8) COLLATE utf8_swedish_ci NOT NULL,
  `displayer_id` int(10) unsigned DEFAULT NULL,
  `displayer_group_id` int(10) unsigned DEFAULT NULL,
  `schedule_fpdb_id` int(10) unsigned DEFAULT NULL,
  `contract_ref` varchar(25) COLLATE utf8_swedish_ci DEFAULT NULL,
  `shown_time` int(10) unsigned DEFAULT NULL,
  `price` int(10) unsigned DEFAULT NULL,
  `ad_id` int(10) unsigned DEFAULT NULL,
  `advertiser_id` int(10) unsigned DEFAULT NULL,
  `is_car` tinyint(1) DEFAULT NULL,
  `car_model_id` int(10) unsigned DEFAULT NULL,
  `car_model_group_id` int(10) unsigned DEFAULT NULL,
  `regnumber` char(40) COLLATE utf8_swedish_ci DEFAULT NULL,
  `zipcode` char(5) COLLATE utf8_swedish_ci DEFAULT NULL,
  UNIQUE KEY `u1` (`displayer_id`,`timestamp`),
  KEY `idx1` (`timestamp`),
  KEY `idx2` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci

How to repeat:
See above; hope that the provided information is sufficient.
[13 Nov 2012 1:56] Sveta Smirnova
Thank you for the report.

I can not repeat described behavior.

Please provide us output of SHOW TABLE STATUS LIKE 'shown_ads' and output of SELECT INTO OUTFILE query you got in the MySQL command line client.
[13 Nov 2012 9:01] Per Lindberg
Below is the output for a *current* SHOW TABLE STATUS LIKE 'shown_ads'.
But please note that this table no longer holds the huge amount
of records as when I submitted the report. The purpose if the whole
operation was to roll out most of the old data, since the disk was
beginning to fll up. (I succeeded by choosing an earlier cutoff date).

I can understand if this is not easy to reproduce, since the number of
records were huge.

Hypotheis: perhaps the the resulting output file filled up the disk?
(There *was* a limited amout of free disk at the time).
If a 'disk full' error occurred during write, perhaps the unusual
problem situation was not handled correctly, and the error message
was lost? Just a thought.

I no longer have a copy of the faulty output file (it was huge,
and besides I don't think it would be a good idea to upload it
to bugs.mysql.com) but it looked good at the head and tail, except,
of course, that it ended way before the specified end date.

Hope this clarifies matters a bit. 

mysql> SHOW TABLE STATUS LIKE 'shown_ads';
+-----------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------+
| Name      | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time          | Collation       | Checksum | Create_options | Comment |
+-----------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------+
| shown_ads | MyISAM |      10 | Dynamic    | 40778582 |             52 |  2120486280 | 281474976710655 |   1802455040 |         0 |           NULL | 2011-11-29 17:07:17 | 2012-11-13 09:45:32 | 2012-11-02 12:59:01 | utf8_swedish_ci |     NULL |                |         |
+-----------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-----------------+----------+----------------+---------+
1 row in set (0.00 sec)
[13 Nov 2012 9:22] Sveta Smirnova
Thank you for the feedback.

> Hypotheis: perhaps the the resulting output file filled up the disk?

Yes, this can be the reason. Do you have MySQL error log file from that time?

Actually your words: "The operation terminates nicely, without any error
message." confuses me. If disk was filled up, this query should not complete. 

> I no longer have a copy of the faulty output file 

We don't need this. I was asked about the output of the command. In my test environment it was:

mysql [localhost] {msandbox} (test) > SELECT * FROM shown_ads WHERE date < '20120601' INTO OUTFILE 'shown_ads.csv'  FIELDS TERMINATED BY ','   OPTIONALLY ENCLOSED BY '\''   ESCAPED BY '\\' LINES TERMINATED BY '\r\n';
Query OK, 7430144 rows affected (3 min 46.78 sec)

I want to ensure there were no warnings information. Do you have such an output?
[13 Nov 2012 12:53] Per Lindberg
> Actually your words: "The operation terminates nicely, without any error
message." confuses me. If disk was filled up, this query should not complete. 

Exactly. But the query DID complete (in a way), without any error message on the command line.

Sorry, I did not keep the output to the command line (like "Query OK", etc).
But if I remember correctly, there was no error message there. That's why I think this is scary, and perhaps a bug in the error handling.

Here's the last output in /var/log/mysql/error.log. It appears to be related to the problem.

121101 16:18:15 [ERROR] /usr/sbin/mysqld: Disk is full writing './histdb/shown_ads.MYD' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
121101 16:18:15 [ERROR] /usr/sbin/mysqld: Retry in 60 secs. Message reprinted in 600 secs
121101 16:18:25 [ERROR] /usr/sbin/mysqld: Disk is full writing '/mnt/persist/mysql/histdb/arrived_cars.MYI' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
121101 16:18:25 [ERROR] /usr/sbin/mysqld: Retry in 60 secs. Message reprinted in 600 secs
[22 Nov 2012 21:04] Sveta Smirnova
Thank you for the feedback.

I still can not repeat described behavior:

mysql> SELECT * FROM shown_ads WHERE date < '20120601' INTO OUTFILE '/home/sveta/sandboxes/msb_5_5_28/bug67454/shown_ads.csv'  FIELDS TERMINATED BY ','   OPTIONALLY ENCLOSED BY '\''   ESCAPED BY '\\' LINES TERMINATED BY '\r\n';
ERROR 3 (HY000): Error writing file '/home/sveta/sandboxes/msb_5_5_28/bug67454/shown_ads.csv' (Errcode: 28)

So we need repeatable test case from you. We also should be sure you got "Query OK"
[23 Dec 2012 1:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".