Bug #71174 | LOAD DATA LOCAL INFILE not loading all records in MySQL 5.7.3 (OK in 5.6.10) | ||
---|---|---|---|
Submitted: | 18 Dec 2013 15:11 | Modified: | 23 Dec 2013 14:13 |
Reporter: | Oscar González | Email Updates: | |
Status: | Not a Bug | Impact on me: | |
Category: | MySQL Server: Windows | Severity: | S2 (Serious) |
Version: | 5.7.3 | OS: | Windows (Windows 7 Enterprise 64-bit) |
Assigned to: | CPU Architecture: | Any | |
Tags: | 5.7.3, data, infile, load |
[18 Dec 2013 15:11]
Oscar González
[18 Dec 2013 16:34]
MySQL Verification Team
Thank you for the bug report. Are you able to provide the file and create table command?. Thanks.
[18 Dec 2013 19:29]
Oscar González
Yes, I'm able to provide the data file, but I'm a little concerned about your privacy policies, because that file holds company confidential information. Will your policies protect the confideatility of the contents of the file? If yes, I can upload it. Please, let me know. The CREATE TABLE command is: CREATE TABLE comcel.inf_app_bl ( RF_ID VARCHAR(100), TECH VARCHAR(100), REGION VARCHAR(100), GRUPO SMALLINT(6), BCF_WBTS_NAME VARCHAR(100), BTS_WCEL_NAME VARCHAR(100), CELL_ID MEDIUMINT(9) NOT NULL, CLUSTER_TOP VARCHAR(100), LATITUD DECIMAL(10,8), LONGITUD DECIMAL(10,8), AZIMUTH SMALLINT(6), BSC_RNC_NAME VARCHAR(100) NOT NULL, TILT_ELECTRICO SMALLINT(6), TILT_MECANICO SMALLINT(6), TIPO_ANTENA VARCHAR(100), ALTURA_ANTENA DECIMAL(5,2), DEPARTAMENTO VARCHAR(100), LOCALIDAD VARCHAR(100), MARKET VARCHAR(100), CLUSTER_AMX VARCHAR(100), PRIMARY KEY (BSC_RNC_NAME, CELL_ID) ) ENGINE = innodb ROW_FORMAT = DEFAULT;
[19 Dec 2013 14:32]
Oscar González
I was looking in detail at the data contained in the file, and I noticed that "LOAD DATA LOCAL INFILE" stops when found the spanish character "ñ" into the text to import. I removed some lines having "ñ" and LOAD DATA LOCAL INFILE was able to load some more rows into the table, but then it stopped again when found another line having "ñ" I checked the table's character-set and collation, and is setted to "utf-8" and "utf8_general_ci". Like I stated before, this same file loaded FULLY into the same table in MySQL 5.6.10 (even with several rows having the spanish character "ñ"), so I think there is something different in the way 5.7.3 processes the LOAD DATA LOCAL INFILE command
[19 Dec 2013 14:41]
Peter Laursen
aahhhh! Thre is a 'character set' option in LOAD DATA statement - please refer: http://dev.mysql.com/doc/refman/5.6/en/load-data.html If not specified I think it defaults to the server default character set. Could you have had different server default character set in your 5.6 and 5.7 server installations - the one in 5.6 matching the file encoding and the one in 5.7 not? Try using the character set option with a setting matching the file encoding ('latin1' if file is ANSI encoded - 'utf8' if file is UTF8 encoded. And also note that files encoded withht 'Windows Unicode' (UTF16 with little endian byte order cannot be used with MySQL LOAD DATA command - I wrote a blog about that: http://blog.webyog.com/2012/08/25/character-sets-in-mysql-still-not-for-windows-users/) Peter (not a MySQL/Oracle person)
[19 Dec 2013 21:28]
Oscar González
Thanks for your great help, Peter! You were ABSOLUTELY and COMPLETELY right :-) I changed the command to: LOAD DATA LOCAL INFILE 'C:\\NetDBLoad\\NetDBLoad\\local\\plantillas\\bl.csv' INTO TABLE inf_app_baseline CHARACTER SET latin1 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n' IGNORE 1 LINES; ... and the file loaded completely!!!! The magic was in "CHARACTER SET latin1". But now I wonder what should be the correct behavior of "LOAD DATA LOCAL INFILE", because as far as I know, the command is supposed to --SKIP-- any records not following the table structure, and continue loading the other records that are OK. But in my case, 5.7.3 stops loading any further records beyond the first one having the "ñ" character, even tough there are many other records that are OK! Normal behavior or 5.7.3 bug?
[20 Dec 2013 11:44]
Sveta Smirnova
Oscar, thank you for the feedback. In order to identify if 5.7 server skips records in wrong way or not we need to know which exact rows you are trying to load. Please send us excerpt of your file from the beginning to couple of lines which follow line with the "ñ" character.
[20 Dec 2013 16:55]
Oscar González
I managed to edit the file so I removed all the company confidential information. The CREATE TABLE command is: CREATE TABLE comcel.inf_app_baseline_text ( REG VARCHAR(100), CLU VARCHAR(100), DEP VARCHAR(100), LOC VARCHAR(100), MKT VARCHAR(100), AMX VARCHAR(100) ) ENGINE = innodb ROW_FORMAT = DEFAULT; And what I do is first TRUNCATE the TABLE: TRUNCATE inf_app_baseline_text; ... and then LOAD the data into the TABLE: LOAD DATA LOCAL INFILE 'C:\\NetDBLoad\\NetDBLoad\\local\\plantillas\\bl_text.csv' INTO TABLE inf_app_baseline_text FIELDS TERMINATED BY ';' LINES TERMINATED BY '\r\n' IGNORE 1 LINES; And then again, only 8,916 records loads into the table, because record #8,917 has the "ñ" character, and the LOAD DATA FILE stops loading any further records. File will be attached in the next post...
[20 Dec 2013 16:56]
Oscar González
Text CSV file for LOAD DATA LOCAL INFILE
Attachment: bl_text.csv (application/vnd.ms-excel, text), 2.82 MiB.
[20 Dec 2013 19:29]
Sveta Smirnova
Thank you for the feedback and file. Your file proceeded correctly in default SQL mode and works as you described if strict SQL mode is set. Since strict SQL mode set by default by Windows installer, most likely you have this mode too. So this is not MySQL bug.
[23 Dec 2013 14:13]
Oscar González
Sorry for retaking this subject, but I'm still concerned because I always use MySQL STRICT Mode (STRICT_ALL_TABLES) and had no problems loading the file in MySQL 5.6.10. In other words, I just mantained the same my.ini file from 5.6.10 to 5.7.3 and this same file loaded fully in 5.6.10, but now is having this problem with 'ñ' in 5.7.3, both version having the same STRICT_ALL_TABLES mode in configuration. Any other ideas?
[24 Dec 2013 14:05]
Sveta Smirnova
Thank you for the feedback. You see error in 5.7 not only because use STRICT SQL mode, but also because default character set changed from latin1 to utf8. Both factors matter. To confirm run queries SHOW VARIABLES LIKE '%char%' on both servers and compare output.