Bug #99080 Load Data from .csv file not possible with TIMESTAMP
Submitted: 26 Mar 2020 11:21 Modified: 26 Mar 2020 19:05
Reporter: yoloname yolonametoo Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server Severity:S2 (Serious)
Version:8.0.19 / 5.7.21 OS:Windows (Windows 10 Pro 1909)
Assigned to: CPU Architecture:x86 (i7-8550U)

[26 Mar 2020 11:21] yoloname yolonametoo
Description:
Hello my friendly developer,

my name is Justin Räder, i am a student at TU Darmstadt in Germany, and i am about to write my bachelor thesis. therefore i downloaded a 100GB file of .csv files i want to analyse (which resulted in 380GB extracted) from ghtorrent.org including a short instruction to import it to a mysql database. there is one point where i should execute 

LOAD DATA INFILE '/full/path/to/users.csv' INTO TABLE users FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';

which stops with an Error:

ERROR 1292 (22007): Incorrect datetime value: '2010-03-28 02:37:50' for column 'created_at' at row 36878

If i just delete this one line in my users.csv file, i get a different Error:

ERROR 1292 (22007): Incorrect datetime value: '2011-03-27 02:14:29' for column 'created_at' at row 63327

So there are different values in my dataset, which can not be imported. But as mentioned in 
https://dev.mysql.com/doc/refman/8.0/en/datetime.html
these dates should be valid TIMESTAMP dates. 
I also copied the String into a Hex-Editor, to look if there is some bad type of space, but that's not the case.
--------------------------------------

The schema of the table is:

CREATE TABLE IF NOT EXISTS `ghtorrent`.`users` (
  `id` INT(11) NOT NULL AUTO_INCREMENT COMMENT '',
  `login` VARCHAR(255) NOT NULL COMMENT '',
  `company` VARCHAR(255) NULL DEFAULT NULL COMMENT '',
  `created_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '',
  `type` VARCHAR(255) NOT NULL DEFAULT 'USR' COMMENT '',
  `fake` TINYINT(1) NOT NULL DEFAULT '0' COMMENT '',
  `deleted` TINYINT(1) NOT NULL DEFAULT '0' COMMENT '',
  `long` DECIMAL(11,8) COMMENT '',
  `lat` DECIMAL(10,8) COMMENT '',
  `country_code` CHAR(3) COMMENT '',
  `state` VARCHAR(255) COMMENT '',
  `city` VARCHAR(255) COMMENT '',
  `location` VARCHAR(255) NULL DEFAULT NULL COMMENT '',
  PRIMARY KEY (`id`)  COMMENT '')
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;

I will upload the complete schema.sql and a shortened part of the users.csv (the complete file is 2GB). 
I executed this with the zipped binary files for Windows (not the installer), in both versions 8.0.19 and 5.7.21.
All of this code and the data is open source from ghtorrent.org
I used the Windows CLI or PowerShell to execute the commands.

How to repeat:
I tried to manipulate the data within my users.csv, and this is the result:

--Manipulate Day
'2010-03-28 02:37:50'  --> Error    --> origin data, row 36878
'2010-03-27 02:37:50'  --> success
'2010-03-29 02:37:50'  --> success

-- Manipulate Month
'2010-04-28 02:37:50'  --> success
'2010-02-28 02:37:50'  --> success

--Manipulate hour
'2010-03-28 03:37:50'  --> success
'2010-03-28 01:37:50'  --> success

--manipulate minute
'2010-03-28 02:38:50'  --> Error
'2010-03-28 02:36:50'  --> Error

--Manipulate second
'2010-03-28 02:37:51'  --> Error
'2010-03-28 02:37:49'  --> Error

Because i have a little fun for testing, i also tested the second Error. And i got the same behaviour.

--Manipulate Month
'2011-03-27 02:14:29'  --> Error   --> origin data, row 63327
'2011-03-27 02:14:29'  --> Success
'2011-03-27 02:14:29'  --> Success

--Manipulate Day
'2011-03-28 02:14:29'  --> Success
'2011-03-26 02:14:29'  --> Success

--Manipulate hour
'2011-03-27 03:14:29'  --> Success
'2011-03-27 01:14:29'  --> Success

--Manipulate minute
'2011-03-27 02:15:29'  --> Error
'2011-03-27 02:13:29'  --> Error

--Manipulate second
'2011-03-27 02:14:30'  --> Error
'2011-03-27 02:14:28'  --> Error

Suggested fix:
 I think, there is some bug in the parsing and converting String to Date.
[26 Mar 2020 11:29] yoloname yolonametoo
id 38870 and id 70035 are the bad rows

Attachment: users_small.csv (application/vnd.ms-excel, text), 1.83 MiB.

[26 Mar 2020 11:30] yoloname yolonametoo
used SQL schema

Attachment: schema.sql (application/octet-stream, text), 18.39 KiB.

[26 Mar 2020 13:16] MySQL Verification Team
Are you affected by DST time?. If yes then that the problem.
[26 Mar 2020 18:48] yoloname yolonametoo
oh thanks, that really solved my problem. By setting my system time to UTC without automatically switch to DST time, made it possible to import the "dead hour".

problem solved, case can be closed :)
[26 Mar 2020 19:05] MySQL Verification Team
Thank you for the feedback.