Bug #102486 | util.dumpSchema fails if any worker thread times out (error 2006)(error 2013) | ||
---|---|---|---|
Submitted: | 5 Feb 2021 3:45 | Modified: | 14 Mar 2021 20:39 |
Reporter: | Robert Roland | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Document Store: MySQL Shell | Severity: | S1 (Critical) |
Version: | 8.0.23 | OS: | Windows (10 x64) |
Assigned to: | MySQL Verification Team | CPU Architecture: | x86 (i9 9900K) |
Tags: | ERROR 2006, ERROR 2013, net_read_timeout, util.dumpInstance, util.dumpSchema, util.loadDump |
[5 Feb 2021 3:45]
Robert Roland
[5 Feb 2021 4:19]
Robert Roland
Under "How to Repeat" the command should be: util.dumpSchemas(["tablename"], "C:/dumpfolder",{ocimds:true})) or util.dumpTables("schemaname",["tablename"], "C:/dumpfolder",{ocimds:true}))
[6 Feb 2021 5:53]
Robert Roland
I also encountered MySQL server has gone away (error 2006) when using util.loadDump when trying to populate the new instance. This happened even though all of the steps I outlined above were taken. At this point I am an unsure how to proceed and have no ability work without the ability to restore these databases. I have already lost several days and it looks like there are going to more to come. I find it hard to believe that the shell's connection parameters or the util.loadDump options cannot be modified to account for a very straightforward restore of a large table created by the util.dump utility I am pretty frustrated to say the least
[6 Feb 2021 6:11]
Robert Roland
wait_timeout and interactive_timout are set to the standard 8hrs
[6 Feb 2021 6:14]
Robert Roland
the error arrives in under a minute after launching the data load so it is more likely packet size but I do have global max_allowed_packet=1073741824 everywhere. I do not know if the client utility needs that setting also and I cannot find mention of it anywhere.
[6 Feb 2021 6:39]
Robert Roland
In mysql-shell/modules/util/dump/dumper.cc 1097 // set size of max packet (~size of 1 row) we can get from server 1098 if (!co.has(mysqlshdk::db::kMaxAllowedPacket)) { 1099 const auto k_one_gb = "1073741824"; 1100 co.set(mysqlshdk::db::kMaxAllowedPacket, k_one_gb); 1101 } the dumper is clearly creating dump files with large packets in mind if the server has max_packet=1G In mysql-shell/modules/util/load/load_dump_options.cc there is no reference to packet_size (I assume sender just sends what it has and it is up to server setting to determine whether it can be received) but there are clear references to long timeouts being set: 61 m_target = get_classic_connection_options(m_base_session); 62 63 if (m_target.has(mysqlshdk::db::kLocalInfile)) { 64 m_target.remove(mysqlshdk::db::kLocalInfile); 65 } 66 m_target.set(mysqlshdk::db::kLocalInfile, "true"); 67 68 // Set long timeouts by default 69 std::string timeout = "86400000"; // 1 day in milliseconds 70 if (!m_target.has(mysqlshdk::db::kNetReadTimeout)) { 71 m_target.set(mysqlshdk::db::kNetReadTimeout, timeout); 72 } 73 if (!m_target.has(mysqlshdk::db::kNetWriteTimeout)) { 74 m_target.set(mysqlshdk::db::kNetWriteTimeout, timeout); 75 } 231 m_wait_dump_timeout_ms = wait_dump_timeout * 1000; This adds to my belief that it isn't a timeout issue. This Q&A: https://stackoverflow.com/questions/51946078/change-and-show-max-allowed-packet-of-client-... seems to clarify better that any max_packet setting for the client only matters for the dumper not the loader. So if it is true that timeouts are ruled out and I have confirmed server has global max packet of 1G: show variables like 'max_allowed_packet'; Variable_name Value max_allowed_packet 1073741824 I am at a loss as to how to restore my data created by the util.dumpSchema
[6 Feb 2021 6:56]
Robert Roland
This is mentioned in the documentation as one possible cause for server closing connection (server has gone away): "You can also encounter this error with applications that fork child processes, all of which try to use the same connection to the MySQL server. This can be avoided by using a separate connection for each child process." I was not sure if the threads created in the shell util are considered forked child processes which could cause a problem.
[6 Feb 2021 8:06]
Robert Roland
After executing set global wait_timeout=60000; I tried loading the dump again and it worked. A second dump load of 1.2TB is also almost complete without error. I don't know if this made the difference but it was the only change I made. Wait_timeout was 28800 prior to the change.
[6 Feb 2021 20:26]
Robert Roland
screenshot of error showing thread losing connection at ~25 sec elapsed
Attachment: screenshot.jpg (image/jpeg, text), 1.19 MiB.
[6 Feb 2021 20:31]
Robert Roland
I added a screenshot of the lost connection error (I did not have a text log available). After having success with changing wait_timeout to 60000 I have continued having more success throughout the last 12 hours and have now been able to restore 1.6 TB across 480 tables and 5 schema dumps. I am still unsure why changing wait_timeout from default 8hrs to 60000s made the difference but I am relieved.
[6 Feb 2021 20:41]
Robert Roland
These are all of the server timeout settings that are currently working -------------- settings ------------- #set global wait_timeout=60000; show variables like '%timeout'; Variable_name Value ------------- ----- connect_timeout 10000 delayed_insert_timeout 10000 have_statement_timeout YES innodb_flush_log_at_timeout 1 innodb_lock_wait_timeout 50 innodb_rollback_on_timeout OFF interactive_timeout 60000 lock_wait_timeout 31536000 mysqlx_connect_timeout 30 mysqlx_idle_worker_thread_timeout 60 mysqlx_interactive_timeout 28800 mysqlx_port_open_timeout 0 mysqlx_read_timeout 30 mysqlx_wait_timeout 28800 mysqlx_write_timeout 60 net_read_timeout 360000 net_write_timeout 360000 rpl_stop_slave_timeout 31536000 slave_net_timeout 60 wait_timeout 60000
[6 Feb 2021 22:01]
Robert Roland
screenshot2 - Error occurred again even after wait_timout=60000
Attachment: screenshot2.jpg (image/jpeg, text), 1.40 MiB.
[6 Feb 2021 22:04]
Robert Roland
Unfortunately the error did come up again after having some good success with wait_timeout=60000. I attached a screenshot that show the error occurred on chunk 80 of the 34th table after 21G of data successfully loaded in 20 min.
[7 Feb 2021 20:39]
Robert Roland
I continued to experiment with timeout settings. I continued to see errors of this type: MySQL Error 2013 (HY000): Lost connection to MySQL server during query: LOAD DATA LOCAL INFILE shell log excerpt: 2021-02-06 05:44:32: Info: Using credential store helper: c:\program files\mysql\mysql shell 8.0\bin\mysql-secret-store-windows-credential.exe 2021-02-06 05:44:32: Info: Setting Python home to 'c:\program files\mysql\mysql shell 8.0\lib\Python3.7' 2021-02-06 05:44:32: Info: Loading startup files... 2021-02-06 05:44:32: Info: Loading plugins... 2021-02-06 05:44:32: Info: Dump \\?\e:\dumps\soa is complete 2021-02-06 05:44:32: Info: Logging load progress to \\?\e:\dumps\soa\load-progress.0273c489-6824-11eb-a362-6e33b4eb80bf.json --------------------------------------------------------------------- 2021-02-06 05:44:58: Error: [Worker002] soa@ilec0215@@0.tsv.zst: MySQL Error 2013 (HY000): Lost connection to MySQL server during query: LOAD DATA LOCAL INFILE '\\\\?\\e:\\dumps\\soa\\soa@ilec0215@@0.tsv.zst' REPLACE INTO TABLE `soa`.`ilec0215` CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY ' ' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED BY '\n' (`ID`, `Observation_Year`, `Common_Company_Indicator`, `Preferred_Indicator`, `Gender`, `Smoker_Status`, `Insurance_Plan`, `Issue_Age`, `Duration`, `Attained_Age`, `Age_Basis`, `Face_Amount_Band`, `Issue_Year`, `Number_Of_Preferred_Classes`, `Preferred_Class`, `SOA_Anticipated_Level_Term_Period`, `SOA_Guaranteed_Level_Term_Period`, `SOA_Post_Level_Term_Indicator`, `Select_Ultimate_Indicator`, `Number_of_Deaths`, `Death_Claim_Amount`, `Policies_Exposed`, `Amount_Exposed`, `Expected_Death_QX7580E_by_Amount`, `Expected_Death_QX2001VBT_by_Amount`, `Expected_Death_QX2008VBT_by_Amount`, `Expected_Death_QX2008VBTLU_by_Amount`, `Expected_Death_QX2015VBT_by_Amount`, `Expected_Death_QX7580E_by_Policy`, `Expected_Death_QX2001VBT_by_Policy`, `Expected_Death_QX2008VBT_by_Policy`, `Expected_Death_QX2008VBTLU_by_Policy`, `Expected_Death_QX2015VBT_by_Policy`, `Contestable`, `Age_Group`) 2021-02-06 05:44:58: Error: Aborting load.. -------------------------------------------------------------------- I finally had some consistent success with a large schema load (schema dump characteristics: 339 chunks, 332M rows, 56.8G) which loaded without any warnings or errors 1hr and 39min. The settings I used were: SHOW SESSION VARIABLES LIKE "%timeout%" Variable_name Value ------------- ----- connect_timeout 99999 delayed_insert_timeout 10000 have_statement_timeout YES innodb_flush_log_at_timeout 1 innodb_lock_wait_timeout 50 innodb_rollback_on_timeout OFF interactive_timeout 99999 lock_wait_timeout 31536000 mysqlx_connect_timeout 99999 mysqlx_idle_worker_thread_timeout 3600 mysqlx_interactive_timeout 99999 mysqlx_port_open_timeout 0 mysqlx_read_timeout 99999 mysqlx_wait_timeout 99999 mysqlx_write_timeout 99999 net_read_timeout 360000 net_write_timeout 360000 rpl_stop_slave_timeout 31536000 slave_net_timeout 60 wait_timeout 99999 I started the shell as follows: mysqlsh.exe --mc --interactive --file restore.js And "restore.js" was as follows: \connect root@localhost:3306?connect-timeout=99999 shell.options.setPersist("dba.restartWaitTimeout",99999) shell.options.setPersist("dba.gtidWaitTimeout",99999) util.loadDump("e:/dumps/schemaname",{threads:5,skipBinlog:true,analyzeTables:"on"}) I cannot pinpoint the exact reason why this worked. A few things I considered were: 1) connecting using ssl vs mysql classic protocol (perhaps a default timeout that I could not directly control was different for each type) 2) setting timeouts to arbitrarily large values might be ignored if they were above a certain number such as smallint or a length of 5 digits (i.e. 99999 was the max timeout the server would recognize) 3) since I start the shell using --interactive I was sure to change the setting for interactive_timeout rather than wait_timeout My current thought is that the connect_timeout and interactive_timeout (wait_timeout) are the most important. This already assumes that these other settings have been set (to high values) in my.ini already: [client] local_infile=1 max_allowed_packet=1073741824 net_read_timeout=360000 net_write_timeout=360000 wait_timeout=99999 net_buffer_length=1048576 [mysqldump] max_allowed_packet=1073741824 net_read_timeout=360000 net_write_timeout=360000 net_buffer_length=1048576 [mysql] local-infile=1 max_allowed_packet=1073741824 net_read_timeout=360000 net_write_timeout=360000
[7 Feb 2021 21:21]
Robert Roland
one more thing to note is that I found I could make it further through the loading of chunks if the server was just restarted (and then I immediately set global timeout settings as previously noted)
[8 Feb 2021 19:36]
Robert Roland
I am happy to report that after 7 days of effort I have over 95% of my instance recovered. I continue to have an intermittent util.loadDump thread that loses connection but it is less frequent using the settings I described earlier. Since the util.loadDump has an ability to resume where it left off after failing, you are able to eventually get the load to complete if you continually monitor the progress and resume upon failure.
[9 Feb 2021 19:25]
Robert Roland
All of the instance data has been restored except for 1 large table which still has the intermittent MySQL Error 2013 (HY000): Lost connection to MySQL server during query: LOAD DATA LOCAL INFILE Sometimes as many as 10 or 12 chunks get processed before the error occurs and sometimes it almost immediate as soon as LOAD DATA is invoked. All timeouts and packet size settings are as previously described. Here is verbose shell log: Copyright (c) 2018, 2021, Oracle and/or its affiliates. All rights reserved. 2021-02-09 19:16:58: Debug2: Exit code: 0 2021-02-09 19:16:58: Info: Using credential store helper: c:\program files\mysql\mysql shell 8.0\bin\mysql-secret-store-windows-credential.exe 2021-02-09 19:16:58: Info: Setting Python home to 'c:\program files\mysql\mysql shell 8.0\lib\Python3.7' 2021-02-09 19:16:58: Info: Loading startup files... 2021-02-09 19:16:58: Debug: Searching for plugins at 'c:\program files\mysql\mysql shell 8.0\lib\mysqlsh\plugins' 2021-02-09 19:16:58: Info: Loading plugins... 2021-02-09 19:16:58: Debug: - c:\program files\mysql\mysql shell 8.0\lib\mysqlsh\plugins\plugins_plugin\init.py 2021-02-09 19:16:59: Debug: The 'plugins' extension object has been registered as a global object. 2021-02-09 19:16:59: Debug: The 'info' function has been registered into the 'plugins' extension object. 2021-02-09 19:16:59: Debug: The 'version' function has been registered into the 'plugins' extension object. 2021-02-09 19:16:59: Debug: The 'about' function has been registered into the 'plugins' extension object. 2021-02-09 19:16:59: Debug: The 'list' function has been registered into the 'plugins' extension object. 2021-02-09 19:16:59: Debug: The 'install' function has been registered into the 'plugins' extension object. 2021-02-09 19:16:59: Debug: The 'uninstall' function has been registered into the 'plugins' extension object. 2021-02-09 19:16:59: Debug: The 'update' function has been registered into the 'plugins' extension object. 2021-02-09 19:16:59: Debug: The 'details' function has been registered into the 'plugins' extension object. 2021-02-09 19:16:59: Debug: Using color mode 3 2021-02-09 19:16:59: Debug: Using prompt theme file c:\program files\mysql\mysql shell 8.0\share\mysqlsh\prompt\prompt_256.json 2021-02-09 19:16:59: Debug2: Invoking helper 2021-02-09 19:16:59: Debug2: Command line: c:\program files\mysql\mysql shell 8.0\bin\mysql-secret-store-windows-credential.exe get 2021-02-09 19:16:59: Debug2: Input: {"SecretType":"password","ServerURL":"root@localhost:3306"} 2021-02-09 19:16:59: Debug2: Output: {"SecretType":"password","ServerURL":"root@localhost:3306","Secret":"****"} 2021-02-09 19:16:59: Debug2: Exit code: 0 2021-02-09 19:16:59: Debug: Loading DDL and Data from 'e:/dumps/MyDBase' using 4 threads. 2021-02-09 19:16:59: Debug: Opening dump... 2021-02-09 19:16:59: Info: Dump \\?\e:\dumps\MyDBase is complete 2021-02-09 19:16:59: Debug: Target is MySQL 8.0.23. Dump was produced from MySQL 8.0.23 2021-02-09 19:16:59: Info: Load progress file detected. Load will be resumed from where it was left, assuming no external updates were made. 2021-02-09 19:16:59: Debug: You may enable the 'resetProgress' option to discard progress for this MySQL instance and force it to be completely reloaded. 2021-02-09 19:16:59: Info: Resuming load, last loaded 0 bytes 2021-02-09 19:16:59: Debug: Scanning contents of schema 'MyDBase' 2021-02-09 19:16:59: Debug: MyDBase has 1 tables 2021-02-09 19:16:59: Debug: MyDBase has 0 views 2021-02-09 19:16:59: Debug: All metadata for schema `MyDBase` was scanned 2021-02-09 19:16:59: Debug: All metadata for dump was scanned 2021-02-09 19:16:59: Debug: Executing common preamble SQL 2021-02-09 19:16:59: Debug: Begin loading table DDL 2021-02-09 19:16:59: Debug2: Got event READY from worker 2 2021-02-09 19:16:59: Debug: Schema DDL for 'MyDBase' (DONE) 2021-02-09 19:16:59: Debug: Processing table DDL for `MyDBase`.`mytable` (placeholder=0) 2021-02-09 19:16:59: Debug2: Got event READY from worker 3 2021-02-09 19:16:59: Debug2: Got event READY from worker 0 2021-02-09 19:16:59: Debug2: Got event READY from worker 1 2021-02-09 19:16:59: Debug: worker2 will load DDL file \\?\e:\dumps\MyDBase\MyDBase@mytable.sql for table `MyDBase`.`mytable` 2021-02-09 19:16:59: Debug2: Got event TABLE_DDL_START from worker 2 2021-02-09 19:16:59: Debug: worker2 done 2021-02-09 19:16:59: Debug2: Got event TABLE_DDL_END from worker 2 2021-02-09 19:16:59: Debug: End loading table DDL 2021-02-09 19:16:59: Debug: Begin loading view DDL 2021-02-09 19:16:59: Debug2: Got event READY from worker 3 2021-02-09 19:16:59: Debug2: Got event READY from worker 0 2021-02-09 19:16:59: Debug2: Got event READY from worker 1 2021-02-09 19:16:59: Debug2: Got event READY from worker 2 2021-02-09 19:16:59: Debug: End loading view DDL 2021-02-09 19:16:59: Debug2: Got event READY from worker 3 2021-02-09 19:16:59: Debug: Table data for 'MyDBase'.'mytable'/-1 (INTERRUPTED) 2021-02-09 19:16:59: Debug: Scheduling chunk for table MyDBase.mytable (\\?\e:\dumps\MyDBase\MyDBase@mytable.tsv.zst) - worker3 2021-02-09 19:16:59: Debug: Loading data for `MyDBase`.`mytable` (chunk -1) 2021-02-09 19:16:59: Debug2: Got event READY from worker 0 2021-02-09 19:16:59: Debug: Analyzing table `MyDBase`.`mytable` 2021-02-09 19:16:59: Debug: worker3 will load chunk \\?\e:\dumps\MyDBase\MyDBase@mytable.tsv.zst for table `MyDBase`.`mytable` 2021-02-09 19:16:59: Debug2: Got event READY from worker 1 2021-02-09 19:16:59: Debug: worker0 will analyze table `MyDBase`.`mytable` 2021-02-09 19:16:59: Debug2: Got event READY from worker 2 2021-02-09 19:16:59: Debug2: Got event LOAD_START from worker 3 2021-02-09 19:16:59: Debug: Analyzing table `MyDBase`.`mytable` 2021-02-09 19:16:59: Debug2: Got event ANALYZE_START from worker 0 2021-02-09 19:16:59: Debug: worker0 done 2021-02-09 19:16:59: Debug2: Got event ANALYZE_END from worker 0 2021-02-09 19:17:02: Error: [Worker003] MyDBase@mytable.tsv.zst: MySQL Error 2013 (HY000): Lost connection to MySQL server during query: LOAD DATA LOCAL INFILE '\\\\?\\e:\\dumps\\MyDBase\\MyDBase@mytable.tsv.zst' REPLACE INTO TABLE `MyDBase`.`mytable` CHARACTER SET 'utf8mb4' FIELDS TERMINATED BY ' ' ESCAPED BY '\\' LINES STARTING BY '' TERMINATED BY '\n' (`ID`, `FULLNAME`, `wcount`, `num`, `num2`, `num3`, `num4`, `num5`, `num6`) 2021-02-09 19:17:02: Debug2: Got event FATAL_ERROR from worker 3 2021-02-09 19:17:02: Error: Aborting load... 2021-02-09 19:17:02: Debug: Import done 2021-02-09 19:17:02: Debug: Waiting on worker threads... 2021-02-09 19:17:02: Debug: All worker threads stopped 2021-02-09 19:17:02: Debug: There was no remaining data left to be loaded. 2021-02-09 19:17:02: Debug: 1 errors and 0 warnings messages were reported during the load. 2021-02-09 19:17:02: Debug: Util.loadDump: Error loading dump (RuntimeError)
[9 Feb 2021 21:29]
Robert Roland
As previously mentioned 1 last table has not been able to complete restoring all chunks using util.loadDump and has been exiting fairly quickly after processing 0 to 3 chunks (rarely up to 10). I tried changing the shell connect option connect-timeout to 0: \connect root@localhost:3306?connect-timeout=0 I completed 108 chunks (65M rows) before again receiving ERROR 2013 Lost connection during query LOAD DATA. It was an unexpectedly large amount of progress that I thought important to report.
[9 Feb 2021 21:31]
Robert Roland
adjusted tags to include error 2013
[11 Feb 2021 19:03]
Robert Roland
I am still unable to fully restore 1 table due to the shell aborting the connection (Error 2013) after just a few seconds regardless of any settings previously mentioned. I have spent a good deal of time looking through much of the python code the shell uses without any obvious place where a hidden setting overrides the server or command line settings. It looks like the shell may be aborting rather than the server closing the connection. If anyone has any insight as to how to proceed (even 1 chunk at a time) I would be grateful for a recommendation that works.
[11 Feb 2021 19:03]
Robert Roland
I am still unable to fully restore 1 table due to the shell aborting the connection (Error 2013) after just a few seconds regardless of any settings previously mentioned. I have spent a good deal of time looking through much of the python code the shell uses without any obvious place where a hidden setting overrides the server or command line settings. It looks like the shell may be aborting rather than the server closing the connection. If anyone has any insight as to how to proceed (even 1 chunk at a time) I would be grateful for a recommendation that works.
[14 Feb 2021 1:40]
Robert Roland
when I put the server under a load by executing other update queries on a different connection I was able to get over 320 more chunks added without the error I then tried loading more chunks after the server became idle and the loadDump failed right away I then put the server under load and was able to get over 150 more chunks before I got the error I tried this back and forth several times yesterday and the loadDump made it much, much farther if the server was under load Maybe this points to a buffer/flow rate issue The hardware for this server is win 10 x64 i9 9900k 16 logical cores at 5Ghz 64G Memory (Buffer Pool 54G / 64 pool instances) 2 TB nvme ssd raid0, the iblog, undo logs and ibdata are all located on 3 different separate local ssds and the connection for the shell is via localhost using classic mysql protocol also binlog is disabled - this should be adequate to digest a single threaded load data infile
[27 Feb 2021 4:38]
MySQL Verification Team
Hi, Just to inform on the progress. I'm failing to reproduce this on a normal linux server. Testing on windows now and I'm not seeing the problem here neither but I might be doing something wrong. I assume you are loading the dump you created from shell util.dumpSchema you created in bug 102485 ? so a single table 10M rows? Can you share the table structure? all best Bogdan
[14 Mar 2021 20:39]
MySQL Verification Team
Hi, I cannot reproduce this