Bug #4382 | Replicated LOAD DATA INFILE blocks queries in other dbs on slave | ||
---|---|---|---|
Submitted: | 2 Jul 2004 15:20 | Modified: | 30 Aug 2004 21:20 |
Reporter: | [ name withheld ] | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 4.0.20 | OS: | Linux (RedHat Linux (MySQL rpms)) |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[2 Jul 2004 15:20]
[ name withheld ]
[2 Jul 2004 15:35]
[ name withheld ]
sorry ID 6 was not another query from the cgi. It was an update from the backend process to tables in the nagios db.
[6 Jul 2004 19:26]
[ name withheld ]
FWIW this problem also exists in version 4.0.16.
[7 Jul 2004 18:03]
Guilhem Bichot
Hi, To better study this bug (which I could not reproduce yet), I would like to see the blocked query not truncated. You'll achieve this with SHOW FULL PROCESSLIST. Could you please copy here the output of SHOW FULL PROCESSLIST at the moment the slave experiences the "system lock"? If this is confidential data, you can simply email it to guilhem@mysql.com . Thanks. Guilhem
[7 Jul 2004 18:23]
[ name withheld ]
Ok. Since the LDIs in question are large and run overnight, I'll set something up to get the full processlist output from each of the affected servers when the LDIs run and get back to you.
[7 Jul 2004 18:33]
[ name withheld ]
One other point that may be relevant is that the table the LDI is happening on is a very large MyISAM table with ~83.7 million rows. The .MYD file is 2.4Gb and the .MYI is 8Gb.
[8 Jul 2004 15:31]
[ name withheld ]
Below is a set of show full processlist (once a minute) from one of the servers that experiences this. 8206 is the LDI. I suppose this shows that 50028 and 50036 are blocking on the stalled update of 6. But then the question becomes why is 6 stuck in an end state for so long while the LDI is occuring. Thu Jul 8 05:42:00 EDT 2004 Id User Host db Command Time State Info 6 nagios park:55000 nagios Sleep 4 NULL 404 system user NULL Connect 1192697 Waiting for master to send event NULL 8206 system user mailer Connect 6220 NULL NULL 50021 root localhost NULL Query 0 NULL show full processlist Thu Jul 8 05:43:00 EDT 2004 Id User Host db Command Time State Info 6 nagios park:55000 nagios Query 4 end UPDATE programstatus SET last_update=FROM_UNIXTIME(1089279776),program_start=FROM_UNIXTIME(1087647491),nagios_pid='2352',daemon_mode='1',last_command_check=FROM_UNIXTIME(1089279776),last_log_rotation=FROM_UNIXTIME(1089259200),enable_notifications='1',execute_service_checks='1',accept_passive_service_checks='1',enable_event_handlers='1',obsess_over_services='0',enable_flap_detection='0',enable_failure_prediction='1',process_performance_data='0' 404 system user NULL Connect 1192757 Waiting for master to send event NULL 8206 system user mailer Connect 6280 NULL NULL 50027 root localhost NULL Query 0 NULL show full processlist Thu Jul 8 05:44:00 EDT 2004 Id User Host db Command Time State Info 6 nagios park:55000 nagios Query 64 end UPDATE programstatus SET last_update=FROM_UNIXTIME(1089279776),program_start=FROM_UNIXTIME(1087647491),nagios_pid='2352',daemon_mode='1',last_command_check=FROM_UNIXTIME(1089279776),last_log_rotation=FROM_UNIXTIME(1089259200),enable_notifications='1',execute_service_checks='1',accept_passive_service_checks='1',enable_event_handlers='1',obsess_over_services='0',enable_flap_detection='0',enable_failure_prediction='1',process_performance_data='0' 404 system user NULL Connect 1192817 Waiting for master to send event NULL 8206 system user mailer Connect 6340 NULL NULL 50028 nagios park:47964 nagios Query 28 System lock SELECT UNIX_TIMESTAMP(last_update) AS last_update, UNIX_TIMESTAMP(program_start) AS program_start, nagios_pid, daemon_mode, UNIX_TIMESTAMP(last_command_check) AS last_command_check, UNIX_TIMESTAMP(last_log_rotation) AS last_log_rotation, enable_notifications, execute_service_checks, accept_passive_service_checks, enable_event_handlers, obsess_over_services, enable_flap_detection, enable_failure_prediction, process_performance_data FROM programstatus 50030 root localhost NULL Query 0 NULL show full processlist [snip] Thu Jul 8 05:48:00 EDT 2004 Id User Host db Command Time State Info 6 nagios park:55000 nagios Query 304 end UPDATE programstatus SET last_update=FROM_UNIXTIME(1089279776),program_start=FROM_UNIXTIME(1087647491),nagios_pid='2352',daemon_mode='1',last_command_check=FROM_UNIXTIME(1089279776),last_log_rotation=FROM_UNIXTIME(1089259200),enable_notifications='1',execute_service_checks='1',accept_passive_service_checks='1',enable_event_handlers='1',obsess_over_services='0',enable_flap_detection='0',enable_failure_prediction='1',process_performance_data='0' 404 system user NULL Connect 1193057 Waiting for master to send event NULL 8206 system user mailer Connect 6580 NULL NULL 50028 nagios park:47964 nagios Query 268 System lock SELECT UNIX_TIMESTAMP(last_update) AS last_update, UNIX_TIMESTAMP(program_start) AS program_start, nagios_pid, daemon_mode, UNIX_TIMESTAMP(last_command_check) AS last_command_check, UNIX_TIMESTAMP(last_log_rotation) AS last_log_rotation, enable_notifications, execute_service_checks, accept_passive_service_checks, enable_event_handlers, obsess_over_services, enable_flap_detection, enable_failure_prediction, process_performance_data FROM programstatus 50036 nagios park:47965 nagios Query 16 System lock SELECT UNIX_TIMESTAMP(last_update) AS last_update, UNIX_TIMESTAMP(program_start) AS program_start, nagios_pid, daemon_mode, UNIX_TIMESTAMP(last_command_check) AS last_command_check, UNIX_TIMESTAMP(last_log_rotation) AS last_log_rotation, enable_notifications, execute_service_checks, accept_passive_service_checks, enable_event_handlers, obsess_over_services, enable_flap_detection, enable_failure_prediction, process_performance_data FROM programstatus 50037 root localhost NULL Query 0 NULL show full processlist [this continues on for five more minutes with more selects backing up until] Thu Jul 8 05:53:00 EDT 2004 Id User Host db Command Time State Info 6 nagios park:55000 nagios Sleep 4 NULL 404 system user NULL Connect 1193357 Waiting for master to send event NULL 8206 system user mailer Connect 6258 NULL NULL 50053 root localhost NULL Query 0 NULL show full processlist which suggests that the LDI has changed from copying the temp file to loading or some such. I'll know for sure when I installed 4.0.21 with the patch you made for me in another bug report. :) At this point the blocked update completes along with the selects that had queued up behind it. This is a fairly decent machine (2x1.8Ghz Xeon with HyperThreading on, 2Gb RAM, RedHat 7.3) so I wouldn't expect the LDI to bog the system down so much that updates on another database would block on it. Please let me know if I can provide any additional info. Thanks,
[8 Jul 2004 22:51]
Guilhem Bichot
Thanks for all this interesting info :) I'm looking at what could cause 'end' to last so long.
[8 Jul 2004 23:09]
Guilhem Bichot
Question: does the lock happen only on slave or do you see the same problem on master? This question is only relevant if the master also has to process LDI and UPDATE on another table at the same time. Does the slave run with --log-bin and --log-slave-updates? Thank you.
[8 Jul 2004 23:30]
Guilhem Bichot
More questions. As it's not clear what the slave SQL thread is doing when it is blocking others and when it's not (due to the deficiency in SHOW PROCESSLIST which you already reported separately), I would need the same listing (SHOW FULL PROCESSLIST) as you just provided, but please together with a SHOW SLAVE STATUS at the moments of the SHOW FULL PROCESSLIST. Then I may ask you to please tell me what is the event in the master's binlog which corresponds to the Exec_master_log_pos positions we'll see in SHOW SLAVE STATUS (you'll be able to tell by using "mysqlbinlog -j<the_position> <the_master's_binlog>" | head -n20). This way I will be able to know that when LDI was blocking others it was in Append_block, or in Exec_load, which will give a hint. Thank you.
[8 Jul 2004 23:49]
[ name withheld ]
I'm not sure on the master. I'll setup the same processlist logging to check the overnight LDI. It's possible that some updates are occuring on other dbs at the time but it's not as regular as elsewhere. It's probably a good idea to go into more detail on my environment at this point. I was hoping I wouldn't have to deal with trying to explain it in this ticket. :) In short: mattapan: 4.0.16 running on 2x2.4Ghz Xeon with HyperThreading, 4Gb RAM, RedHat 9 central: 4.0.16 running on 2x2.4Ghz Xeon with HyperThreading, 4Gb RAM, RedHat 9 park: 4.0.20 running on 2x1.8Ghz Xeon with HyperThreading, 2Gb RAM, RedHat 7.3 mattapan is master for 2 dbs (and is the master of the db where the LDI in question happens), and slaves 4 dbs from central. It has log-bin on but log-slave-updates off. central is master for 23 databases, slaves the 2 dbs from mattapan. It has both log-bin and log-slave-updates on. park is master for 1 database (the nagios db) and slaves 23 dbs from central. It has log-bin on and log-slave-updates off. I've updated the logging on central and park to include show slave status with each corresponding show full processlist. Based on the next overnight occurence of this problem I'll include the mysqlbinlog output you requested in another update. I should note that I think I'm also seeing this behavior on central as well, but it's been harder to pin down. We have occasional periods where reads timeout in the same way that we believe correspond to it's replication of the LDI into the large db in question from mattapan. However it's not clear that an update is occuring at the same time that might be stuck in an end state. Hopefully I can find one of those as well. As soon as I have the additional info I'll add another comment. Thanks
[30 Jul 2004 21:20]
Guilhem Bichot
Waiting for more info (SHOW FULL PROCESSLIST + SHOW SLAVE STATUS + binlogs at the moment of hang).
[14 Feb 2005 22:54]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".