| Bug #93708 | Page Cleaner will sleep for long time if clock changes | ||
|---|---|---|---|
| Submitted: | 20 Dec 2018 20:10 | Modified: | 27 May 2019 13:25 |
| Reporter: | Marcelo Altmann (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: InnoDB storage engine | Severity: | S3 (Non-critical) |
| Version: | 5.7.24, 8.0.13 | OS: | Linux |
| Assigned to: | CPU Architecture: | Any | |
[21 Dec 2018 1:25]
Marcelo Altmann
On further analysis, looks like os_event_wait_time_low->os_event::wait_time_low->ut_gettimeofday is returning previous saved time (in the future). This affects page cleaner in general, as it won't flush any pages while coordinator is in os_event::timed_wait->pthread_cond_timedwait() When talking about os_event_wait_time_low, looks like there are other 10 places where it can be potentially affected by the same issue: 1 buf0flu.cc pc_sleep_if_needed 2689 return(os_event_wait_time_low(buf_flush_event, 2 os0event.h os_event_wait_time 133 #define os_event_wait_time(e, t) os_event_wait_time_low((e), (t), 0) 3 lock0wait.cc DECLARE_THREAD 501 os_event_wait_time_low(event, 1000000, sig_count); 4 os0event.cc os_event_wait_time_low 507 os_event_wait_time_low( 5 row0merge.cc row_merge_read_clustered_index 2574 os_event_wait_time_low(fts_parallel_sort_event, 6 row0merge.cc row_merge_build_indexes 4473 os_event_wait_time_low( 7 srv0srv.cc DECLARE_THREAD 1585 os_event_wait_time_low(srv_monitor_event, 5000000, sig_count); 8 srv0srv.cc DECLARE_THREAD 1751 os_event_wait_time_low(srv_error_event, 1000000, sig_count); 9 srv0srv.cc srv_purge_coordinator_suspend 2679 ret = os_event_wait_time_low( a ut0wqueue.cc ib_wqueue_timedwait 158 error = os_event_wait_time_low(wq->event,
[21 Dec 2018 13:14]
MySQL Verification Team
Hi, I have managed to repeat this behaviour. The only missing step in the How-To-Repeat paragraph is that massive DML have to be run during changing of the system clock. Actually, there are 6 (six) steps needed to definitely repeat the behaviour: 1. Start MySQL 2. Run massive number of DMLs on some large table 3. During those DMLs change time: date $(date +%m%d%H%M%Y.%S -d '50 min ago') 4. After those DMLs, also change time: date $(date +%m%d%H%M%Y.%S -d '50 min ago') 5. Stop MySQL 6. Read the error log However, this is not a bug, because changing the system time should not be done during running of the server. However, this needs to be documented, hence I change this to the "Documentation" bug.
[21 Dec 2018 13:19]
Laurynas Biveinis
This is a bug - cleaner coordinator code should use monotonic clock
[21 Dec 2018 13:23]
MySQL Verification Team
Laurinas, I agree with you. Cleaner coordinator should use a monotonic clock. But, this makes it into a feature request, so I set it to the feature request.
[21 Dec 2018 13:58]
Marcelo Altmann
This can be repeated with the 3 steps I mentioned. Time passed to os_event::timed_wait is based on before the time change, doesn't matter if you have pages to flush or not. > However, this is not a bug, because changing the system time should not be done during running of the server. With all due respect, I don't agree with the above sentence. What about daylight saving and NTP?
[21 Dec 2018 14:08]
Jeff Leuschner
We have upgraded to 5.7.24 and have numerous customers and are hitting this at multiple sites. This is critical to us and our customers, so looking for a fix or a work-around in short order if possible.
[21 Dec 2018 17:49]
MySQL Verification Team
Hello Mr. Leschner, If you wish to escalate this problem, you or anyone of your customers should create the appropriate ticket in our support system.
[21 Dec 2018 18:07]
Marcelo Altmann
This issue doesn't seem like a feature request. It's rather a BUG. clock can drift for various reasons and a simple correction made by NTP can freeze InnoDB page_cleaner for as many seconds/minutes the clock was adjusted.
[25 Dec 2018 18:34]
MySQL Verification Team
Back to original severity reported.
[27 May 2019 13:25]
Daniel Price
Posted by developer: Fixed as of the upcoming 5.7.27, 8.0.17 release, and here's the changelog entry: Manually changing the system time while the MySQL server was running caused page cleaner thread delays. Thank you for the bug report.
[27 May 2019 13:34]
MySQL Verification Team
Thanks for the feedback, Daniel ........

Description: Page cleaner will sleep for a long time in case you adjust your clock backward, for example, if ntp adjusts the clock or when you exit summer time. Error log: 2018-12-20T15:26:28.981196Z 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool 2018-12-20T15:26:28.981951Z 0 [Note] InnoDB: Buffer pool(s) dump completed at 181220 15:26:28 2018-12-20T15:26:29.181297Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:27:29.412452Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:28:29.636237Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:29:31.497178Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:30:34.856757Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:31:38.337117Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:32:41.654854Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:33:45.085653Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:34:48.594154Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:35:52.294568Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:36:55.932855Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:37:59.590748Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:39:03.387422Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:40:07.109075Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:41:10.742600Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:42:14.363348Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:43:18.191026Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:44:21.817929Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool 2018-12-20T15:45:25.487459Z 0 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool Thread stack: [Switching to thread 3 (Thread 0x7f1dd2897700 (LWP 3570))] #0 0x00007f1decbb8d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f1decbb8d42 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000fce162 in os_event::timed_wait (this=this@entry=0x393e528, abstime=abstime@entry=0x7f1dd2896b70) at /export/home/pb2/build/sb_0-30854123-1538633024.42/rpm/BUILD/mysql-5.7.24/mysql-5.7.24/storage/innobase/os/os0event.cc:285 #2 0x0000000000fce391 in os_event::wait_time_low (this=0x393e528, time_in_usec=<optimized out>, reset_sig_count=<optimized out>, reset_sig_count@entry=2) at /export/home/pb2/build/sb_0-30854123-1538633024.42/rpm/BUILD/mysql-5.7.24/mysql-5.7.24/storage/innobase/os/os0event.cc:412 #3 0x0000000000fce5fa in os_event_wait_time_low (event=<optimized out>, time_in_usec=<optimized out>, reset_sig_count=reset_sig_count@entry=2) at /export/home/pb2/build/sb_0-30854123-1538633024.42/rpm/BUILD/mysql-5.7.24/mysql-5.7.24/storage/innobase/os/os0event.cc:517 #4 0x000000000113b19f in pc_sleep_if_needed (sig_count=2, next_loop_time=1545320288739) at /export/home/pb2/build/sb_0-30854123-1538633024.42/rpm/BUILD/mysql-5.7.24/mysql-5.7.24/storage/innobase/buf/buf0flu.cc:2690 #5 buf_flush_page_cleaner_coordinator (arg=<optimized out>) at /export/home/pb2/build/sb_0-30854123-1538633024.42/rpm/BUILD/mysql-5.7.24/mysql-5.7.24/storage/innobase/buf/buf0flu.cc:3191 #6 0x00007f1decbb4e25 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f1deb66ebad in clone () from /lib64/libc.so.6 If clock changed after buf0flu.cc where we set next_loop_time = curr_time + 1000 the time passed to pc_sleep_if_needed will be too far in the future, which will cause a long delay on shutdown. How to repeat: 1. Start MySQL 2. Change time: date $(date +%m%d%H%M%Y.%S -d '50 min ago') 3. Stop MySQL Suggested fix: As we always set next_loop_time to be 1 second, on buf0flu.cc at function pc_sleep_if_needed we should also check if the diff between (next_loop_time - cur_time) is not greater than 1 second.