Bug #103034 new undo log purge strategy purge record too slow on single table
Submitted: 18 Mar 2021 9:29 Modified: 18 Mar 2021 13:20
Reporter: Zongzhi Chen (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: InnoDB storage engine Severity:S5 (Performance)
Version:8.0 OS:Any
Assigned to: CPU Architecture:Any

[18 Mar 2021 9:29] Zongzhi Chen
Description:
Hello, guys

We found the new undo log purge strategy purge record too slow on single table.

In MySQL8.0, the undo log purge coordinator thread read the undo log records from the history list, and put the records to same purge thread according the table id. Even if there is 8 undo purge thread, if it only modify one table, then there is only one undo purge thread doing the purge work. 

However, In MySQL5.6, the undo log purge coordinator thread put the records to purge thread by round-robin.

I know that new undo log purge strategy will get better performance if it modify multi tables. Because put the same table in one purge thread can avoid the lock contention. However, in this situation, purge on single table get worse performance.
I suggest to add new variable to control the strategy.

This is the stack, there is only one purge thread working, even the innodb_purge_threads=4

```
      1 nanosleep(libpthread.so.0),sleep_for<long,std::ratio<1,1000000>>(thread:373),trx_purge_wait_for_workers_to_complete(thread:373),trx_purge(thread:373),srv_do_purge(srv0srv.cc:3378),srv_purge_coordinator_thread(srv0srv.cc:3378),void(srv0srv.cc:3378),__invoke<void(srv0srv.cc:3378),__call<void>(srv0srv.cc:3378),operator()<>(srv0srv.cc:3378),operator()<void(srv0srv.cc:3378),__invoke_impl<void,(srv0srv.cc:3378),__invoke<Runnable,(srv0srv.cc:3378),_M_invoke<0,(srv0srv.cc:3378),operator()(srv0srv.cc:3378),std::thread::_State_impl<std::thread::_Invoker<std::tuple<Runnable,(srv0srv.cc:3378),execute_native_thread_routine,start_thread(libpthread.so.0),clone(libc.so.6)

      1 do_futex_wait(libpthread.so.0),__new_sem_wait_slow(libpthread.so.0),sem_timedwait(libpthread.so.0),pfs_iochnl_cli_wait(pfs_iochnl.cc:1126),pfs_polardev_wait_io,pfs_dev_wait_io,pfs_io_wait,pfsdev_do_io,pfs_blkio_execute,pfs_file_read,pfs_file_xpread,_pfs_pread,pfs_pread,SyncFileIO::execute(os0file.cc:2225),os_file_io(os0file.cc:5381),os_file_pread(os0file.cc:5552),os_file_read_page(os0file.cc:5552),os_file_read_func(os0file.cc:6049),os_aio_func(os0file.cc:7911),pfs_os_aio_func(os0file.ic:224),Fil_shard::do_io(os0file.ic:224),fil_io(fil0fil.cc:8832),buf_read_page_low(buf0rea.cc:143),buf_read_page(buf0rea.cc:351),buf_page_get_gen(buf0buf.cc:3601),btr_cur_search_to_nth_level(btr0cur.cc:982),btr_pcur_open_low(btr0pcur.ic:405),row_search_index_entry(btr0pcur.ic:405),row_purge_remove_sec_if_poss_leaf(row0purge.cc:454),row_purge_remove_sec_if_poss(row0purge.cc:555),row_purge_del_mark(row0purge.cc:555),row_purge_record_func(row0purge.cc:555),row_purge(row0purge.cc:555),row_purge_step(row0purge.cc:555),que_thr_step(que0que.cc:937),que_run_threads_low(que0que.cc:937),que_run_threads(que0que.cc:937),srv_task_execute(srv0srv.cc:3250),srv_worker_thread(srv0srv.cc:3250),void(srv0srv.cc:3250),__invoke<void(srv0srv.cc:3250),__call<void>(srv0srv.cc:3250),operator()<>(srv0srv.cc:3250),operator()<void(srv0srv.cc:3250),__invoke_impl<void,(srv0srv.cc:3250),__invoke<Runnable,(srv0srv.cc:3250),_M_invoke<0,(srv0srv.cc:3250),operator()(srv0srv.cc:3250),std::thread::_State_impl<std::thread::_Invoker<std::tuple<Runnable,(srv0srv.cc:3250),execute_native_thread_routine,start_thread(libpthread.so.0),clone(libc.so.6)
```

How to repeat:
read the code and run sysbench

Suggested fix:
I suggest to add new variable to control the strategy.
[18 Mar 2021 13:20] MySQL Verification Team
Hi Mr. zongzhi,

Thank you very much for your performance improvement request.

We have analysed carefully your report and concluded that you are correct. This is useful feature request that would improve performance.

We disagree only on one single point. Instead of using system variable, this can be solved by a simple condition testing.

Verified as reported.