Description:
1. InnoDB buffer pool is almost filled with 4KB compressed pages
2. Inserting into 16KB (uncompressed/compact) tables
If the above conditions are met, pages_free in 8KB page increases a lot.
mysql> select * from innodb_cmpmem;
+-----------+----------------------+------------+------------+----------------+-----------------+
| page_size | buffer_pool_instance | pages_used | pages_free | relocation_ops | relocation_time |
+-----------+----------------------+------------+------------+----------------+-----------------+
| 1024 | 0 | 0 | 0 | 0 | 0 |
| 2048 | 0 | 0 | 0 | 0 | 0 |
| 4096 | 0 | 453427 | 1 | 606502 | 0 |
| 8192 | 0 | 0 | 62170 | 0 | 0 |
| 16384 | 0 | 0 | 0 | 0 | 0 |
....
This is actually a big problem. Very large zip_free list causes both stalls and insert slowdown. When this problem happened, one user thread spent quite a long time for buf_buddy_free_low(). buffer_pool_mutex is held during the whole process. This blocks almost all operations for a long time, including transaction commit (that locks log_sys and buffer_pool_mutex). cpu util also dropped to ~5%.
Here is an example stack trace.
buf_buddy_free_low:buf=0x7f13ea86a000,,buf_buddy_free,buf_LRU_block_remove_hashed_page:zip=1),buf_LRU_free_block,buf_flush_LRU_list_batch,buf_do_LRU_batch:out>,,buf_flush_batch:flush_type<optimized,page_cleaner_flush_LRU_tail,buf_flush_page_cleaner_thread:out>),start_thread,clone
#0 buf_buddy_free_low (buf_pool=0x1404530, buf=0x7f13ea86a000, i=3) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0buddy.cc:482
#1 0x0000000000a6d773 in buf_buddy_free (size=<optimized out>, buf=0xffffff00, buf_pool=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/include/buf0buddy.ic:137
#2 buf_LRU_block_remove_hashed_page (bpage=0x7f076e07aa50, zip=1) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0lru.cc:2283
#3 0x0000000000a6ed0b in buf_LRU_free_block (bpage=0x7f076e07aa50, zip=1) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0lru.cc:1855
#4 0x0000000000a6a78d in buf_flush_LRU_list_batch (max=100, buf_pool=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0flu.cc:1453
#5 buf_do_LRU_batch (max=<optimized out>, buf_pool=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0flu.cc:1514
#6 buf_flush_batch (buf_pool=0x1404530, flush_type=<optimized out>, min_n=100, lsn_limit=0) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0flu.cc:1667
#7 0x0000000000a6bd07 in page_cleaner_flush_LRU_tail () at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0flu.cc:1830
#8 buf_flush_page_cleaner_thread (arg=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0flu.cc:2372
#9 0x0000003eef0062f7 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003eee4d1e3d in clone () from /lib64/libc.so.6
How to repeat:
innodb options:
innodb_file_format=Barracuda
innodb_file_per_table=1
innodb_log_compressed_pages=0
innodb_flush_neighbors=0
innodb_buffer_pool_size=54G
innodb_log_file_size=2000M
innodb_flush_method=O_DIRECT
innodb_thread_concurrency=256
thread_cache_size=2000
innodb_flush_log_at_trx_commit=0
fill buffer pool with 4KB compressed pages:
- create database db1 .. db50
- for each database, create a 4KB compressed table and insert many rows
(Until LRU len:unzip_LRU len == 10:1)
- Stop inserting 4KB compressed tables. And do the same thing for 16KB compact tables.
- select * from information_schema.innodb_cmpmem and see pages_free increases. And check innodb_rows_inserted drops.
Suggested fix:
a) I do not understand why 8KB pages_free increased even though I didn't use 8KB pages at all. If the pages_free length is small enough, this problem won't happen.
b) Around buf0buddy.cc:482:
---
for (bpage = UT_LIST_GET_FIRST(buf_pool->zip_free[i]); bpage; ) {
...
bpage = UT_LIST_GET_NEXT(list, bpage);
}
---
This is O(N). By using tree (Olog(N)) instead of list will mitigate the problem.
Description: 1. InnoDB buffer pool is almost filled with 4KB compressed pages 2. Inserting into 16KB (uncompressed/compact) tables If the above conditions are met, pages_free in 8KB page increases a lot. mysql> select * from innodb_cmpmem; +-----------+----------------------+------------+------------+----------------+-----------------+ | page_size | buffer_pool_instance | pages_used | pages_free | relocation_ops | relocation_time | +-----------+----------------------+------------+------------+----------------+-----------------+ | 1024 | 0 | 0 | 0 | 0 | 0 | | 2048 | 0 | 0 | 0 | 0 | 0 | | 4096 | 0 | 453427 | 1 | 606502 | 0 | | 8192 | 0 | 0 | 62170 | 0 | 0 | | 16384 | 0 | 0 | 0 | 0 | 0 | .... This is actually a big problem. Very large zip_free list causes both stalls and insert slowdown. When this problem happened, one user thread spent quite a long time for buf_buddy_free_low(). buffer_pool_mutex is held during the whole process. This blocks almost all operations for a long time, including transaction commit (that locks log_sys and buffer_pool_mutex). cpu util also dropped to ~5%. Here is an example stack trace. buf_buddy_free_low:buf=0x7f13ea86a000,,buf_buddy_free,buf_LRU_block_remove_hashed_page:zip=1),buf_LRU_free_block,buf_flush_LRU_list_batch,buf_do_LRU_batch:out>,,buf_flush_batch:flush_type<optimized,page_cleaner_flush_LRU_tail,buf_flush_page_cleaner_thread:out>),start_thread,clone #0 buf_buddy_free_low (buf_pool=0x1404530, buf=0x7f13ea86a000, i=3) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0buddy.cc:482 #1 0x0000000000a6d773 in buf_buddy_free (size=<optimized out>, buf=0xffffff00, buf_pool=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/include/buf0buddy.ic:137 #2 buf_LRU_block_remove_hashed_page (bpage=0x7f076e07aa50, zip=1) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0lru.cc:2283 #3 0x0000000000a6ed0b in buf_LRU_free_block (bpage=0x7f076e07aa50, zip=1) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0lru.cc:1855 #4 0x0000000000a6a78d in buf_flush_LRU_list_batch (max=100, buf_pool=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0flu.cc:1453 #5 buf_do_LRU_batch (max=<optimized out>, buf_pool=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0flu.cc:1514 #6 buf_flush_batch (buf_pool=0x1404530, flush_type=<optimized out>, min_n=100, lsn_limit=0) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0flu.cc:1667 #7 0x0000000000a6bd07 in page_cleaner_flush_LRU_tail () at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0flu.cc:1830 #8 buf_flush_page_cleaner_thread (arg=<optimized out>) at /export/home/pb2/build/sb_0-7655600-1353595193.21/mysql-5.6.9-rc/storage/innobase/buf/buf0flu.cc:2372 #9 0x0000003eef0062f7 in start_thread () from /lib64/libpthread.so.0 #10 0x0000003eee4d1e3d in clone () from /lib64/libc.so.6 How to repeat: innodb options: innodb_file_format=Barracuda innodb_file_per_table=1 innodb_log_compressed_pages=0 innodb_flush_neighbors=0 innodb_buffer_pool_size=54G innodb_log_file_size=2000M innodb_flush_method=O_DIRECT innodb_thread_concurrency=256 thread_cache_size=2000 innodb_flush_log_at_trx_commit=0 fill buffer pool with 4KB compressed pages: - create database db1 .. db50 - for each database, create a 4KB compressed table and insert many rows (Until LRU len:unzip_LRU len == 10:1) - Stop inserting 4KB compressed tables. And do the same thing for 16KB compact tables. - select * from information_schema.innodb_cmpmem and see pages_free increases. And check innodb_rows_inserted drops. Suggested fix: a) I do not understand why 8KB pages_free increased even though I didn't use 8KB pages at all. If the pages_free length is small enough, this problem won't happen. b) Around buf0buddy.cc:482: --- for (bpage = UT_LIST_GET_FIRST(buf_pool->zip_free[i]); bpage; ) { ... bpage = UT_LIST_GET_NEXT(list, bpage); } --- This is O(N). By using tree (Olog(N)) instead of list will mitigate the problem.