Bug #30738 | Concurrent disk sorts from InnoDB tables are much slower in MySQL 5 | ||
---|---|---|---|
Submitted: | 31 Aug 2007 3:36 | Modified: | 19 Nov 2011 9:50 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | No Feedback | Impact on me: | |
Category: | MySQL Server: InnoDB storage engine | Severity: | S5 (Performance) |
Version: | 5.0.37 | OS: | Any |
Assigned to: | Assigned Account | CPU Architecture: | Any |
Tags: | concurrent, disk, innodb, regression, slow, sort |
[31 Aug 2007 3:36]
Mark Callaghan
[31 Aug 2007 3:38]
Mark Callaghan
vmstat output for MySQL 4.0.26. Note that CPU utilization remains at 80%. I cutoff the output at the time at which the on disk temp table is full and the sort starts. procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 9 0 56 47160 97520 14167996 0 0 497 1501 68 25 43 8 46 3 5 0 56 45048 74804 14114144 0 0 174 16 1032 13045 84 11 4 0 9 0 56 47440 69016 14110616 0 0 92 628 1045 14410 86 9 4 0 9 3 56 45304 55888 14108036 0 0 110 188246 2018 14476 82 12 5 1 7 4 56 45040 52832 14106808 0 0 82 198704 1887 12767 81 12 2 5 8 2 56 43928 50800 14111356 0 0 30 160129 1975 14323 82 11 3 4 8 1 56 46496 47868 14104768 0 0 86 209886 2213 16641 82 12 4 2 4 1 56 42032 33940 14121212 0 0 76 203277 2153 18678 81 12 4 3 6 2 56 45952 28588 14122144 0 0 149 135128 1974 19209 81 12 4 4 6 2 56 41520 24656 14129680 0 0 88 146142 1933 17340 80 12 4 4 14 1 56 45656 21004 14130476 0 0 100 154403 1935 13303 82 11 3 4 8 0 56 42176 21572 14134260 0 0 84 49999 1508 13391 85 10 4 1 9 0 56 42824 22200 14132884 0 0 82 159710 2125 17685 82 12 5 1 8 0 56 42880 22676 14132476 0 0 121 160562 2085 15818 82 12 5 1 5 2 56 44872 22092 14130544 0 0 112 182780 2202 14865 81 12 4 4 16 2 56 39000 22048 14135824 0 0 186 153669 2017 14228 80 11 3 6 8 2 56 41664 21756 14134416 0 0 296 141492 2006 16417 82 11 2 4 7 1 56 41616 21816 14134628 0 0 160 150432 2024 15184 82 11 5 2 4 1 56 45456 21748 14130616 0 0 84 163190 2082 14450 83 11 5 1 3 1 56 45392 22024 14130136 0 0 114 154883 2002 15815 80 11 5 4 7 3 56 46984 22564 14127964 0 0 134 145262 1960 16131 80 11 3 5 10 0 56 41816 22676 14131116 0 0 49 194023 2165 15921 81 12 3 3 8 1 56 47408 23024 14126280 0 0 73 175178 2107 13564 82 12 4 2 12 1 56 46632 22876 14125816 0 0 143 196120 2065 13970 82 12 4 3 10 1 56 42768 22832 14130756 0 0 137 155012 2080 14929 82 12 4 2
[31 Aug 2007 3:39]
Mark Callaghan
vmstat output for MySQL 5.0.37. Note that CPU utilization quickly falls off. vmstat 10 100 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 2 56 14207884 48436 1224388 0 0 182 1264 107 44 17 1 79 3 10 1 56 13639396 52024 1177960 0 0 5608 1462 1674 6518 24 6 50 20 10 5 56 9704476 55816 5022356 0 0 1343 104943 5092 2181 83 16 0 0 11 6 56 5873204 59484 8768208 0 0 264 171564 2470 2203 83 16 0 1 14 3 56 1984940 63188 12562576 0 0 224 95187 2525 3110 84 15 0 0 19 5 56 47396 35604 14478452 0 0 345 94204 2496 3673 82 17 0 0 8 9 56 35396 37768 14487984 0 0 291 91910 2635 3708 79 21 0 1 0 14 56 36228 6024 14514764 0 0 436 105907 2690 6018 30 14 0 56 0 14 56 45788 6744 14504796 0 0 554 110822 2698 7916 21 9 0 70 3 13 56 35372 7560 14509488 0 0 619 128301 2467 6258 24 9 1 66 13 2 56 45788 8824 14495780 0 0 351 202230 2269 4301 35 13 0 52 17 6 56 38108 10084 14504244 0 0 362 188190 2233 2419 35 13 1 51 0 16 60 35420 11420 14510248 0 0 698 162536 2356 3342 31 12 0 56 7 12 60 35716 12980 14508008 0 0 572 110507 2554 5145 24 10 0 65 1 14 60 35292 14280 14502968 0 0 528 105590 2504 4805 22 9 0 68 2 13 60 35228 15484 14498364 0 0 955 110892 2552 4757 20 8 0 71 0 13 60 35420 16644 14491696 0 0 869 97495 2375 4620 19 8 1 73 9 6 60 43868 17860 14477696 0 0 452 115229 2612 5205 23 9 1 66 18 7 60 45860 10264 14479308 0 0 388 113643 2690 3681 60 17 1 22 0 17 60 35316 8252 14459156 0 0 3402 133678 2710 2617 33 11 0 56 0 18 60 35452 8864 14429916 0 0 36264 130934 2679 3868 19 9 0 72
[31 Aug 2007 3:45]
Mark Callaghan
We don't normally run 40 concurrent queries on a 4 core server with innodb_thread_concurrency=8. But query pileups do happen because of query plans that go bad --usually because of changes in stats sampled by InnoDB. The problem is that the performance for MySQL5 is much worse than for MySQL4 in this case. The problem with my dataset that I cannot share is much worse than with the testcase above. CPU utilization drops to 15%, IO/second drops to 50 but 30+ queries are active for MySQL5, while MySQL4 does not slow down. And when I change the testcase to keep all temp tables in memory, the problem is reversed -- MySQL5 is several times faster (the sync array changes in InnoDB might explain this). I have used OProfile and Google Perftools to get profiling, but the profile shows where time is spent and not where threads are blocking. I need to know where threads are blocking. InnoDB has some code to report on wait events on InnoDB mutexes. Does MySQL have anything similar for any blocking that might occur while writing to an on-disk temp table?
[31 Aug 2007 4:04]
Mark Callaghan
The testcase might not work for you. For now I will be happy if my questions get answered as I will try to debug this.
[31 Aug 2007 4:12]
Mark Callaghan
On my test server, 20 concurrent queries will reproduce the problem. The server has 4 CPU cores and 12 disks.
[31 Aug 2007 4:16]
Valeriy Kravchuk
Mark, Please, send/upload the content of t1s.sql.
[31 Aug 2007 4:19]
Mark Callaghan
My test servers use Linux 2.6 with NPTL although MySQL 4.0.26 is also faster than MySQL5 when it uses LinuxThreads.
[31 Aug 2007 14:18]
Valeriy Kravchuk
Mark, please, send the exact statements you had used in your t1s.sql scripts. Send also the results of show global variables like 'char%'; from your 5.0.37 testing environment.
[31 Aug 2007 14:30]
Mark Callaghan
The contents of t1s.sql: select straight_join * from t2048 a, t2048 b , t2048 c, t2048 d, t2048 e, t2048 f, t2048 g, t2048 h, t2048 i where a.j < 6 and b.j < 6 and c.j < 6 and d.j < 6 and e.j < 6 and f.j < 6 and g.j < 6 and h.j < 6 and i.j < 6 and length(a.c) = 50 and length(b.c) = 50 and length(c.c) = 50 and length(d.c) = 50 and length(e.c) = 50 and length(f.c) = 50 and length(g.c) = 50 and length(h.c) = 50 and length(i.c) = 50 order by a.c, a.i limit 1;
[31 Aug 2007 14:33]
Mark Callaghan
For MySQL 4.0.26: mysql -e 'show global variables like "char%"' +----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | character_set | latin1 | | character_sets | latin1 big5 czech euc_kr gb2312 gbk latin1_de sjis tis620 ujis dec8 dos german1 hp8 koi8_ru latin2 swe7 usa7 cp1251 danish hebrew win1251 estonia hungarian koi8_ukr win1251ukr greek win1250 croat cp1257 latin5 | For MySQL 5.0.37: +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+
[31 Aug 2007 16:55]
Heikki Tuuri
Mark, what does SHOW INNODB STATUS\G print during the slow operation? The temp tables are MyISAM type. Something may have changed also in MySQL/MyISAM. Regards, Heikki
[31 Aug 2007 17:25]
Mark Callaghan
I am running my original testcase again as the one I posted is flaky. I temporarily turned off mutex stats to see if the extra cache activity from updating shared variables was part of the problem. From my memory, there was more mutex activity per second on MySQL4 (more usage, more OS waits, more rounds) and this is reasonable because queries were running faster on MySQL4. Thread states are much more likely to be *waiting in InnoDB queue* for MySQL5 and *sleeping before joining InnoDB queue* for MySQL4. This is vmstat output for MySQL5 when the slowdown occurs. Note that the CPU is pinned for some time (when queries are writing to in memory temp tables) and the drop in CPU utilization occurs when queries begin to spill to disk. Is ha_release_temporary_latches called? * when an in memory temp table is spilled to disk (this can run long) * when rows are written to disk based temp table 9 0 56 11914172 75324 2155484 0 0 134 174 1120 1482 99 0 1 0 6 0 56 11911804 75348 2156548 0 0 108 74 1043 1375 99 0 1 0 10 0 56 11909740 75372 2157612 0 0 89 63 1058 1505 99 0 1 0 12 0 56 11905644 75404 2159824 0 0 163 185 1136 1591 99 0 1 0 9 0 56 11903052 75432 2161428 0 0 114 81 1161 1685 99 0 1 0 11 0 56 11901004 75460 2162488 0 0 105 72 1041 1372 99 0 1 0 9 1 56 11898324 75520 2164060 0 0 148 170 1051 1405 99 0 1 0 14 0 56 11887540 75568 2165644 0 0 109 377 1052 1446 98 1 1 0 9 0 56 11886764 75600 2167516 0 0 88 376 1050 1448 99 0 1 0 23 1 56 11881740 75668 2169012 0 0 143 175 1097 1536 99 0 1 0 0 0 56 11847804 75816 2172060 0 0 268 84 1147 11069 81 2 17 0 0 0 56 11820052 75860 2173376 0 0 114 54 1082 8882 36 1 62 0 9 0 56 11800732 75968 2175308 0 0 169 178 1118 6806 22 1 77 1 0 0 56 11770588 76024 2176952 0 0 132 53 1116 13083 36 2 62 1 0 0 56 11752220 76080 2178392 0 0 121 10 1130 8352 22 1 77 1 0 0 56 11723900 76156 2181104 0 0 234 485 1133 12710 31 2 67 0
[31 Aug 2007 17:27]
Mark Callaghan
Top 20 functions in MySQL5 from OProfile once the slowdown has started: samples % app name symbol name 2639899 9.9375 libpthread-2.3.5.so pthread_mutex_trylock 2221655 8.3631 libpthread-2.3.5.so pthread_mutex_unlock 1954730 7.3583 mysqld row_search_for_mysql 1665278 6.2687 mysqld btr_search_guess_on_hash 1508263 5.6777 libpthread-2.3.5.so __pthread_mutex_unlock_usercnt 1492409 5.6180 mysqld cmp_dtuple_rec_with_match 1455102 5.4775 mysqld rec_get_offsets_func 825131 3.1061 mysqld btr_cur_search_to_nth_level 810887 3.0525 libc-2.3.5.so memcpy 744389 2.8022 mysqld buf_page_get_gen 597741 2.2501 mysqld ut_delay 516241 1.9433 mysqld mutex_spin_wait 484341 1.8232 mysqld row_sel_store_mysql_rec 413573 1.5568 mysqld page_cur_search_with_match 404812 1.5239 mysqld ha_innobase::general_fetch(char*, unsigned int, unsigned int) 380421 1.4320 mysqld btr_search_info_update_slow 347627 1.3086 mysqld mtr_memo_slot_release Results for MySQL4 from similar point in the benchmark: samples % app name symbol name 26002487 39.5086 libpthread-2.3.5.so pthread_mutex_trylock 8627212 13.1083 mysqld mutex_spin_wait 7681806 11.6719 libpthread-2.3.5.so __pthread_mutex_unlock_usercnt 2041599 3.1020 mysqld .plt 1564911 2.3777 mysqld mtr_memo_slot_release 1378054 2.0938 mysqld btr_search_guess_on_hash 1340988 2.0375 mysqld row_search_for_mysql 1293789 1.9658 libpthread-2.3.5.so pthread_mutex_unlock 1278852 1.9431 mysqld cmp_dtuple_rec_with_match 1087260 1.6520 mysqld ha_innobase::general_fetch(char*, unsigned int, unsigned int) 878946 1.3355 mysqld buf_page_optimistic_get_func 792475 1.2041 mysqld mutex_exit 744995 1.1320 mysqld page_dir_find_owner_slot 731574 1.1116 mysqld buf_page_get_gen 701585 1.0660 mysqld btr_cur_search_to_nth_level 632754 0.9614 mysqld row_sel_store_mysql_rec 595511 0.9048 libc-2.3.5.so memcpy
[31 Aug 2007 17:30]
Mark Callaghan
One more point about the vmstat output above, the regression comes from an increase in idle time, not in wait time. Is something sleeping more than desired?
[31 Aug 2007 20:00]
Mark Callaghan
This problem continues to baffle me. Eventually the workload pins the CPU again. But there is a stretch in the middle of the test where the system is much more idle than expected. From SHOW PROCESSLIST output, the slowdown occurs immediately before thread status changes from 'Copying to tmp table' to 'Converting HEAP to MyISAM'. Once status changes to 'Copy to tmp table on disk', then CPU is pinned again with a lot of system time. vmstat output on 10 second intervals 19 3 56 2181060 128716 11658268 0 0 712 135 1060 2364 98 1 1 0 10 0 56 2167876 128724 11659348 0 0 95 15 1041 2259 98 0 1 0 10 0 56 2153668 128728 11659752 0 0 31 54 1026 2317 98 1 1 0 14 0 56 2142796 128756 11661900 0 0 179 144 1058 2204 98 1 1 0 14 0 56 2136460 128760 11665840 0 0 364 57 1044 2184 98 0 1 0 7 0 56 2128892 128780 11666228 0 0 45 44 1027 2215 99 0 1 0 13 0 56 2124492 128800 11667704 0 0 120 133 1053 2176 98 0 1 0 1 0 56 2107404 128808 11668580 0 0 69 373 1036 5990 74 1 25 0 0 0 56 2078476 128816 11669320 0 0 47 378 1048 12748 32 1 65 1 0 0 56 2061828 128836 11671136 0 0 154 291 1058 5705 12 1 85 2 0 0 56 2044284 128860 11671860 0 0 74 47 1035 10162 20 1 78 1 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 56 2029276 128872 11673480 0 0 124 379 1068 6768 14 1 84 2 1 0 56 2015572 128888 11674484 0 0 88 134 1058 5928 13 1 85 1 0 0 56 1989804 128892 11678492 0 0 386 378 1054 11086 28 1 69 1 0 0 56 1964932 128904 11679840 0 0 48 559 1055 12258 29 1 69 1 0 0 56 1942948 128920 11681048 0 0 64 490 1066 11646 28 1 69 1 0 0 56 1914988 128940 11681708 0 0 52 456 1045 14129 36 2 61 1 0 0 56 1892316 128952 11682784 0 0 76 389 1063 10528 25 1 73 1 0 0 56 1866228 128960 11686516 0 0 333 496 1066 10848 25 1 73 0 7 0 56 1838132 128968 11687324 0 0 57 22 1044 14129 33 2 66 0 0 0 56 1822324 128996 11688316 0 0 91 366 1057 9242 19 1 78 1 0 0 56 1806124 129016 11689248 0 0 59 466 1062 7877 17 1 82 1 0 0 56 1785084 129064 11691716 0 0 221 558 1069 9390 19 1 79 1 6 0 56 1702084 129132 11745164 0 0 78 544 1039 14984 31 5 64 1 4 0 56 1556884 129288 11882572 0 0 75 16748 1113 360092 13 50 34 2 5 0 56 1418420 129488 12016944 0 0 536 29 1168 347147 18 45 36 0 5 0 56 1288724 129652 12145164 0 0 83 54 1029 373713 11 52 37 1 8 1 56 1170868 129804 12260408 0 0 160 39867 1140 284897 30 42 26 2 7 0 56 1063804 129964 12364696 0 0 66 467 1051 200680 49 33 18 0
[1 Sep 2007 16:07]
Heikki Tuuri
Mark, both oprofile outputs show lots of InnoDB mutex activity, much more in 4.1. But 4.1 is the fast version! Concerning the question about ha_releases_temp_latches(), I do not think it gets called when MySQL decided to convert a HEAP temp table to a MyISAM temp table. Thus, it might happen that threads flock to wait for the adaptive hash latch to be released. But then you should see SHOW INNODB STATUS\G showing lots of threads waiting for it. What does it show?
[2 Sep 2007 1:44]
Mark Callaghan
ha_releaes_temporary_latches is not called when converting a temp table from heap to myisam. I experimented by adding a callback so it would be called when my_write was called for the IO_CACHE used by the MyISAM temp table, but that did not help.
[2 Sep 2007 2:02]
Mark Callaghan
My best description of the performance problem using data that I cannot give to you. The testcase is ~20 concurrent queries each doing an order by that spills to disk (session status changes from 'Copying to tmp table' to 'Copy to tmp table on disk'). 0: MySQL4 finishes the queries in ~220 minutes vs ~346 for MySQL5 1: MySQL5.0 drops from 90% CPU to 25% CPU on a 4-core server. The state of the queries remains as 'Copying to tmp table' for several minutes after the drop so the problem appears to start before converting the tmp table from heap to myisam and before doing IO for MyISAM. Sample MySQL5 vmstat output before drop: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 11 0 56 1230348 53536 12576444 0 0 293 18 1033 2269 98 1 1 0 And after drop, note that user CPU is down, idle CPU and context switches are up 1 1 56 1067740 54320 12603948 0 0 113 432 1038 11848 38 2 60 1 2: When user CPU drops, idle CPU goes up. 3: MySQL5 does io for the MyISAM tmp tables at half the rate of MySQL4. I will do more tests to confirm that the disk throughput is similar independent of MySQL on the test machines. Sample vmstat output showing typical IO rate for MySQL4: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- 8 0 56 44680 67144 10012020 0 0 87 3621 1055 75087 87 7 6 0 And for MySQL5, not that IO rate is 3621 vs 2037 0 0 56 42900 45900 12410680 0 0 29 2037 1046 10535 24 1 74 1 4: The number of runnable procs reported by vmstat for MySQL4 is usually between 5 and 8. For MySQL5 it is usually 0. This will explain the high percentage of idle time mentioned above. Both MySQL4 and MySQL5 use NPTL. I assume that a runnable thread counts as a runnable proc. 5: MySQL4 has many more context switches. I assume some of this is a result of the fixes for the sync array. vmstat output for MySQL4 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- 8 0 56 44680 67144 10012020 0 0 87 3621 1055 75087 87 7 6 0 And for MySQL5 0 0 56 42900 45900 12410680 0 0 29 2037 1046 10535 24 1 74 1
[2 Sep 2007 2:04]
Mark Callaghan
SHOW INNODB STATUS for MySQL4 at the end of the testcase: *************************** 1. row *************************** Status: ===================================== 070901 18:37:28 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 5 seconds ---------- BACKGROUND THREAD ---------- srv_master_thread loops: 62599 1_second, 62598 sleeps, 6131 10_second, 3456 background, 3156 flush srv_master_thread log flush: 62650 sync, 45271 async fsync callers: 26284 buffer pool, 0 other, 152 checkpoint, 10864 log aio, 56549 log sync, 0 archive ---------- SEMAPHORES ---------- Lock wait timeouts 0 OS WAIT ARRAY INFO: reservation count 238728266, signal count 140825445 Mutex spin waits 12029782620, rounds 35419816117, OS waits 64317016 RW-shared spins 385401401, OS waits 96964144; RW-excl spins 319291740, OS waits 40482911 -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) reads 0 writes 0 requests 0 io secs 0.000000 io msecs/request 0.000000 max_io_wait 0.000000 I/O thread 1 state: waiting for i/o request (log thread) reads 0 writes 8310 requests 8310 io secs 0.208978 io msecs/request 0.025148 max_io_wait 114.045000 I/O thread 2 state: waiting for i/o request (read thread) reads 70741 writes 0 requests 39193 io secs 389.418743 io msecs/request 9.935926 max_io_wait 3366.653000 I/O thread 3 state: waiting for i/o request (write thread) reads 0 writes 736836 requests 464850 io secs 569.769490 io msecs/request 1.225706 max_io_wait 4040.025000 Pending normal aio reads: 0, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 279712 OS file reads, 1406249 OS file writes, 93849 OS fsyncs 0.80 reads/s, 16384 avg bytes/read, 21.20 writes/s, 2.00 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf for space 0: size 1, free list len 1128, seg size 1130, 303440 inserts, 303440 merged recs, 114391 merges Hash table size 10375991, used cells 7940167, node heap has 22180 buffer(s) 96.98 hash searches/s, 18.20 non-hash searches/s --- LOG --- Log sequence number 128 1718444680 Log flushed up to 128 1718441148 Last checkpoint at 128 1718333760 0 pending log writes, 0 pending chkp writes 922687 log i/o's done, 4.60 log i/o's/second, 56543 syncs, 9149 checkpoints log sync callers: 736836 buffer pool, background 62686 sync and 45271 async, 16 internal, checkpoint 9149 sync and 0 async, 0 archive, commit 0 sync and 840789 async log sync syncers: 2722 buffer pool, background 53820 sync and 0 async, 0 internal, checkpoint 1 sync and 0 async, 0 archive, commit 0 sync and 0 async ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 5927978378; in additional pool allocated 10121472 Buffer pool size 320000 Free buffers 0 Database pages 297820 Modified db pages 117 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages read 311122, created 29028, written 736836 0.80 reads/s, 0.00 creates/s, 24.80 writes/s Buffer pool hit rate 999 / 1000 -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue Main thread process no. 19047, id 1132489056, state: sleeping Number of rows inserted 1891870, updated 2125081, deleted 632148, read 2080952285 0.80 inserts/s, 54.39 updates/s, 0.00 deletes/s, 54.39 reads/s ------------ TRANSACTIONS ------------ Trx id counter 0 2059683696 Purge done for trx's n:o < 0 2059683692 undo n:o < 0 0 Total number of lock structs in row lock hash table 0
[2 Sep 2007 2:05]
Mark Callaghan
SHOW INNODB STATUS for MySQL5 at the end of the testcase *************************** 1. row *************************** Status: ===================================== 070901 18:37:32 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 32 seconds ---------- BACKGROUND THREAD ---------- srv_master_thread loops: 18782 1_second, 18781 sleeps, 1829 10_second, 1148 background, 1036 flush srv_master_thread log flush: 18892 sync, 15826 async srv_wait_thread_mics 233042029178 microseconds, 233042.0 seconds spinlock delay for 5 delay 20 rounds is 4 mics fsync callers: 23779 buffer pool, 23 other, 29 checkpoint, 2497 log aio, 16731 log sync, 0 archive ---------- SEMAPHORES ---------- Lock wait timeouts 0 OS WAIT ARRAY INFO: reservation count 99493060, signal count 68066543 Mutex spin waits 1804993118, rounds 8500902893, OS waits 8381279 RW-shared spins 252062961, OS waits 47106255; RW-excl spins 234269355, OS waits 30858427 Spin rounds per wait: 4.71 mutex, 6.16 RW-shared, 3.25 RW-excl -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) reads 14 writes 0 requests 7 io secs 0.032885 io msecs/request 4.697857 max_io_wait 23.025000 I/O thread 1 state: waiting for i/o request (log thread) reads 0 writes 2497 requests 2497 io secs 1.661534 io msecs/request 0.665412 max_io_wait 331.285000 I/O thread 2 state: waiting for i/o request (read thread) reads 10436 writes 0 requests 6314 io secs 235.918854 io msecs/request 37.364405 max_io_wait 1286.202000 I/O thread 3 state: waiting for i/o request (read thread) reads 4987 writes 0 requests 2756 io secs 104.830433 io msecs/request 38.037167 max_io_wait 1508.804000 I/O thread 4 state: waiting for i/o request (read thread) reads 4000 writes 0 requests 1904 io secs 61.204302 io msecs/request 32.145117 max_io_wait 1438.879000 I/O thread 5 state: waiting for i/o request (read thread) reads 3765 writes 0 requests 2421 io secs 63.081140 io msecs/request 26.055820 max_io_wait 1406.591000 I/O thread 6 state: waiting for i/o request (write thread) reads 0 writes 79623 requests 59754 io secs 89.803908 io msecs/request 1.502894 max_io_wait 2252.062000 I/O thread 7 state: waiting for i/o request (write thread) reads 0 writes 47713 requests 36961 io secs 87.605921 io msecs/request 2.370226 max_io_wait 2034.358000 I/O thread 8 state: waiting for i/o request (write thread) reads 0 writes 43826 requests 34597 io secs 87.361416 io msecs/request 2.525115 max_io_wait 2738.745000 I/O thread 9 state: waiting for i/o request (write thread) reads 0 writes 40770 requests 32503 io secs 87.283270 io msecs/request 2.685391 max_io_wait 1805.893000 Pending normal aio reads: 0, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 127533 OS file reads, 284055 OS file writes, 43385 OS fsyncs 0.94 reads/s, 16384 avg bytes/read, 4.81 writes/s, 1.44 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf: size 1, free list len 56642, seg size 56644, 52316 inserts, 101064 merged recs, 42700 merges Hash table size 10375991, used cells 7218277, node heap has 18217 buffer(s) 48.00 hash searches/s, 24.91 non-hash searches/s --- LOG --- Log sequence number 57 3848610184 Log flushed up to 57 3848609713 Last checkpoint at 57 3816843045 0 pending log writes, 0 pending chkp writes 114762 log i/o's done, 3.56 log i/o's/second, 16731 syncs, 2842 checkpoints log sync callers: 211932 buffer pool, background 18902 sync and 15826 async, 3 internal, checkpoint 2842 sync and 0 async, 0 archive, commit 0 sync and 90080 async log sync syncers: 676 buffer pool, background 16054 sync and 0 async, 0 internal, checkpoint 1 sync and 0 async, 0 archive, commit 0 sync and 0 async ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 6046890062; in additional pool allocated 17983488 Buffer pool size 320000 Free buffers 156566 Database pages 145217 Modified db pages 19378 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages read 137213, created 8042, written 211932 0.94 reads/s, 0.25 creates/s, 14.12 writes/s Buffer pool hit rate 997 / 1000 -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue 1 read views open inside InnoDB Main thread process no. 23724, id 1191237984, state: sleeping Number of rows inserted 338412, updated 500472, deleted 246240, read 2085938366 8.66 inserts/s, 24.06 updates/s, 0.72 deletes/s, 25.16 reads/s ------------ TRANSACTIONS ------------ Trx id counter 0 39672366 Purge done for trx's n:o < 0 39672350 undo n:o < 0 0 History list length 12 Total number of lock structs in row lock hash table 0
[2 Sep 2007 2:06]
Mark Callaghan
Top N waits from SHOW MUTEX STATUS for MySQL5 at the end of the testcase btr0sea.c 139 6887692 buf0buf.c 545 806330 dict0dict.c 3689 231175 buf0buf.c 494 160150 buf0buf.c 497 112489 buf0buf.c 494 97965 mem0pool.c 205 47795 dict0dict.c 3689 6509 buf0buf.c 494 5761 srv0srv.c 969 3864 ibuf0ibuf.c 399 1781
[2 Sep 2007 3:47]
Mark Callaghan
The problem is btr_search_latch. I have simplified my testcase to 4 concurrent queries. In this case, user CPU utilization is at 50% for MySQL5 and 99% for MySQL4. I then recompiled MySQL5 with srv_use_adaptive_hash_indexes=FALSE and the dropoff does not occur. Does processing a select statement ever get an X lock on btr_search_latch? Or is that only done for insert/update/delete? I will post numbers tomorrow. What has changed with the adaptive hash index between MySQL4 and MySQL5 other than the changes to the sync array?
[2 Sep 2007 4:01]
Mark Callaghan
Disabling the adaptive hash index fixes the problem. Why is this worse with MySQl5 Time for 4 concurrent queries: MySQL4 with adaptive hash index -> ~29 minutes each MySQL5 with adaptive hash index -> ~38 minutes each MySQL5 without adaptive hash index -> ~17 minutes each
[2 Sep 2007 14:47]
Mark Callaghan
Time for ~20 concurrent queries: MySQL4 with adaptive hash indexes --> ~220 minutes MySQL5 with adaptive hash indexes --> ~346 minutes MySQL5 without adaptive hash indexes --> ~95 minutes
[2 Sep 2007 14:49]
Mark Callaghan
See http://bugs.mysql.com/bug.php?id=26442
[3 Sep 2007 13:57]
Heikki Tuuri
Mark, InnoDB does want an X-latch on the adaptive hash index also in a SELECT when it builds the hash index for a page. It has to modify the adaptive hash => that is why an X-latch is needed. The difference between 4.0 and 5.0 may be some random effect. The general principle in the adaptive hash latch is that it should not be held if a query does extensive processing inside MySQL, outside of InnoDB. The conversion HEAP -> MyISAM is a typical operation where the adaptive hash should not be held. Fix: 1) we could do so that InnoDB would not keep the adaptive hash if the MySQL query involves a sort (how do we know this?); or: 2) we could call release_temporary_latches() when MySQL decides to convert a HEAP temp table to MyISAM (where does it do that?). I am assigning this to Inaam. Regards, Heikki
[3 Sep 2007 15:04]
Mark Callaghan
Heikki, The slowdown begins before converting the tmp table from heap to MyISAM and continues long after the conversion is done.
[3 Sep 2007 15:29]
Heikki Tuuri
Mark, ok, then 1) might be the better solution until we have http://bugs.mysql.com/bug.php?id=26442 fixed. Best regards, Heikki
[4 Sep 2007 17:24]
Mark Callaghan
When the adaptive hash index is enabled, MySQL 5.0.37 does IO for the tmp table at half the rate of MySQL 4.0.26. The numbers below are for ~20 concurrent queries that all do an external sort. As Valerii suggested, some of this is from the use of the new external sort in MySQL5 (sort keys and all data) versus the old sort in MySQL4 (only sort keys and row pointers). I reran the tests with max_length_for_sort_data=2 in MySQL5 so that the old sort would be used and the IO rate was higher, but still less than that for MySQL4. binary io_rate mysql4 ~3.6 MB mysql5 & new sort ~2.0 MB mysql5 & old sort ~2.9 MB Sample vmstat output showing typical IO rate for MySQL4: procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- 8 0 56 44680 67144 10012020 0 0 87 3621 1055 75087 87 7 6 0 And for MySQL5, not that IO rate is 3621 vs 2037 0 0 56 42900 45900 12410680 0 0 29 2037 1046 10535 24 1 74 0 And for MySQL5 with max_length_for_sort_data=2 0 0 56 2806804 217044 10597132 0 0 6 2929 1050 12708 36 2 62 0
[4 Sep 2007 21:47]
Mark Callaghan
Much of the regression between MySQL4 and MySQL5 is from the use of the new external sort algorithm as Valerii suggested might be the case. I reran the test for MySQL5 with the adaptive hash index enabled and force the old external sort algorithm by setting max_length_for_sort=2. Time for ~20 concurrent queries: MySQL4 with adaptive hash index --> ~220 minutes MySQL5 with adaptive hash index --> ~346 minutes MySQL5 without adaptive hash index --> ~95 minutes MySQL5 with adaptive hash index and old sort -> ~253 minutes
[18 Sep 2007 1:41]
James Day
For other readers, the new sorting algorithm that Mark mentioned is described at http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html . It avoids lots of random row reads after sorting but stores all fields returned by the query instead of just the fields being sorted and keys to look up the rest of the records. It halves the number of row reads but it can mean that a larger temporary table size or sort buffer size is helpful for InnoDB in 4.1 compared to 4.0. Or a smaller max_length_for_sort_data than the default. Queries that are particularly greatly affected are those with large numbers of rows matching WHERE but a much smaller LIMIT or any query with much larger data to return than the size of the fields to sort by. For Mark's case this raises the time from 253 to 346 minutes. But the big result he saw is that some 72% of his query time in 5.0 is caused by the adaptive hash index so there is considerable speed improvement possible by improving it. Because the new sort put more data on disk it seems that the existing problem of holding the adaptive hash index lock while the sorting is happening is made worse in 5.0. Heikki's fix 1 might reduce the 5.0 time from 346 to 253 minutes but would still leave the potential gain of a reduction to 95 minutes available. How much of that remaining gain potential is from reducing lock contention rather than just the other cost of updating the hash index is unclear. Running tests with different numbers of concurrent queries might help to establish this.
[24 Oct 2007 5:08]
Inaam Rana
Mark, I tried the tls.sql test and it quickly became IO bound. The vmstat output was more like the one showed by you at the top of this report i.e.: increase in iowait time not idle time. A few questions for you: 1) The idle time increase that you mentioned, it happens on your internal workload or tls.sql? 2) I see that the 5.0 binary is using four reader/writer InnoDB threads. Is it a customized build? If yes, what else has been changed in it? 3) When you ran with adaptive hash index disabled, did you saw an increase in iowait time and/or user time? Was the idle time reduced to what it was in 4.0? 4) Can you help us in testing some tentative fixes? I am asking for this because the workload that is triggering this particular utilization is not available to us for testing. Any help in this regard is appreciated.
[24 Oct 2007 5:10]
Inaam Rana
patch to release btr_search_latch if the current THD has build temp tables on disk.
Attachment: ahi_5.0_v0.patch (text/x-diff), 532 bytes.
[24 Oct 2007 5:12]
Inaam Rana
patch based on Yasufumi's idea of fixing missed os_event_set() call in sync0arr.
Attachment: os_event_5.0_v2.patch (text/x-diff), 17.22 KiB.
[28 Oct 2007 18:22]
Mark Callaghan
>>>>> I tried the tls.sql test and it quickly became IO bound. The vmstat output was more like the one showed by you at the top of this report i.e.: increase in iowait time not idle time. A few questions for you: >>>>> We have someone working on this fulltime. I hope to have a better testcase ready soon. The testcase that I shared is not as good as the testcase run on data we can't share. >>>>> 1) The idle time increase that you mentioned, it happens on your internal workload or tls.sql? >>>>> It happens on our internal workload. tls.sql is not a good testcase. >>>>> 2) I see that the 5.0 binary is using four reader/writer InnoDB threads. Is it a customized build? If yes, what else has been changed in it? >>>>> There are many changes. Most are described at http://code.google.com/p/google-mysql-tools/w/list. We use MySQL 4.0.26 and 5.0.37 and both versions have the same changes by us. Performance for 5.0.37 is much worse than 4.0.26 with this bug. >>>>> 3) When you ran with adaptive hash index disabled, did you saw an increase in iowait time and/or user time? Was the idle time reduced to what it was in 4.0? >>>>> I will make sure that the person working on this provides an answer. >>>>> 4) Can you help us in testing some tentative fixes? I am asking for this because the workload that is triggering this particular utilization is not available to us for testing. Any help in this regard is appreciated. >>>>> We will do our best to get a testcase to you. I don't think the patch to release the btr_search_latch when spilling temp tables to disk does much for this problem. I say that because I tried such a change when first debugging this. I am curious why a thread can hold an S lock on btr_search_latch for a long time -- for example when leaving InnoDB. Given that there can be frequent requests for an X lock on this, shouldn't the code minimize the duration for which locks on btr_search_latch are held? We will try out the other patch for the missing os_event_set call.
[9 Nov 2007 17:36]
Heikki Tuuri
Inaam, do we need to change the plugin interface in 5.1?
[5 Dec 2007 4:12]
Mark Callaghan
We have begun testing the fix for http://bugs.mysql.com/bug.php?id=29560 and the results look great on servers with 4 CPU cores and 20 concurrent queries.
[16 Dec 2007 18:53]
Inaam Rana
We are waiting on (hopefully good) results from Mark and his team on this.
[17 Jan 2008 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[19 Aug 2010 19:25]
James Day
For cases where bug #29560 isn't the solution, the work to release the adaptive hash during filesorts that was done in bug #32149 should solve the problem.
[1 Oct 2011 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[20 Nov 2011 7:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".