Bug #57489 | performance regression in 5.5.6-rc for IO bound sysbench | ||
---|---|---|---|
Submitted: | 15 Oct 2010 18:05 | Modified: | 26 Dec 2010 18:46 |
Reporter: | Mark Callaghan | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: InnoDB Plugin storage engine | Severity: | S5 (Performance) |
Version: | 5.5.6-rc | OS: | Any |
Assigned to: | Inaam Rana | CPU Architecture: | Any |
Tags: | innodb, performance, regression |
[15 Oct 2010 18:05]
Mark Callaghan
[15 Oct 2010 18:17]
Mark Callaghan
WIth this test, the first run of sysbench with 8 total threads and 1 thread per might have skewed results as the server is restarted prior to it. The tests that follow are started immediately one after another without restarts of mysqld.
[15 Oct 2010 18:24]
Mark Callaghan
Sample vmstat output from the highest concurrency test (128 threads per sysbench instance, 1024 total threads). Sorry that the formatting for this is not preserved Note that 5.1.50 is doing IO faster per the bi and bo columns: ~20,000 versus ~15,000 5.5.6-rc procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st bi bo in cs us sy id wa st 3 63 212 58393104 1367716 7838392 0 0 14318 13932 2367 12726 3 1 64 31 0 0 66 212 58408920 1367716 7838408 0 0 17453 14446 2549 15041 2 1 57 39 0 0 65 212 58413156 1367716 7838444 0 0 15510 15376 2498 13726 2 1 62 35 0 2 64 212 58450244 1367716 7838508 0 0 13931 14569 2435 13084 4 1 63 32 0 from 5.1.50 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st bi bo in cs us sy id wa st 0 64 212 58423536 1368152 7869328 0 0 20982 21532 3040 18522 4 2 51 44 0 1 63 212 58428276 1368152 7869348 0 0 19448 19953 2853 17252 2 1 55 41 0 0 1 212 58428892 1368152 7869368 0 0 19650 20252 2881 17397 2 1 54 42 0 1 64 212 58428976 1368152 7869380 0 0 20733 21491 2974 18263 3 1 53 43 0
[15 Oct 2010 18:29]
Mark Callaghan
Data from iostat, once again I predict formatting will be lost. This is from the highest concurrency test. Again, 5.1.50 is able to do IO at a higher rate which explains higher performance. What limits performance for 5.5.6-rc? * mutex contention? * something else? Last two samples from 5.5.6-rc Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util cciss/c0d0 0.00 16.70 903.50 544.60 28912.00 29902.90 40.62 32.87 22.68 0.68 98.96 cciss/c0d0 0.00 13.20 863.50 547.30 27632.00 29857.90 40.75 32.66 23.19 0.70 98.88 Last two samples from 5.1.50 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util cciss/c0d0 0.00 6.40 1365.20 787.50 43686.40 44665.40 41.04 39.91 18.54 0.46 99.99 cciss/c0d0 0.00 18.38 1359.04 780.92 43489.31 44602.90 41.17 40.12 18.77 0.47 99.90
[15 Oct 2010 18:38]
Mark Callaghan
Performance per 60 seconds. This is the performance per interval. Normal sysbench has an option to report cumulative performance per N seconds. I changed that to report the performance per interval per N seconds. Note that TPS for 5.5.6 quickly drops to 65 but is steady at 85 for 5.1.50 for 5.5.6-rc [1285921215] min: 0.1746 avg: 1.2644 max: 3.1057 percentile: 2.2962 throughput: 99.6500 requests: 5979 interval_secs: 60 sum_secs: 7560 threads: 128 [1285921275] min: 0.1746 avg: 1.8964 max: 3.1309 percentile: 1.3903 throughput: 67.0167 requests: 4021 interval_secs: 60 sum_secs: 7625 threads: 128 [1285921335] min: 0.1746 avg: 1.9073 max: 3.4103 percentile: 1.3341 throughput: 67.4667 requests: 4048 interval_secs: 60 sum_secs: 7721 threads: 128 [1285921395] min: 0.1746 avg: 1.9746 max: 3.4547 percentile: 1.2901 throughput: 64.7667 requests: 3886 interval_secs: 60 sum_secs: 7673 threads: 128 [1285921455] min: 0.1746 avg: 1.6937 max: 3.4547 percentile: 1.2169 throughput: 76.0333 requests: 4562 interval_secs: 60 sum_secs: 7727 threads: 128 [1285921515] min: 0.1746 avg: 2.2509 max: 3.9495 percentile: 1.0601 throughput: 56.7333 requests: 3404 interval_secs: 60 sum_secs: 7662 threads: 128 [1285921575] min: 0.1746 avg: 1.9835 max: 3.9495 percentile: 1.1035 throughput: 64.5167 requests: 3871 interval_secs: 60 sum_secs: 7678 threads: 128 [1285921635] min: 0.1746 avg: 2.0271 max: 3.9495 percentile: 1.0923 throughput: 63.2000 requests: 3792 interval_secs: 60 sum_secs: 7687 threads: 128 [1285921695] min: 0.1746 avg: 1.9205 max: 3.9495 percentile: 1.1118 throughput: 66.6167 requests: 3997 interval_secs: 60 sum_secs: 7676 threads: 128 [1285921755] min: 0.1746 avg: 1.8471 max: 3.9495 percentile: 1.1201 throughput: 69.3333 requests: 4160 interval_secs: 60 sum_secs: 7684 threads: 128 [1285921815] min: 0.1746 avg: 1.8634 max: 3.9495 percentile: 1.1148 throughput: 68.9667 requests: 4138 interval_secs: 60 sum_secs: 7711 threads: 128 [1285921875] min: 0.1746 avg: 1.8908 max: 3.9827 percentile: 1.0913 throughput: 66.8167 requests: 4009 interval_secs: 60 sum_secs: 7580 threads: 128 [1285921935] min: 0.1746 avg: 2.0538 max: 3.9827 percentile: 1.0703 throughput: 62.8333 requests: 3770 interval_secs: 60 sum_secs: 7743 threads: 128 [1285921995] min: 0.1746 avg: 1.9612 max: 3.9827 percentile: 1.0786 throughput: 64.6167 requests: 3877 interval_secs: 60 sum_secs: 7604 threads: 128 [1285922055] min: 0.1746 avg: 1.8996 max: 3.9827 percentile: 1.0910 throughput: 68.1000 requests: 4086 interval_secs: 60 sum_secs: 7762 threads: 128 [1285922115] min: 0.1746 avg: 1.9592 max: 3.9827 percentile: 1.0767 throughput: 65.5000 requests: 3930 interval_secs: 60 sum_secs: 7699 threads: 128 [1285922175] min: 0.1746 avg: 1.9960 max: 3.9827 percentile: 1.0716 throughput: 64.1333 requests: 3848 interval_secs: 60 sum_secs: 7681 threads: 128 [1285922235] min: 0.1746 avg: 1.9835 max: 3.9827 percentile: 1.0722 throughput: 64.5000 requests: 3870 interval_secs: 60 sum_secs: 7676 threads: 128 [1285922295] min: 0.1746 avg: 1.8981 max: 3.9827 percentile: 1.0838 throughput: 67.2500 requests: 4035 interval_secs: 60 sum_secs: 7659 threads: 128 for 5.1.50 [1285960689] min: 0.7676 avg: 1.4803 max: 3.6626 percentile: 2.8836 throughput: 85.6333 requests: 5138 interval_secs: 60 sum_secs: 7606 threads: 128 [1285960749] min: 0.7676 avg: 1.5318 max: 3.6626 percentile: 1.2213 throughput: 83.0500 requests: 4983 interval_secs: 60 sum_secs: 7633 threads: 128 [1285960809] min: 0.7676 avg: 1.5136 max: 3.6626 percentile: 1.0923 throughput: 85.1333 requests: 5108 interval_secs: 60 sum_secs: 7731 threads: 128 [1285960869] min: 0.7676 avg: 1.5704 max: 3.8074 percentile: 1.0421 throughput: 80.8500 requests: 4851 interval_secs: 60 sum_secs: 7618 threads: 128 [1285960929] min: 0.7676 avg: 1.5745 max: 4.0609 percentile: 1.0166 throughput: 81.8000 requests: 4908 interval_secs: 60 sum_secs: 7728 threads: 128 [1285960989] min: 0.7676 avg: 1.5282 max: 4.0609 percentile: 0.9997 throughput: 83.8167 requests: 5029 interval_secs: 60 sum_secs: 7685 threads: 128 [1285961049] min: 0.7676 avg: 1.5584 max: 4.0901 percentile: 0.9845 throughput: 82.3833 requests: 4943 interval_secs: 60 sum_secs: 7703 threads: 128 [1285961109] min: 0.7676 avg: 1.5292 max: 4.0901 percentile: 0.9731 throughput: 83.6500 requests: 5019 interval_secs: 60 sum_secs: 7675 threads: 128 [1285961169] min: 0.7676 avg: 1.5641 max: 4.0901 percentile: 0.9644 throughput: 81.3833 requests: 4883 interval_secs: 60 sum_secs: 7637 threads: 128 [1285961229] min: 0.7676 avg: 1.5612 max: 4.5812 percentile: 0.9584 throughput: 82.2500 requests: 4935 interval_secs: 60 sum_secs: 7704 threads: 128 [1285961289] min: 0.7676 avg: 1.5422 max: 4.5812 percentile: 0.9529 throughput: 83.0667 requests: 4984 interval_secs: 60 sum_secs: 7686 threads: 128 [1285961349] min: 0.7676 avg: 1.5420 max: 4.5812 percentile: 0.9461 throughput: 83.1500 requests: 4989 interval_secs: 60 sum_secs: 7693 threads: 128 [1285961409] min: 0.7676 avg: 1.5553 max: 4.5812 percentile: 0.9436 throughput: 82.4833 requests: 4949 interval_secs: 60 sum_secs: 7697 threads: 128 [1285961469] min: 0.7676 avg: 1.5648 max: 4.5812 percentile: 0.9374 throughput: 80.3000 requests: 4818 interval_secs: 60 sum_secs: 7539 threads: 128 [1285961529] min: 0.7676 avg: 1.5588 max: 4.5812 percentile: 0.9346 throughput: 82.9500 requests: 4977 interval_secs: 60 sum_secs: 7758 threads: 128 [1285961589] min: 0.7676 avg: 1.6296 max: 4.5812 percentile: 0.9301 throughput: 79.2333 requests: 4754 interval_secs: 60 sum_secs: 7747 threads: 128 [1285961649] min: 0.7676 avg: 1.5128 max: 4.5812 percentile: 0.9295 throughput: 82.9000 requests: 4974 interval_secs: 60 sum_secs: 7525 threads: 128 [1285961709] min: 0.7676 avg: 1.5867 max: 4.5812 percentile: 0.9270 throughput: 81.5333 requests: 4892 interval_secs: 60 sum_secs: 7762 threads: 128 [1285961769] min: 0.7676 avg: 1.5422 max: 4.5812 percentile: 0.9270 throughput: 83.7833 requests: 5027 interval_secs: 60 sum_secs: 7752 threads: 128
[15 Oct 2010 18:45]
Mark Callaghan
This is not a problem for a read-only workload: These are QPS for sysbench configured to fetch by primary key for 8, 16, 32, ..., 1024 concurrent clients using the same database 961 1373 1825 2316 2336 2345 2337 2373 5150orig 962 1375 1825 2314 2332 2348 2354 2373 556rc
[15 Oct 2010 19:30]
Davi Arnaut
There were a few performance related regressions in 5.5.6 due to MDL, which are now hopefully fixed. See Bug#56715, Bug#56405, Bug#56585. Would be interesting to repeat this test with 5.5.7.
[15 Oct 2010 22:38]
Mark Callaghan
I don't think those have any impact on this. One is a windows bug with regression on a read-only benchmark. I use Linux and have no regression on read-only loads. The other two are more interesting, but I am not doing anything that should require an exclusive MDL lock.
[15 Oct 2010 22:40]
Mark Callaghan
I think this is a regression in InnoDB. Maybe I will repeat tests with MyISAM, but concurrent read-write tests on IO-bound MyISAM frequently shows you that MyISAM and a RW-lock per table isn't good (maybe MongoDB users will notice that one day).
[19 Oct 2010 21:32]
Inaam Rana
Mark, Wondering if you have 'show engine innodb status' and 'show engine innodb mutex' listings for both runs, hopefully multiple of them at regular intervals.
[19 Oct 2010 21:46]
Mark Callaghan
I don't have that yet. I think this may be caused by support issue 51606 and I am repeating tests to attempt to reproduce that issue. Until today I was not recording the size of the ibdata1 file during my tests so I don't know when things when bad. Note for those who can't read this the ibdata1 file gets very large and most of the growth is from pages used for the undo log. For the results previously reported on this bug, the ibdata1 file was eventually ~190GB for 5.5.6-rc. I don't know when/how/why it got that big so all results for 5.5.6-rc are suspect. Repeating tests with reasonably sized ibdata1 files (~24M versus 190G), performance looks OK so far: Throughput in TPS for 8 -> 1024 concurrent clients Using innodb_io_capacity=1000 and innodb_thread_concurrency=64 8 16 32 64 128 256 512 1024 concurrent clients 513 621 781 916 912 911 914 915 5.1.50 386 680 769 911 909 914 927 938 5.5.6-rc
[20 Oct 2010 18:57]
Mikhail Izioumtchenko
bug#56933 may be related, I'm not confident it went to the 5.5.6-rc.
[20 Oct 2010 19:03]
Calvin Sun
The fix of bug#56933 is in 5.5.7-rc.
[7 Dec 2010 18:21]
Mark Callaghan
Can we put this on hold until I repeat my tests after getting the fix for http://bugs.mysql.com/bug.php?id=56933
[23 Dec 2010 18:29]
Mark Callaghan
I cannot reproduce this with 5.5.8, my current results are... 483 615 781 781 784 784 784 785 5147orig 508 629 799 802 801 802 800 802 5152facebook 487 607 768 772 770 769 770 769 558orig
[26 Dec 2010 18:46]
Valeriy Kravchuk
Not repeatable with 5.5.8