Bug #37424 | Considerable performance degradation in Falcon from 6.0.4 to 6.0.5 | ||
---|---|---|---|
Submitted: | 16 Jun 2008 8:20 | Modified: | 26 May 2010 17:50 |
Reporter: | Philip Stoev | Email Updates: | |
Status: | Unsupported | Impact on me: | |
Category: | MySQL Server: Falcon storage engine | Severity: | S2 (Serious) |
Version: | 6.0.5 | OS: | Any |
Assigned to: | Kelly Long | CPU Architecture: | Any |
Tags: | F_PERFORMANCE |
[16 Jun 2008 8:20]
Philip Stoev
[16 Jun 2008 12:26]
Philip Stoev
720-second sysbench test with 10 users, falcon_lock_wait_timeout=0: build-team 6.0.4 debug read/write requests: 3570408 (4958.49 per sec.) build-team 6.0.4 nondebug read/write requests: 3540522 (4917.05 per sec.) build-team 6.0.5 debug read/write requests: 688736 (956.21 per sec.) build-team 6.0.5 nondebug read/write requests: 714368 (991.55 per sec.) self-compiled 6.0.5 debug read/write requests: 690032 (957.96 per sec.) self-compiled 6.0.5 nondebug read/write requests: 681630 (946.31 per sec.) self-compiled 6.0-falcon nondbg read/write requests: 713676 (990.77 per sec.) The -03 flag appears to kick in properly for all nondebug builds.
[19 Jun 2008 7:49]
Mats Kindahl
From looking at the description of the bug, there are a few things that I find strange (and hints that it is not a compiler problem). - The two functions SyncObject::lock() and ::unlock() have risen significantly. Now, I don't know what the sync object does, but I assume that it is some form of mutex. The fact that they have risen, hints at some kind of congestion on the locks, which can be caused by a number of factors, including at least: - Wrong phase of the moon. Just bad luck with this run, is it repeatable? Given enough time, even unlikely events happen. - Competing for memory or disk. Now, it is hard to tell since I don't know if the figures are real time, or CPU time. It would be nice to have the real time, but also the user time and system time since that helps identify if this is an OS issue (e.g., thrashing on disk) or other issues. - Bad use of locks that prevents the system from leaving the pathological state once it's there. - The ::addRef() and ::lock() and ::unlock() has risen together. Since I cannot envision that the addRef() does anything complicated that can cause an increase in CPU usage, I draw the conclusion that the the figures are based on the real time, not CPU time. This hints at some form of memory (or disk) convoying/caravaning where the same memory is being accessed by several threads. The relatively small increase suggests that it is memory that is the accessed resource, not disk. - The fact that some record manipulation functions has entered the arena hints at memory (or disk) convoying/caravaning as well. - This is not related to the problem at hand, but what on earth are the Bitmap manipulation functions doing in the list? They should be inlined because they are usually small and executed often. Are the definitions available in the header files so that the compiler has a chance to inline them? - Also note that the index scanning function have appeared at the list. If it is a compiler problem, you are likely to see a repeatable degradation. Compilers are pretty predictable in this aspect: they will follow the same rules when generating the code, so there is no randomness involved. Considering that you have supplied -O3, I would rule out code generation issues as a performance degradation of this level.
[1 Jul 2008 19:16]
Kelly Long
---- summary ----- I did some comparisons on this to find the changeset that caused the slow down. Vlad will look at this next. The change set causing this slow down: revno: 2597.22.1 revision-id: sp1r-vvaintroub@wva.-20080328064007-14330 parent: sp1r-hakank/hakan@lu0011.(none)-20080322192653-14335 committer: vvaintroub@wva. timestamp: Fri 2008-03-28 07:40:07 +0100 message: WL#3763: Supernodes - If insert is done in decreasing order of values, no supernodes were created. Now fixed - if index node is inserted at the start of the page, next node can become supernode, if the gap between the start of the page and first supernode is too large. - Allow supernode on duplicate keys. - Do not always delete supernode if previous node is deleted, but check whether saving it makes sense. ---- detail ----- Below you'll find a list of revision ids followed by one of the results from the sysbench test. All results are similar to the one I choose to list. For the revisions I tested there is a line below the revision with the results. There is also a degradation issue as the test continues. You can see this in the locations I ran the test 3 times witout restarting the database. The results below a "no supernodes" comment are a slightly modifed code base where I turned off the supernodes (always return false in the checkAddSupernode() routine). cd mysql-6.0 bzr log --show-ids | egrep 'revision-id:|tags:' > /tmp/bzr-6.0-logs-show-ids.2 revision-id: sp1r-cbell/Chuck@mysql_cab_desk.-20080411171935-14416 # transactions: 31029 (43.08 per sec.) revision-id: sp1r-hakank/hakan@lu0011.(none)-20080410190122-10487 --- cut --- revision-id: sp1r-hakank/hakan@lu0011.(none)-20080330122626-54363 revision-id: sp1r-hakank/hakan@lu0011.(none)-20080330095349-54376 # transactions: 30751 (42.69 per sec.) revision-id: sp1r-hakank/hakan@lu0011.(none)-20080330094706-53538 revision-id: sp1r-hakank/hakan@lu0011.(none)-20080330090700-53665 revision-id: sp1r-hakank/hakan@lu0011.(none)-20080330095259-57282 revision-id: sp1r-cpowers@xeno.mysql.com-20080330072429-57288 revision-id: sp1r-cpowers@xeno.mysql.com-20080329182218-57524 revision-id: sp1r-cpowers@xeno.mysql.com-20080328232654-57519 revision-id: sp1r-cpowers@xeno.mysql.com-20080328224644-57519 revision-id: sp1r-cpowers@xeno.mysql.com-20080328224436-22681 revision-id: sp1r-hakank/hakan@lu0011.(none)-20080328201039-56444 # transactions: 96533 (134.06 per sec.) # transactions: 61482 (85.39 per sec.) # transactions: 45112 (62.63 per sec.) ###### no supernodes # transactions: 128358 (178.27 per sec.) # transactions: 128162 (178.00 per sec.) # transactions: 129308 (179.59 per sec.) revision-id: sp1r-hakank/hakan@lu0011.(none)-20080328183802-56447 revision-id: sp1r-hakank/hakan@lu0011.(none)-20080328182226-61590 revision-id: sp1r-hakank/hakan@lu0011.(none)-20080328175251-56251 revision-id: sp1r-hakank/hakan@lu0011.(none)-20080328174917-56243 revision-id: sp1r-hakank/hakan@lu0011.(none)-20080328171351-61576 revision-id: sp1r-hakank/hakan@lu0011.(none)-20080328164232-03126 revision-id: sp1r-klewis@klewis-mysql.-20080328153527-21606 # transactions: 78300 (108.74 per sec.) # transactions: 41569 (57.73 per sec.) # transactions: 35202 (48.88 per sec.) ###### no supernodes 20080328153527 # transactions: 128828 (178.92 per sec.) # transactions: 128891 (179.01 per sec.) # transactions: 129209 (179.44 per sec.) revision-id: sp1r-vvaintroub@wva.-20080328065431-25981 revision-id: sp1r-vvaintroub@wva.-20080328064007-14330 revision-id: sp1r-klewis@klewis-mysql.-20080328042148-22090 # transactions: 142006 (197.21 per sec.) revision-id: sp1r-hakank/hakan@lu0011.(none)-20080327203449-26939 revision-id: sp1r-hakank/hakan@lu0011.(none)-20080327114539-60369 revision-id: sp1r-hakank/hakan@lu0011.(none)-20080326193327-64165 --- cut --- revision-id: sp1r-hakank/hakan@lu0011.(none)-20080313141037-15204 revision-id: sp1r-cbell/Chuck@mysql_cab_desk.-20080407153214-28468 # transactions: 152292 (211.50 per sec.)
[21 Jul 2008 15:12]
Philip Stoev
Here are some more results with a freshly cloned mysql-6.0-falcon: no supernodes: transactions: 18326 (61.04 per sec.) deadlocks: 7969 (26.54 per sec.) read/write requests: 475194 (1582.86 per sec.) other operations: 44621 (148.63 per sec.) with supernodes: transactions: 18259 (60.81 per sec.) deadlocks: 8123 (27.05 per sec.) read/write requests: 476286 (1586.17 per sec.) other operations: 44641 (148.67 per sec.) Notice there is no difference in the results and that they are somewhere between the "good" and the "bad" values reported initially in this report. All tests were done on the same machine.
[21 Jul 2008 17:07]
Kevin Lewis
Philip, Please try --falcon_user_supernodes=0 instead of --falcon_user_supernodes=off.
[21 Jul 2008 17:27]
Philip Stoev
Result with --falcon_use_supernodes=0 --mysqld=--skip-falcon-use-supernodes: transactions: 18390 (61.26 per sec.) deadlocks: 7997 (26.64 per sec.) read/write requests: 476912 (1588.58 per sec.) other operations: 44777 (149.15 per sec.)
[21 Jul 2008 17:34]
Philip Stoev
Same results were observed when running with just --falcon_use_supernodes=0
[21 Jul 2008 17:38]
Kevin Lewis
Kelly, Please take another look.
[21 Jul 2008 17:57]
Philip Stoev
A 720 second test with --falcon_use_supernodes=0 (previous tests were 600-second): transactions: 28323 (39.32 per sec.) deadlocks: 12303 (17.08 per sec.) read/write requests: 734253 (1019.45 per sec.) other operations: 68949 (95.73 per sec.)
[14 Nov 2008 11:19]
Philip Stoev
Sysbench "classic" 0.4.8 was used for this bug, as downloadable from http://sourceforge.net/projects/sysbench , test name OLTP.
[14 Nov 2008 12:18]
Philip Stoev
Command lines: ./mysql-test-run.pl \ --start-and-exit \ --mysqld=--log-output=file \ --mysqld-falcon_lock_wait_timeout=0 ./sysbench \ --test=oltp \ --mysql-host=localhost\ --mysql-port=9306 \ --mysql-user=root \ --mysql-db=test \ --mysql-table-engine=Falcon \ --mysql-engine-trx=yes \ --oltp-table-size=100000 \ prepare ./sysbench \ --test=oltp \ --mysql-host=localhost \ --mysql-port=9306 \ --mysql-user=root \ --mysql-db=test \ --mysql-table-engine=Falcon \ --mysql-engine-trx=yes \ --max-requests=0 \ --max-time=720 \ --num-threads=10 \ run