Bug #39253 | Large query cache still freezes server after fix for bug #21074 | ||
---|---|---|---|
Submitted: | 4 Sep 2008 18:47 | Modified: | 12 Mar 2010 17:49 |
Reporter: | Sveta Smirnova | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Query Cache | Severity: | S2 (Serious) |
Version: | 5.1 bzr | OS: | Any |
Assigned to: | Kristofer Pettersson | CPU Architecture: | Any |
Tags: | performance |
[4 Sep 2008 18:47]
Sveta Smirnova
[4 Sep 2008 18:47]
Sveta Smirnova
Link to the original test: http://bugs.mysql.com/file.php?id=7219 Shane posted it privately for some reason.
[23 Apr 2009 16:48]
Konstantin Osipov
Kristofer, can we discuss on IRC what's causing this? Assigning to you so that you just see this message.
[28 Apr 2009 10:24]
Kristofer Pettersson
Progress notes I've ran the test a few times. The last time I also created an artificial wait of 6.5s on each UPDATE induced query cache invalidation. The resulting graph can be viewed in an attachment. I don't see any evidence of a systematic error like the one in 21074, ie there is no hang among the select threads. The SELECT intensity drops but that is probably CPU-bound on my computer rather than a result of any threads being stuck. I was expecting to see a 6+s window in the select count graph but there isn't any indication of that.
[28 Apr 2009 10:24]
Kristofer Pettersson
Queries in qcache
Attachment: queries_in_qcache.png (image/png, text), 14.04 KiB.
[28 Apr 2009 10:25]
Kristofer Pettersson
SELECT intensity
Attachment: SELECT-intensity-qcache.png (image/png, text), 27.31 KiB.
[28 Apr 2009 14:39]
Kristofer Pettersson
Simple dtracing produces an average 0.02s execution time on mysql_parse with only a very few 0.7s and 1.3s worst case scenarios (less than 10 in 240M). #!/usr/sbin/dtrace -qws pid$target::*mysql_parse*:entry { self->ts= timestamp; } pid$target::*mysql_parse*:return /self->ts != 0/ { printf("%u\n", timestamp - self->ts); self->t = 0; }
[28 May 2009 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[29 May 2009 5:10]
Sveta Smirnova
Kristofer, which feedback do you need?
[29 Jun 2009 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[3 Jul 2009 0:13]
James Day
Kristofer, please say why you have changed this bug from verified to can't repeat. Do you need Sveta to help you find a test case that demonstrates the problem?
[6 Jul 2009 12:46]
Kristofer Pettersson
Yes please, submit a test case I can use.
[7 Jul 2009 6:05]
Sveta Smirnova
tests used
Attachment: sources.tar (application/x-tar, text), 70.00 KiB.
[7 Jul 2009 6:20]
Sveta Smirnova
Tests used uploaded. Measured blocking time INSERT and UPDATE. Results will be uploaded. See second column "selects since last printout" Sometimes it decreases to 0: 1992.389771,0, 401017, ... 1995.390991,0, 519168, ...
[7 Jul 2009 6:51]
Sveta Smirnova
test result with 50mb query cache
Attachment: out_witqc2g.tar.gz (application/x-gzip, text), 161.76 KiB.
[7 Jul 2009 6:52]
Sveta Smirnova
Sorry, 500 MB query cache
[7 Jul 2009 7:47]
Kristofer Pettersson
Locked select threads
Attachment: bild.png (image/png, text), 21.04 KiB.
[7 Jul 2009 7:49]
Kristofer Pettersson
Non-linear invalidation graph
Attachment: bild(2).png (image/png, text), 13.28 KiB.
[7 Jul 2009 8:26]
Kristofer Pettersson
Query cache free memory graph with spikes
Attachment: bild(3).png (image/png, text), 15.39 KiB.
[7 Jul 2009 8:48]
Kristofer Pettersson
select intesity in corralation to free cache memory
Attachment: bild(5).png (image/png, text), 30.64 KiB.
[14 Jul 2009 16:26]
Sveta Smirnova
results for 1G qc with 5.1-bugteam
Attachment: 20090714_log_1GB.txt (text/plain), 288.21 KiB.
[14 Jul 2009 16:26]
Sveta Smirnova
results for 500MB qc with 5.1-bugteam
Attachment: 20090714_log_500MB.txt (text/plain), 287.99 KiB.
[14 Jul 2009 16:28]
Sveta Smirnova
Just tested with 5.1-bugteam tree. Works much better, although has such periods in case of update: 2797.455322,15728, ... 2800.457275,0, ... 2803.459473,14884, ...
[15 Jul 2009 7:39]
Sveta Smirnova
chart with inserts and qc=500 MB
Attachment: inserts_500MB.png (image/png, text), 14.25 KiB.
[15 Jul 2009 7:40]
Sveta Smirnova
chart with updates and qc=500 MB
Attachment: updates_500MB.png (image/png, text), 14.74 KiB.
[15 Jul 2009 7:41]
Sveta Smirnova
chart with inserts and qc=1 GB
Attachment: inserts_1GB.png (image/png, text), 14.22 KiB.
[15 Jul 2009 7:42]
Sveta Smirnova
chart with updates and qc=1 GB
Attachment: updates_1GB.png (image/png, text), 14.25 KiB.
[15 Jul 2009 7:46]
Sveta Smirnova
Charts are at: 500MB updates: http://spreadsheets.google.com/pub?key=tfVT1Ym3da9JHiYDFogMWIQ&oid=1&output=image 500MB inserts: http://spreadsheets.google.com/pub?key=tcSrFLCG-XKY6kC0Ym2FQ_A&oid=1&output=image 1GB updates: http://spreadsheets.google.com/pub?key=ti5iYuXGDzpI29eXPQ_4NMQ&oid=1&output=image 1GB inserts: http://spreadsheets.google.com/pub?key=tAXboRCwk5urcZRYXyUyXaQ&oid=1&output=image
[15 Jul 2009 7:55]
Sveta Smirnova
chart with updates and qc=500 MB - better format
Attachment: updates_500MB.png (image/png, text), 19.98 KiB.
[15 Jul 2009 7:56]
Sveta Smirnova
chart with updates and qc=1 GB - better format
Attachment: updates_1GB.png (image/png, text), 19.72 KiB.
[15 Jul 2009 7:59]
Sveta Smirnova
Problem is some period with 0 selects during 3 seconds.
[15 Jul 2009 7:59]
Sveta Smirnova
chart with inserts and qc=500 MB - better format
Attachment: inserts_500MB.png (image/png, text), 20.32 KiB.
[15 Jul 2009 8:00]
Sveta Smirnova
chart with inserts and qc=1 GB - better format
Attachment: inserts_1GB.png (image/png, text), 19.65 KiB.
[29 Jul 2009 21:46]
Sveta Smirnova
opcontrol output for 500MB query cache
Attachment: 20090729_opgrof_500MB.out.txt (text/plain), 153.42 KiB.
[29 Jul 2009 21:46]
Sveta Smirnova
opcontrol output for 1G query cache (+500MB I think)
Attachment: 20090729_opgrof_1G.out.txt (text/plain), 157.95 KiB.
[29 Jul 2009 21:59]
Sveta Smirnova
To get oprofile output tested mysql-5.1-bugteam compiled with debug=full. Results are worse than before for mysql-5.1-bugteam. New results will be uploaded shortly.
[29 Jul 2009 22:00]
Sveta Smirnova
results
Attachment: 20090729.results.tar.gz (application/x-gzip, text), 175.80 KiB.
[29 Jul 2009 22:01]
Sveta Smirnova
chart with inserts and qc=1 GB
Attachment: 20090729_1G.inserts.png (image/png, text), 19.75 KiB.
[29 Jul 2009 22:02]
Sveta Smirnova
chart with updates and qc=1 GB
Attachment: 20090729_1G.updates.png (image/png, text), 20.49 KiB.
[29 Jul 2009 22:02]
Sveta Smirnova
chart with inserts and qc=500 MB
Attachment: 20090729_500MB.inserts.png (image/png, text), 20.01 KiB.
[29 Jul 2009 22:03]
Sveta Smirnova
chart with updates and qc=500 MB
Attachment: 20090729_500MB.updates.png (image/png, text), 20.06 KiB.
[14 Aug 2009 15:26]
Kristofer Pettersson
Progress report. The invalidation performance is two fold: 1. The invalidation execution path seems slower than it needs to be. 2. As the QC invalidate a subset of queries, all other threads need to wait. This means that any performance issue in (1) will be multiplied by the number of waiting threads. The wait is a regression compared to the patch in 21074 as a result of fixing bug43758 which enabled us to refactor the lock API to use a consistent approach to locking the QC. To find hotspots in the invalidation path I've used dtrace sampling on a test case invalidating 1 GB of data from a single table. From this I see that my_hash_sort_bin() and my_hash_delete() are potential bottle necks. my_hash_sort_bin() is used when calculating the hash for a binary string and the function used comes from the strings/ directory. A close inspection on my_hash_delete() using google perf tools doesn't give away too many clues though (see attachment). It also seems likely that the linear search performed within each memory bin can be inefficient with certain load patterns. I wasn't able to show this with my test case though. Unfortunately I turns out that gprof doesn't work in macosx and for some reason and Shark refuses to sample a running mysqld process. I will have to switch to ubuntu to continue with some alternative profiling.
[14 Aug 2009 15:28]
Kristofer Pettersson
google-perftool-on-hash-test
Attachment: my_hash_delete-profile-release-build.pdf (application/pdf, text), 22.79 KiB.
[9 Sep 2009 8:43]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/82773 3048 Kristofer Pettersson 2009-09-09 Bug#39253 Large query cache still freezes server after fix for bug #21074 During a cache invalidation the server will be blocked on a mutex waiting for the query cache to finish. This patch reduces the time for this wait by making hash delete operations more efficient. @ sql/qc_hash.cc * Implemented new hash table based on chained hash entries and Jenkins hash function. - The hash entry is stored in the Query cache blocks to avoid internal memory allocations. - Delete is now a simple pointer operation instead of a linear search. - Hash search reorders hash entries by attempting to move the most accessed entry first in the chain. @ sql/qc_hash.h * Implemented new hash table based on chained hash entries and Jenkins hash function @ sql/sql_cache.cc * Implemented new hash table based on chained hash entries and Jenkins hash function - Replaced free_query() with dettach() and free_query_internal() @ sql/sql_cache.h * Implemented new hash table based on chained hash entries and Jenkins hash function @ sql/sql_plist.h * back port of double linked list from alazea tree.
[13 Nov 2009 19:18]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/90392 3192 Kristofer Pettersson 2009-11-13 Bug#39253 Large query cache still freezes server after fix for bug #21074 When the query cache is fragmented, the size of the free block lists in the memory bins grow which causes the query cache invalidation to become slow. Even if this is countered by defragmentation the hash_delete() operation still creates a bottle necks which can cause the server to hang for a short period of time depending on the size of the cache. This patch introduce a limit on the time the qc can block with a lock on SELECTs. Other operations which causes a change in the table data will still be blocked. @ sql/sql_cache.cc * Introduced a timeout value for the qc lock when entering send_result_to_client() and store_query() methods. @ sql/sql_cache.h * New signature for Query_cache::try_lock()
[20 Nov 2009 12:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/91071 3191 Kristofer Pettersson 2009-11-20 Bug#39253 Large query cache still freezes server after fix for bug #21074 This patch introduce a limit on the time the query cache can block with a lock on SELECTs. Other operations which causes a change in the table data will still be blocked. @ sql/sql_cache.cc * Introduced a timeout value for the qc lock when entering send_result_to_client() and store_query() methods. @ sql/sql_cache.h * New signature for Query_cache::try_lock()
[2 Dec 2009 8:08]
Bugs System
Pushed into 5.1.42 (revid:joro@sun.com-20091202080033-mndu4sxwx19lz2zs) (version source revid:davi.arnaut@sun.com-20091125130912-d7hrln14ef7y5d7i) (merge vers: 5.1.42) (pib:13)
[3 Dec 2009 2:34]
Paul DuBois
Noted in 5.1.42 changelog. When the query cache is fragmented, the size of the free block lists in the memory bins grows, which causes query cache invalidation to become slow. Setting report to NDI pending push into 5.6.x+.
[3 Dec 2009 16:34]
Paul DuBois
Revised changelog entry: When the query cache is fragmented, the size of the free block lists in the memory bins grows, which causes query cache invalidation to become slow. There is now a 50ms timeout for a SELECT statement waiting for the query cache lock. If the timeout expires, the statement executes without using the query cache.
[16 Dec 2009 8:39]
Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091216083311-xorsasf5kopjxshf) (version source revid:alik@sun.com-20091214191830-wznm8245ku8xo702) (merge vers: 6.0.14-alpha) (pib:14)
[16 Dec 2009 8:46]
Bugs System
Pushed into 5.5.0-beta (revid:alik@sun.com-20091216082430-s0gtzibcgkv4pqul) (version source revid:alexey.kopytov@sun.com-20091124083136-iqm136jm31sfdwg3) (merge vers: 5.5.0-beta) (pib:14)
[16 Dec 2009 8:52]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20091216083231-rp8ecpnvkkbhtb27) (version source revid:alik@sun.com-20091212203859-fx4rx5uab47wwuzd) (merge vers: 5.6.0-beta) (pib:14)
[16 Dec 2009 15:29]
Paul DuBois
Noted in 5.5.1, 6.0.14 changelogs.
[12 Mar 2010 14:14]
Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:30]
Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:46]
Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[10 May 2010 19:43]
James Day
A crashing server with this bug as likely cause was fixed by upgrading from 5.1.40sp1 to 5.1.46. Two stack traces from the core files generated by the crashes were: [1] 0xfffffd7ffed7c94a(0x10b, 0xb, 0xffffff15f6085580, 0x0, 0x0, 0x0), at 0xfffffd7ffed7c94a =>[2] my_write_core(sig = ???) (optimized), at 0x9d3f19 (line ~310) in "stacktrace.c" [3] my_write_core(sig = ???) (optimized), at 0x9d3f19 (line ~310) in "stacktrace.c" [4] handle_segfault(sig = ???) (optimized), at 0x70ee0e (line ~2570) in "mysqld.cc" [5] collect_a_seq(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed77126 [6] set_lock_byte64(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed6ba22 [7] queue_alloc(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed6bc3e [8] 0xffffffffffffffff(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xffffffffffffffff [9] Query_cache::pack_cache(this = ???) (optimized), at 0x855b95 (line ~3578) in "sql_cache.cc" [10] Query_cache::pack(this = ???, join_limit = ???, iteration_limit = ???) (optimized), at 0x853f58 (line ~1890) in "sql_cache.cc" [11] reload_acl_and_cache(thd = ???, options = ???, tables = ???, write_to_binlog = ???) (optimized), at 0x725b3f (line ~6857) in "sql_parse.cc" [12] mysql_execute_command(thd = ???) (optimized), at 0x71ef55 (line ~3974) in "sql_parse.cc" [13] mysql_parse(thd = ???, inBuf = ???, length = ???, found_semicolon = ???) (optimized), at 0x7241c0 (line ~5963) in "sql_parse.cc" [14] dispatch_command(command = ???, thd = ???, packet = ???, packet_length = ???) (optimized), at 0x71aa58 (line ~1224) in "sql_parse.cc" [15] do_command(thd = ???) (optimized), at 0x719fd2 (line ~865) in "sql_parse.cc" [16] handle_one_connection(arg = ???) (optimized), at 0x718d51 (line ~1127) in "sql_connect.cc" [1] 0xfffffd7ffed7c94a(0x29, 0xb, 0xfffffeae5e2d9460, 0x0, 0x0, 0x0), at 0xfffffd7ffed7c94a =>[2] my_write_core(sig = ???) (optimized), at 0x9d3f19 (line ~310) in "stacktrace.c" [3] my_write_core(sig = ???) (optimized), at 0x9d3f19 (line ~310) in "stacktrace.c" [4] handle_segfault(sig = ???) (optimized), at 0x70ee0e (line ~2570) in "mysqld.cc" [5] collect_a_seq(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed77126 [6] set_lock_byte64(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed6ba22 [7] queue_alloc(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffed6bc3e [8] 0xffffffffffffffff(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xffffffffffffffff [9] Query_cache::free_query_internal(this = ???, query_block = ???) (optimized), at 0x85471f (line ~2264) in "sql_cache.cc" [10] Query_cache::free_query(this = ???, query_block = ???) (optimized), at 0x854755 (line ~2321) in "sql_cache.cc" [11] Query_cache::free_old_query(this = ???) (optimized), at 0x8545fb (line ~2216) in "sql_cache.cc" [12] Query_cache::allocate_block(this = ???, len = ???, not_less = ???, min = ???) (optimized), at 0x855220 (line ~2991) in "sql_cache.cc" [13] Query_cache::allocate_data_chain(this = ???, result_block = ???, data_len = ???, query_block = ???, first_block_arg = ???) (optimized), at 0x854aa0 (line ~2548) in "sql_cache.cc" [14] Query_cache::write_result_data(this = ???, result_block = ???, data_len = ???, data = ???, query_block = ???, type = ???) (optimized), at 0x854978 (line ~2454) in "sql_cache.cc" [15] Query_cache::append_result_data(this = ???, current_block = ???, data_len = ???, data = ???, query_block = ???) (optimized), at 0x854927 (line ~2378) in "sql_cache.cc" [16] query_cache_insert(net = ???, packet = ???, length = ???) (optimized), at 0x85291e (line ~869) in "sql_cache.cc" [17] net_real_write(net = ???, packet = ???, len = ???) (optimized), at 0x705968 (line ~567) in "net_serv.cc" [18] net_flush(net = ???) (optimized), at 0x7054f9 (line ~344) in "net_serv.cc" [19] net_send_eof(thd = ???, server_status = ???, total_warn_count = ???) (optimized), at 0x706686 (line ~230) in "protocol.cc" [20] net_end_statement(thd = ???) (optimized), at 0x70697d (line ~449) in "protocol.cc" [21] dispatch_command(command = ???, thd = ???, packet = ???, packet_length = ???) (optimized), at 0x71b2f0 (line ~1614) in "sql_parse.cc" [22] do_command(thd = ???) (optimized), at 0x719fd2 (line ~865) in "sql_parse.cc" [23] handle_one_connection(arg = ???) (optimized), at 0x718d51 (line ~1127) in "sql_connect.cc" Before upgrading, turning off the query cache was a successful workaround.