MySQL Bugs: #37844: Fix SMP performance for the query cache

Bug #37844	Fix SMP performance for the query cache
Submitted:	3 Jul 2008 17:58	Modified:	7 Jul 2017 9:44
Reporter:	Mark Callaghan	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: Query Cache	Severity:	S5 (Performance)
Version:	5.0.62,5.1.25	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any
Tags:	cache, multicore, performance, query, SMP

Description:
Only 1 thread can search the query cache at a time. A pthread mutex is held while it is searched. This will limit performance on multi-core servers.

The problems include:
* the search key is initialized after the mutex is locked. This increases contention.
* for 5.1.25, pthread_mutex_lock is used and many threads will sleep when there is contention
* for 5.0.62 an attempt was made to use a spin lock, but the spin lock has a few problems (the code is not encapsulated, it calls my_clock() which ignores error return values from times(), it isn't a spin lock as there is no spinning -- sleep() is called immediately after each failure to get the lock)

How to repeat:
Read Query_cache::send_result_to_client

Suggested fix:
Partition the query cache into N pieces with 1 mutex per piece.
Use a proper spin lock with an encapsulated implementation.
Don't ignore error return values from system calls.
Limit the duration for which the mutex is locked.

WL#1468

Thank you for the bug report.

Would be great if two or more threads could be returning difference LIMIT/OFFSETS off the same cache query. See Bug #18707

Some notes:
A plugin frame work has already been created by Mattias J and Mikael R which could be the foundation for new implementations.

One of the biggest issues with performance of the QC happens when a lot of memory prunes are executed. This leads to heavy fragmentation which in turn leads to a lot of linear searches in the memory bins. The hash delete operation is not cheap either.

We can increase the performance of hash delete, introduce automatic de-fragmentation, remove the linear search in the memory allocation layer and ensure that similar cachable queries can queue up instead of be executed concurrently in the SE.

To increase throughput on high concurrency servers we would probably need a new design which allows for more threads to concurrently serve queries from the cache. Possibly the old design can be used if we just created more QC instances and used a consistent hash to select which one to use for a specific query.

Kristofer, thanks. Yes several different issues.

Today the walking of the deleted block list seems to be causing less issues than the general cost of using the qc that Mark was looking at. This doesn't mean that it's not a problem, just that CPU core counts are changing and other problems are starting to become more popular. On the support side we deal with the deleted block size issue by cutting qc RAM allocation and/or suggesting flushing it to defragment the deleted block list. Might be an idea to automate the flushing operation? The fixes you've done mostly eliminate the big problems (server freezing) we had in the past with long deleted block lists, leaving just performance issues.

Increased CPU counts and the bad performance interaction with slave RBR (choose RBR or qc, not both if you do a lot of updates, seems to be required) are the issues that worry me most at the moment when it comes to qc. RBR is probably something that would take working with the replication team to try to work out what's happening and how to fix it.

It'll be interesting to see what comes out of a really thorough analysis and architecture review and testing of possible solutions for the qc.

Thinking more about using more qc instances, that does seem like a good idea. Just assign a connection to a qc. The qc benefits typically come mostly in the first 10-20 megabytes so it'd be easy to allocate that a few times. It's not uncommon for the support team to see a 200M qc with 150M free and it's easy enough to allocate a few 50M qc instances if 20M isn't enough. Lots of duplicate query storage but that's OK.

If anyone doesn't understand why duplicate storage is OK but speed is very important, it's because qc is done before other query work and it's top priority is checking whether it has a query result quickly, else getting out of the way so it doesn't slow down the full query processing work. It's only reason for existing is to be very fast at returning a result. Though it also keeps some work away from other bottlenecks in the server or storage engines, which is one reason why just getting rid of it isn't a good idea if it can stay fast enough.

Possibly of interest is this benchmarking with dbstress where these two query cache mutexes show up as hot sometimes even though the query cache is disabled:

wait/synch/cond/sql/Query_cache::COND_cache_status_changed
wait/synch/mutex/sql/Query_cache::structure_guard_mutex

http://dimitrik.free.fr/blog/archives/2010/05/mysql-performance-using-performance-schema.h...

query_cache_type wasn't set to 0 in that test. Should always set query_cache_type to 0 in 5.4 and later if there's no need for the query cache.

Bug#47529 has been closed as a duplicate of this one.

See also bug #64924

MySQL will no longer invest in the query cache, see:

http://mysqlserverteam.com/mysql-8-0-retiring-support-for-the-query-cache/