MySQL Bugs: #60696: invalidating query cache entries (table) hangs replication

Bug #60696	invalidating query cache entries (table) hangs replication
Submitted:	30 Mar 2011 8:33	Modified:	7 Jul 2017 9:08
Reporter:	Alan Hollis	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: Query Cache	Severity:	S2 (Serious)
Version:	5.5.8, 5.5.20	OS:	Any (Windows server 2008r2, Linux)
Assigned to:		CPU Architecture:	Any
Tags:	query cache, replication

Description:
Show process list shows.

mysql> show processlist \G
*************************** 2. row ***************************
     Id: 7929352
   User: system user
   Host:
     db: NULL
Command: Connect
   Time: 69899
  State: invalidating query cache entries (table)
   Info: NULL

replication shows

 Seconds_Behind_Master: 70429

Once the invalidating query cache entries (table) process has finished the slave catches up.

How to repeat:
I believe this might have something to do with row based replication of large tables.

Rows: ~49,917,839	InnoDB	latin1_swedish_ci	Size: 28.7 GiB

I've now yet been able to repeat with 100% accuracy.
Any help as to how best to test my theory would be greatly appreciated.

Suggested fix:
I've switched to statement based replication now to see if that resolves the issue.

Please, send the output of:

show variables like 'query_%';
show status like 'qcache%';

As requested. The process list is still showing the invalidating query cache at this time too.

mysql> show variables like 'query_%';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| query_alloc_block_size       | 8192    |
| query_cache_limit            | 1048576 |
| query_cache_min_res_unit     | 4096    |
| query_cache_size             | 0       |
| query_cache_type             | OFF     |
| query_cache_wlock_invalidate | OFF     |
| query_prealloc_size          | 8192    |
+------------------------------+---------+
7 rows in set (0.01 sec)

mysql> show status like 'qcache%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Qcache_free_blocks      | 0     |
| Qcache_free_memory      | 0     |
| Qcache_hits             | 0     |
| Qcache_inserts          | 0     |
| Qcache_lowmem_prunes    | 0     |
| Qcache_not_cached       | 0     |
| Qcache_queries_in_cache | 0     |
| Qcache_total_blocks     | 0     |
+-------------------------+-------+
8 rows in set (0.03 sec)

Sorry, missed GLOBAL keyword in the statements. Are the results any different with SHOW GLOBAL..?

No problem.

mysql> show global status like 'qcache%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Qcache_free_blocks      | 0     |
| Qcache_free_memory      | 0     |
| Qcache_hits             | 0     |
| Qcache_inserts          | 0     |
| Qcache_lowmem_prunes    | 0     |
| Qcache_not_cached       | 0     |
| Qcache_queries_in_cache | 0     |
| Qcache_total_blocks     | 0     |
+-------------------------+-------+
8 rows in set (0.01 sec)

mysql> show global variables like 'query_%';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| query_alloc_block_size       | 8192    |
| query_cache_limit            | 1048576 |
| query_cache_min_res_unit     | 4096    |
| query_cache_size             | 0       |
| query_cache_type             | OFF     |
| query_cache_wlock_invalidate | OFF     |
| query_prealloc_size          | 8192    |
+------------------------------+---------+
7 rows in set (0.02 sec)

Do you use ROW or MIXED mode replication? See bug #38551 (must be fixed in 5.5.8, but anyway).

Also, check INNODB STATUS for thread with this status reported. See Bug #59899. Finally there is not yet solved bug #53375.

It was ROW based replication when the problem was occuring. I've now set this to mixed to see if this stops the problem as a workaround.

I'm discounting #38551 as this happened more than once since after turning the query cache off.

Unfortunately the thread finally ended so I can't checkout whether #59899 is related or not.

#53375 seems to suggest this being a consequence of primary keys. Everyone of the tables in the database has a primary key.

Before it died the query had been going for a very long time, the last time I did a process list showed.

 7929352 | system user |                                | NULL | Connect     |   65039 | invalidating query cache entries (table)

Please let me know if I can be any more help. I'm more than happy to change back to row based replication and wait for this to happen again, as I've taken other work around to make sure this isn't an issue.

If you'll ever see this again, please, get INNODB STATUS while thread is hanging in this state.

Also, please, send SHOW CREATE TABLE for all tables used in the query that caused this hang, if possible.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

Okay, it's taken a long time to happen again. But it's happened again.

=====================================
110621  9:44:07 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 49 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 98256 1_second, 98253 sleeps, 8937 10_second, 38504 background, 38480 flush
srv_master_thread log flush and writes: 102851
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 99157, signal count 98161
Mutex spin waits 55615, rounds 1687416, OS waits 2257
RW-shared spins 48751, rounds 1513582, OS waits 47860
RW-excl spins 43794, rounds 1471200, OS waits 46372
Spin rounds per wait: 30.34 mutex, 31.05 RW-shared, 33.59 RW-excl
------------
TRANSACTIONS
------------
Trx id counter B3ACF9AD
Purge done for trx's n:o < B3ACF8CF undo n:o < 0
History list length 2694
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 1477, query id 66550116 localhost ::1 root
SHOW ENGINE INNODB STATUS
---TRANSACTION B3ACF8CD, not started
MySQL thread id 2, query id 66550116 invalidating query cache entries (table)
--------
FILE I/O
--------
I/O thread 0 state: wait Windows aio (insert buffer thread)
I/O thread 1 state: wait Windows aio (log thread)
I/O thread 2 state: wait Windows aio (read thread)
I/O thread 3 state: wait Windows aio (read thread)
I/O thread 4 state: wait Windows aio (read thread)
I/O thread 5 state: wait Windows aio (read thread)
I/O thread 6 state: wait Windows aio (write thread)
I/O thread 7 state: wait Windows aio (write thread)
I/O thread 8 state: wait Windows aio (write thread)
I/O thread 9 state: wait Windows aio (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
167027 OS file reads, 31194752 OS file writes, 294653 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 35765, seg size 35767, 104991 merges
merged operations:
 insert 4327531, delete mark 2559019, delete 1087032
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 10624987, node heap has 89579 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 1004414414280
Log flushed up to   1004414414280
Last checkpoint at  1004414414280
0 pending log writes, 0 pending chkp writes
25275000 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 5494538240; in additional pool allocated 0
Dictionary memory allocated 2076651
Buffer pool size   327680
Free buffers       0
Database pages     238101
Old database pages 87872
Modified db pages  0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 41414, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 167016, created 197710, written 5792316
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s
LRU len: 238101, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread id 1900, state: waiting for server activity
Number of rows inserted 5042373, updated 8790786, deleted 5577, read 26550813
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

I'm not sure how I find out which query is causing this? Help would be greatly appreciated!

Just to add, I've upgraded this server to 5.5.13-log.

It also hangs in invalidating on gcov built/linux: http://pb2.norway.sun.com/?template=mysql_show_test_failure&test_failure_id=3603326.

So I've found something interesting. If I do a mysql dump on one database in particular the invalidating query cache gets put into a waiting for table level lock.

So I'm assuming it must be some query I'm running on a table in that database? I'll investigate further.

Thank you for the feedback.

Please update us with results of your investigation or, at least, send us dump (maybe mysqldump --no-data) of involved tables, so we can try to recreate this on our side too.

Hello,
   I'm getting the same error.  When you run a show processlist it's always in the "invalidating query cache entries (table)" state.  

I've actually turned off the inserts to the master such that the database should be static, yet the master-bin log file continues to increase in size.
The slave continues to move forward on the position, but not fast enough to catch-up to the master.

   My assumption is that each time it reads a position or several positions from the binlog it has to go and invalidate the query cache entry table due to updates, and that process is super-expensive.

   Is there a way to just clear the cache table entries for all updates from a binlog file if it's behind in time?

slave info:
os version: redhat linux 
mysql v5.5.15-log
binlog format= mixed
file: master-bin.000426   pos: 353,906,762

id    | user        | host | db    | Command   |  Time  | State                                     | Info
18796 | system user |      |  NULL |  Connect  |  802   | Waiting for master to send event          | NULL
18797 | system user |      |  NULL |  Connect  |  6265  | invalidating query cache entries (table)  | NULL

master info:
os version: redhat linux 
mysql v5.5.15-log.
binlog format= mixed
file:  master-bin.000426  pos: 610,803,498

id    | user        | host      | db    | Command     |  Time  | State                                                                  | Info
5566  | repl        | localhost | NULL  | Binlog dump |  1336  | Master has sent all binlog to slave; waiting for binlog to be updated  | NULL

Please, check if the same problem still happens with a newer version, 5.5.20.

Yes, I see the same behavior on my newly upgraded 5.5.20.

Bug #64266 was marked as a duplicate of this one.

I have the same problem on my system and have the following pt-pmp output:

pt-pmp -p $(pidof mysqld); mysql -uroot -e "show processlist;" 
Wed Jun 13 00:04:07 PDT 2012

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fff915fd000
      9 pthread_cond_wait,one_thread_per_connection_end,handle_one_connection,start_thread,clone
      4 pthread_cond_wait,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,::??,start_thread,clone
      1 select,os_thread_sleep,srv_monitor_thread,start_thread,clone
      1 select,os_thread_sleep,srv_master_thread,start_thread,clone
      1 select,os_thread_sleep,srv_lock_timeout_thread,start_thread,clone
      1 select,os_thread_sleep,srv_error_monitor_thread,start_thread,clone
      1 select,handle_connections_sockets,main
      1 read,vio_read_buff,::??,my_net_read,cli_safe_read,handle_slave_io,start_thread,clone
      1 read,::??,my_net_read,do_command,handle_one_connection,start_thread,clone
      1 pread64,::??,os_file_read,fil_io,::??,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,::??,row_search_for_mysql,ha_innobase::general_fetch,Rows_log_event::find_row,Delete_rows_log_event::do_exec_row,Rows_log_event::do_apply_event,apply_event_and_update_pos,handle_slave_sql,start_thread,clone
      1 do_sigwait,sigwait,signal_hand,start_thread,clone
+-----+-------------+-------------------+------------+---------+--------+------------------------------------------+------------------+
| Id  | User        | Host              | db         | Command | Time   | State                                    | Info             |
+-----+-------------+-------------------+------------+---------+--------+------------------------------------------+------------------+
|   1 | system user |                   | NULL       | Connect |    600 | Waiting for master to send event         | NULL             |
|   2 | system user |                   | NULL       | Connect | 106673 | invalidating query cache entries (table) | NULL             |
|  78 | somedatabase         | x.x.x.x:33934 | sa1reports | Sleep   |    306 |                                          | NULL             |
| 183 | root        | localhost         | NULL       | Query   |      0 | NULL                                     | show processlist |
+-----+-------------+-------------------+------------+---------+--------+------------------------------------------+------------------+

Not sure if this will help but I am on version 5.1.60-log x86_64.

Thank you for the feedback.

"Invalidating query cache" status can be separate bug only if query_cache_size and query_cache_type is 0. If it is ON and queries are really in cache it should be cleaned.

I tend to think this bug has same source of issues as bug #65336: I could only repeat call which leads to status described when I started slave with option --query_cache-type=1, then changed it to OFF online. In other cases slave did not enter this option and this bug could not occur.

Please check if you either have query_cache_type ON or 1. If SHOW GLOBAL VARIABLES shows query_cache_type is OFF, please ensure configuration file contains option query_cache_type=0.

Please inform us about results of your checking.

To test this I created MTR test case like below:

--source include/master-slave.inc
--source include/have_innodb.inc

connection master;
set binlog_format='row';
create table t1(f1 text) engine=innodb;
insert into t1 values(md5(1));
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;

sleep 1;
connection slave;

select * from t1;

connection master;
insert into t1 select * from t1;

sleep 1;
connection slave;
show variables like 'query_cache%';

Then attached a debugger to slave and added following breakpoints:

Rows_log_event::do_apply_event
Query_cache::invalidate_locked_for_write 
Query_cache::invalidate_table
sql_cache.cc:1892

With statement-based format invalidate_locked_for_write, which sets problem status, was not called.

After some thinking considered to verify it now, because:

1. Problem is only repeatable with row-based replication
2. According to http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_query_cache_wlo...: "Normally, when one client acquires a WRITE lock on a MyISAM table, other clients are not blocked from issuing statements that read from the table if the query results are present in the query cache. Setting this variable to 1 causes acquisition of a WRITE lock for a table to invalidate any queries in the query cache that refer to the table. This forces other clients that attempt to access the table to wait while the lock is in effect. " lock should happen only for MyISAM tables and only if option query_cache_wlock_invalidate set to ON. But never for InnoDB tables and if query_cache_wlock_invalidate is OFF (default).
3. Why query cache is involved at all is same issue as bug #65336

Today we hit such issue again, caused the slave large lag.

I looked into the source again, and not puzzled the "invalidating query cache entries (table)" any more. Current implement of QC has flaws, which can refer to:
http://bugs.mysql.com/bug.php?id=65336
https://bugs.launchpad.net/percona-server/5.5/+bug/1021131

For the high workload slaves, it's more obvious to catch such issue.

MySQL will no longer invest in the query cache, see:

http://mysqlserverteam.com/mysql-8-0-retiring-support-for-the-query-cache/