Bug #83938 MEM purge of the statements by server table fails causing it to continue growing
Submitted: 23 Nov 2016 22:27 Modified: 24 Nov 2016 11:49
Reporter: Andrew Gothman Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Enterprise Monitor Severity:S3 (Non-critical)
Version:3.2.4 OS:Red Hat
Assigned to: CPU Architecture:Any

[23 Nov 2016 22:27] Andrew Gothman
Description:
We keep having significant bug issues with all of the MEM releases we have tried. For this bug the query analyzer page stopped functioning. I checked the table sizes to see if anything looked strange and found that the mem324__quan.normalized_statements_by_server_by_schema_data table reached a  size of 30,109MB (23G data and about 7G being the indexes) with a row count of 130,409,819 records with only 20 some target databases.

It seems crazy that this purge cannot keep up with 20 target databases when the documentation states a medium MEM installation involves up to 100 mysql target servers. Is this a known bug or is there a MEM configuration that is not mentioned in the documentation that needs to be performed to prevent this delete from failing.

Here's an example of the errors that were written to the log every minute:

TRANSACTION 1493165401, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 848978, OS thread handle 0x7ff521d6a700, query id 226147306 localhost 127.0.0.1 service_manager updating
/* mem dbpool.default */ DELETE FROM `mem324__quan`.`normalized_statements_by_server_by_schema_data` WHERE `timestamp` < 1476855149116 ORDER BY `timestamp` LIMIT 1000

InnoDB: Submit a detailed bug report to http://bugs.mysql.com
2016-11-09 00:15:11 7ff521d6a700  InnoDB: error clustered record for sec rec not found
InnoDB: index `timestamp` of table `mem324__quan`.`normalized_statements_by_server_by_schema_data`
InnoDB: sec index record PHYSICAL RECORD: n_fields 3; compact format; info bits 0
0: len 8; hex 800001576f46d6c0; asc    WoF  ;;
1: len 8; hex 80000000000071cb; asc       q ;;
2: len 16; hex 85f3494071ef3030b51b8f3d57e4cc55; asc   I@q 00   =W  U;;

InnoDB: clust index record PHYSICAL RECORD: n_fields 33; compact format; info bits 0
0: len 8; hex 80000000000071cb; asc       q ;;
1: len 16; hex 85e89a64b7c33797adca6d6fc2068c47; asc    d  7   mo   G;;
2: len 6; hex 000034eda583; asc   4   ;;
3: len 7; hex 28000003cd00b6; asc (      ;;
4: SQL NULL;
5: SQL NULL;
6: SQL NULL;
7: len 8; hex 800000000000ea60; asc        `;;
8: len 8; hex 8000000000000000; asc         ;;
9: len 8; hex 8000000000000004; asc         ;;
10: len 8; hex 8000000000000000; asc         ;;
11: len 8; hex 8000000000000004; asc         ;;
12: len 8; hex 8000000000049c74; asc        t;;
13: len 8; hex 800000000000005d; asc        ];;
14: len 8; hex 80000000000002ce; asc         ;;
15: len 8; hex 8000000000000169; asc        i;;
16: len 8; hex 8000000000000000; asc         ;;
17: len 8; hex 8000000000000004; asc         ;;
18: len 8; hex 8000000000000004; asc         ;;
19: SQL NULL;
20: SQL NULL;
21: len 8; hex 8000000000000004; asc         ;;
22: len 8; hex 8000000000000000; asc         ;;
23: len 8; hex 8000000000000000; asc         ;;
24: len 8; hex 8000000000000000; asc         ;;
25: len 8; hex 8000000000000000; asc         ;;
26: len 8; hex 8000000000000004; asc         ;;
27: len 8; hex 8000000000000000; asc         ;;
28: len 8; hex 8000000000000000; asc         ;;
29: len 8; hex 8000000000000000; asc         ;;
30: len 8; hex 8000000000000000; asc         ;;
31: len 8; hex 80000157db6c10b8; asc    W l  ;;
32: len 8; hex 8000000000000000; asc         ;;

How to repeat:
If this isn't caused by an odd bug in MEM that hold locks on the table then just configure 20-28 targets and wait for the purge to fail and fill the ../monitor/mysql/runtime/mysqld.log log file with the error above every minute.

Suggested fix:
Either optimize the purge or figure out if a bug is holding locks on this heavily active table. To resolve this for the time being I stopped tomcat then the mysql db, started mysql, then began running the following purge to purge the data down to two weeks ago.

DELETE FROM `mem324__quan`.`normalized_statements_by_server_by_schema_data`
WHERE `timestamp` < 1479053410000
LIMIT 1000000;

Purging the table down by repeating the delete above took around 6 hours. I followed the purges by running "optimize table" on the table. The optimize ran for around 4-5 hours. At the end of my purge I brought the table size down to 10,839MB (8,195M data and 2,644M indexes) with a row count of 70,077,176 records.

I'm going to continue monitoring the size of this table and the error log to see if this starts back up again in the next few weeks.
[24 Nov 2016 11:49] Mark Leith
Hi Andrew,

This doesn't appear to be a problem with MEM (note that we are already using a purge loop as you suggest), but seems to instead be an apparent corruption of the table:

2016-11-09 00:15:11 7ff521d6a700  InnoDB: error clustered record for sec rec not found
InnoDB: index `timestamp` of table `mem324__quan`.`normalized_statements_by_server_by_schema_data`

i.e. there's a secondary index record with no corresponding primary key entry.

This is not really something that MEM can "automatically work around", it's something where we would expect an administrator to work out the corruption causes and fix appropriately.

If this continues to occur, please work with our Support department instead, as this would either be a bug in InnoDB, or a problem with your hardware (causing corruption in the data files).

Regards,

Mark