Description:
We keep having significant bug issues with all of the MEM releases we have tried. For this bug the query analyzer page stopped functioning. I checked the table sizes to see if anything looked strange and found that the mem324__quan.normalized_statements_by_server_by_schema_data table reached a size of 30,109MB (23G data and about 7G being the indexes) with a row count of 130,409,819 records with only 20 some target databases.
It seems crazy that this purge cannot keep up with 20 target databases when the documentation states a medium MEM installation involves up to 100 mysql target servers. Is this a known bug or is there a MEM configuration that is not mentioned in the documentation that needs to be performed to prevent this delete from failing.
Here's an example of the errors that were written to the log every minute:
TRANSACTION 1493165401, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
2 lock struct(s), heap size 360, 1 row lock(s)
MySQL thread id 848978, OS thread handle 0x7ff521d6a700, query id 226147306 localhost 127.0.0.1 service_manager updating
/* mem dbpool.default */ DELETE FROM `mem324__quan`.`normalized_statements_by_server_by_schema_data` WHERE `timestamp` < 1476855149116 ORDER BY `timestamp` LIMIT 1000
InnoDB: Submit a detailed bug report to http://bugs.mysql.com
2016-11-09 00:15:11 7ff521d6a700 InnoDB: error clustered record for sec rec not found
InnoDB: index `timestamp` of table `mem324__quan`.`normalized_statements_by_server_by_schema_data`
InnoDB: sec index record PHYSICAL RECORD: n_fields 3; compact format; info bits 0
0: len 8; hex 800001576f46d6c0; asc WoF ;;
1: len 8; hex 80000000000071cb; asc q ;;
2: len 16; hex 85f3494071ef3030b51b8f3d57e4cc55; asc I@q 00 =W U;;
InnoDB: clust index record PHYSICAL RECORD: n_fields 33; compact format; info bits 0
0: len 8; hex 80000000000071cb; asc q ;;
1: len 16; hex 85e89a64b7c33797adca6d6fc2068c47; asc d 7 mo G;;
2: len 6; hex 000034eda583; asc 4 ;;
3: len 7; hex 28000003cd00b6; asc ( ;;
4: SQL NULL;
5: SQL NULL;
6: SQL NULL;
7: len 8; hex 800000000000ea60; asc `;;
8: len 8; hex 8000000000000000; asc ;;
9: len 8; hex 8000000000000004; asc ;;
10: len 8; hex 8000000000000000; asc ;;
11: len 8; hex 8000000000000004; asc ;;
12: len 8; hex 8000000000049c74; asc t;;
13: len 8; hex 800000000000005d; asc ];;
14: len 8; hex 80000000000002ce; asc ;;
15: len 8; hex 8000000000000169; asc i;;
16: len 8; hex 8000000000000000; asc ;;
17: len 8; hex 8000000000000004; asc ;;
18: len 8; hex 8000000000000004; asc ;;
19: SQL NULL;
20: SQL NULL;
21: len 8; hex 8000000000000004; asc ;;
22: len 8; hex 8000000000000000; asc ;;
23: len 8; hex 8000000000000000; asc ;;
24: len 8; hex 8000000000000000; asc ;;
25: len 8; hex 8000000000000000; asc ;;
26: len 8; hex 8000000000000004; asc ;;
27: len 8; hex 8000000000000000; asc ;;
28: len 8; hex 8000000000000000; asc ;;
29: len 8; hex 8000000000000000; asc ;;
30: len 8; hex 8000000000000000; asc ;;
31: len 8; hex 80000157db6c10b8; asc W l ;;
32: len 8; hex 8000000000000000; asc ;;
How to repeat:
If this isn't caused by an odd bug in MEM that hold locks on the table then just configure 20-28 targets and wait for the purge to fail and fill the ../monitor/mysql/runtime/mysqld.log log file with the error above every minute.
Suggested fix:
Either optimize the purge or figure out if a bug is holding locks on this heavily active table. To resolve this for the time being I stopped tomcat then the mysql db, started mysql, then began running the following purge to purge the data down to two weeks ago.
DELETE FROM `mem324__quan`.`normalized_statements_by_server_by_schema_data`
WHERE `timestamp` < 1479053410000
LIMIT 1000000;
Purging the table down by repeating the delete above took around 6 hours. I followed the purges by running "optimize table" on the table. The optimize ran for around 4-5 hours. At the end of my purge I brought the table size down to 10,839MB (8,195M data and 2,644M indexes) with a row count of 70,077,176 records.
I'm going to continue monitoring the size of this table and the error log to see if this starts back up again in the next few weeks.