Bug #55453 | Agent resends the same example query over and over every 2 minutes | ||
---|---|---|---|
Submitted: | 21 Jul 2010 18:01 | Modified: | 17 Aug 2010 10:45 |
Reporter: | Diego Medina | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Enterprise Monitor: Agent | Severity: | S1 (Critical) |
Version: | 2.2.2.1729 | OS: | Any |
Assigned to: | Jan Kneschke | CPU Architecture: | Any |
[21 Jul 2010 18:01]
Diego Medina
[21 Jul 2010 19:58]
Enterprise Tools JIRA Robot
Diego Medina writes: The original steps are missing at least one step, you need to keep sending "other" queries through the proxy port to trigger the bug.
[23 Jul 2010 4:18]
Enterprise Tools JIRA Robot
Diego Medina writes: The first agent to show the problem is: 2.2.0.1705 Version 2.2.0.1686 does not have the problem 2.1.x also do not have this bug
[28 Jul 2010 18:40]
Enterprise Tools JIRA Robot
Diego Medina writes: 2.3.0.2017 is also affected
[4 Aug 2010 14:34]
Enterprise Tools JIRA Robot
Jan Kneschke writes: a first set of patches is pushed and adds support to dump the content of the item-hash into a file: {noformat} $ kill -INFO `cat agent.pid` $ ls -l /tmp/mysql* -rw-r--r-- 1 jan wheel 78572 30 Jul 17:43 /tmp/mysql-monitor-agent-items.dump-20100730-174336.txt -rw-r--r-- 1 jan wheel 80736 30 Jul 17:43 /tmp/mysql-monitor-agent-items.dump-20100730-174343.txt ... {noformat}
[4 Aug 2010 15:33]
Enterprise Tools JIRA Robot
Jan Kneschke writes: Running the item-hash dump once a minute without any QUAN query results in: {noformat} ... -rw-r--r-- 1 jan wheel 78307 4 Aug 17:30 /tmp/mysql-monitor-agent-items.dump-20100804-173015.txt -rw-r--r-- 1 jan wheel 78306 4 Aug 17:31 /tmp/mysql-monitor-agent-items.dump-20100804-173115.txt -rw-r--r-- 1 jan wheel 78307 4 Aug 17:32 /tmp/mysql-monitor-agent-items.dump-20100804-173215.txt ... {noformat} With ONE SELECT query through the QUAN-proxy we see how the memory usage is higher and is basicly a zickzack. {noformat} $ while true; do kill -INFO `cat ../trunk/quan.pid`; sleep 1; ls -l /tmp/mysql*; sleep 59; done ... -rw-r--r-- 1 jan wheel 83761 4 Aug 17:19 /tmp/mysql-monitor-agent-items.dump-20100804-171943.txt -rw-r--r-- 1 jan wheel 82984 4 Aug 17:20 /tmp/mysql-monitor-agent-items.dump-20100804-172043.txt -rw-r--r-- 1 jan wheel 84756 4 Aug 17:21 /tmp/mysql-monitor-agent-items.dump-20100804-172143.txt -rw-r--r-- 1 jan wheel 83979 4 Aug 17:22 /tmp/mysql-monitor-agent-items.dump-20100804-172244.txt -rw-r--r-- 1 jan wheel 83762 4 Aug 17:23 /tmp/mysql-monitor-agent-items.dump-20100804-172344.txt -rw-r--r-- 1 jan wheel 82985 4 Aug 17:24 /tmp/mysql-monitor-agent-items.dump-20100804-172444.txt -rw-r--r-- 1 jan wheel 84759 4 Aug 17:25 /tmp/mysql-monitor-agent-items.dump-20100804-172544.txt -rw-r--r-- 1 jan wheel 83982 4 Aug 17:26 /tmp/mysql-monitor-agent-items.dump-20100804-172644.txt -rw-r--r-- 1 jan wheel 83765 4 Aug 17:27 /tmp/mysql-monitor-agent-items.dump-20100804-172744.txt ... {noformat}
[4 Aug 2010 16:27]
Enterprise Tools JIRA Robot
Jan Kneschke writes: based on the "versions-affected" from above we get: {noformat} 2.2.0.1686 - 2010-04-07 2.2.0.1705 - 2010-04-28 {noformat} Feeding this into bzr and filtering out stuff that is unrelated: {noformat} $ bzr log -r date:2010-04-05..date:2010-04-28 -v ... revno: 1831 fixes bug(s): http://bugs.mysql.com/49699 committer: jan@mysql.com branch nick: trunk timestamp: Wed 2010-04-28 20:06:40 +0200 message: merged revno 1530 + 1531 from rel-2.1 over by hand: * do not quote instance names even if they contain a '.' character. quoting will mess up scheduling and finding the instance again (fixes #49699/EM-3865) * remove code that checks if instance names need to be quoted (some unused code is still left). * change instance name matching in quan.lua to allow for '.' chars in database names, make the rest of the matching a bit stricter to allow for unquoted '.' chars * replace exception messages with something useful (no more "Hmm, something wasn't ok"). modified: items/quan.lua src/agent_item.c src/job_collect_lua.c src/job_collect_mysql.c src/job_collect_os.c revno: 1826 committer: Kay Roepke <kay@sun.com> branch nick: trunk timestamp: Mon 2010-04-12 15:06:09 +0200 message: EM-4276: introduction of per-attribute versioning broke multi-instance support for certain mysql datacollections. see EM-3972. per-attribute versions are now stored per instance, to allow multiple instances to be monitored. affected mysql::status among others modified: src/agent_item.c src/agent_item.h src/job_collect.c src/job_collect_mysql.c src/job_collect_mysql_innodb.c src/job_collect_mysql_quanconfig.c revno: 1798 committer: jan@mysql.com branch nick: trunk timestamp: Tue 2010-04-06 20:56:21 +0200 message: fixed signed/unsigned comparision and format-string warnings modified: src/job_collect_lua.c ... {noformat}
[6 Aug 2010 12:29]
Enterprise Tools JIRA Robot
Jan Kneschke writes: Pushed to trunk. revno: 1916 fixes bug(s): http://bugs.mysql.com/55453 committer: jan@mysql.com branch nick: trunk timestamp: Fri 2010-08-06 14:20:47 +0200 message: fixed splitting of the instance keys (fixes #55453/EM-4689) the normalized instances were stuck in the lib/quan.lua script as the :remove_if_last() never called for all the data items it contained as the instances for bytes and exec_time where not found. In the log-file you saw messages like: .../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.b .../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.exec The agent cleaned up its side of the storage in the C-layer, but it never was actually removed from the Lua side of it as the remove_if_last() could never remove it all.
[6 Aug 2010 14:40]
Enterprise Tools JIRA Robot
Jan Kneschke writes: and to 2.2: {noformat} revno: 1904 fixes bug(s): http://bugs.mysql.com/55453 committer: jan@mysql.com branch nick: rel-2.2 timestamp: Fri 2010-08-06 16:33:45 +0200 message: fixed splitting of the instance keys (fixes #55453/EM-4689) the normalized instances were stuck in the lib/quan.lua script as the :remove_if_last() never called for all the data items it contained as the instances for bytes and exec_time where not found. In the log-file you saw messages like: .../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.b .../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.exec The agent cleaned up its side of the storage in the C-layer, but it never was actually removed from the Lua side of it as the remove_if_last() could never remove it all. {noformat}
[13 Aug 2010 18:57]
Enterprise Tools JIRA Robot
Diego Medina writes: Verified fixed on 2.2.3.1736
[17 Aug 2010 10:45]
MC Brown
A note has been added to the 2.2.3 changelog: The same example query could be consistently resent to &merlin_server;, even though the example query had already been reported.