| Bug #55453 | Agent resends the same example query over and over every 2 minutes | ||
|---|---|---|---|
| Submitted: | 21 Jul 2010 18:01 | Modified: | 17 Aug 2010 10:45 |
| Reporter: | Diego Medina | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Enterprise Monitor: Agent | Severity: | S1 (Critical) |
| Version: | 2.2.2.1729 | OS: | Any |
| Assigned to: | Jan Kneschke | CPU Architecture: | Any |
[21 Jul 2010 18:01]
Diego Medina
[21 Jul 2010 19:58]
Enterprise Tools JIRA Robot
Diego Medina writes: The original steps are missing at least one step, you need to keep sending "other" queries through the proxy port to trigger the bug.
[23 Jul 2010 4:18]
Enterprise Tools JIRA Robot
Diego Medina writes: The first agent to show the problem is: 2.2.0.1705 Version 2.2.0.1686 does not have the problem 2.1.x also do not have this bug
[28 Jul 2010 18:40]
Enterprise Tools JIRA Robot
Diego Medina writes: 2.3.0.2017 is also affected
[4 Aug 2010 14:34]
Enterprise Tools JIRA Robot
Jan Kneschke writes:
a first set of patches is pushed and adds support to dump the content of the item-hash into a file:
{noformat}
$ kill -INFO `cat agent.pid`
$ ls -l /tmp/mysql*
-rw-r--r-- 1 jan wheel 78572 30 Jul 17:43 /tmp/mysql-monitor-agent-items.dump-20100730-174336.txt
-rw-r--r-- 1 jan wheel 80736 30 Jul 17:43 /tmp/mysql-monitor-agent-items.dump-20100730-174343.txt
...
{noformat}
[4 Aug 2010 15:33]
Enterprise Tools JIRA Robot
Jan Kneschke writes:
Running the item-hash dump once a minute without any QUAN query results in:
{noformat}
...
-rw-r--r-- 1 jan wheel 78307 4 Aug 17:30 /tmp/mysql-monitor-agent-items.dump-20100804-173015.txt
-rw-r--r-- 1 jan wheel 78306 4 Aug 17:31 /tmp/mysql-monitor-agent-items.dump-20100804-173115.txt
-rw-r--r-- 1 jan wheel 78307 4 Aug 17:32 /tmp/mysql-monitor-agent-items.dump-20100804-173215.txt
...
{noformat}
With ONE SELECT query through the QUAN-proxy we see how the memory usage is higher and is basicly a zickzack.
{noformat}
$ while true; do kill -INFO `cat ../trunk/quan.pid`; sleep 1; ls -l /tmp/mysql*; sleep 59; done
...
-rw-r--r-- 1 jan wheel 83761 4 Aug 17:19 /tmp/mysql-monitor-agent-items.dump-20100804-171943.txt
-rw-r--r-- 1 jan wheel 82984 4 Aug 17:20 /tmp/mysql-monitor-agent-items.dump-20100804-172043.txt
-rw-r--r-- 1 jan wheel 84756 4 Aug 17:21 /tmp/mysql-monitor-agent-items.dump-20100804-172143.txt
-rw-r--r-- 1 jan wheel 83979 4 Aug 17:22 /tmp/mysql-monitor-agent-items.dump-20100804-172244.txt
-rw-r--r-- 1 jan wheel 83762 4 Aug 17:23 /tmp/mysql-monitor-agent-items.dump-20100804-172344.txt
-rw-r--r-- 1 jan wheel 82985 4 Aug 17:24 /tmp/mysql-monitor-agent-items.dump-20100804-172444.txt
-rw-r--r-- 1 jan wheel 84759 4 Aug 17:25 /tmp/mysql-monitor-agent-items.dump-20100804-172544.txt
-rw-r--r-- 1 jan wheel 83982 4 Aug 17:26 /tmp/mysql-monitor-agent-items.dump-20100804-172644.txt
-rw-r--r-- 1 jan wheel 83765 4 Aug 17:27 /tmp/mysql-monitor-agent-items.dump-20100804-172744.txt
...
{noformat}
[4 Aug 2010 16:27]
Enterprise Tools JIRA Robot
Jan Kneschke writes:
based on the "versions-affected" from above we get:
{noformat}
2.2.0.1686 - 2010-04-07
2.2.0.1705 - 2010-04-28
{noformat}
Feeding this into bzr and filtering out stuff that is unrelated:
{noformat}
$ bzr log -r date:2010-04-05..date:2010-04-28 -v
...
revno: 1831
fixes bug(s): http://bugs.mysql.com/49699
committer: jan@mysql.com
branch nick: trunk
timestamp: Wed 2010-04-28 20:06:40 +0200
message:
merged revno 1530 + 1531 from rel-2.1 over by hand:
* do not quote instance names even if they contain a '.' character. quoting will mess up scheduling and finding the instance again (fixes #49699/EM-3865)
* remove code that checks if instance names need to be quoted (some unused code is still left).
* change instance name matching in quan.lua to allow for '.' chars in database names, make the rest of the matching a bit stricter to allow for unquoted '.' chars
* replace exception messages with something useful (no more "Hmm, something wasn't ok").
modified:
items/quan.lua
src/agent_item.c
src/job_collect_lua.c
src/job_collect_mysql.c
src/job_collect_os.c
revno: 1826
committer: Kay Roepke <kay@sun.com>
branch nick: trunk
timestamp: Mon 2010-04-12 15:06:09 +0200
message:
EM-4276: introduction of per-attribute versioning broke multi-instance support for certain mysql datacollections. see EM-3972. per-attribute versions are now stored per instance, to allow multiple instances to be monitored. affected mysql::status among others
modified:
src/agent_item.c
src/agent_item.h
src/job_collect.c
src/job_collect_mysql.c
src/job_collect_mysql_innodb.c
src/job_collect_mysql_quanconfig.c
revno: 1798
committer: jan@mysql.com
branch nick: trunk
timestamp: Tue 2010-04-06 20:56:21 +0200
message:
fixed signed/unsigned comparision and format-string warnings
modified:
src/job_collect_lua.c
...
{noformat}
[6 Aug 2010 12:29]
Enterprise Tools JIRA Robot
Jan Kneschke writes: Pushed to trunk. revno: 1916 fixes bug(s): http://bugs.mysql.com/55453 committer: jan@mysql.com branch nick: trunk timestamp: Fri 2010-08-06 14:20:47 +0200 message: fixed splitting of the instance keys (fixes #55453/EM-4689) the normalized instances were stuck in the lib/quan.lua script as the :remove_if_last() never called for all the data items it contained as the instances for bytes and exec_time where not found. In the log-file you saw messages like: .../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.b .../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.exec The agent cleaned up its side of the storage in the C-layer, but it never was actually removed from the Lua side of it as the remove_if_last() could never remove it all.
[6 Aug 2010 14:40]
Enterprise Tools JIRA Robot
Jan Kneschke writes:
and to 2.2:
{noformat}
revno: 1904
fixes bug(s): http://bugs.mysql.com/55453
committer: jan@mysql.com
branch nick: rel-2.2
timestamp: Fri 2010-08-06 16:33:45 +0200
message:
fixed splitting of the instance keys (fixes #55453/EM-4689)
the normalized instances were stuck in the lib/quan.lua script as the :remove_if_last()
never called for all the data items it contained as the instances for bytes and exec_time
where not found. In the log-file you saw messages like:
.../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.b
.../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.exec
The agent cleaned up its side of the storage in the C-layer, but it never was actually removed from the Lua
side of it as the remove_if_last() could never remove it all.
{noformat}
[13 Aug 2010 18:57]
Enterprise Tools JIRA Robot
Diego Medina writes: Verified fixed on 2.2.3.1736
[17 Aug 2010 10:45]
MC Brown
A note has been added to the 2.2.3 changelog:
The same example query could be consistently resent to
&merlin_server;, even though the example query had already been reported.
