Bug #55453 Agent resends the same example query over and over every 2 minutes
Submitted: 21 Jul 2010 18:01 Modified: 17 Aug 2010 10:45
Reporter: Diego Medina Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Agent Severity:S1 (Critical)
Version:2.2.2.1729 OS:Any
Assigned to: Jan Kneschke CPU Architecture:Any

[21 Jul 2010 18:01] Diego Medina
Description:
I run one query through the query analyzer, it shows on the dashboard as run once, I then run this query against the repository:

select  from_unixtime(`timestamp` / 1000) , text from statement_examples  where text like "insert into bug%" \g

and every two minutes, there is a new entry, so far I have:

mysql> select  from_unixtime(`timestamp` / 1000) , text from statement_examples  where text like "insert into bug%" \g
+-----------------------------------+-----------------------------------------------------------------------------------------------------+
| from_unixtime(`timestamp` / 1000) | text                                                                                                |
+-----------------------------------+-----------------------------------------------------------------------------------------------------+
| 2010-07-21 13:28:18               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:30:18               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:32:18               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:34:18               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:36:18               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:38:23               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:40:17               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:42:17               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:44:17               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:46:17               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:48:17               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:50:17               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:52:17               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:54:17               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
| 2010-07-21 13:56:17               | insert into bug (select count(*) as cnt , text from statement_examples group by text order by null) | 
+-----------------------------------+-----------------------------------------------------------------------------------------------------+

and the number of rows keeps going up

How to repeat:
1- Install and start the service manager and agent
2- Enable the query analyzer
3- send one query though the proxy port
4- wait until it shows up on the dashboard
5- run this query on the repository:

 select  from_unixtime(`timestamp` / 1000) , text from statement_examples  where text like "insert into bug%" \g

replace the query text for your query text

6- notice that every two minutes, there will be one more row
[21 Jul 2010 19:58] Enterprise Tools JIRA Robot
Diego Medina writes: 
The original steps are missing at least one step,  you need to keep sending "other" queries through the proxy port to trigger the bug.
[23 Jul 2010 4:18] Enterprise Tools JIRA Robot
Diego Medina writes: 
The first agent to show the problem is: 2.2.0.1705

Version 2.2.0.1686 does not have the problem
2.1.x also do not have this bug
[28 Jul 2010 18:40] Enterprise Tools JIRA Robot
Diego Medina writes: 
2.3.0.2017 is also affected
[4 Aug 2010 14:34] Enterprise Tools JIRA Robot
Jan Kneschke writes: 
a first set of patches is pushed and adds support to dump the content of the item-hash into a file:

{noformat}
$ kill -INFO `cat agent.pid`
$ ls -l /tmp/mysql*
-rw-r--r--  1 jan     wheel  78572 30 Jul 17:43 /tmp/mysql-monitor-agent-items.dump-20100730-174336.txt
-rw-r--r--  1 jan     wheel  80736 30 Jul 17:43 /tmp/mysql-monitor-agent-items.dump-20100730-174343.txt
...
{noformat}
[4 Aug 2010 15:33] Enterprise Tools JIRA Robot
Jan Kneschke writes: 
Running the item-hash dump once a minute without any QUAN query results in:

{noformat}
...
-rw-r--r--  1 jan     wheel  78307  4 Aug 17:30 /tmp/mysql-monitor-agent-items.dump-20100804-173015.txt
-rw-r--r--  1 jan     wheel  78306  4 Aug 17:31 /tmp/mysql-monitor-agent-items.dump-20100804-173115.txt
-rw-r--r--  1 jan     wheel  78307  4 Aug 17:32 /tmp/mysql-monitor-agent-items.dump-20100804-173215.txt
...
{noformat}

With ONE SELECT query through the QUAN-proxy we see how the memory usage is higher and is basicly a zickzack.

{noformat}
$ while true; do kill -INFO `cat ../trunk/quan.pid`; sleep 1; ls -l /tmp/mysql*; sleep 59; done
...
-rw-r--r--  1 jan     wheel  83761  4 Aug 17:19 /tmp/mysql-monitor-agent-items.dump-20100804-171943.txt
-rw-r--r--  1 jan     wheel  82984  4 Aug 17:20 /tmp/mysql-monitor-agent-items.dump-20100804-172043.txt
-rw-r--r--  1 jan     wheel  84756  4 Aug 17:21 /tmp/mysql-monitor-agent-items.dump-20100804-172143.txt
-rw-r--r--  1 jan     wheel  83979  4 Aug 17:22 /tmp/mysql-monitor-agent-items.dump-20100804-172244.txt
-rw-r--r--  1 jan     wheel  83762  4 Aug 17:23 /tmp/mysql-monitor-agent-items.dump-20100804-172344.txt
-rw-r--r--  1 jan     wheel  82985  4 Aug 17:24 /tmp/mysql-monitor-agent-items.dump-20100804-172444.txt
-rw-r--r--  1 jan     wheel  84759  4 Aug 17:25 /tmp/mysql-monitor-agent-items.dump-20100804-172544.txt
-rw-r--r--  1 jan     wheel  83982  4 Aug 17:26 /tmp/mysql-monitor-agent-items.dump-20100804-172644.txt
-rw-r--r--  1 jan     wheel  83765  4 Aug 17:27 /tmp/mysql-monitor-agent-items.dump-20100804-172744.txt
...
{noformat}
[4 Aug 2010 16:27] Enterprise Tools JIRA Robot
Jan Kneschke writes: 
based on the "versions-affected" from above we get:

{noformat}
2.2.0.1686 - 2010-04-07
2.2.0.1705 - 2010-04-28
{noformat}

Feeding this into bzr and filtering out stuff that is unrelated:

{noformat}
$ bzr log -r date:2010-04-05..date:2010-04-28 -v
...
revno: 1831
fixes bug(s): http://bugs.mysql.com/49699
committer: jan@mysql.com
branch nick: trunk
timestamp: Wed 2010-04-28 20:06:40 +0200
message:
    merged revno 1530 + 1531 from rel-2.1 over by hand:
  
     * do not quote instance names even if they contain a '.' character. quoting will mess up scheduling and finding the instance again (fixes #49699/EM-3865)
       * remove code that checks if instance names need to be quoted (some unused code is still left).
       * change instance name matching in quan.lua to allow for '.' chars in database names, make the rest of the matching a bit stricter to allow for unquoted '.' chars
     * replace exception messages with something useful (no more "Hmm, something wasn't ok").
modified:
  items/quan.lua
  src/agent_item.c
  src/job_collect_lua.c
  src/job_collect_mysql.c
  src/job_collect_os.c

revno: 1826
committer: Kay Roepke <kay@sun.com>
branch nick: trunk
timestamp: Mon 2010-04-12 15:06:09 +0200
message:
  EM-4276: introduction of per-attribute versioning broke multi-instance support for certain mysql datacollections. see EM-3972. per-attribute versions are now stored per instance, to allow multiple instances to be monitored. affected mysql::status among others
modified:
  src/agent_item.c
  src/agent_item.h
  src/job_collect.c
  src/job_collect_mysql.c
  src/job_collect_mysql_innodb.c
  src/job_collect_mysql_quanconfig.c

revno: 1798
committer: jan@mysql.com
branch nick: trunk
timestamp: Tue 2010-04-06 20:56:21 +0200
message:
  fixed signed/unsigned comparision and format-string warnings
modified:
  src/job_collect_lua.c
...
{noformat}
[6 Aug 2010 12:29] Enterprise Tools JIRA Robot
Jan Kneschke writes: 
Pushed to trunk.

revno: 1916
fixes bug(s): http://bugs.mysql.com/55453
committer: jan@mysql.com
branch nick: trunk
timestamp: Fri 2010-08-06 14:20:47 +0200
message:
  fixed splitting of the instance keys (fixes #55453/EM-4689)
  
  the normalized instances were stuck in the lib/quan.lua script as the :remove_if_last()
  never called for all the data items it contained as the instances for bytes and exec_time
  where not found. In the log-file you saw messages like:
  
    .../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.b
    .../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.exec
  
  The agent cleaned up its side of the storage in the C-layer, but it never was actually removed from the Lua
  side of it as the remove_if_last() could never remove it all.
[6 Aug 2010 14:40] Enterprise Tools JIRA Robot
Jan Kneschke writes: 
and to 2.2:

{noformat}
revno: 1904
fixes bug(s): http://bugs.mysql.com/55453
committer: jan@mysql.com
branch nick: rel-2.2
timestamp: Fri 2010-08-06 16:33:45 +0200
message:
  fixed splitting of the instance keys (fixes #55453/EM-4689)
  
  the normalized instances were stuck in the lib/quan.lua script as the :remove_if_last()
  never called for all the data items it contained as the instances for bytes and exec_time
  where not found. In the log-file you saw messages like:
  
    .../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.b
    .../items/quan.lua:401) (histogram_collect) can't find norm_query_inst: 0e982db4-2577-4fc0-b4d2-d1721748bfbf..ed0f217a759d003a52432507d41e4cb4.exec
  
  The agent cleaned up its side of the storage in the C-layer, but it never was actually removed from the Lua
  side of it as the remove_if_last() could never remove it all.
{noformat}
[13 Aug 2010 18:57] Enterprise Tools JIRA Robot
Diego Medina writes: 
Verified fixed on 2.2.3.1736
[17 Aug 2010 10:45] MC Brown
A note has been added to the 2.2.3 changelog: 

        The same example query could be consistently resent to                                                                                             
        &merlin_server;, even though the example query had already been reported.