Bug #40729 Graph data lost during merlin restart
Submitted: 14 Nov 2008 8:26 Modified: 8 Feb 2010 15:54
Reporter: Simon Mudd (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Server Severity:S1 (Critical)
Version:2.0.0.7088 OS:Any
Assigned to: Kay Roepke CPU Architecture:Any
Tags: leith_discuss_me, mem_20_maint, mem_discuss_me, windmill

[14 Nov 2008 8:26] Simon Mudd
Description:
We restarted the tomcat server due to problems logging in. After doing so it took about 10 minutes before we could log in again.

I see that the graph data is cut off during this time (no data) yet was under the impression that the agent spools/stores this data for a short period even if the server is down so that it can be sent later.

This doesn't seem to be happening.

How to repeat:
stop tomcat.
start it.
check for gaps in the graphs.
It may be necessary to have several mysql instances monitored as otherwise the tomcat restart may not take very long. Our installation had over 100.

I'll see if I can get some graphs and server logs to show this info.

Suggested fix:
Ensure that while the server is down the agent stores the data for a certain amount of time.
Is this time/space configurable? Might be handy to do that.
When the server comes back allow the agent to send the data to the server so we can see it later.
[14 Nov 2008 8:30] Simon Mudd
Screen showing gap after restarting merlin.

Attachment: 40729.png (image/png, text), 42.12 KiB.

[17 Nov 2008 16:59] Gary Whizin
Will look at various ways to cache and then send up backlog of data. Meanwhile, looking at the slow server restart
[26 May 2009 16:48] Gary Whizin
Currently overhauling/enhancing logging to track this down
[1 Oct 2009 19:25] Enterprise Tools JIRA Robot
Mark Matthews writes: 
Could we add something to the graphs that would show when the service manager is not running (similar to the concept of showing when mysqld wasn't running)?
[2 Dec 2009 22:12] Enterprise Tools JIRA Robot
Keith Russell writes: 
Patch installed in versions => 2.2.0.1563.
[15 Dec 2009 4:13] Enterprise Tools JIRA Robot
Diego Medina writes: 
Still see gaps using agent 2.2.0.1586
[15 Dec 2009 20:09] Enterprise Tools JIRA Robot
Carsten Segieth writes: 
attached the debug logs and screen shots where both servers shows gaps in the graphs at the time the logs were created. Agent #13 is RH4_x86, #34 is WinXP.
[4 Feb 2010 20:50] Enterprise Tools JIRA Robot
Darren Oldag writes: 
revision-id: oldag@mysql.com-20100204204414-brry701sackbz5p5
parent: oldag@mysql.com-20100203214700-87z6w8vqxhzra7ks
committer: Darren L. Oldag <oldag@mysql.com>
branch nick: Trunk
timestamp: Thu 2010-02-04 14:44:14 -0600
message:
  https://repoman.mysql.com/jira/browse/EM-3197
  
  the service manager would only take the last value for an instance attribute
  per heartbeat from the agent.  we know recognize when there are multiple
  ones and group them by timestamp, so that each effective 'snapshot' is
  able to be saved.
[4 Feb 2010 21:06] Enterprise Tools JIRA Robot
Keith Russell writes: 
Patch installed in versions => 2.2.0.1614.
[5 Feb 2010 19:46] Enterprise Tools JIRA Robot
Diego Medina writes: 
You will also need the agent 2.2.0.1614 as the fixed was part on the agent and part on the dashboard.
[8 Feb 2010 15:54] MC Brown
An entry has been added to the 2.2.2 changelog: 

        In the event of a failure by &merlin_agent; to communicate                                                                                         
        with the &merlin_server;, for example when the server has been                                                                                     
        restarted, it was possible for there to be gaps in the                                                                                             
        reported information. This may have shown up as gaps in the                                                                                        
        graph output.