Bug #42581 | Agent won't reconnect to monitored DB if started when monitored DB is down | ||
---|---|---|---|
Submitted: | 4 Feb 2009 7:24 | Modified: | 14 Jan 2010 14:58 |
Reporter: | Andrii Nikitin | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Enterprise Monitor: Agent | Severity: | S2 (Serious) |
Version: | 2.0.3.7134, 2.0.2.7131,2.1.* | OS: | Any |
Assigned to: | Kay Roepke | CPU Architecture: | Any |
Tags: | a2memj, mem_20_maint, mem_discuss_me, regression, up_for_grabs, windmill |
[4 Feb 2009 7:24]
Andrii Nikitin
[18 Feb 2009 18:52]
Diego Medina
"We think the solution is: the agent never shuts down;" There is another use case, where this should really be fixed: 1- Complete server (the box) restarts 2- The agent starts up 2- then the mysqld starts up but as the agent started first, and tried to connect to the mysqld and failed, the agent will not try to connect to the db any more, and it will report the mysqld as down, while it is up. See http://bugs.mysql.com/bug.php?id=41634 for more info
[27 Feb 2009 13:09]
Jan Kneschke
These are two bugs in one: * agent-side: no auto-report of new items/attributes/value when the mysql-server comes up * server-side: os-data displayed without a mysql-server reported. It should be moved to the next release to implement properly.
[27 Feb 2009 13:11]
Jan Kneschke
* server-side: os-data is _not_ displayed without the initial connect to the mysql-server
[27 Feb 2009 13:14]
Jan Kneschke
We should split this bug into 2 bugs (agent/server) and close this one.
[22 Jun 2009 15:18]
Jan Kneschke
The problem is that the mem-server only asks for the LKDI at startup and never again afterwards. Most DI's are known at startup and don't change. As a mysql-server might be down at startup and the LKDI isn't executed again later for the "unknown" items we have to run LKDI as long as the mysql-server is still down and send back the result for the LKDI as soon as it is reachable. The LKDI should return the KDI's of: mysql::server mysql::status mysql::variables mysql::innodbstatus ... and the other mysql::* classes of the mysql-collector. That should trigger the LI on mem-server side automaticly and lead to a working late-discovery of the mysql-server.
[22 Jun 2009 15:20]
Jan Kneschke
=== modified file 'plugins/agent/network-io.c' --- plugins/agent/network-io.c 2009-06-17 21:13:17 +0000 +++ plugins/agent/network-io.c 2009-06-22 15:20:31 +0000 @@ -871,6 +871,25 @@ } } else if (0 == strcmp(job_resp->command, "resynchronize")) { g_hash_table_remove_all(tracked_uuids); /* flush the table of tracked UUIDs as we got resynced */ + } else if (0 == strcmp(job_resp->command, "list_known_data_items")) { + /* check if we send mysql::server items up to the server + * + * see #42581 + * + * if not, start a internal task that tries to get list_known_data_items for mysql::server + * every 30sec. If that succeeds, send the data up and start list_known_data_items for + * + * - mysql::status + * - mysql::variables + * - mysql::innodbstatus + * - mysql::... + * + * and send its result back too + * + * the mem-server should start the list-instances for those data-items right away + */ + + } }
[23 Jun 2009 10:31]
Jan Kneschke
After investigation: * the agent sends a response to LKDI(mysql::server), they are static (server.reachable, ...) * but returns not to LI(mysql::server) as expected We need some infrastructure work to return the instances automaticly when they appear or change.
[2 Jul 2009 13:36]
Jan Kneschke
revno: 1402 committer: jan@mysql.com branch nick: trunk timestamp: Thu 2009-07-02 15:10:56 +0200 message: added a internal task that checks unknown mysql::server instances * moved the network-io internal structures into network_io_state_t * added a _before_send() function to intercep the result of internal tasks * start internal list-instances(mysql::server) if no mysql::server instances are reported on startup ------------------------------------------------------------ revno: 1401 committer: jan@mysql.com branch nick: trunk timestamp: Thu 2009-07-02 13:18:10 +0200 message: moved the job_task_t structure into the job_response_t to see which task the response is for * at the time the task arrives in network-io, the corresponding agent_task might be gone * the job_task is the current instance of that agent_task * we need it to see if list-instances() call could have included a mysql::server instance or not to start a internal task for it
[2 Jul 2009 13:37]
Jan Kneschke
oops, I should have set it to patch queued.
[6 Jul 2009 19:44]
Enterprise Tools JIRA Robot
Darren Oldag writes: the fix appears sufficient for the single-server monitored case.
[6 Jul 2009 19:47]
Enterprise Tools JIRA Robot
Darren Oldag writes: patch was pushed prior to review, which is "no big deal" to me.
[6 Jul 2009 22:27]
Enterprise Tools JIRA Robot
Keith Russell writes: Patch applied in versions => 2.1.0.1074.
[7 Jul 2009 18:11]
Enterprise Tools JIRA Robot
Diego Medina writes: Verified fixed on 2.1.0.1074
[20 Jul 2009 15:43]
Tony Bedford
An entry was added to the 2.1.0 changelog: The Agent would not reconnect to a monitored database if it was started when the monitored server was down. The agent log contained the following error: Can't connect to MySQL server on '127.0.0.1' (0) (mysql-errno = 2003) The agent only sent OS data to the Dashboard. Further, when the monitored server was later started, no attempts to reconnect were logged. The problem could be worked around by restarting the agent when the monitored server was running again.
[16 Dec 2009 19:31]
Enterprise Tools JIRA Robot
Keith Russell writes: Patch installed in versions => 2.2.0.1560.
[18 Dec 2009 18:02]
Enterprise Tools JIRA Robot
Diego Medina writes: Agent 2.2.0.1588 still has the problem.
[12 Jan 2010 19:53]
Enterprise Tools JIRA Robot
Keith Russell writes: Patch installer in version => 2.2.0.1605.
[13 Jan 2010 17:04]
Enterprise Tools JIRA Robot
Carsten Segieth writes: checked fixed in 2.2.0.1605
[14 Jan 2010 14:58]
MC Brown
Entry has been added to the 2.2.0 changelog