Bug #45591 Agent crashes when monitoring a slave server using ssl for replication
Submitted: 18 Jun 2009 15:35 Modified: 19 Jun 2009 17:31
Reporter: Diego Medina Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Agent Severity:S2 (Serious)
Version:2.1.0.1054 OS:Any
Assigned to: Diego Medina CPU Architecture:Any

[18 Jun 2009 15:35] Diego Medina
Description:
If your slave is using ssl for replication and you try to monitor it. The agent crashes, then, the angel process starts it again, and as soon as the agent plugin is loaded again, it crashes.
Making it impossible to monitor a slave that is using ssl for replication.

Agent 2.1.0.1051 (previous one) can monitor a slave that uses ssl on replication and does not crash.

How to repeat:
1- Setup replication and make sure the slaves are using ssl for this
2- Install and start the agent and service manager
3- look at the agent log

On default log level you see:

(critical) agent_mysqld.c:641: successfully connected to database at 127.0.0.1:22548 as user msandbox (with password: YES)
(critical) network-io.c:254: successfully reconnected to dashboard at https://agent:mysql@127.0.0.1:48443/heartbeat
(critical) exception received from server: E1402: DuplicateAgentUuidException: [373a1777-1f3b-43ff-8735-14c4a2d9ea3b, 1245338810491.63, diego-medinas-macbook-pro.local, -1, diego-medinas-macbook-pro.local]

on message log level you see:

(message) mysql-query (mysql): SHOW DATABASES
(message) mysql-query (mysql): SHOW SLAVE STATUS
(message) chassis.c:304: [angel] PID=6498 died on signal=10 (it used 3648 kBytes max) ... waiting 3min before restart
(message) chassis.c:259: [angel] we try to keep PID=6518 alive
(message) mysql-proxy 0.7.0 started
(message) MySQL Monitor Agent 2.1.0.1054 started.

It always crashes after the show slave status query

I will attach gdb and valgrind outputs soon
[18 Jun 2009 15:39] Diego Medina
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000338
[Switching to process 6617 thread 0x2a03]
0x00527151 in mysql_ssl_set ()
(gdb) thread apply all bt

Thread 8 (process 6617 thread 0x2b03):
#0  0x96f232e6 in semaphore_timedwait_signal_trap ()
#1  0x96f552af in _pthread_cond_wait ()
#2  0x96f9faab in pthread_cond_timedwait ()
#3  0x0001463b in g_cond_timed_wait_posix_impl ()
#4  0x000e15d1 in g_async_queue_pop_intern_unlocked ()
#5  0x000e18a1 in g_async_queue_timed_pop ()
#6  0x00357145 in scheduler_thread ()
#7  0x0012fc98 in g_thread_create_proxy ()
#8  0x96f54155 in _pthread_start ()
#9  0x96f54012 in thread_start ()

Thread 7 (process 6617 thread 0x2a03):
#0  0x00527151 in mysql_ssl_set ()
#1  0x0036202e in agent_item_class_update_mysqld_from_fieldnames ()
#2  0x000f9d66 in g_hash_table_foreach ()
#3  0x00362119 in agent_item_class_update_all_from_fieldnames ()
#4  0x003578ea in knownitems_class_iter ()
#5  0x000f9d66 in g_hash_table_foreach ()
#6  0x000f9d66 in g_hash_table_foreach ()
#7  0x00357a37 in job_list_knowndataitems_get_items ()
#8  0x00357b0f in job_list_knowndataitems_thread ()
#9  0x0012fc98 in g_thread_create_proxy ()
#10 0x96f54155 in _pthread_start ()
#11 0x96f54012 in thread_start ()

Thread 6 (process 6617 thread 0x2903):
#0  0x96f232e6 in semaphore_timedwait_signal_trap ()
#1  0x96f552af in _pthread_cond_wait ()
#2  0x96f9faab in pthread_cond_timedwait ()
#3  0x0001463b in g_cond_timed_wait_posix_impl ()
#4  0x000e15d1 in g_async_queue_pop_intern_unlocked ()
#5  0x000e18a1 in g_async_queue_timed_pop ()
#6  0x0035812d in job_list_instances_thread ()
#7  0x0012fc98 in g_thread_create_proxy ()
#8  0x96f54155 in _pthread_start ()
#9  0x96f54012 in thread_start ()

Thread 5 (process 6617 thread 0x2803):
#0  0x96f232e6 in semaphore_timedwait_signal_trap ()
#1  0x96f552af in _pthread_cond_wait ()
#2  0x96f9faab in pthread_cond_timedwait ()
#3  0x0001463b in g_cond_timed_wait_posix_impl ()
#4  0x000e15d1 in g_async_queue_pop_intern_unlocked ()
#5  0x000e18a1 in g_async_queue_timed_pop ()
#6  0x0035f900 in job_collect_lua_thread ()
#7  0x0012fc98 in g_thread_create_proxy ()
#8  0x96f54155 in _pthread_start ()
#9  0x96f54012 in thread_start ()

Thread 4 (process 6617 thread 0x2703):
#0  0x96f232e6 in semaphore_timedwait_signal_trap ()
#1  0x96f552af in _pthread_cond_wait ()
#2  0x96f9faab in pthread_cond_timedwait ()
#3  0x0001463b in g_cond_timed_wait_posix_impl ()
#4  0x000e15d1 in g_async_queue_pop_intern_unlocked ()
#5  0x000e18a1 in g_async_queue_timed_pop ()
#6  0x0035cfa0 in job_collect_os_thread ()
#7  0x0012fc98 in g_thread_create_proxy ()
#8  0x96f54155 in _pthread_start ()
#9  0x96f54012 in thread_start ()

Thread 3 (process 6617 thread 0x1403):
#0  0x96f232e6 in semaphore_timedwait_signal_trap ()
#1  0x96f552af in _pthread_cond_wait ()
#2  0x96f9faab in pthread_cond_timedwait ()
#3  0x0001463b in g_cond_timed_wait_posix_impl ()
#4  0x000e15d1 in g_async_queue_pop_intern_unlocked ()
#5  0x000e18a1 in g_async_queue_timed_pop ()
#6  0x00363830 in job_collect_mysql_thread ()
#7  0x0012fc98 in g_thread_create_proxy ()
#8  0x96f54155 in _pthread_start ()
#9  0x96f54012 in thread_start ()

Thread 2 (process 6617 thread 0x1103):
#0  0x96f232e6 in semaphore_timedwait_signal_trap ()
#1  0x96f552af in _pthread_cond_wait ()
#2  0x96f9faab in pthread_cond_timedwait ()
#3  0x0001463b in g_cond_timed_wait_posix_impl ()
#4  0x000e15d1 in g_async_queue_pop_intern_unlocked ()
#5  0x000e18a1 in g_async_queue_timed_pop ()
#6  0x0036b514 in network_io_thread ()
#7  0x0012fc98 in g_thread_create_proxy ()
#8  0x96f54155 in _pthread_start ()
#9  0x96f54012 in thread_start ()

Thread 1 (process 6617 local thread 0x2d03):
#0  0x96f539c6 in kevent ()
#1  0x00046c96 in kq_dispatch ()
#2  0x0003a2b0 in event_base_loop ()
#3  0x0003a639 in event_base_dispatch ()
#4  0x0000bfbe in chassis_mainloop ()
#5  0x000031a6 in main_cmdline ()
#6  0x000019cb in _start ()
#7  0x000018f9 in start ()
[18 Jun 2009 16:03] Diego Medina
revno: 1365
revision-id: diego.medina@sun.com-20090618155951-npelchq2g3d2p1g9
parent: merlin@dl380-g5-a.mysql.com-20090617222042-k3eqrv4063edjzmr
committer: Diego Medina <diego.medina@sun.com>
branch nick: local-trunk
timestamp: Thu 2009-06-18 11:59:51 -0400
message:
  Moved master = mysql_init(NULL) before mysql_ssl_set().
  Fixes Bug#45591
[19 Jun 2009 3:36] Keith Russell
Patch installed in versions => 2.1.0.1067.
[19 Jun 2009 17:31] Diego Medina
Verified fixed on 2.1.0.1067

On the agent logs, you now see:

(message) job_collect_mysql.c:540: [mysql->127.0.0.1:22547] (master-uuid) mysql-query(SELECT value FROM mysql.inventory WHERE name='uuid')
(warning) job_collect_mysql.c:838: [mysql] master-uuid = 1a8ebf4e-850e-4e11-81fb-84fbec9abfcc