MySQL Bugs: #46222: Agent memory leak (sigar related)

Bug #46222	Agent memory leak (sigar related)
Submitted:	16 Jul 2009 13:57	Modified:	29 Jul 2009 9:04
Reporter:	Diego Medina	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Enterprise Monitor: Agent	Severity:	S1 (Critical)
Version:	2.0.5.x 2.1.0.1067	OS:	Linux (Mac OS X, *BSD)
Assigned to:	Jan Kneschke	CPU Architecture:	Any

Description:
The agent has 35MB of ram usage increase every 5min (on the clock), as in, every 5 minutes, it will jump.

How to repeat:
We are still getting information from the customer, but it seems related to;

* it happens if the "device" that in referenced in "mount" isn't stat()able.
* the call tracks "local" mount-points and exits without free()ing if the mount-point can't be stat()ed

valgrind output

==18712== 
==18712== 59,421,120 bytes in 331 blocks are possibly lost in loss record 93 of 94
==18712==    at 0x4C22741: realloc (vg_replace_malloc.c:429)
==18712==    by 0x7477871: sigar_file_system_list_grow (sigar.c:383)
==18712==    by 0x747DDDE: sigar_file_system_list_get (linux_sigar.c:1133)
==18712==    by 0x7479CCE: sigar_iodev_get (sigar_util.c:405)
==18712==    by 0x747E093: get_iostat_proc_dstat (linux_sigar.c:1214)
==18712==    by 0x747E725: sigar_disk_usage_get (linux_sigar.c:1391)
==18712==    by 0x747EAF3: sigar_file_system_usage_get (linux_sigar.c:1488)
==18712==    by 0x6A9BD29: agent_dc_os_fs_update_values (job_collect_os.c:756)
==18712==    by 0x6A9A2B2: job_collect_get_value (job_collect.c:81)
==18712==    by 0x6A9DD6C: job_collect_os_thread (job_collect_os.c:1532)
==18712==    by 0x60B82A7: g_thread_create_proxy (gthread.c:635)
==18712==    by 0x4F30FC6: start_thread (in /lib/libpthread-2.7.so)
==18712== 
==18712== 
==18712== 91,555,200 bytes in 510 blocks are definitely lost in loss record 94 of 94
==18712==    at 0x4C22741: realloc (vg_replace_malloc.c:429)
==18712==    by 0x7477871: sigar_file_system_list_grow (sigar.c:383)
==18712==    by 0x747DDDE: sigar_file_system_list_get (linux_sigar.c:1133)
==18712==    by 0x7479CCE: sigar_iodev_get (sigar_util.c:405)
==18712==    by 0x747E093: get_iostat_proc_dstat (linux_sigar.c:1214)
==18712==    by 0x747E725: sigar_disk_usage_get (linux_sigar.c:1391)
==18712==    by 0x747EAF3: sigar_file_system_usage_get (linux_sigar.c:1488)
==18712==    by 0x6A9BD29: agent_dc_os_fs_update_values (job_collect_os.c:756)
==18712==    by 0x6A9A2B2: job_collect_get_value (job_collect.c:81)
==18712==    by 0x6A9DD6C: job_collect_os_thread (job_collect_os.c:1532)
==18712==    by 0x60B82A7: g_thread_create_proxy (gthread.c:635)
==18712==    by 0x4F30FC6: start_thread (in /lib/libpthread-2.7.so)
==18712== 
==18712== LEAK SUMMARY:
==18712==    definitely lost: 91,555,200 bytes in 510 blocks.
==18712==      possibly lost: 59,421,120 bytes in 331 blocks.
==18712==    still reachable: 83,433 bytes in 354 blocks.
==18712==         suppressed: 0 bytes in 0 blocks.

Jan Kneschke writes: 
==18712== 91,555,200 bytes in 510 blocks are definitely lost in loss record 94 of 94
==18712==    at 0x4C22741: realloc (vg_replace_malloc.c:429)
==18712==    by 0x7477871: sigar_file_system_list_grow (sigar.c:383)
==18712==    by 0x747DDDE: sigar_file_system_list_get (linux_sigar.c:1133)
==18712==    by 0x7479CCE: sigar_iodev_get (sigar_util.c:405)
==18712==    by 0x747E093: get_iostat_proc_dstat (linux_sigar.c:1214)
==18712==    by 0x747E725: sigar_disk_usage_get (linux_sigar.c:1391)
==18712==    by 0x747EAF3: sigar_file_system_usage_get (linux_sigar.c:1488)
==18712==    by 0x6A9BD29: agent_dc_os_fs_update_values (job_collect_os.c:756)
==18712==    by 0x6A9A2B2: job_collect_get_value (job_collect.c:81)
==18712==    by 0x6A9DD6C: job_collect_os_thread (job_collect_os.c:1532)
==18712==    by 0x60B82A7: g_thread_create_proxy (gthread.c:635)
==18712==    by 0x4F30FC6: start_thread (in /lib/libpthread-2.7.so)

Jan Kneschke writes: 
Mail sent to Hyperic:

Hi Doug,

On Linux we have a mem-leak for one of our customers:

==18712== 91,555,200 bytes in 510 blocks are definitely lost in loss record 94 of 94
==18712==    at 0x4C22741: realloc (vg_replace_malloc.c:429)
==18712==    by 0x7477871: sigar_file_system_list_grow (sigar.c:383)
==18712==    by 0x747DDDE: sigar_file_system_list_get (linux_sigar.c:1133)
==18712==    by 0x7479CCE: sigar_iodev_get (sigar_util.c:405)
==18712==    by 0x747E093: get_iostat_proc_dstat (linux_sigar.c:1214)
==18712==    by 0x747E725: sigar_disk_usage_get (linux_sigar.c:1391)
==18712==    by 0x747EAF3: sigar_file_system_usage_get (linux_sigar.c:1488)
...

in sigar_iodev_get (sigar_util.c:405)

   status = sigar_file_system_list_get(sigar, &fslist);

   if (status != SIGAR_OK) {
       sigar_log_printf(sigar, SIGAR_LOG_DEBUG,
                        "[iodev] file_system_list failed: %s",
                        sigar_strerror(sigar, status));
       return NULL;
   }

   for (i=0; i<fslist.number; i++) {
       sigar_file_system_t *fsp = &fslist.data[i];

       if (fsp->type == SIGAR_FSTYPE_LOCAL_DISK) {
           int retval = stat(fsp->dir_name, &sb);
           sigar_cache_entry_t *ent;

           if (retval < 0) {
               if (debug) {
                   sigar_log_printf(sigar, SIGAR_LOG_DEBUG,
                                    "[iodev] inode stat(%s) failed",
                                    fsp->dir_name);
               }

>>> sigar_file_system_list_get() isn't destroyed

               return NULL; /* cant cache w/o inode */
           }

Testcase which should demonstrate leak:

mount something /test/mnt
chmod 0000 /test

You will see in the dashboard:
/test/mnt null (null free)

And agent size increasing rapidly.

Andy Bang writes: 
Should be in agent build 2.1.0.1079.

Diego Medina writes: 
This has been fixed on 2.1.0.1079

An entry was added to the 2.1.0 changelog:

The Agent had a memory leak. The memory consumption increased by 35MB every five minutes.