Bug #25039 Memory stats not collected properly on Solaris 8/9/10 SPARC 64
Submitted: 13 Dec 2006 12:28 Modified: 4 May 2007 14:13
Reporter: Mark Leith Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Agent Severity:S2 (Serious)
Version:1.0.0 - 1.1.1.5336 OS:Solaris (Sol[8|9|10] SPARC64)
Assigned to: Kent Boortz CPU Architecture:Any
Tags: Agent, Memory, sparc64

[13 Dec 2006 12:28] Mark Leith
Description:
Memory usage statistics are not being collected properly on Solaris 10 SPARC 64. Total memory is reported correctly, however all other memory stats are reported incorrectly (0):

./sigar-test-all
agent/src/sigar-test-all.c.38 (unknown):
  mem.ram = 8192,
  mem.total = 0,
  mem.used = 0,
  mem.free = 0,
  mem.actual_free = 0,
  mem.actual_used = 0

"Advice

Use whatever system tools are available to you on mvdbprd02_8425:8425 (e.g. vmstat, perfmon, top, Task Manager, etc.) to investigate how and why memory is being used, so you can determine the appropriate action to take to improve the situation. The amount of free memory on xxxxxxxx:xxxx is getting low. Only 0 bytes of memory are free out of a total of 34359738368 bytes." 

How to repeat:
Run an agent on Solaris and wait for the heat chart rules to fire incorrectly as above.
[6 Jan 2007 15:37] Mark Leith
I think this has to do with:

a) the types we are declaring for pagesize or i (or both)
b) the format specifier that we are giving for the values (within sigar-test-all) at least

Here's what we have now (condensed) in code:

#include <unistd.h>
#include <stdio.h>
#include <sys/stat.h>

typedef unsigned long long my_uint64_t;

main()
{

  int i;
  int pagesize;
  my_uint64_t total;

  /* logic used in solaris_sigar.c */

  pagesize = 0;
  i = sysconf(_SC_PAGESIZE);
  while ((i >>= 1) > 0) {
    pagesize++;
  }

  /* again same logic in solaris_sigar.c */

  total = sysconf(_SC_PHYS_PAGES);
  total <<= pagesize;

  printf("  pagesize      = %lld,\n"
         "  total         = %lld,\n",
         pagesize,
         total);
}

-bash-3.00$ /usr/sfw/bin/gcc testmem.c -o testmem
-bash-3.00$ ./testmem 
  pagesize      = 55834574850,
  total         = 2,

Obviously wrong.. But, changing pagesize and i to be of the my_uint64_t (or sigar_uint64_t in SIGAR speak), and also changing the format specifier to %lld instead of %Ld (there seems to be a problem with this on Solaris), we get the "right" output (for total at least):

#include <unistd.h>
#include <stdio.h>
#include <sys/stat.h>

typedef unsigned long long my_uint64_t;

main()
{

  my_uint64_t i;
  my_uint64_t pagesize;
  my_uint64_t total;

  /* logic used in solaris_sigar.c */

  pagesize = 0;
  i = sysconf(_SC_PAGESIZE);
  while ((i >>= 1) > 0) {
    pagesize++;
  }

  /* again same logic in solaris_sigar.c */

  total = sysconf(_SC_PHYS_PAGES);
  total <<= pagesize;

  printf("  pagesize      = %lld,\n"
         "  total         = %lld,\n",
         pagesize,
         total);
}

-bash-3.00$ /usr/sfw/bin/gcc testmem.c -o testmem
-bash-3.00$ ./testmem 
  pagesize      = 13,
  total         = 8589934592,

-bash-3.00$ /usr/platform/$(uname -i)/sbin/prtdiag |grep  Memory
Memory size: 8GB
[6 Jan 2007 15:50] Mark Leith
OK looks like the format specifier in the agent is OK:

				switch (id) {
				case DC_MEM_TOTAL:
					g_string_printf(value, "%lld", mem.total);
					break;
				case DC_MEM_UNUSED:
					g_string_printf(value, "%lld", mem.actual_free);
					break;
				default:
					break;
				}

So is this down to the datatypes for pagesize etc.?
[8 Feb 2007 21:52] Jan Kneschke
The patch has been committed to SVN and is currently been tested.
[21 Feb 2007 14:39] Mark Leith
Verified fixed on 1.1.0.4785a
[21 Feb 2007 14:39] Mark Leith
Verified fixed on 1.1.0.4785a
[15 Mar 2007 14:12] MySQL Verification Team
Re-opened because problem with memory usage reproduced again with version 1.1.0.4876.

From agent log:
<![CDATA[merlin:os://pronssi/mem/mem?attrib=ram_unused]]></target><utc>2007-03-15T13:12:28.371Z</utc><value>0</value></datum></data></task>

Though there are ~10G of free memory (16G total).
[29 Mar 2007 16:47] Jan Kneschke
The issue has been tracked down to be a build-issue. -DSIZEOF_SIZE_T was != 8 on sol10 sparc64.
[19 Apr 2007 17:59] Jan Kneschke
a fix has been committed into svn [5192]
[20 Apr 2007 15:47] Carsten Segieth
not solved with 1.1.1.5214
From the log of a Solaris10-sparc-64bit agent, it looks that 'ram_unused' is still not there (the graph only shows one 'total' value):

<![CDATA[merlin:os://1.1.1.5214_23_solaris10-sparc-64bit_sol10-sparc-b_56/mem/mem?attrib=ram_unused]]></target><value><![CDATA[]]></value><frequency><![CDATA[00:01:00]]></frequency><utc><![CDATA[]]></utc></datum></data></task>

Log file is: /users/csegieth/mysql/network/agent/1.1.1.5214/solaris10-sparc-64bit/sol10-sparc-b/log/10.100.1.224.win2003a-x86.3351.56.log
[1 May 2007 21:55] Keith Russell
Corrected in ver => 1.1.1.5336
[4 May 2007 0:13] Kent Boortz
Now the configure check will set SIGAR_64BIT if
64 bit platform, more safely than simplistic test
in header file.
[4 May 2007 0:37] Bill Weber
needs to be tested :)
[4 May 2007 10:11] Carsten Segieth
tested with 1.1.1.5370 OK on solaris[8|9|10]-sparc64
[4 May 2007 14:13] Peter Lavin
Added to the changelog.