Bug #52338 agent crashes getting systemversion via Gestalt on osx server 10.6.2
Submitted: 24 Mar 2010 16:16 Modified: 9 Jan 2015 10:32
Reporter: Shannon Wade Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Agent Severity:S1 (Critical)
Version:2.1.1.1144 OS:MacOS (10.6.2 server)
Assigned to: Jan Kneschke CPU Architecture:Any

[24 Mar 2010 16:16] Shannon Wade
Description:
OSX server crashdump is always identical:

Exception Type:  EXC_BAD_ACCESS (SIGBUS)
Exception Codes: KERN_PROTECTION_FAILURE at 0x0000000000000104
Crashed Thread:  3

Thread 3 Crashed:
0   libSystem.B.dylib             	0x93c2b38d _dispatch_wakeup + 91
1   libSystem.B.dylib             	0x93c2ae96 _dispatch_source_create2 + 194
2   libSystem.B.dylib             	0x93c2d72b dispatch_source_machport_create + 60
3   ...ple.CoreServices.CarbonCore	0x94fdb21f _initSharedCache() + 131
4   libSystem.B.dylib             	0x93c25090 pthread_once + 82
5   ...ple.CoreServices.CarbonCore	0x94fbe83c scCreateSystemServiceVersion + 82
6   ...ple.CoreServices.CarbonCore	0x94fdb005 FileIDTreeGetCachedPort + 299
7   ...ple.CoreServices.CarbonCore	0x94fbf89c FSNodeStorageGetAndLockCurrentUniverse + 39
8   ...ple.CoreServices.CarbonCore	0x94fcddb9 FileIDTreeGetVRefNumForDevice + 29
9   ...ple.CoreServices.CarbonCore	0x94fcdd32 FSMount::FSMount(unsigned int, FSMountNumberType, short*) + 62
10  ...ple.CoreServices.CarbonCore	0x94fcc1b4 PathGetObjectInfo(char const*, unsigned long, unsigned long, short*, unsigned long*, unsigned long*, char*, unsigned long*, unsigned char*) + 314
11  ...ple.CoreServices.CarbonCore	0x94fcbfb4 FSPathMakeRefInternal(unsigned char const*, unsigned long, unsigned long, FSRef*, unsigned char*) + 134
12  ...ple.CoreServices.CarbonCore	0x94fcbf2c FSPathMakeRef + 47
13  com.apple.CoreFoundation      	0x91cd7e74 __CFCarbonCore_FSPathMakeRef + 68
14  com.apple.CoreFoundation      	0x91cd7bbf _CFGetFSRefFromURL + 767
15  com.apple.CoreFoundation      	0x91cd78b2 CFURLGetFSRef + 34
16  com.apple.CoreFoundation      	0x91cd76eb _CFBundleCopyInfoDictionaryInResourceForkWithAllocator + 75
17  com.apple.CoreFoundation      	0x91cd7694 _CFBundleCopyInfoDictionaryInResourceFork + 36
18  ...ple.CoreServices.CarbonCore	0x94fd18df GetBugsForOurBundleIDFromCoreservicesd + 568
19  ...ple.CoreServices.CarbonCore	0x94fd164c _CSCheckFix + 20
20  ...ple.CoreServices.CarbonCore	0x94ff6e72 _Gestalt_SystemVersion + 781
21  ...ple.CoreServices.CarbonCore	0x94ff6753 Gestalt + 162
22  libsigar.0.dylib              	0x0233e4ba sigar_os_sys_info_get + 154
23  libsigar.0.dylib              	0x0232eb29 sigar_sys_info_get + 57
24  libagent.dylib                	0x0034f30f agent_dc_os_os_update_values + 63
25  libagent.dylib                	0x0034c3b6 job_collect_get_value + 214
26  libagent.dylib                	0x003506c6 job_collect_os_thread + 246
27  libglib-2.0.0.1600.6.dylib    	0x000df8de g_thread_create_proxy + 174
28  libSystem.B.dylib             	0x93c33fbd _pthread_start + 345
29  libSystem.B.dylib             	0x93c33e42 thread_start + 34

oddly agent-run-os-tests is fine calling the same sigar_sys_info_get:

sigar-test-all.c.554 (test_sigar_sys_info_get): 
  sysinfo.names = MacOSX
  sysinfo.versions = 10.6.2
  sysinfo.archs = i386
  sysinfo.machines = i386
  sysinfo.descriptions = Mac OS X Snow Leopard
  sysinfo.patch_levels = unknown
  sysinfo.vendors = Apple
  sysinfo.vendor_versions = 10.6
  sysinfo.vendor_names = Mac OS X
  sysinfo.vendor_code_name = Snow Leopard

This would be our 2.1.1.1144 agent build which is 32bit running on osx server 10.6.2. Agent crashes, then receiveds a dupuuid from the service manager, running in log-level=debug shows:

2010-02-26 12:56:43: (debug) job (task 33016 (collect_os)) executes again in 300 sec
2010-02-26 12:56:44: (debug) chassis.c:282: 3139 returned: 3139
2010-02-26 12:56:44: (message) chassis.c:304: [angel] PID=3139 died on signal=10 (it used 0 kBytes max) ... waiting 3min before restart
2010-02-26 12:56:46: (message) chassis.c:259: [angel] we try to keep PID=3146 alive

---
2010-02-26 12:59:13: (debug) scheduler.c.520: scheduling collect_os for os::net->netmask
2010-02-26 12:59:14: (debug) chassis.c:282: 3146 returned: 3146
2010-02-26 12:59:14: (message) chassis.c:304: [angel] PID=3146 died on signal=10 (it used 0 kBytes max) ... waiting 3min before restart
2010-02-26 12:59:16: (message) chassis.c:259: [angel] we try to keep PID=3183 alive

How to repeat:
Unable to repeat as we don't have access to a '10.6.2 osx server', am unable to repeat on normal osx 10.6.2 client.

Suggested fix:
Needs additional testing specifically on osx 10.6.2 server version. Our builds are 10.5 32bit and assumed to work ok.
[3 May 2010 18:27] Jan Kneschke
Either:

  $ gdb --args ...mysql-monitor-agent --defaults-file=...
  (gdb) handle SIGPIPE nostop noprint pass
  (gdb) run
  ... on crash ...
  (gdb) thread apply all bt