Bug #32277 PCI message-signed-interrupts cause /proc/stat buffer overflows in agent
Submitted: 12 Nov 2007 11:58 Modified: 29 Feb 2008 16:26
Reporter: Domas Mituzas Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Enterprise Monitor: Agent Severity:S3 (Non-critical)
Version:1.2.0.7879 OS:Linux (2.6.21.2)
Assigned to: Jan Kneschke CPU Architecture:Any
Triage: D1 (Critical) / R2 (Low) / E2 (Low)

[12 Nov 2007 11:58] Domas Mituzas
Description:
Message Signaled Interrupt (MSI), as described in the PCI Local Bus
Specification Revision 2.3 or latest, is an optional feature, and a
required feature for PCI Express devices. 

It is seen by Linux-kernel with large interrupt numbers (e.g. 8409), and this makes /proc/stat intr line insanely large. 

This crashes the SIGAR library:

(gdb) run
Starting program: /opt/mysql/enterprise/agent/lib/mysql-service-agent/mysql-serv
ice-agent -f /opt/mysql/enterprise/agent/etc/mysql-service-agent.ini -D
warning: Lowest section in system-supplied DSO at 0xffffe000 is .hash at ffffe0b
4
[Thread debugging using libthread_db enabled]
[New Thread 4157900480 (LWP 23383)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 4157900480 (LWP 23383)]
0x080e33d6 in sigar_skip_token (p=0x0) at agent/src/sigar/src/sigar_util.c:59
59 agent/src/sigar/src/sigar_util.c: No such file or directory.
in agent/src/sigar/src/sigar_util.c
(gdb) bt full
#0 0x080e33d6 in sigar_skip_token (p=0x0)
at agent/src/sigar/src/sigar_util.c:59
No locals.
#1 0x080e3561 in sigar_os_open (sigar=0x81fb2e4)
at agent/src/sigar/src/os/linux/linux_sigar.c:143
buffer = "cpu 15576080 1863687 5219297 312565295 22964945 55369 328215 0\ncpu0 9157911 921495 2612893 154977560 11422172 28476 165786 0\ncpu1 6418168 942192 2606403 157587735 11542773 26892 162429 0\nintr 2235256"...
ptr = 0x0
i = 2687360
status = 0
sb = {st_dev = 0, __pad1 = 0, st_ino = 0, st_mode = 0, st_nlink = 0,
st_uid = 0, st_gid = 0, st_rdev = 0, __pad2 = 0, st_size = 0,
st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0},
st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0},
__unused4 = 0, __unused5 = 0}
#2 0x080dfd20 in sigar_open (sigar=0x81fb200)
at agent/src/sigar/src/sigar.c:35
status = 2687360
#3 0x080677d3 in main_cmdline (argc=1, argv=0xffaf9c64)
at agent/src/merlin-agent.c:1865
print_uuid = 0
print_version = 0
---Type <return> to continue, or q <return> to quit---
option_ctx = (GOptionContext *) 0x81fb2d0
gerr = (GError *) 0x0
config_file = (
gchar *) 0x81fbc50 "/opt/mysql/enterprise/agent/etc/mysql-service-agent.ini"
g = (global *) 0x81fb2d0
ret = 0
no_daemon = 1
entries = {{long_name = 0x80e8ecb "defaults-file",
short_name = 102 'f', flags = 0, arg = G_OPTION_ARG_FILENAME,
arg_data = 0xffaf9b1c,
description = 0x80e95e8 "location of the mysql-service-agent.ini",
arg_description = 0x80e8ed9 "<filename>"}, {
long_name = 0x80f9b84 "version", short_name = 86 'V', flags = 0,
arg = G_OPTION_ARG_NONE, arg_data = 0xffaf9b18,
description = 0x80e8ee4 "print version and exit", arg_description = 0x0}, {
long_name = 0x80e8efb "print-uuid", short_name = 117 'u', flags = 0,
arg = G_OPTION_ARG_NONE, arg_data = 0xffaf9b14,
description = 0x80e9610 "print the uuid entry for appending it to the configfile", arg_description = 0x0}, {long_name = 0x80e8f06 "no-daemon",
short_name = 68 'D', flags = 0, arg = G_OPTION_ARG_NONE,
arg_data = 0xffaf9b10,
description = 0x80e9648 "run agent in foreground for debugging",
---Type <return> to continue, or q <return> to quit---
arg_description = 0x0}, {long_name = 0x0, short_name = 0 '\0', flags = 0,
arg = G_OPTION_ARG_NONE, arg_data = 0x0, description = 0x0,
arg_description = 0x0}}
#4 0x08067930 in main (argc=4, argv=0xffaf9c64)
at agent/src/merlin-agent.c:1932
check_str = (const gchar *) 0xf7d48690 "\200\001)"

How to repeat:
get a pci express card which would have high interrupt number, then observe such /proc/stat:

cpu  15796624 1863687 5256913 313572511 23841171 55943 337632 0
cpu0 9269901 921495 2632803 155473539 11864899 28769 170525 0
cpu1 6526722 942192 2624109 158098972 11976271 27174 167107 0
intr 2253301706 1817966488 3481 0 13 11 0 5 0 0 0 0 0 4 0 63408846 0 58221846 12724222 0 0 [thousands of 0s] 0 0 0 0 0 300976790 0 0 0 0 0 0
ctxt 605988946
btime 1192808591
processes 414102
procs_running 1
procs_blocked 1

And the device would be seen in /proc/interrupts as:

 8409: 167154295 167134445 PCI-MSI-edge eth0

Suggested fix:
handle large buffers
[12 Nov 2007 22:14] Jan Kneschke
This is fixed in the upstream and we'll merged it for the next release.
[7 Dec 2007 23:41] Jan Kneschke
The patch has been added to trunk/ in [8504] and [8505]
[26 Feb 2008 16:45] Sloan Childers
r8935 - merged revisions 8504 and 8505 from trunk to development-1.2.1 (SIGAR update)
[28 Feb 2008 12:14] Carsten Segieth
- mysqlserviceagent-1.3.0.8933-linux-rhas3-x86_64-installer.bin has been tested successfull by a customer
[28 Feb 2008 21:19] Carsten Segieth
tested OK with 1.3.0.8939
[29 Feb 2008 16:26] Peter Lavin
Thank you for your bug report. This issue has been addressed in the documentation. The updated documentation will appear on our website shortly, and will be included in the next release of the relevant products.
Added to the changelog for version 1.3.