Bug #32844 | some started agents do not appear on dashboard until Tomcat is restarted | ||
---|---|---|---|
Submitted: | 29 Nov 2007 14:07 | Modified: | 7 Jul 2008 12:21 |
Reporter: | Carsten Segieth | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Enterprise Monitor: Server | Severity: | S2 (Serious) |
Version: | 1.3.0.8384 | OS: | Any |
Assigned to: | Eric Herman | CPU Architecture: | Any |
[29 Nov 2007 14:07]
Carsten Segieth
[5 Dec 2007 13:26]
Eric Herman
this has proven to be a _very_ difficult problem to reproduce reliably in development. However there are a few findings: (0) this is a "known" issue: quite some time ago we determined that we had similar issues and were able to "solve" them by introducing a delay of 1 second between agent startups in scripts that start many agents at once. This hard to reproduce in test situation seems to only occur if, after a fresh start of the MEM dashboard, many agents all ping in at the same time. This is a scenario which isn't likely in the "real world" and is much more likely in testing situations. Perhaps this can be viewed a documentation issue? (1) Once contributing problem seems to be some contention around getting a database connection from the connection pool when the number of agents exceeds the the maximum number of database connections. Normally this is not a problem since incoming requests will wait until a thread becomes available, however, upon initial startup, the first listInventory requests are relatively long-running processes, and several simultanious ones all stack up behind some mutexes. (2) Another contributing problems seems to be contention around the ItemsCache initialization. We may be able to reduce contention by replacing generic synchronization with a ReentrantReadWriteLock. (3) currently we re-try 3 times on MySQLTransactionRollbackException, we might wish to tune that some-what.
[6 Dec 2007 17:31]
Sloan Childers
This has been an intermittent problem since the 1.0 release. There is too much risk to try to fix this for the 1.3 release. Since no customers have reported this issue I'm going to move it forward to the 2.0 release for re-testing. Sloan
[7 Jul 2008 12:21]
Carsten Segieth
Problem never occured in any of my 2.0 tests, I'll re-open the problem it it does occur.