| Bug #46320 | CPCD: checking if processes are running is fragile, and depending on scheduling | ||
|---|---|---|---|
| Submitted: | 21 Jul 2009 11:59 | Modified: | 10 Feb 2010 4:17 |
| Reporter: | Jørgen Austvik | Email Updates: | |
| Status: | Won't fix | Impact on me: | |
| Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S3 (Non-critical) |
| Version: | OS: | Any | |
| Assigned to: | Assigned Account | CPU Architecture: | Any |
[28 Sep 2009 4:40]
jack andrews
i noticed this while porting cpcd to windows. as i can reproduce this easily, i'll see if there's an easy fix.
[1 Feb 2010 10:44]
jack andrews
what do you think of something like this?
=== modified file 'storage/ndb/src/cw/cpcd/CPCD.hpp'
--- storage/ndb/src/cw/cpcd/CPCD.hpp 2010-01-29 14:42:03 +0000
+++ storage/ndb/src/cw/cpcd/CPCD.hpp 2010-02-01 10:33:39 +0000
@@ -56,7 +56,8 @@
STOPPED = 0,
STARTING = 1,
RUNNING = 2,
- STOPPING = 3
+ STOPPING = 3,
+ STARTED = 4
};
enum ProcessType {
=== modified file 'storage/ndb/src/cw/cpcd/Process.cpp'
--- storage/ndb/src/cw/cpcd/Process.cpp 2010-01-29 14:42:03 +0000
+++ storage/ndb/src/cw/cpcd/Process.cpp 2010-02-01 10:33:39 +0000
@@ -659,6 +659,13 @@
m_status = RUNNING;
return 0;
}
+
+ if(m_status == STARTED)
+ {
+ m_status = STOPPED;
+ return 0
+ }
+
m_status = STOPPED;
return -1;
[3 Feb 2010 5:40]
jack andrews
returning -1 means process starting failed.
but a process can start and finish before
getting to this code.
--- storage/ndb/src/cw/cpcd/Process.cpp 2010-01-29 14:42:03 +0000
+++ storage/ndb/src/cw/cpcd/Process.cpp 2010-02-03 04:40:07 +0000
@@ -657,11 +657,14 @@ CPCD::Process::start() {
if(isRunning()){
m_status = RUNNING;
- return 0;
}
- m_status = STOPPED;
+ else
+ {
+ logger.debug("process %d (pid %d) ran for a short time", m_id, m_pid);
+ m_status = STOPPED;
+ }
- return -1;
+ return 0;
}
void
[3 Feb 2010 10:44]
Magnus Blåudd
Looks promising but how do we know if the process failed to start? I.e was not short lived.
[10 Feb 2010 4:17]
jack andrews
it appears that this 'feature' [short running processes
are considered failed processes] does not affect the
running of autotest.
if the feature appears later in autotest or in other
uses of cpcd, we can look at this again.
i had a prelimiary patch. essentially:
. on linux, if exec returns in the child process,
write a file named "[pid or id].failed".
in CPCD::Process::start(), if isRunning()
is false, check for existence of .failed
file, returning -1, otherwise STOPPED.
. on windows, fix is trivial.
[13 Mar 2014 13:37]
Omer Barnir
This bug is not scheduled to be fixed at this time.

Description: In CPCD::Process::start(): switch(m_processType){ case TEMPORARY:{ .. do_exec(); .. if(isRunning()){ m_status = RUNNING; return 0; } m_status = STOPPED; return -1; ATRT log: 2009-07-20 19:12:25 [ndb_atrt] DEBUG -- system(atrt-setup.sh tyr16 /export/home2/tmp/ndb/cr_autotest/run-2-dn-mysql-5.1-telco-6.3/run/ndb_api.1/ /export/home2/tmp/ndb/cr_autotest/run-2-dn-mysql-5.1-telco-6.3/run/ndb_api.1) 2009-07-20 19:12:26 [ndb_atrt] ERROR -- Unable to start process: Failed to start CPCD log: 2009-07-20 19:12:26 [ndb_cpcd] INFO -- Starting 2906: 4-ndb_api 2009-07-20 19:12:26 [ndb_cpcd] DEBUG -- Started temporary 2906 : pid=343 2009-07-20 19:12:27 [ndb_cpcd] DEBUG -- kill pid: -343 : No such process 2009-07-20 19:12:27 [ndb_cpcd] INFO -- Process list saved as 'ndb_cpcd.db' 2009-07-20 19:12:27 [ndb_cpcd] DEBUG -- Sent SIGTERM to pid -326 2009-07-20 19:12:27 [ndb_cpcd] INFO -- Process list saved as 'ndb_cpcd.db' 2009-07-20 19:12:27 [ndb_cpcd] CRITICAL -- Stopping process with bogus pid: -1 id: 7403 What we see is that the process is started and get PID 343. It then exits before we execute isRunning(). The isRunning( called around Process.cpp:436 returns false-1 (the process is not running any more), and CPCD::startProcess() returns "Failed to start" to the client. How to repeat: Use cpcd to start short lived process Suggested fix: I think calling isRunning() (at least for temporary processes) is wrong, since the process could be scheduled in and exit before the check. Furthermore I think it would be good if CPCD returned a better error message, along the lines of "Failed to start '%s' in directory: '%s'. Exit status: %d"