Linux – Failed to start process from bash script

Failed to start process from bash script… here is a solution to the problem.

Failed to start process from bash script


have a central server and I periodically launch a script (from cron) to check the remote server. The checks are performed continuously, so first one server, then another….

This script (

from the central server) starts another script (let’s call it on the remote machine, and that script (on the remote machine) is doing something like this:

processID=`pgrep "processName"` 
kill $processID

The process is terminated and then started in script


pidof "processName"

if [ ! $? -eq 0 ]; then
    nohup "processName" "processArgs" >> "processLog" &
    pidof "processName"
    if [! $? -eq 0]; then
        echo "Error: failed to start process"

The actual binaries for the,, and the processes it starts are on NFS mounted from a central server.

Sometimes, the process I tried to start in didn’t start and I got the error. The strange thing is that it is random, sometimes processes on one machine start while others do not. I’m checking about 300 servers and the error is always random.

One more thing, the remote servers are located in 3 different geographical locations (2 in the US and 1 in Europe), with the central server in Europe. So far, I’ve found that servers in the US have more bugs than those in Europe.

First I thought the bug was related to kill, so I added a sleep between kill and, but that didn’t make any difference.

Also, it seems that the process in did not start at all, or something happened at startup because there is no output in the log file and should be the output in the log file.

So, I’m here for help

Has anyone ever encountered this kind of problem, or knows what went wrong?

Thanks for your help


(Sorry, but my original answer was rather wrong…) Here is the correction)

Use $? Getting the exit status of a background process in the results in an error. Male :

Special Parameters
?      Expands to the status of the most recently executed foreground

As you mentioned in your comment, the correct way to get background process exit status is to use wait built-in. But for this must be dealt with SIGCHLD signal.

I made a small test environment for this to show how it works:

This is a script run as a background process:

[ "$1" == -x ] && exit 1;
while ((++c<=cnt)); do echo "SLEEPING [$$]: $c/$cnt"; sleep 5; done

If the parameter is -x, then it exits with exit status 1 to simulate an error. If arg is num, wait for num*5 seconds to print SLEEPING [<PID>] <counter>/<max_counter> to standard output.

The second is the launcher script. It starts 3 runs scripts in the background and prints their exit status:


handle_chld() {
    local tmp=()
    for i in ${!pids[@]}; do
        if [ ! -d /proc/${pids[i]} ]; then
            wait ${pids[i]}
            echo "Stopped ${pids[i]}; exit code: $?"
            unset pids[i]

set -o monitor
trap "handle_chld" CHLD

# Start background processes
./ 3 &
./ 2 &
./ -x &

# Wait until all background processes are stopped
while [ ${#pids[@]} -gt 0 ]; do echo "WAITING FOR: ${pids[@]}"; sleep 2; done

The handle_chld function will process the SIGCHLD signal. Setting the monitor option enables non-interactive scripts to receive SIGCHLD. Then set traps for the SIGCHLD signal.

Then start the background process. All of their PIDs are recorded in large batches in PIDS. If SIGCHLD is received, check in the /proc/ directory which child process is stopped (the missing one) (you can also use the built-in kill -0 <PID>.) check). After waiting, the exit state of the background process is stored in the famous $? Middle. Pseudo variables.

The main script waits for all pids to stop (otherwise it cannot get the exit status of its child processes) and stops itself.

Sample output:

WAITING FOR: 13102 13103 13104
SLEEPING [13103]: 1/2
SLEEPING [13102]: 1/3
Stopped 13104; exit code: 1
WAITING FOR: 13102 13103
WAITING FOR: 13102 13103
SLEEPING [13103]: 2/2
SLEEPING [13102]: 2/3
WAITING FOR: 13102 13103
WAITING FOR: 13102 13103
SLEEPING [13102]: 3/3
Stopped 13103; exit code: 0
Stopped 13102; exit code: 0

It can be seen that the exit codes are incorrect.

Hope this helps!

Related Problems and Solutions