When the parent leaves the zombie, Python subprocess.communicate hangs

When the parent leaves the zombie, Python subprocess.communicate hangs … here is a solution to the problem.

When the parent leaves the zombie, Python subprocess.communicate hangs

I’m trying to create a child process A using Popen and a thread to communicate with using Popen.communicate. The main process will use Thread.join to wait for the thread with the specified timeout and kill A after the timeout expires, which also causes the thread to die.

However, this doesn’t seem to work when A itself spawns more child processes B, C, and D> with a different group of processes than A who refuses to die. Even after A is dead and marked as defunct, and even after the main process uses os.waitpid() to get A so that it no longer exists, the thread refuses to join the main thread.

Popen.communicate ends its return only after all children B, C, and D have been killed.

Is this behavior actually expected by the module? Recursive waiting can be useful in some cases, but as the default behavior of Popen.communicate, it is certainly not appropriate. If this is expected behavior, is there any way to override it?

Here’s a very simple example:

from subprocess import PIPE, Popen
from threading import Thread
import os
import time
import signal

DEVNULL = open(os.devnull, 'w')

proc = Popen(["/bin/bash"], stdin=PIPE, stdout=PIPE,
             stderr=DEVNULL, start_new_session=True)

def thread_function():
    print("Entering thread")
    return proc.communicate(input=b"nohup sleep 100 &\nexit\n")

thread = Thread(target=thread_function)
thread.start()
time.sleep(1)
proc.kill()
while True:
    thread.join(timeout=5)
    if not thread.is_alive():
        break
    print("Thread still alive")

This is on Linux.

Solution

I think this comes from a fairly natural way of writing popen.communicate methods in Linux. Proc.communicate() seems to read the stdin file descriptor, and when the process ends, it will return EOF. Then wait to get the exit code of the process.

In your example, the sleep process inherits the standard input file descriptor from the bash process. Therefore, when the bash process terminates, popen.communicate does not get EOF on the stdin pipe because sleep still has it open. The easiest way to resolve this issue is to change the communication line to:

return proc.communicate(input=b"nohup sleep 100 >/dev/null&\nexit\n")

This causes your thread to end immediately when bash terminates… Due to the exit, not your proc.kill, in this case. However, if you use the exit statement or the proc.kill call, sleep is still running after the bash ends. If you want to pass the sleep too, I will use it

os.killpg(proc.pid,15)

instead of proc.kill(). If B, C, and D change groups, the more general problem that kills B, C, and D is a more complex one.

Additional data:
I can’t find any official documentation on this proc.communicate method, but I forgot the most obvious 🙂 I’m answering this found with the help of . . docs for communicate:

Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate.

You are stuck in Step 2: Read the end of the file because sleep keeps the pipe open.

Related Problems and Solutions