Python – Why can Python’s Event.wait() be signaled on some systems but not on others?

Why can Python’s Event.wait() be signaled on some systems but not on others?… here is a solution to the problem.

Why can Python’s Event.wait() be signaled on some systems but not on others?

Consider the following Python script:

from datetime import datetime
from threading import Event

event = Event()
start = datetime.now()

try:
    event.wait(5)
except KeyboardInterrupt:
    print("caught Ctrl+C after %s" % (datetime.now() - start))

When I run it on Debian (specifically Docker’s python:3.6.5-stretch) and quickly press Ctrl+C, it immediately breaks:

# python mptest.py
^Ccaught Ctrl+C after 0:00:00.684854
# 

But when I run it on Alpine (specifically Docker’s python: 3.6.5-alpine 3.7) and quickly press Ctrl+C, it lets the whole wait finish :

/ # python mptest.py 
^Ccaught Ctrl+C after 0:00:05.000314
/ # 

What is the reason for this discrepancy? Is one of the systems incorrect?

Solution

Short version:

Python assumes that sem_timedwait returns EINTR when there is a signal interrupt while waiting. Glibc (Debian’s libc) does this, but POSIX says it’s optional, while musl (Alpine’s libc) doesn’t.

Long version:

Python’s events arebuilt around Condition is internal, and it itself is built around For the same reason, the following programs exhibit the same behavior using Lock alone:

from datetime import datetime
from threading import Lock

lock = Lock()
lock.acquire()
start = datetime.now()

try:
    lock.acquire(True, 5)
except KeyboardInterrupt:
    print("caught Ctrl+C after %s" % (datetime.now() - start))

From Python’s documentation:

Lock acquires can now be interrupted by signals on POSIX.

Assuming this document is correct

, this means that the behavior is correct on Debian and incorrect on Alpine.

Python’s acquire isbuilt around sem_timedwait (Assuming it exists, it exists on both Debian and Alpine.) If it doesn’t exist, it will be built around pthread_cond_ timedwait )。

The following C program demonstrates the inconsistencies of sem_timedwait when built on each system:

<pre class=”lang-c prettyprint-override”>#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <semaphore.h>
#include <time.h>
#include <errno.h>
#include <signal.h>

void handler(int sig) {
puts("in signal handler");
}

int main() {
struct sigaction sa;
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGALRM, &sa, NULL);

alarm(1);

struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
ts.tv_sec += 2;

sem_t sem;
sem_init(&sem, 0, 0);
sem_timedwait(&sem, &ts);

if(errno == EINTR) {
puts("Got interrupted by signal");
} else if(errno == ETIMEDOUT) {
puts("Timed out");
}
return 0;
}

On Debian, it exits after 1 second with “Got interrupted by signal”. On Alpine, it exits after 2 seconds with “Timeout”.

sem_timedwait is a libc function defined by POSIX In particular, it states that it “may” fail due to EINTR, not “should”. This means that neither glibc (Debian’s) nor musl (Alpine’s) are correct.

For historical reasons due to bugs in old kernels, musk made the conscious decision to not support EINTR where they don’t have to .

In my opinion, the error here is that Python relies on the optional features of POSIX. As it turns out, Python has been bitten by a similar issue. Previously, in the case of pthread_cond_timedwait was used due to the absence of semaphore. Additionally, this issue causes one of Python’s self-tests Failed when building against musl. I opened Python bug #34004 on this.

Related Problems and Solutions