When do I need Queue.join()?
The Python 3 documentation shows an example of a worker thread using queues ( https://docs.python.org/3/library/queue.html):
def worker():
while True:
item = q.get()
if item is None:
break
do_work(item)
q.task_done()
q = queue. Queue()
threads = []
for i in range(num_worker_threads):
t = threading. Thread(target=worker)
t.start()
threads.append(t)
for item in source():
q.put(item)
# block until all tasks are done
q.join()
# stop workers
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
In this example, why do you need q.join()
? Do subsequent q.put(None) and t.join()
operations complete the same operation that blocks the main thread until the worker thread completes?
Solution
That’s how I understand this example.
Each worker loops indefinitely, always looking for something new from the queue. If the item it gets is None
, it breaks and returns control to main.
So, first we leave the program waiting queue empty. Each call to q.task_done()
marks a new project as completed. The code is stuck below, so we make sure each item is marked as complete.
# block until all tasks are done
q.join()
Then, below, we add None
items with the same number of workers to the queue (so we make sure that each worker gets one.) )
for i in range(num_worker_threads):
q.put(None)
Next, we join all threads. Because we assign a None
entry to each worker through the queue, they all break. We want to get stuck here before they all crash and regain control.
for t in threads:
t.join()
In doing so, we help avoid orphaned processes by ensuring that every item in the queue is processed, interrupting every worker thread when the queue is empty, and shutting down each worker thread before we continue with our code.