An array of shared memory strings with multiprocessing
I’m trying to multi-process some existing code and I’m finding that using Pool
to pickling/unpickling data into processes is too slow. I think for my case the Manager
had the same issue because it did the same pickling behind the scenes.
To solve this problem, I tried to move to a shared memory array. To do this, I need an array of strings. multiprocessing. Array
seems to support ctypes.c_char_p
but I’m having a hard time extending that into an array of strings. Here are some of the many things I’ve tried.
#!/usr/bin/python
import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy
# Tested possible solutions
ver = 1
if 1==ver:
strings = mpsc. RawArray(ctypes.c_char_p, (' '*10, ' '*10, ' '*10, ' '*10))
elif 2==ver:
tmp_strings = [mpsc. RawValue(ctypes.c_char_p, ' '*10) for i in xrange(4)]
strings = mpsc. RawArray(ctypes.c_char_p, tmp_strings)
elif 3==ver:
strings = []
for i in xrange(4):
strings.append( mpsc. RawValue(ctypes.c_char_p, 10) )
def worker(args):
snum, lenarg = args
string = '%s' % snum
string *= lenarg
strings[snum] = string
return string
# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(3)]
print 'Testing version ', ver
print
print 'Single process'
for x in map(worker, data):
print '%10s : %s' % (x, list(strings))
print
print 'Multi-process'
pool = mp. Pool(3)
for x in pool.map(worker, data):
print '%10s : %s' % (x, list(strings))
print ' ', [isinstance(s, str) for s in strings]
Note that I’m using multiprocessing.sharedctypes
because I don’t need locking, it should work with multiprocessing. Arrays
are completely interchangeable
The problem with the above code is that the generated strings
object contains regular strings, not from mpsc. The
shared memory string for the RawArray constructor. With versions 1 and 2, you can see how the data is scrambled when working out-of-process (as expected). To me, version 3 initially looks like it works, but you can see =
just setting the object to a regular string, which while this works well for short-term testing, it creates problems in larger programs.
It seems that there should be a way to create an array of shared pointers where pointers point to strings in shared memory space. If you try to initialize it with the c_str_p
type, the c_void_p
type reports an error, and I haven’t successfully manipulated the underlying address pointer directly.
Any help would be appreciated.
Solution
First, your third solution doesn’t work because strings
haven’t been changed by the multiprocessing part, but have been modified by the single-processing part. You can check this by commenting on your individual process sections.
Secondly, this one will work:
import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy
strings = [mpsc. RawArray(ctypes.c_char, 10) for _ in xrange(4)]
def worker(args):
snum, lenarg = args
string = '%s' % snum
string *= lenarg
strings[snum].value = string
return string
# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(4)]
print 'Multi-process'
print "Before: %s" % [item.value for item in strings]
pool = mp. Pool(4)
pool.map(worker, data)
print 'After : %s' % [item.value for item in strings]
Output:
Multi-process
Before: ['', '', '', '']
After : ['0000000', '111111', '222', '3333']