Python – An array of shared memory strings with multiprocessing

An array of shared memory strings with multiprocessing… here is a solution to the problem.

An array of shared memory strings with multiprocessing

I’m trying to multi-process some existing code and I’m finding that using Pool to pickling/unpickling data into processes is too slow. I think for my case the Manager had the same issue because it did the same pickling behind the scenes.

To solve this problem, I tried to move to a shared memory array. To do this, I need an array of strings. multiprocessing. Array seems to support ctypes.c_char_p but I’m having a hard time extending that into an array of strings. Here are some of the many things I’ve tried.

#!/usr/bin/python
import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy

# Tested possible solutions
ver = 1
if 1==ver:
    strings = mpsc. RawArray(ctypes.c_char_p, (' '*10, ' '*10, ' '*10, ' '*10))
elif 2==ver:
    tmp_strings = [mpsc. RawValue(ctypes.c_char_p, ' '*10) for i in xrange(4)]
    strings = mpsc. RawArray(ctypes.c_char_p, tmp_strings)
elif 3==ver:
    strings = []
    for i in xrange(4):
        strings.append( mpsc. RawValue(ctypes.c_char_p, 10) )

def worker(args):
    snum, lenarg = args
    string = '%s' % snum
    string *= lenarg
    strings[snum] = string
    return string

# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(3)]
print 'Testing version ', ver
print
print 'Single process'
for x in map(worker, data):
    print '%10s : %s' % (x, list(strings))
print

print 'Multi-process'
pool = mp. Pool(3)
for x in pool.map(worker, data):
    print '%10s : %s' % (x, list(strings))
    print '            ', [isinstance(s, str) for s in strings]

Note that I’m using multiprocessing.sharedctypes because I don’t need locking, it should work with multiprocessing. Arrays

are completely interchangeable

The problem with the above code is that the generated strings object contains regular strings, not from mpsc. The shared memory string for the RawArray constructor. With versions 1 and 2, you can see how the data is scrambled when working out-of-process (as expected). To me, version 3 initially looks like it works, but you can see = just setting the object to a regular string, which while this works well for short-term testing, it creates problems in larger programs.

It seems that there should be a way to create an array of shared pointers where pointers point to strings in shared memory space. If you try to initialize it with the c_str_p type, the c_void_p type reports an error, and I haven’t successfully manipulated the underlying address pointer directly.

Any help would be appreciated.

Solution

First, your third solution doesn’t work because strings haven’t been changed by the multiprocessing part, but have been modified by the single-processing part. You can check this by commenting on your individual process sections.

Secondly, this one will work:

import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy

strings = [mpsc. RawArray(ctypes.c_char, 10) for _ in xrange(4)]

def worker(args):
    snum, lenarg = args
    string = '%s' % snum
    string *= lenarg
    strings[snum].value = string
    return string

# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(4)]

print 'Multi-process'
print "Before: %s" % [item.value for item in strings]
pool = mp. Pool(4)
pool.map(worker, data)
print 'After : %s' % [item.value for item in strings]

Output:

Multi-process
Before: ['', '', '', '']
After : ['0000000', '111111', '222', '3333']

Related Problems and Solutions