In a discussion about sharing memory between Python processes in a scientific computing context, several people expressed an interest in a short blog post:
The GIL stops threading being effective for CPU bound tasks in Python and so often-times, people use multiprocessing, an approach which unfortunately heavily restricts the ease of programming (for instance no functions or non-picklable objects can be passed), and the overheads of duplicating data (i.e. in a message queue) can often be too much for large datasets.
As part of my dissertation on procedural texture generation, I implemented parallelism in an extremely simple way using a shared image array in numpy. Although in a lot of cases, one would require better semantics, in the case of write-only, idempotent or very easily controlled access behaviour, this trick is pretty simple
import multiprocessing as mp import ctypes import numpy as np def work(buf_m, istart, iend): # Some work import random buf = np.frombuffer(buf_m.get_obj(), dtype=np.float32) # buf is always going to be flat now, so use np.reshape if you have to for i in range(istart, iend): buf[i] = random.normalvariate(0,2) def dispatch(length, n_processes): div = float(length)/n_processes assert div == int(div) div = int(div) array = mp.Array(ctypes.c_float, length) # This should be a float32 processes = [] istart = 0 iend = div for i in range(n_processes): p = mp.Process(target=work, args=(array, istart, iend)) istart += div iend += div processes.append(p) p.start() for p in processes: p.join() print "Finished!" import sys dispatch(int(1E8),int(sys.argv[1]))
With timings:
giles@MACHINE:~$ time python test.py 1 Finished! real 2m9.711s user 2m8.193s sys 0m1.458s giles@MACHINE:~$ time python test.py 10 Finished! real 1m33.350s user 5m49.876s sys 0m2.706s