Sharing memory across Python processes using numpy arrays

In a discussion about sharing memory between Python processes in a scientific computing context, several people expressed an interest in a short blog post:

The GIL stops threading being effective for CPU bound tasks in Python and so often-times, people use multiprocessing, an approach which unfortunately heavily restricts the ease of programming (for instance no functions or non-picklable objects can be passed), and the overheads of duplicating data (i.e. in a message queue) can often be too much for large datasets.

As part of my dissertation on procedural texture generation, I implemented parallelism in an extremely simple way using a shared image array in numpy. Although in a lot of cases, one would require better semantics, in the case of write-only, idempotent or very easily controlled access behaviour, this trick is pretty simple

import multiprocessing as mp
import ctypes
import numpy as np

def work(buf_m, istart, iend): # Some work
    import random
    buf = np.frombuffer(buf_m.get_obj(), dtype=np.float32)
    # buf is always going to be flat now, so use np.reshape if you have to
    for i in range(istart, iend):
        buf[i] = random.normalvariate(0,2)

def dispatch(length, n_processes):
    div = float(length)/n_processes
    assert div == int(div)
    div = int(div)
    array = mp.Array(ctypes.c_float, length) # This should be a float32
    
    processes = []
    istart = 0
    iend = div
    for i in range(n_processes):
        p = mp.Process(target=work, args=(array, istart, iend))
        istart += div
        iend += div

        processes.append(p)
        p.start()

    for p in processes:
        p.join()
    print "Finished!"

import sys
dispatch(int(1E8),int(sys.argv[1]))

With timings:

giles@MACHINE:~$ time python test.py 1
Finished!

real	2m9.711s
user	2m8.193s
sys	0m1.458s
giles@MACHINE:~$ time python test.py 10
Finished!

real	1m33.350s
user	5m49.876s
sys	0m2.706s

Treo Blog

Our thoughts on Tech Consulting

Sharing memory across Python processes using numpy arrays

Leave a Reply Cancel reply