## 3D distance vectorization

I need help vectorizing this code. Now, N=100, it takes a minute or so to run. I want to speed things up. I’ve done something like this for double loops, but never used 3D loops, and I’m having a hard time.

```
import numpy as np
N = 100
n = 12
r = np.sqrt(2)
x = np.arange(-N,N+1)
y = np.arange(-N,N+1)
z = np.arange(-N,N+1)
C = 0
for i in x:
for j in y:
for k in z:
if (i+j+k)%2==0 and (i*i+j*j+k*k!=0):
p = np.sqrt(i*i+j*j+k*k)
p = p/r
q = (1/p)**n
C += q
print '\n'
print C
```

### Solution

The meshgrid/where/indexing solution is already very fast. I made it about 65% faster. It’s not excessive, but I’ll explain it step by step :

I’m easiest to solve this problem because all the 3D vectors in the grid are columns in a large 2D `3 x M`

array. The meshgrid is the right tool to create all combinations (note that numpy version >= 1.7 is required for 3D `meshgrid`

), `vstack`

+ `reshape`

to transform the data into the desired form. Example:

```
>>> np.vstack(np.meshgrid(*[np.arange(0, 2)]*3)).reshape(3,-1)
array([[0, 0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 1, 1, 1, 1],
[0, 1, 0, 1, 0, 1, 0, 1]])
```

Each column is a 3D vector. Each of these eight vectors represents a corner of a `1x1x1`

cube (a 3D mesh with a length of 1 in step 1 and length 1 in all dimensions).

Let’s call this array `vectors`

(which contains all the 3D vectors that represent points in the mesh). Then, prepare a bool mask to select those vectors that meet your mod2 criteria:

```
mod2bool = np.sum(vectors, axis=0) % 2 == 0
```

`np.sum(vectors, axis=0)`

creates a `1 x M`

array containing the sum of elements for each column vector. Thus, `mod2bool`

is a `1 x M`

array with a bool value for each column vector. Now use this bool mask:

```
vectorsubset = vectors[:,mod2bool]
```

This selects all rows (:) And use bool indexes to filter columns, both are quick operations in numpy. Calculate the length of the remaining vector using the native numpy method:

```
lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))
```

This is fairly fast — however, `scipy.stats.ss`

and `bottleneck.ss`

can perform sum-squared calculations faster than that.

Convert the length using your instructions:

```
with np.errstate(divide='ignore'):
p = (r/lengths)**n
```

This involves dividing a finite number by zero, resulting in an `INF`

in the output array. That’s totally fine. We use numpy’s `errstate`

context manager to ensure that these zero divisions do not throw exceptions or runtime warnings.

Now sum the finite elements (ignoring infs) and return the sum:

```
return np.sum(p[np.isfinite(p)])
```

I’ve implemented this method twice below. Once exactly as just explained, once involving the `ss`

and `nansum`

functions of Bottleneck. I’ve also added your comparison method, as well as skipping the modified version of the `np.where((x*x+y*y+z*z)!=0)`

index, but instead creating `an inf`

and finally summarizing `the isfinite`

way.

```
import sys
import numpy as np
import bottleneck as bn
N = 100
n = 12
r = np.sqrt(2)
x,y,z = np.meshgrid(*[np.arange(-N, N+1)]*3)
gridvectors = np.vstack((x,y,z)).reshape(3, -1)
def measure_time(func):
import time
def modified_func(*args, **kwargs):
t0 = time.time()
result = func(*args, **kwargs)
duration = time.time() - t0
print("%s duration: %.3f s" % (func.__name__, duration))
return result
return modified_func
@measure_time
def method_columnvecs(vectors):
mod2bool = np.sum(vectors, axis=0) % 2 == 0
vectorsubset = vectors[:,mod2bool]
lengths = np.sqrt(np.sum(vectorsubset**2, axis=0))
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return np.sum(p[np.isfinite(p)])
@measure_time
def method_columnvecs_opt(vectors):
# On my system, bn.nansum is even slightly faster than np.sum.
mod2bool = bn.nansum(vectors, axis=0) % 2 == 0
# Use ss from bottleneck or scipy.stats (axis=0 is default).
lengths = np.sqrt(bn.ss(vectors[:,mod2bool]))
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return bn.nansum(p[np.isfinite(p)])
@measure_time
def method_original(x,y,z):
ind = np.where((x+y+z)%2==0)
x = x[ind]
y = y[ind]
z = z[ind]
ind = np.where((x*x+y*y+z*z)!=0)
x = x[ind]
y = y[ind]
z = z[ind]
p=np.sqrt(x*x+y*y+z*z)/r
return np.sum((1/p)**n)
@measure_time
def method_original_finitesum(x,y,z):
ind = np.where((x+y+z)%2==0)
x = x[ind]
y = y[ind]
z = z[ind]
lengths = np.sqrt(x*x+y*y+z*z)
with np.errstate(divide='ignore'):
p = (r/lengths)**n
return np.sum(p[np.isfinite(p)])
print method_columnvecs(gridvectors)
print method_columnvecs_opt(gridvectors)
print method_original(x,y,z)
print method_original_finitesum(x,y,z)
```

Here is the output:

```
$ python test.py
method_columnvecs duration: 1.295 s
12.1318801965
method_columnvecs_opt duration: 1.162 s
12.1318801965
method_original duration: 1.936 s
12.1318801965
method_original_finitesum duration: 1.714 s
12.1318801965
```

All methods produce the same result. When performing `isfinite`

style summation, your method becomes a little faster. My method is faster, but I would say that this is an exercise of an academic nature, not an important improvement 🙂

I have another question: you mean that for N=3, the calculation should be 12. Even yours won’t do it. For N=3, all of the above methods generate 12.1317530867. Is this expected?