# Python – Large dense matrix of dots multiplied by different dtypes (float x boolean)

Large dense matrix of dots multiplied by different dtypes (float x boolean)… here is a solution to the problem.

## Large dense matrix of dots multiplied by different dtypes (float x boolean)

I’m multiplying by 2 matrices, `A.dot(B),` where:

A = 1 x n matrix, dtype float

B = n x n matrix, dtype Boolean

I’m doing this calculation for larger n and the memory runs out quickly (about n=14000 failure). A and B are dense.

It seems that because numpy converts B to dtype float before performing matrix multiplication, it incurs a huge memory cost. In fact, %timeit shows that converting B to float takes more time than performing multiplication.

Is there a way to fix this? The focus here is on reducing memory spikes/floating-point conversions while still allowing for common matrix functions (matrix addition/multiplication).

The following is reproducible data for the benchmark solution:

``````np.random.seed(999)
n = 30000
A = np.random.random(n)
B = np.where(np.random.random((n, n)) > 0.5, True, False)
``````

### Solution

You can use `np.packbits` to compress a bool array into bitfields, and then use `np.bincount` on rows to calculate blocks of 8 scalar products at the same time, saving space and time

``````import numpy as np

def setup_data(M, N):
return {'B': np.random.randint(0, 2, (M, N), dtype=bool),
'A': np.random.random((M,))}

def f_vecmat_mult(A, B, decode=np.array(np.unravel_index(np.arange(256), 8*(2,)))):
M, N = B.shape
out = [(decode * np.bincount(row, A, minlength=256)).sum(axis=1) for row in np.packbits(B, axis=1). T]
if N & 7:
out[-1] = out[-1][:N & 7]
return np.concatenate(out)

def f_direct(A, B):
return A @ B

import types
from timeit import timeit

for M, N in [(99, 80), (999, 777), (9999, 7777), (30000, 30000)]:
data = setup_data(M, N)
ref = f_vecmat_mult(**data)
print(f'M, N = {M}, {N}')
for name, func in list(globals().items()):
if not name.startswith('f_') or not isinstance(func, types. FunctionType):
continue
try:
assert np.allclose(ref, func(**data))
print("{:16s}{:16.8f} ms".format(name[2:], timeit(
'f(**data)', globals={'f':func, 'data':data}, number=100)*10))
except:
print("{:16s} apparently failed".format(name[2:]))
``````

Sample output:

``````M, N = 99, 80
vecmat_mult           0.12248290 ms
direct                0.03647798 ms
M, N = 999, 777
vecmat_mult           1.67854790 ms
direct                5.68286091 ms
M, N = 9999, 7777
vecmat_mult          68.74523309 ms
direct              571.34140913 ms
M, N = 30000, 30000
vecmat_mult        1345.18991556 ms
direct           apparently failed
``````