Python – How to efficiently calculate sparse values of matrix product memory in Python?

How to efficiently calculate sparse values of matrix product memory in Python?… here is a solution to the problem.

How to efficiently calculate sparse values of matrix product memory in Python?

I want to calculate some specific values of the matrix product in a way that is efficient in terms of memory usage and calculation time. The problem is that the intermediate matrix has two very large dimensions and may not be able to be stored.

Dimension with example value:

N = 7  # very large
K = 3
M = 10 # very large
L = 8  # very very large

‘a’ is a matrix
of shape (N,K).
‘b’ is a matrix of shape (K,N).

a = np.arange(N*K).reshape(N,K)
b = np.arange(K*M).reshape(K,M)

rows is an indexed array whose values are in the range (N) and length L
cols is an indexed array whose values are within the range (M) and length L

rows = [0,0,1,2,3,3,4,6]
cols = [0,9,5,8,2,8,3,6]

I

need the following, but due to its size, I can’t calculate a matrix of shape (MxN) (a @ b) as an intermediate result:

values = (a @ b)[rows, cols]

Alternative implementations may involve:
slice a[rows] and b[:,cols], creating matrices of shapes (L,K) and (K,L),
But those are also too big.
Numpy copies values while doing fancy slicing

values = np.einsum("ij,ji->i", a[rows], b[:,cols])

Thanks in advance

Solution

One possibility is to calculate the results directly. Maybe there are some other tricks to use BLAS routines without huge temporary arrays, but that also works.

Example

import numpy as np
import numba as nb
import time

@nb.njit(fastmath=True,parallel=True)
def sparse_mult(a,b_Trans,inds):
  res=np.empty(inds.shape[0],dtype=a.dtype)

for x in nb.prange(inds.shape[0]):
    i=inds[x,0]
    j=inds[x,1]
    sum=0.
    for k in range(a.shape[1]):
      sum+=a[i,k]*b_Trans[j,k]
    res[x]=sum
  return res

#-------------------------------------------------
K=int(1e3)
N=int(1e5)
M=int(1e5)
L=int(1e7)

a = np.arange(N*K).reshape(N,K).astype(np.float64)
b = np.arange(K*M).reshape(K,M).astype(np.float64)

inds=np.empty((L,2),dtype=np.uint64)
inds[:,0] = np.random.randint(low=0,high=N,size=L) #rows
inds[:,1] = np.random.randint(low=0,high=M,size=L) #cols

#prepare
#-----------------------------------------------
#sort inds for better cache usage
inds=inds[np.argsort(inds[:,1]),:]

# transpose b for easy SIMD-usage
# we wan't a real transpose here not a view
b_T=np.copy(np.transpose(b))

#calculate results
values=sparse_mult(a,b_T,inds)

The calculation steps, including preparation (sorting, B-matrix transposing), should run within 60 seconds.

Related Problems and Solutions