Python – numpy two-dimensional boolean array counts continuous real size

numpy two-dimensional boolean array counts continuous real size… here is a solution to the problem.

numpy two-dimensional boolean array counts continuous real size

I’m interested in finding out the individual sizes of “True” patches in the boolean array. For example, in the boolean matrix:

[[1, 0, 0, 0],
 [0, 1, 1, 0],
 [0, 1, 0, 0],
 [0, 1, 0, 0]]

The output will be:

[[1, 0, 0, 0],
 [0, 4, 4, 0],
 [0, 4, 0, 0],
 [0, 4, 0, 0]]

I

know I can do this recursively, but I also feel that python array operations are expensive at scale, are there library functions available?

Solution

It’s a quick and easy complete solution:

import numpy as np
import scipy.ndimage.measurements as mnts

A = np.array([
    [1, 0, 0, 0],
    [0, 1, 1, 0],
    [0, 1, 0, 0],
    [0, 1, 0, 0]
])

# labeled is a version of A with labeled clusters:
#
# [[1 0 0 0]
#  [0 2 2 0]
#  [0 2 0 0]
#  [0 2 0 0]]
#
# clusters holds the number of different clusters: 2
labeled, clusters = mnts.label(A)

# sizes is an array of cluster sizes: [0, 1, 4]
sizes = mnts.sum(A, labeled, index=range(clusters + 1))

# mnts.sum always outputs a float array, so we'll convert sizes to int
sizes = sizes.astype(int)

# get an array with the same shape as labeled and the 
# appropriate values from sizes by indexing one array 
# with the other. See the `numpy` indexing docs for details
labeledBySize = sizes[labeled]

print(labeledBySize)

Output:

[[1 0 0 0]
 [0 4 4 0]
 [0 4 0 0]
 [0 4 0 0]]

The trickiest line above is the “peculiar” numpy index:

labeledBySize = sizes[labeled]

One of the arrays is used to index the other. View numpy indexing docs (section ” Index arrays”) for more information about how it works.

I also wrote a version of the above code as a single compact functionthat you can try out yourself online. It includes a test case based on a random array.

Related Problems and Solutions