Python – Grid on probability vectors

Grid on probability vectors… here is a solution to the problem.

Grid on probability vectors

I’m trying to get a “grid” of n-dimensional probability vectors – where each entry is between 0 and 1, and all entries add up to a vector of 1. I want the coordinates in every possible vector to take any of v evenly distributed values between 0 and 1.

To illustrate this, here is a very inefficient implementation, for n = 3 and v = 3:

from itertools import product
grid_redundant = product([0, .5, 1], repeat=3)
grid = [point for point in grid_redundant if sum(point)==1]

Now the grid contains [(0, 0, 1), (0, 0.5, 0.5), (0, 1, 0), (0.5, 0, 0.5), ( 0.5,

0.5, 0), (1, 0, 0)]

This “implementation” is very bad for higher dimensional and more fine-grained meshes. Is there a good way to do this, maybe by using numpy?


Perhaps I can add a little motivation: I would be very happy if just sampling from a random distribution gave me enough extreme points, but that’s not the case. See also this question. The grid “is not random, but systematically sweeps over the space of simplex (probability vectors. )

Solution

This is a recursive solution. It doesn’t use NumPy and isn’t super efficient, although it should be faster than the published code snippet:

import math
from itertools import permutations

def probability_grid(values, n):
    values = set(values)
    # Check if we can extend the probability distribution with zeros
    with_zero = 0. in values
    values.discard(0.)
    if not values:
        raise StopIteration
    values = list(values)
    for p in _probability_grid_rec(values, n, [], 0.):
        if with_zero:
            # Add necessary zeros
            p += (0.,) * (n - len(p))
        if len(p) == n:
            yield from set(permutations(p))  # faster: more_itertools.distinct_permutations(p)

def _probability_grid_rec(values, n, current, current_sum, eps=1e-10):
    if not values or n <= 0:
        if abs(current_sum - 1.) <= eps:
            yield tuple(current)
    else:
        value, *values = values
        inv = 1. / value
        # Skip this value
        yield from _probability_grid_rec(
            values, n, current, current_sum, eps)
        # Add copies of this value
        precision = round(-math.log10(eps))
        adds = int(round((1. - current_sum) / value, precision))
        for i in range(adds):
            current.append(value)
            current_sum += value
            n -= 1
            yield from _probability_grid_rec(
                values, n, current, current_sum, eps)
        # Remove copies of this value
        if adds > 0:
            del current[-adds:]

print(list(probability_grid([0, 0.5, 1.], 3)))

Output:

[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0), (0.5, 0.5, 0.0), (0.0, 0.5, 0.5), (0.5, 0.0, 0.5)]

Quick comparison with published methods:

from itertools import product

def probability_grid_basic(values, n):
    grid_redundant = product(values, repeat=n)
    return [point for point in grid_redundant if sum(point)==1]

values = [0, 0.25, 1./3., .5, 1]
n = 6
%timeit list(probability_grid(values, n))
1.61 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit probability_grid_basic(values, n)
6.27 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Related Problems and Solutions