Grid on probability vectors
I’m trying to get a “grid” of n-dimensional probability vectors – where each entry is between 0 and 1, and all entries add up to a vector of 1. I want the coordinates in every possible vector to take any of v evenly distributed values between 0 and 1.
To illustrate this, here is a very inefficient implementation, for n = 3 and v = 3:
from itertools import product
grid_redundant = product([0, .5, 1], repeat=3)
grid = [point for point in grid_redundant if sum(point)==1]
Now the grid contains [(0, 0, 1), (0, 0.5, 0.5), (0, 1, 0), (0.5, 0, 0.5), ( 0.5,
0.5, 0), (1, 0, 0)]
This “implementation” is very bad for higher dimensional and more fine-grained meshes. Is there a good way to do this, maybe by using numpy
?
Perhaps I can add a little motivation: I would be very happy if just sampling from a random distribution gave me enough extreme points, but that’s not the case. See also this question. The grid “is not random, but systematically sweeps over the space of simplex (probability vectors. )
Solution
This is a recursive solution. It doesn’t use NumPy and isn’t super efficient, although it should be faster than the published code snippet:
import math
from itertools import permutations
def probability_grid(values, n):
values = set(values)
# Check if we can extend the probability distribution with zeros
with_zero = 0. in values
values.discard(0.)
if not values:
raise StopIteration
values = list(values)
for p in _probability_grid_rec(values, n, [], 0.):
if with_zero:
# Add necessary zeros
p += (0.,) * (n - len(p))
if len(p) == n:
yield from set(permutations(p)) # faster: more_itertools.distinct_permutations(p)
def _probability_grid_rec(values, n, current, current_sum, eps=1e-10):
if not values or n <= 0:
if abs(current_sum - 1.) <= eps:
yield tuple(current)
else:
value, *values = values
inv = 1. / value
# Skip this value
yield from _probability_grid_rec(
values, n, current, current_sum, eps)
# Add copies of this value
precision = round(-math.log10(eps))
adds = int(round((1. - current_sum) / value, precision))
for i in range(adds):
current.append(value)
current_sum += value
n -= 1
yield from _probability_grid_rec(
values, n, current, current_sum, eps)
# Remove copies of this value
if adds > 0:
del current[-adds:]
print(list(probability_grid([0, 0.5, 1.], 3)))
Output:
[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0), (0.5, 0.5, 0.0), (0.0, 0.5, 0.5), (0.5, 0.0, 0.5)]
Quick comparison with published methods:
from itertools import product
def probability_grid_basic(values, n):
grid_redundant = product(values, repeat=n)
return [point for point in grid_redundant if sum(point)==1]
values = [0, 0.25, 1./3., .5, 1]
n = 6
%timeit list(probability_grid(values, n))
1.61 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit probability_grid_basic(values, n)
6.27 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)