Use PYMC3 to get the posterior distribution of the difference between two variables
Now suppose we are looking at the daily price of two stocks, A and B. The priori is simple: prices are all normally distributed, both mu_A and mu_B are evenly distributed on [10,100], and sigma_A and sigma_B are evenly distributed on [10,100] [1,10]. (I know these are some naïve/false assumptions – just to make the issue clearer.) )
Now suppose I’ve been observing these two stocks for a month and collecting price data. I can get the posterior distribution of A and B separately, but I don’t know how to get the posterior distribution of the difference between two stocks?
prices_A = [25,20,26,23,30,25]
prices_B = [45,49,52,58,45,48]
basic_model = pm. Model()
with basic_model:
mu_A = pm. Uniform('mu_A', lower=10, upper=100)
sigma_A = pm. Uniform('sigma_A', lower=1, upper=10)
mu_B = pm. Uniform('mu_B', lower=10, upper=100)
sigma_B = pm. Uniform('sigma_B', lower=1, upper=10)
A = pm. Normal('Y_1', mu=mu_A, sd=sigma_A, observed=prices_A)
B = pm. Normal('Y_2', mu=mu_B, sd=sigma_B, observed=prices_B)
dif = pm. Deterministic('dif', A-B)
map_estimate = pm.find_MAP(model=basic_model)
map_estimate
However, the result estimate did not give me the distribution of the dif… Am I confusing the concept of posterior distribution?
Solution
Subtract two variables and you can do it after sampling:
C = trace['A'] - trace['B']
Or you can use deterministic variables as part of the model:
C = pm. Deterministic('C', A - B)
Update:
Now that you’ve published your model, I’ll make the following recommendations
prices_A = [25,20,26,23,30,25]
prices_B = [45,49,52,58,45,48]
basic_model = pm. Model()
with basic_model:
mu_A = pm. Uniform('mu_A', lower=10, upper=100)
sigma_A = pm. Uniform('sigma_A', lower=1, upper=10)
mu_B = pm. Uniform('mu_B', lower=10, upper=100)
sigma_B = pm. Uniform('sigma_B', lower=1, upper=10)
A = pm. Normal('Y_1', mu=mu_A, sd=sigma_A, observed=prices_A)
B = pm. Normal('Y_2', mu=mu_B, sd=sigma_B, observed=prices_B)
dif = pm. Deterministic('dif', mu_A-mu_B) # diff of the means
trace = pm.sample()
pm.summary(trace)
Basically my advice is not to use find_MAP(),
but to sample from a posterior sample and then calculate what you want from the sample (in trace
). For example, summary
will give you the mean, standard deviation, and other quantities calculated from a posterior sample.
You might also want to use sample_ppc
to get a “posterior prediction sample.”
ppc = pm.sample_ppc(trace, 1000, basic_model)
dif_ppc = ppc['Y_1'] - ppc['Y_2']
dif_ppc
represents the difference in stocks you expect to see, including uncertainty about the mean and standard deviation of stocks.
As a side note, perhaps you want to replace your uniform distribution with others, such as the normal distribution of the mean and the seminormal distribution of Sigma.