Python – Use PYMC3 to get the posterior distribution of the difference between two variables

Use PYMC3 to get the posterior distribution of the difference between two variables… here is a solution to the problem.

Use PYMC3 to get the posterior distribution of the difference between two variables

Now suppose we are looking at the daily price of two stocks, A and B. The priori is simple: prices are all normally distributed, both mu_A and mu_B are evenly distributed on [10,100], and sigma_A and sigma_B are evenly distributed on [10,100] [1,10]. (I know these are some naïve/false assumptions – just to make the issue clearer.) )

Now suppose I’ve been observing these two stocks for a month and collecting price data. I can get the posterior distribution of A and B separately, but I don’t know how to get the posterior distribution of the difference between two stocks?

prices_A = [25,20,26,23,30,25]
prices_B = [45,49,52,58,45,48]
basic_model = pm. Model()
with basic_model: 
    mu_A = pm. Uniform('mu_A', lower=10, upper=100)
    sigma_A = pm. Uniform('sigma_A', lower=1, upper=10)
    mu_B = pm. Uniform('mu_B', lower=10, upper=100)
    sigma_B = pm. Uniform('sigma_B', lower=1, upper=10)
    A = pm. Normal('Y_1', mu=mu_A, sd=sigma_A, observed=prices_A)
    B = pm. Normal('Y_2', mu=mu_B, sd=sigma_B, observed=prices_B)
    dif = pm. Deterministic('dif', A-B)
map_estimate = pm.find_MAP(model=basic_model)
map_estimate

However, the result estimate did not give me the distribution of the dif… Am I confusing the concept of posterior distribution?

Solution

Subtract two variables and you can do it after sampling:

C = trace['A'] - trace['B']

Or you can use deterministic variables as part of the model:

C = pm. Deterministic('C', A - B)

Update:

Now that you’ve published your model, I’ll make the following recommendations

prices_A = [25,20,26,23,30,25]
prices_B = [45,49,52,58,45,48]
basic_model = pm. Model()
with basic_model: 
    mu_A = pm. Uniform('mu_A', lower=10, upper=100)
    sigma_A = pm. Uniform('sigma_A', lower=1, upper=10)
    mu_B = pm. Uniform('mu_B', lower=10, upper=100)
    sigma_B = pm. Uniform('sigma_B', lower=1, upper=10)
    A = pm. Normal('Y_1', mu=mu_A, sd=sigma_A, observed=prices_A)
    B = pm. Normal('Y_2', mu=mu_B, sd=sigma_B, observed=prices_B)
    dif = pm. Deterministic('dif', mu_A-mu_B) # diff of the means
    trace = pm.sample()

pm.summary(trace)

Basically my advice is not to use find_MAP(), but to sample from a posterior sample and then calculate what you want from the sample (in trace). For example, summary will give you the mean, standard deviation, and other quantities calculated from a posterior sample.

You might also want to use sample_ppc to get a “posterior prediction sample.”

ppc = pm.sample_ppc(trace, 1000, basic_model)
dif_ppc = ppc['Y_1'] - ppc['Y_2']

dif_ppc represents the difference in stocks you expect to see, including uncertainty about the mean and standard deviation of stocks.

As a side note, perhaps you want to replace your uniform distribution with others, such as the normal distribution of the mean and the seminormal distribution of Sigma.

Related Problems and Solutions