Python - Use PYMC3 to get the posterior distribution of the difference between two variables

Use PYMC3 to get the posterior distribution of the difference between two variables… here is a solution to the problem.

Use PYMC3 to get the posterior distribution of the difference between two variables

Now suppose we are looking at the daily price of two stocks, A and B. The priori is simple: prices are all normally distributed, both mu_A and mu_B are evenly distributed on [10,100], and sigma_A and sigma_B are evenly distributed on [10,100] [1,10]. (I know these are some naïve/false assumptions – just to make the issue clearer.) )

Now suppose I’ve been observing these two stocks for a month and collecting price data. I can get the posterior distribution of A and B separately, but I don’t know how to get the posterior distribution of the difference between two stocks?

prices_A = [25,20,26,23,30,25]
prices_B = [45,49,52,58,45,48]
basic_model = pm. Model()
with basic_model: 
    mu_A = pm. Uniform('mu_A', lower=10, upper=100)
    sigma_A = pm. Uniform('sigma_A', lower=1, upper=10)
    mu_B = pm. Uniform('mu_B', lower=10, upper=100)
    sigma_B = pm. Uniform('sigma_B', lower=1, upper=10)
    A = pm. Normal('Y_1', mu=mu_A, sd=sigma_A, observed=prices_A)
    B = pm. Normal('Y_2', mu=mu_B, sd=sigma_B, observed=prices_B)
    dif = pm. Deterministic('dif', A-B)
map_estimate = pm.find_MAP(model=basic_model)
map_estimate

However, the result estimate did not give me the distribution of the dif… Am I confusing the concept of posterior distribution?

Solution

Subtract two variables and you can do it after sampling:

C = trace['A'] - trace['B']

Or you can use deterministic variables as part of the model:

C = pm. Deterministic('C', A - B)

Update:

Now that you’ve published your model, I’ll make the following recommendations

prices_A = [25,20,26,23,30,25]
prices_B = [45,49,52,58,45,48]
basic_model = pm. Model()
with basic_model: 
    mu_A = pm. Uniform('mu_A', lower=10, upper=100)
    sigma_A = pm. Uniform('sigma_A', lower=1, upper=10)
    mu_B = pm. Uniform('mu_B', lower=10, upper=100)
    sigma_B = pm. Uniform('sigma_B', lower=1, upper=10)
    A = pm. Normal('Y_1', mu=mu_A, sd=sigma_A, observed=prices_A)
    B = pm. Normal('Y_2', mu=mu_B, sd=sigma_B, observed=prices_B)
    dif = pm. Deterministic('dif', mu_A-mu_B) # diff of the means
    trace = pm.sample()

pm.summary(trace)

Basically my advice is not to use find_MAP(), but to sample from a posterior sample and then calculate what you want from the sample (in trace). For example, summary will give you the mean, standard deviation, and other quantities calculated from a posterior sample.

You might also want to use sample_ppc to get a “posterior prediction sample.”

ppc = pm.sample_ppc(trace, 1000, basic_model)
dif_ppc = ppc['Y_1'] - ppc['Y_2']

dif_ppc represents the difference in stocks you expect to see, including uncertainty about the mean and standard deviation of stocks.

As a side note, perhaps you want to replace your uniform distribution with others, such as the normal distribution of the mean and the seminormal distribution of Sigma.

Python – Use PYMC3 to get the posterior distribution of the difference between two variables

Use PYMC3 to get the posterior distribution of the difference between two variables

Solution

Related Problems and Solutions