Python – Simple t-test in Python with differential CI

Simple t-test in Python with differential CI… here is a solution to the problem.

Simple t-test in Python with differential CI

What is the most straightforward way to perform t-tests in python and include CI for differences? I’ve seen various posts, but everything is different, and when I try to calculate CI myself, it seems a little wrong… Here:

import numpy as np
from scipy import stats

g1 = np.array([48.7107107,
36.8587287,
67.7129929,
39.5538852,
35.8622661])
g2 = np.array([62.4993857,
49.7434833,
67.7516511,
54.3585559,
71.0933957])

m1, m2 = np.mean(g1), np.mean(g2)
dof = (len(g1)-1) + (len(g2)-1)

MSE = (np.var(g1) + np.var(g2)) / 2

stderr_diffs = np.sqrt((2 * MSE)/len(g1))

tcl = stats.t.ppf([.975], dof)

lower_limit = (m1-m2) - (tcl) * (stderr_diffs)
upper_limit = (m1-m2) + (tcl) * (stderr_diffs)

print(lower_limit, upper_limit)

Return:

[-30.12845447] [-0.57070077]

However, when I run the same test in SPSS, although I have the same t and p values, the CI is -31.87286, 1.17371, and the same is true in R. I can’t seem to find the right way to do this and would like some help.

Solution

Subtract 1 when calculating

degrees of freedom, but do not use sample variance when calculating variance:

MSE = (np.var(g1) + np.var(g2)) / 2

It should be

MSE = (np.var(g1, ddof=1) + np.var(g2, ddof=1)) / 2

This gave me

[-31.87286426] [ 1.17370902]

That is, I might use statsmodels’ CompareMeans instead of manual implementation:

In [105]: import statsmodels.stats.api as sms

In [106]: r = sms. CompareMeans(sms. DescrStatsW(g1), sms. DescrStatsW(g2))

In [107]: r.tconfint_diff()
Out[107]: (-31.872864255548553, 1.1737090155485568)

(Actually we should be using DataFrame here, not ndarray, but I’m lazy).

Remember, though, you have to consider what assumptions you want to make about the other side:

In [110]: r.tconfint_diff(usevar='pooled')
Out[110]: (-31.872864255548553, 1.1737090155485568)

In [111]: r.tconfint_diff(usevar='unequal')
Out[111]: (-32.28794665832114, 1.5887914183211436)

If your g1 and g2 are representative, the equivariance hypothesis may not be good.

Related Problems and Solutions