R chisquare goodness-of-fit test code cannot be converted to equivalent python… here is a solution to the problem.
R chisquare goodness-of-fit test code cannot be converted to equivalent python
UCLA has this awesome statistical testing website
But the code is all in R. I’m trying to convert code to Python equivalents, but it’s not a simple process for someone like chi-square goodness-of-fit. This is the R version:
hsb2 <- within(read.csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv"), {
race <- as.factor(race)
schtyp <- as.factor(schtyp)
prog <- as.factor(prog)
})
chisq.test(table(hsb2$race), p = c(10, 10, 10, 70)/100)
My Python attempt is like this:
import numpy as np
import pandas as pd
from scipy import stats
df = pd.read_csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv")
# convert to category
df["race"] = df["race"].astype("category")
t_race = pd.crosstab(df.race, columns = 'race')
p_tests = np.array((10, 10, 10, 70))
p_tests = ptests/100
# tried this
stats.chisquare(t_race, p_tests)
# and this
stats.chisquare(t_race. T, p_tests)
But none of the stats.chisquare outputs come close to the R version. Can anyone guide me in the right direction? Time difference
Solution
chisq.test
uses probability vectors; stats.chisquare
uses the expected frequency ( docs )。
> results = chisq.test(c(24, 11, 20, 145), p=c(0.1, 0.1, 0.1, 0.7))
> results
Chi-squared test for given probabilities
data: c(24, 11, 20, 145)
X-squared = 5.028571429, df = 3, p-value = 0.169716919
Contrast
In [49]: obs = np.array([24, 11, 20, 145])
In [50]: prob = np.array([0.1, 0.1, 0.1, 0.7])
In [51]: stats.chisquare(obs, f_exp=obs.sum() * prob)
Out[51]: Power_divergenceResult(statistic=5.0285714285714285, pvalue=0.16971691923343338)