Python – R and linear regression in Python – different results for the same problem

R and linear regression in Python – different results for the same problem… here is a solution to the problem.

R and linear regression in Python – different results for the same problem

I’m training my data skills with python that I learned in R. Although, I have questions about simple linear regression

Climate change data:
[link here]

Python scripts

import os
import pandas as pd
import statsmodels.api as sm

train = df[df. Year>=2006]

X = train[['MEI', 'CO2', 'CH4', 'N2O', 'CFC.11', 'CFC.12', 'TSI', 'Aerosols']]
y = train[['Temp']]
model = sm. OLS(y, X).fit()
predictions = model.predict(X)
model.summary()

Python results

Dep. Variable: Temp R-squared: 0.972

Model: OLS Adj. R-squared: 0.964

Method: Least Squares F-statistic: 123.1

Date: Mon, 01 Oct 2018 Prob (F-statistic):9.54e-20

Time: 14:52:53 Log-Likelihood: 46.898

No. Observations: 36 AIC: -77.80

Df Residuals: 28 BIC: -65.13

Df Model: 8

Covariance Type: nonrobust

MEI
0.0361

CO2
0.0046

CH4
-0.0023

N2O
-0.0141

CFC-11
-0.0312

CFC-12
0.0358

TSI
-0.0033

Aerosols
69.9680

Omnibus:8.397
Durbin-Watson: 1.484

Prob(Omnibus):0.015
Jarque-Bera (JB):10.511

Skew: -0.546
Prob(JB): 0.00522

Kurtosis: 5.412
Cond. No. 6.35e+06

R scripts

train <- climate_change[climate_change$Year>=2006,]
prev <- lm(Temp ~ ., data = train[,3:NCOL(train)])
summary(prev)

R result

Residuals:
Min 1Q Median 3Q Max
-0.221684 -0.032846 0.002042 0.037158 0.167887

Coefficients:
MEI 0.036056
CO2 0.004817
CH4 -0.002366
N2O -0.013007
CFC-11 -0.033194
CFC-12 0.037775
TSI 0.009100
Aerosols 70.463329
Residual standard error: 0.07594 on 27 degrees of freedom Multiple
R-squared: 0.5346, Adjusted R-squared: 0.3967 F-statistic: 3.877 on
8 and 27 DF, p-value: 0.003721

Question

The R-squared of the two is very different, and the coefficients of the independent variables are also a little different. Can someone explain why?

Solution

Just to point this out: the least squares fit of statsmodel does not include constants by default. If we remove constants from the fit of R, we get very similar results to the Python implementation, or conversely, if we add a constant to statsmodel-fit, we get the same as R:

Remove constants in R's lm call:

summary(lm(Temp ~ . - 1, data = train[,3:NCOL(train)]))

Call:
lm(formula = Temp ~ . - 1, data = train[, 3:NCOL(train)])

Residuals:
      Min        1Q    Median        3Q       Max 
-0.221940 -0.032347  0.002071  0.037048  0.167294 

Coefficients:
          Estimate Std. Error t value Pr(>|t|)  
MEI       0.036076   0.027983   1.289   0.2079  
CO2       0.004640   0.008945   0.519   0.6080  
CH4      -0.002328   0.002132  -1.092   0.2843  
N2O      -0.014115   0.079452  -0.178   0.8603  
`CFC-11` -0.031232   0.096693  -0.323   0.7491  
`CFC-12`  0.035760   0.103574   0.345   0.7325  
TSI      -0.003283   0.036861  -0.089   0.9297  
Aerosols 69.968040  33.093275   2.114   0.0435 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.07457 on 28 degrees of freedom
Multiple R-squared:  0.9724,    Adjusted R-squared:  0.9645 
F-statistic: 123.1 on 8 and 28 DF,  p-value: < 2.2e-16

Let’s add a constant to the statsmodel call:

X_with_constant = sm.add_constant(X)

model = sm. OLS(y, X_with_constant).fit()
model.summary()

Give us the same result:

OLS Regression Results
Dep. Variable:  Temp    R-squared:  0.535
Model:  OLS Adj. R-squared: 0.397
Method: Least Squares   F-statistic:    3.877
Date:   Tue, 02 Oct 2018    Prob (F-statistic): 0.00372
Time:   10:14:03    Log-Likelihood: 46.899
No. Observations:   36  AIC:    -75.80
Df Residuals:   27  BIC:    -61.55
Df Model:   8       
Covariance Type:    nonrobust       
coef    std err t   P>|t|   [0.025  0.975]
const   -17.8663    563.008 -0.032  0.975   -1173.064   1137.332
MEI 0.0361  0.029   1.265   0.217   -0.022  0.095
CO2 0.0048  0.011   0.451   0.656   -0.017  0.027
CH4 -0.0024 0.002   -0.950  0.351   -0.007  0.003
N2O -0.0130 0.088   -0.148  0.884   -0.194  0.168
CFC-11  -0.0332 0.116   -0.285  0.777   -0.272  0.205
CFC-12  0.0378  0.123   0.307   0.761   -0.215  0.290
TSI 0.0091  0.392   0.023   0.982   -0.795  0.813
Aerosols    70.4633 37.139  1.897   0.069   -5.739  146.666
Omnibus:    8.316   Durbin-Watson:  1.488
Prob(Omnibus):  0.016   Jarque-Bera (JB):   10.432
Skew:   -0.535  Prob(JB):   0.00543
Kurtosis:   5.410   Cond. No.   1.06e+08

Related Problems and Solutions