Python – Cross-validation in linear regression

Cross-validation in linear regression… here is a solution to the problem.

Cross-validation in linear regression

I’m trying to do cross-validation in linear regression, for which I’m using the python sklearn library. I have a problem with the appropriate way to perform cross-validation on a given dataset.

Two APIs that confuse me a bit are cross_val_score() and any regularization cross-validation algorithm, such as LassoCV().

As I understand it, cross_val_score is used to get scores based on cross-validation. And, it can be combined with Lasso() to achieve regularized cross-validation scores (example: here)。

In contrast, LassoCV() is it’s The documentation recommends performing LASSO for a given range of tuning parameters (alpha or lambda).

Now, my question is:

  • Which method is better (cross_val_score with Lasso or just LassoCV).
  • What is the correct way to perform linear cross-validation
    Regression (or other algorithms such as logistic, NN, etc.).

Thank you.

Solution

To confuse you even more – consider using GridSearchCV, which will do cross-validation and adjust hyperparameters.

Demo:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso, Ridge, SGDRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline, FeatureUnion

X_train, X_test, y_train, y_test = \
        train_test_split(X, y, test_size = 0.33)

pipe = Pipeline([
    ('scale', StandardScaler()),
    ('regr', Lasso())
])

param_grid = [
    {
        'regr': [Lasso(), Ridge()],
        'regr__alpha': np.logspace(-4, 1, 6),
    },
    {
        'regr': [SGDRegressor()],
        'regr__alpha': np.logspace(-5, 0, 6),
        'regr__max_iter': [500, 1000],
    },
]

grid = GridSearchCV(pipe, param_grid=param_grid, cv=3, n_jobs=-1, verbose=2)
grid.fit(X_train, y_train)

predicted = grid.predict(X_test, y_test)

print('Score:\t{}'.format(grid.score(X_test, y_test)))

Related Problems and Solutions