Python – Do I need to add a column to the training data for multiple linear regression in scikit?

Do I need to add a column to the training data for multiple linear regression in scikit?… here is a solution to the problem.

Do I need to add a column to the training data for multiple linear regression in scikit?

I’ve been browsing various MOOCs on the web, mentioning that one of them includes a column in scikit to the training data for linear regression.

Let’s say I have the following training dataset:

investment    loan
    300000   12000
    431000    3000
    900000    4000
    320000    2000

Before I can fit the scikit model for LinearRegression in python, do I need to attach the following?

ones    investment    loan
   1        300000   12000
   1        431000    3000
   1        900000    4000
   1        320000    2000

Thanks for any help.

Solution

From docs :

fit_intercept : boolean, optional, default True

whether to calculate the intercept for this model. If set to False, no
intercept will be used in calculations (e.g. data is expected to be
already centered).

Intercept is the coefficient associated with Column 1. Therefore, if this parameter is set to True (the default), 1 column is not required.

Related Problems and Solutions