Python – Scale features used for prediction in Scikit Learn

Scale features used for prediction in Scikit Learn… here is a solution to the problem.

Scale features used for prediction in Scikit Learn

I’ve been working on machine learning models and am currently using pipelines with GridSearchCV. My data is scaled with MinMaxScaler and I am using SVR with RBR cores. Now my question is that my model is complete, suitable and has a decent evaluation score, do I still need to use MinMaxScaler to extend new data for prediction, or can I use the data as-is to make predictions? I’ve read 3 books about Scikit Learn, but they all focus on feature engineering and fitting. They don’t really cover any other steps in the forecasting step other than using forecasting methods.

Here is the code:

pipe = Pipeline([('scaler', MinMaxScaler()), ('clf', SVR())]) 
time_split = TimeSeriesSplit(n_splits=5) 

param_grid = {'clf__kernel': ['rbf'], 
              'clf__C':[0.0001, 0.001], 
              'clf__gamma': [0.0001, 0.001]} 

grid = GridSearchCV(pipe, param_grid, cv= time_split, 
                    scoring='neg_mean_squared_error', n_jobs = -1) 
grid.fit(X_train, y_train) 

Solution

Of course, if you get new (unprocessed) data, you need to perform the same preparation steps as when you trained the model. For example, if you use MinMaxScaler with a default scale, the model is used for data with zero mean and standard variance in each feature, and if you do not preprocess the data, the model will not produce accurate results.

Remember to use the exact same MinMaxScaler object as the training data. Therefore, if you save the model to a file, save your preprocessed objects as well.

Related Problems and Solutions