Python – Use sklearn load_files() to load images from png as data

Use sklearn load_files() to load images from png as data… here is a solution to the problem.

Use sklearn load_files() to load images from png as data

I’m currently building a simple image recognizer using sklearn.

I need to use load_files(‘./directory/’) to load images from a subfolder in that directory.

It gets the target value correctly, but the data property is not a simple pixel value. I’m assuming I need to set the encoding parameter to account for image files, but can’t find the exact purpose.

Solution

The encoding parameter is used to decode the raw bytes of the file content that assumes text encoding (for example, UTF-8).

For image files, you need to iterate over the contents of the filenames attribute yourself and use something similar content of scipy.misc.imread. (You’ll also need to install the PIL or Pillow package.)

This is a utility function for loading data from a jpeg file from a wild-labeled face as a numpy array:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/lfw.py#L108

You can use it to learn how to write your own custom dataset loader.

Related Problems and Solutions