Python – How to control the number of processes in GaussianMixture

How to control the number of processes in GaussianMixture… here is a solution to the problem.

How to control the number of processes in GaussianMixture

GaussianMixture There are no n_jobs parameters.
At the same time, whenever I fit the model

from sklearn.mixture import GaussianMixture as GMM
gmm = GMM(n_components=4,
          init_params='random',
          covariance_type='full',
          tol=1e-2,
          max_iter=100,
          n_init=1)
gmm.fit(X, y)

It spans 16 processes and uses the full CPU power of my 16 CPU machines. I don’t want it to do that.

In contrast, Kmeans has n_jobs parameter, This parameter controls mutliprocessing (n_init > 1) when it has multiple initializations. Multiprocessing pops up here.

My question is where does it come from and how to control it?

Solution

You are observing parallel processing in basic algebraic operations, accelerating BLAS/ LAPACK .

Modifying it is not as simple as setting a n_jobs parameter, it depends on your implementation in use!

Common candidates are ATLAS, OpenBLAS, and Intel’s MKL.

I recommend checking which one is used first and then acting accordingly :

import numpy as np
np.__config__.show()

Sadly, these things can get tricky (source) :

export MKL_NUM_THREADS="2"
export MKL_DOMAIN_NUM_THREADS="MKL_BLAS=2"
export OMP_NUM_THREADS="1"
export MKL_DYNAMIC="FALSE"
export OMP_DYNAMIC="FALSE"

For ATLAS, you seem to define it at compile-time

And according to the this answer, the same applies to OpenBLAS.

Based on OP testing, it seems that you can set environment variables for OpenMP, and even affect behavior modification for open source candidates Atlas and OpenBLAS (compile time limits are the alternative:

).

export OMP_NUM_THREADS="4";

Related Problems and Solutions