Python package installation: pip vs yum, or both?

Python package installation: pip vs yum, or both? … here is a solution to the problem.

Python package installation: pip vs yum, or both?

I’ve just started managing Hadoop clusters. We use Bright Cluster Manager at the O/S level (CentOS 7.1) and then use Ambari with Hortonworks HDP 2.3 for Hadoop.

I keep getting requests to install new python modules. We installed some modules using yum during installation, and some modules have been installed using pip as the cluster progresses.

What is the “right” way to do this? Always using yum and can’t provide the latest and greatest modules? Always using pip without a pip truth (yum) shows which packages are installed? Or is it okay to use both pip and yum?

I’m just worried that I’m filling the system with garbage and too many versions of python modules. Any suggestions?

Solution

Packages that are part of your distribution should be preferred because they have been tested to work on your system. These packages are installed system-wide.

However, if a suitable RPM package is not provided, go ahead and deploy a virtual Python environment from, for example, PyPi or github using pip, but whenever possible. With a virtual environment, you do not have to install third-party software packages system-wide. You will have several smaller sets of packages that are easier to manage as a group.

Related Problems and Solutions