What is the role of MPI-related ‘cluster’ software?
I’m a bit confused about how cluster implementations (“Beowulf clusters”) relate to communication protocols such as MPI. What software components are required to set up a “cluster” using tools such as OpenMPI?
As you know, a cluster is a group of computers that are networked together. When you have such a configuration, you will usually install and use the following:
- MPI, which is used for interprocess communication
- NFS, which makes network disks visible and shared to all nodes
- NTP, synchronizes the time of nodes so that you can compare log events and timestamps
- bootp starts nodes from remote nodes so that good and uniform settings are guaranteed when each node restarts.
- A set of cluster utilities to make your life easier, such as distributed ssh, which can execute the same commands on all nodes at the same time.
- A task scheduler or queue manager, such as Condor, LFS, or others, allows you to prioritize job submissions and ultimately measure their throttling/pricing.
- A watchdog, so automatically restarts when a node gets stuck.
- Software control of the UPS (to automatically shut down in the event of a prolonged power outage).
And many more. All of these things are completely add-ons to MPI. MPI is simply a communication channel between processes. MPI itself does not “form a cluster”.