Second tutorial: Parallel execution

epyc’s main utility comes from being able to run experiments, like those we defined in the first tutorial and ran on a single machine, on multicore machines and clusters of machines. In this tutorial we’ll explain how epyc manages parallel machines.

(If you know about parallel computing, then it’ll be enough for you to know that epyc creates a task farm of experiments across multiple cores. If this didn’t make sense, then you should first read Parallel processing concepts.)

Two ways to get parallelism

epyc arranges the execution of experiments around the Lab class, which handles the execution of experiments across a parameter space. The default Lab executes experiments sequentially.

But what if you have more than one core? – very common on modern workstations. Or if you have access to a cluster of machines? Then epyc can make use of these resources with no change to your experiment code.

If you have a multicore machine, the easiest way to use it with epyc is to replace the Lab managing the experiments with a ParallelLab to get local parallelism. This will execute experiments in parallel using the available cores. (You can limit the number of cores used if you want to.) For example:

from epyc import ParallelLab, HDF5LabNotebook

nb = HF5LabNotebook('mydata.h5', create=True)
lab = ParallelLab(nb, cores=-1)                 # leave one core free

e = MyExperiment()
lab['first'] = range(1, 1000)
lab.runExperiment(e)

On a machine with, say, 16 cores, this will use 15 of the cores to run experiments and return when they’re all finished.

If you have a cluster, things are a little more complicated as you need to set up some extra software to manage the cluster for you. Once that’s done, though, accessing the cluster from, epyc is largely identical to accessing local parallelism.

Setting up a compute cluster

epyc doesn’t actually implement parallel computing itself: instead it builds on top of existing Python infrastructure for this purpose. The underlying library epyc uses is called ipyparallel, which provides portable parallel processing on both multicore machines and collections of machines.

Warning

Confusingly, there’s also a system called PyParallel which is a completely different beast to ipyparallel.

epyc wraps-up ipyparallel within the framework of experiments, labs, and notebooks, so that, when using epyc, there’s no need to interact directly with``ipyparallel``. However, before we get to that stage we do need to set up the parallel compute cluster that epyc will use, and (at present) this does require interacting to some degree with ipyparallel’s commands.

Setting up a cluster depends on what kind of cluster you have, and we’ll describe each one individually. It’s probably easiest to start with the simplest system to which you have access, and then – if and when you need more performance – move onto the more advanced systems.

Running experiments on a cluster

Having set uo the cluster of whatever kind, we can now let epyc run experiments on it. This involves using a ClusterLab, which is simply a Lab that runs experiments remotely on a cluster rather than locally on the machine with the lab.

We create a ClusterLab in the same way as “ordinary” labs:

clab = epyc.ClusterLab(profile = 'cluster',
                       notebook = epyc.JSONLabNotebook('our-work.json'))

The lab thus created will connect to the cluster described in the cluster profile (which must already have been created and started).

A ClusterLab behaves like Lab in most respects: we can set a parameter space, run a set of experiments at each point in the space, and so forth. But they differ in one important respect. Runn ing an experiment in a Lab is a synchronous process: when you call Lab.runExperiment() you wait until the experiments finish before regaining control. That’s fine for small cases, but what if you’re wanting top run a huge computation? – many repetitionsd of experiments across a large parameter space? That after all is the reason we want to do parallel computing: to support large computations. It would be inconvenient to say the least if performing such experiments locked-up a computer forr a long period.

ClusterLab differs from Lab by being asynchronous. When you call ClusterLab.runExperiment(), the experiments are submitted to the cluster in one go and control returns to your program: the computation happens “in the background” on the cluster.

So suppose we go back to our example of computing a curve. This wasn’t a great example for a sequential lab, and it’s monumentally unrealistic for parallel computation except as an example. We can set up the parameter space and run them all in parallel using the same syntax as before:

clab['x'] = numpy.linspace(-2 * numpy.pi, 2 * numpy.pi)
clab['y'] = numpy.linspace(-2 * numpy.pi, 2 * numpy.pi)

clab.runExperiment(CurveExperiment())

Control will return immediately, as the computation is spun-up on the cluster.

How can we tell when we’re finished? There are three ways. The first is to make the whole comp[utation synchronous by waiting for it to finish:

clab.wait()

This will lock-up your computer waiting for all the experiments to finish. That’s not very flexible. We can instead test whether the computations have finished:

clab.ready()

which wil return True when everything has finished. But that might take a long time, and we might want to get results as they become available – for example to plot them partially. We can see what fraction of experiments are finished using:

clab.readyFraction()

which returns a number between 0 and 1 indicating how far along we are.

As results come in, they’re stored in the lab’s notebook and can be retrieved as normal: as a list of dicts, as a DataFrame, and so forth. As long as ClusterLab.ready() is returning False (and ClusterLab.readyFraction() is therefore returning less than 1), there are still “pending” results that will be filled in later. Each call to one of these “query” methods synchronises the notebook with the results computed on the cluster.

In fact ClusterLab has an additional trick up its sleeve, allowing completely disconnected operation. But that’s another topic.

Common problems with clusters

ipyparallel is a fairly basic cluster management system, but one that’s adequate for a lot of strightforward experiments. That means it sometimes need tweaking to work effectively, in ways that rely on you (the user) rather than being automated as might be the case in a more advanced system.

The most common problem is one of overloading. This can occur both for both multicore and multi-machine set-ups, and is when the machine spends so long doing your experiments that it stops being able to do other work. While this may sound like a good thing – an efficient use of resources – some of that “other” work includes communicating with the cluster controller. It’s possible that too many engines crowd-out something essential, which often manifests itself in one of two ways:

  1. You can’t log-in to the machine or run simple processes; or
  2. You can’t retrieve results.

The solution is actually quite straightforward: don’t run as much work! This can easily be done by, for example, always leaving one or two cores free on each machine you use: so an eight-core machine would run six engines, leaving two free for other things.

Clusters versus local parallelism

You probably noticed that, if you have a single multicore workstation, there are two ways to let epyc use it:

There are pros and cons to each approach. For the ParallelLab we have:

  • it’s very simple to start, requiring no extra software to manage; but
  • you only get (at most) as many cores as you have on your local machine; and
  • experiments run synchronously, meaning the program that runs them is locked out until they complete (this is especially inconvenient when using Jupyter).

For the ClusterLab:

  • you need to set up the cluster outside epyc; but
  • experiments run asynchronously, meaning you can get on with other things; and
  • you can use all the cores of all the machines you can get access to.

As a rule of thumb, a suite of experiments likely to take hours or days will be better run on a cluster; shorter campaigns can use local parallelism to get a useful speed-up.