Second tutorial: Parallel execution¶
epyc
’s main utility comes from being able to run experiments, like those we defined in
the first tutorial and ran on a single machine, on multicore machines and clusters of machines.
In this tutorial we’ll explain how epyc
manages parallel machines.
(If you know about parallel computing, then it’ll be enough for you to know that epyc
creates
a task farm of experiments across multiple cores. If this didn’t make sense, then you
should first read Parallel processing concepts.)
Two ways to get parallelism¶
epyc
arranges the execution of experiments around the Lab
class, which handles
the execution of experiments across a parameter space. The default Lab
executes
experiments sequentially.
But what if you have more than one core? – very common on modern workstations. Or if
you have access to a cluster of machines? Then epyc
can make use of these resources
with no change to your experiment code.
If you have a multicore machine, the easiest way to use it with epyc
is to replace
the Lab
managing the experiments with a ParallelLab
to get
local parallelism. This will execute
experiments in parallel using the available cores. (You can limit the number of cores
used if you want to.) For example:
from epyc import ParallelLab, HDF5LabNotebook
nb = HF5LabNotebook('mydata.h5', create=True)
lab = ParallelLab(nb, cores=-1) # leave one core free
e = MyExperiment()
lab['first'] = range(1, 1000)
lab.runExperiment(e)
On a machine with, say, 16 cores, this will use 15 of the cores to run experiments and return when they’re all finished.
If you have a cluster, things are a little more complicated as you need to set up
some extra software to manage the cluster for you. Once that’s done, though, accessing
the cluster from, epyc
is largely identical to accessing local parallelism.
Setting up a compute cluster¶
epyc
doesn’t actually implement parallel computing itself: instead it builds on top of
existing Python infrastructure for this purpose. The underlying library epyc
uses is
called ipyparallel, which provides
portable parallel processing on both multicore machines and collections of machines.
Warning
Confusingly, there’s also a system called PyParallel which is a completely
different beast to ipyparallel
.
epyc
wraps-up ipyparallel
within the framework of experiments, labs, and notebooks,
so that, when using epyc
, there’s no need to interact directly with``ipyparallel``.
However, before we get to that stage we do need to set up the parallel compute cluster that
epyc
will use, and (at present) this does require interacting to some degree with
ipyparallel
’s commands.
Setting up a cluster depends on what kind of cluster you have, and we’ll describe each one individually. It’s probably easiest to start with the simplest system to which you have access, and then – if and when you need more performance – move onto the more advanced systems.
Running experiments on a cluster¶
Having set uo the cluster of whatever kind, we can now let epyc
run experiments on it.
This involves using a ClusterLab
, which is simply a Lab
that runs
experiments remotely on a cluster rather than locally on the machine with the lab.
We create a ClusterLab
in the same way as “ordinary” labs:
clab = epyc.ClusterLab(profile = 'cluster',
notebook = epyc.JSONLabNotebook('our-work.json'))
The lab thus created will connect to the cluster described in the cluster
profile
(which must already have been created and started).
A ClusterLab
behaves like Lab
in most respects: we can set a
parameter space, run a set of experiments at each point in the space, and so forth.
But they differ in one important respect. Runn ing an experiment in a Lab
is a synchronous process: when you call Lab.runExperiment()
you wait until the
experiments finish before regaining control. That’s fine for small cases, but what if
you’re wanting top run a huge computation? – many repetitionsd of experiments across
a large parameter space? That after all is the reason we want to do parallel computing:
to support large computations. It would be inconvenient to say the least if performing
such experiments locked-up a computer forr a long period.
ClusterLab
differs from Lab
by being asynchronous. When you
call ClusterLab.runExperiment()
, the experiments are submitted to the cluster in one
go and control returns to your program: the computation happens “in the background”
on the cluster.
So suppose we go back to our example of computing a curve. This wasn’t a great example for a sequential lab, and it’s monumentally unrealistic for parallel computation except as an example. We can set up the parameter space and run them all in parallel using the same syntax as before:
clab['x'] = numpy.linspace(-2 * numpy.pi, 2 * numpy.pi)
clab['y'] = numpy.linspace(-2 * numpy.pi, 2 * numpy.pi)
clab.runExperiment(CurveExperiment())
Control will return immediately, as the computation is spun-up on the cluster.
How can we tell when we’re finished? There are three ways. The first is to make the whole comp[utation synchronous by waiting for it to finish:
clab.wait()
This will lock-up your computer waiting for all the experiments to finish. That’s not very flexible. We can instead test whether the computations have finished:
clab.ready()
which wil return True
when everything has finished. But that might take a long time,
and we might want to get results as they become available – for example to plot them
partially. We can see what fraction of experiments are finished using:
clab.readyFraction()
which returns a number between 0 and 1 indicating how far along we are.
As results come in, they’re stored in the lab’s notebook and can be retrieved
as normal: as a list of dicts, as a DataFrame
, and
so forth. As long as ClusterLab.ready()
is returning False
(and
ClusterLab.readyFraction()
is therefore returning less than 1), there are
still “pending” results that will be filled in later. Each call to one of these
“query” methods synchronises the notebook with the results computed on the
cluster.
In fact ClusterLab
has an additional trick up its sleeve, allowing
completely disconnected operation. But that’s another topic.
Common problems with clusters¶
ipyparallel
is a fairly basic cluster management system, but one that’s adequate
for a lot of strightforward experiments. That means it sometimes need tweaking to
work effectively, in ways that rely on you (the user) rather than being automated
as might be the case in a more advanced system.
The most common problem is one of overloading. This can occur both for both multicore and multi-machine set-ups, and is when the machine spends so long doing your experiments that it stops being able to do other work. While this may sound like a good thing – an efficient use of resources – some of that “other” work includes communicating with the cluster controller. It’s possible that too many engines crowd-out something essential, which often manifests itself in one of two ways:
- You can’t log-in to the machine or run simple processes; or
- You can’t retrieve results.
The solution is actually quite straightforward: don’t run as much work! This can easily be done by, for example, always leaving one or two cores free on each machine you use: so an eight-core machine would run six engines, leaving two free for other things.
Clusters versus local parallelism¶
You probably noticed that, if you have a single multicore workstation, there are two ways
to let epyc
use it:
- a
ParallelLab
; or - a
ClusterLab
that happens to only run engines locally.
There are pros and cons to each approach. For the ParallelLab
we have:
- it’s very simple to start, requiring no extra software to manage; but
- you only get (at most) as many cores as you have on your local machine; and
- experiments run synchronously, meaning the program that runs them is locked out until they complete (this is especially inconvenient when using Jupyter).
For the ClusterLab
:
- you need to set up the cluster outside
epyc
; but - experiments run asynchronously, meaning you can get on with other things; and
- you can use all the cores of all the machines you can get access to.
As a rule of thumb, a suite of experiments likely to take hours or days will be better run on a cluster; shorter campaigns can use local parallelism to get a useful speed-up.