Lab: An environment for running experiments

class epyc.Lab(notebook: epyc.labnotebook.LabNotebook = None, design: epyc.design.Design = None)

A laboratory for computational experiments.

A Lab conducts an experiment at different points in a multi-dimensional parameter space. The default performs all the experiments locally; sub-classes exist to perform remote parallel experiments.

A Lab stores its result in a notebook, an instance of LabNotebook. By default the base Lab class uses an in-memory notebook, essentially just a dict; sub-classes use persistent notebooks to manage larger sets of experiments.

Each lab has an associated Design that turns a set of parameter ranges into a set of individual “points” of the parameter space at which to perform actual experiments. The default is to use a FactorialDesign that performs an experiment for every combination of parameter values. This might be a lot of experiments, and other designs can be used to reduce or modify the space.

Parameters:
  • notebook – the notebook used to store results (defaults to an empty LabNotebook)
  • design – the experimental design to use (defaults to a FactorialDesign)

Lab creation and management

Lab.__init__(notebook: epyc.labnotebook.LabNotebook = None, design: epyc.design.Design = None)

Initialize self. See help(type(self)) for accurate signature.

Lab.open()

Open a lab for business. Sub-classes might insist the they are opened and closed explicitly when experiments are being performed. The default does nothing.

Lab.close()

Shut down a lab. Sub-classes might insist the they are opened and closed explicitly when experiments are being performed. The default does nothing.

Lab.updateResults()

Update the lab’s results. This method is called by all other methods that return results in some sense, and may be overridden to let the results “catch up” with external processing. The default does nothing.

Parameter management

A Lab is equipped with a multi-dimensional parameter space over which to run experiments, one experiment per point. The dimensions of the space can be defined by single values, lists, or iterators that give the points along that dimension. Strings are considered to be single values, even though they’re technically iterable in Python. Experiments are then conducted on the cross product of the dimensions.

Lab.addParameter(k: str, r: Any)

Add a parameter to the experiment’s parameter space. k is the parameter name, and r is its range. The range can be a single value or a list, or any other iterable. (Strings are counted as single values.)

Parameters:
  • k – parameter name
  • r – parameter range
Lab.parameters() → List[str]

Return a list of parameter names.

Returns:a list of parameter names
Lab.__len__() → int

The length of an experiment is the total number of data points that will be explored. This is the length of the experimental configuration returned by experiments().

Returns:the number of experimental runs
Lab.__getitem__(k: str) → Any

Access a parameter range using array notation.

Parameters:k – parameter name
Returns:the parameter range
Lab.__setitem__(k: str, r: Any)

Add a parameter using array notation.

Parameters:
  • k – the parameter name
  • r – the parameter range

Parameters can be dropped, either individually or en masse, to prepare the lab for another experiment. This will often accompany creating or selecting a new result set in the LabNotebook.

Lab.__delitem__(k: str)

Delete a parameter using array notation.

Parameters:k – the key
Lab.deleteParameter(k: str)

Delete a parameter from the parameter space. If the parameter doesn’t exist then this is a no-op.

Parameters:k – the parameter name
Lab.deleteAllParameters()

Delete all parameters from the parameter space.

Building the parameter space

The parameter ranges defined above need to be translated into “points” in the parameter space at which to conduct experiments. This function is delegated to the experimental design, an instance of Design, which turns ranges into points. The design is provided at construction time: by default a FactorialDesign is used, and this will be adequate for most use cases..

Lab.design() → epyc.design.Design

Return the experimental design this lab uses.

Returns:the design
Lab.experiments(e: epyc.experiment.Experiment) → List[Tuple[epyc.experiment.Experiment, Dict[str, Any]]]

Return the experimental configuration, a list consisting of experiments and the points at which they should be run. The structure of the experimental space is defined by the lab’s experimental design, which may also change the experiment to be run.

Parameters:e – the experiment
Returns:an experimental configuration

Running experiments

Running experiments involves providing a Experiment object which can then be executed by setting its parameter point (using Experiment.set()) and then run (by calling Experiment.run()) The Lab co-ordinates the running of the experiment at all the points chosen by the design.

Lab.runExperiment(e: epyc.experiment.Experiment)

Run an experiment over all the points in the parameter space. The results will be stored in the notebook.

Parameters:e – the experiment
Lab.ready(tag: str = None) → bool

Test whether all the results are ready in the tagged result set – that is, none are pending.

Parameters:tag – (optional) the result set to check (default is the current result set)
Returns:True if the results are in
Lab.readyFraction(tag: str = None) → float

Return the fraction of results available (not pending) in the tagged result set after first updating the results.

Parameters:tag – (optional) the result set to check (default is the current result set)
Returns:the ready fraction

Conditional experiments

Sometimes it is useful to run experiments conditionally, for example to create a result set only if it doesn’t already exist. Lab can do this by providing a function to execute in order to populate a result set.

Note

This technique work especially well with Jupyter notebooks, to avoid re-computing some cells. See Avoiding repeated computation.

Lab.createWith(tag: str, f: Callable[[Lab], bool], description: str = None, propagate: bool = True, delete: bool = True, finish: bool = False, deleteAllParameters: bool = True)

Use a function to create a result set.

If the result set already exists in the lab’s notebook, it is selected; if it doesn’t, it is created, selected, and the creation function is called. The creation function is passed a reference to the lab it is populating.

By default any exception in the creation function will cause the incomplete result set to be deleted and the previously current result set to be re-selected: this can be inhibited by setting delete=False. Any raised exception is propagated by default: this can be inhibited by setting propagate = False. The result set can be locked after creation by setting finished=True, as long as the creation was successful: poorly-created result sets aren’t locked.

By default the lab has its parameters cleared before calling the creation function, so that it occurd “clean”. Set deleteAllParameters=False to inhibit this.

Parameters:
  • tag – the result set tag
  • f – the creation function (taking Lab as argument)
  • description – (optional) description if a result set is created
  • propagate – (optional) propagate any excepton (defaults to True)
  • delete – (optional) delete on exception (default is True)
  • finish – (optional) lock the result set after creation (defaults to False)
  • deleteAllParameters – (optional) delete all lab parameters before creation (defaults to True)
Returns:

True if the result set exists already or was properly created

Accessing results

Results of experiments can be accessed directly. via the lab’s underlying LabNotebook, or directly as a DataFrame from the pandas analysis package.

Lab.notebook() → epyc.labnotebook.LabNotebook

Return the notebook being used by this lab.

Returns:the notebook
Lab.results() → List[Dict[str, Dict[str, Any]]]

Return the current results as a list of results dicts after resolving any pending results that have completed. This makes use of the underlying notebook’s current result set. For finer control, access the notebook’s LabNotebook.results() or :meth:LabNotebook.resultsFor` methods directly.

Note that this approach to acquiring results is a lot slower and more memory-hungry than using dataframe(), but may be useful for small sets of results that benefit from a more Pythonic intertface.

Lab.dataframe(only_successful: bool = True) → pandas.core.frame.DataFrame

Return the current results as a pandas DataFrame after resolving any pending results that have completed. This makes use of the underlying notebook’s current result set. For finer control, access the notebook’s LabNotebook.dataframe() or :meth:LabNotebook.dataframeFor` methods directly.

Parameters:only_successful – only return successful results
Returns:the resulting dataset as a DataFrame