LabNotebook: A persistent store for results

class epyc.LabNotebook(name: str = '', description: str = None)

A “laboratory notebook” collecting together the results obtained from different sets of experiments. A notebook is composed of ResultSet objects, which are homogeneous collections of results of experiments performed at different values for the same set of parameters. Each result set is tagged for access, with the notebook using one result set as “current” at any time.

The notebook collects together pending results from all result sets so that they can be accessed uniformly. This is used by labs to resolve pending results if there are multiple sets of experiments running simultaneously.

Result sets are immutable, but can be added and deleted freely from notebooks: their contents cannot be changed, however.

Parameters:
  • name – (optional) the notebook name (may be meaningful for sub-classes)
  • description – (optional) a free text description

Metadata access

LabNotebook.name() → str

Return the name of the notebook. If the notebook is persistent, this likely relates to its storage in some way (for example a file name).

Returns:the notebook name or None
LabNotebook.description() → str

Return the free text description of the notebook.

Returns:the notebook description
LabNotebook.setDescription(d: str)

Set the free text description of the notebook.

Parameters:d – the description

Persistence

Notebooks may be persistent, storing results and metadata to disc. The default implementation is simply in-memory and volatile. Committing a notebook ensures its data is written-through to persistent storage (where applicable).

LabNotebook.isPersistent() → bool

By default notebooks are not persistent.

Returns:False
LabNotebook.commit()

Commit to persistent storage. By default does nothing. This should be called periodically to save intermediate results: it may happen automatically in some sub-classes, depending on their implementation.

With blocks

Notebooks support with blocks, like files. For persistent notebooks this will ensure that the notebook is committed. (For the default in-memory notebook this does nothing.)

LabNotebook.open()

Open and close the notebook using a with block. For persistent notebooks this will cause the notebook to be committed to persistent storage in a robust manner.

(See JSON file access and HDF5 file access for examples of this method in use.)

The with block approach is slightly more robust than the explicit use of LabNotebook.commit() as the notebook will be committed even if exceptions are thrown while it is open, ensuring no changes are lost accidentally. However notebooks are often held open for a long time while experiments are run and/or analysed, so the explicit commit can be more natural.

Result sets

Results are stored as ResultSet objects, each with a unique tag. The notebook allows them to be created, and to be selected to receive results. They can also be deleted altogether.

LabNotebook.addResultSet(tag: str, description: str = None) → epyc.resultset.ResultSet

Start a new experiment. This creates a new result set to hold the results, which will receive any results and notes.

Parameters:tag – unique tag for this result set
Param:(optional) free text description of the result set
Returns:the result set
LabNotebook.deleteResultSet(rs: Union[str, epyc.resultset.ResultSet])

Delete a result set. The default result set can’t be deleted: this ensures that a notebook always has at least one result set.

Parameters:rs – the result set or its tag
LabNotebook.resultSet(tag: str) → epyc.resultset.ResultSet

Return the tagged result set.

Parameters:tag – the tag
Returns:the result set
LabNotebook.resultSets() → List[str]

Return the tags for all the result sets in this notebook.

Returns:a list of keys
LabNotebook.keys() → List[str]

Return the result set tags in this notebook. The same as resultSets().

Returns:the result set tags
LabNotebook.numberOfResultSets() → int

Return the number of result sets in this notebook.

Returns:the number of result sets
LabNotebook.__len__() → int

Return the number of result sets in this notebook. Same as numberOfResultSets().

Returns:the number of result sets
LabNotebook.__contains__(tag: str) → bool

Tests if the given result set ic contained in this notebook.

Parameters:tag – the result set tag
Returns:True if the result set exists
LabNotebook.resultSetTag(rs: epyc.resultset.ResultSet) → str

Return the tag associated with the given result set.

Parameters:rs – the result set
Returns:the tag
LabNotebook.current() → epyc.resultset.ResultSet

Return the current result set.

Returns:the result set
LabNotebook.currentTag() → str

Return the tag of the current result set.

Returns:the tag
LabNotebook.select(tag: str) → epyc.resultset.ResultSet

Select the given result set as current. Sub-classes may use this to manage memory, for example by swapping-out non-current result sets.

Parameters:tag – the tag
Returns:the result set

Conditional creation of result sets

Sometimes it’s useful to create a result set in an “all or nothing” fashion: if it already exists then do nothing.

LabNotebook.already(tag: str, description: str = None) → bool

Check whether a result set exists. If it does, select it and return True; if it doesn’t, add it and return False. This is a single-call combination of contains() and select() that’s useful for avoiding repeated computation.

Parameters:
  • tag – the result set tag
  • description – (optional) description if a result set is created
Returns:

True if the set existed

Note

See the Lab.createWith() method for a more conmvenient way to use this function.

Result storage and access

Results are stored using the results dict structure of parameters, experimental results, and metadata. There may be many results dicts associated with each parameter point.

LabNotebook.addResult(results: Union[Dict[str, Dict[str, Any]], List[Dict[str, Dict[str, Any]]]], tag: str = None)

Add one or more results dicts to the current result set. Each should be a results dict as returned from an instance of Experiment, that contains metadata, parameters, and result.

The results may include one or more nested results dicts, for example as returned by RepeatedExperiment, whose results are a list of results at the same point in the parameter space. In this case the embedded results will themselves be unpacked and added.

One may also add a list of results dicts, in which case they will be added individually.

Any structure of results dicts that can’t be handled will raise a ResultsStructureException.

Parameters:
  • result – a results dict or collection of them
  • tag – (optional) result set to add tp (defalts to the current result set)

Results can be accessed in a number of ways: all together; as a pandas.DataFrame object for easier analysis; or as a list corresponding to a particular parameter point.

LabNotebook.numberOfResults(tag: str = None) → int

Return the number of results in the tagged dataset.

Params tag:(optional) the result set tag (defaults to the current set)
Returns:the number of results
LabNotebook.__len__() → int

Return the number of result sets in this notebook. Same as numberOfResultSets().

Returns:the number of result sets
LabNotebook.results(tag: str = None) → List[Dict[str, Dict[str, Any]]]

Return results as a list of results dicts. If no tag is provided, use the current result set. This is a lot slower and more memory-hungry than using dataframe() (which is therefore to be preferred), but may be useful for small sets of results that need a more Pythonic interface than that provided by DataFrames. You can pre-filter the results dicts to those matching only some parameters combinations using resultsFor().

Params tag:(optional) the tag of the result set (defaults to the currently select result set)
Returns:the results dicts
LabNotebook.resultsFor(params: Dict[str, Any], tag: str = None) → List[Dict[str, Dict[str, Any]]]

Return results for the given parameter values a list of results dicts. If no tag is provided, use the current result set. This is a lot slower and more memory-hungry than using dataframeFor() (which is therefore to be preferred), but may be useful for small sets of results that need a more Pythonic interface than that provided by DataFrames.

Parameters:params – the experimental parameters
Returns:results dicts
LabNotebook.dataframe(tag: str = None, only_successful: bool = True) → pandas.core.frame.DataFrame

Return results as a pandas.DataFrame. If no tag is provided, use the current result set.

If the only_successful flag is set (the default), then the DataFrame will only include results that completed without an exception; if it is set to False, the DataFrame will include all results and also the exception details.

If you are only interested in results corresponding to some sets of parameters you can pre-filter the dataframe using dataframeFor().

Params tag:(optional) the tag of the result set (defaults to the currently select result set)
Parameters:only_successful – include only successful experiments (defaults to True)
Returns:the parameters, results, and metadata in a DataFrame
LabNotebook.dataframeFor(params: Dict[str, Any], tag: str = None, only_successful: bool = True) → pandas.core.frame.DataFrame

Return results for the goven parameter values as a pandas.DataFrame. If no tag is provided, the current result set is queried. If the only_successful flag is set (the default), then the DataFrame will only include results that completed without an exception; if it is set to False, the DataFrame will include all results and also the exception details.

Parameters:
  • params – the experimental parameters
  • only_successful – include only successful experiments (defaults to True)
Params tag:

(optional) the tag of the result set (defaults to the currently select result set)

Returns:

the parameters, results, and metadata in a DataFrame

Pending results

Pending results allow a notebook to keep track of on-going experiments, and are used by some Lab sub-classes (for example ClusterLab) to manage submissions to a compute cluster. A pending result is identified by some unique identifier, typically a job id. Pending results can be resolved (have their results filled in) using LabNotebook.addResult(), or can be cancelled, which removes the record from the notebook but not from the lab managing the underlying job.

Since a notebook can have multiple result sets, the pending results interface is split into three parts. Firstly there are the operations on the currently-selected result set.

LabNotebook.addPendingResult(params: Dict[str, Any], jobid: str, tag: str = None)

Add a pending result for the given point in the parameter space under the given job identifier to the current result set. The identifier will generally be meaningful to the lab that submitted the request, and must be unique.

Parameters:
  • params – the experimental parameters
  • jobid – the job id
  • tag – (optional) the tag of the result set receiving the pending result (defaults to the current result set)
LabNotebook.numberOfPendingResults(tag: str = None) → int

Return the number of results pending in the tagged dataset.

Params tag:(optional) the result set tag (defaults to the current set)
Returns:the number of results
LabNotebook.pendingResults(tag: str = None) → List[str]

Return the identifiers of the results pending in the tagged dataset.

Params tag:(optional) the result set tag (defaults to the current set)
Returns:a set of job identifiers

Secondly, there are operations that work on any result set. You can resolve or cancel a pending result simply by knowing its job id and regardless of which is the currently selected result set.

LabNotebook.resolvePendingResult(rc: Dict[str, Dict[str, Any]], jobid: str)

Resolve the pending result with the given job id with the given results dict. The experimental parameters of the result are sanity-checked against what the result set expected for that job.

The result may not be pending within the current result set, but can be within any result set in the notebook. This will not affect the result set that is selected as current.

Parameters:
  • rc – the results dict
  • jobid – the job id
LabNotebook.cancelPendingResult(jobid: str)

Cancel the given pending result.

The result may not be pending within the current result set, but can be within any result set in the notebook. This will not affect the result set that is selected as current.

Parameters:jobid – the job id

You can also check whether there are pending results remaining in any result set, which defaults to the surrently selected result set.

LabNotebook.ready(tag: str = None) → bool

Test whether the result set has pending results.

Params tag:(optional) the result set tag (defaults to the current set)
Returns:True if all pending results have been resolved (or cancelled)
LabNotebook.readyFraction(tag: str = None) → float

Test what fraction of results are available in the tagged result set.

Params tag:(optional) the result set tag (defaults to the current set)
Returns:the fraction of available results

Thirdly, there are operations that work on all result sets.

LabNotebook.allPendingResults() → Set[str]

Return the identifiers for all pending results in all result sets.

Returns:a set of job identifiers
LabNotebook.numberOfAllPendingResults() → int

Return the number of results pending in all result sets.

Returns:the total number of pending results

Locking the notebook

Locking a notebook prevents further updates: result sets cannot be added, all pending results are cancelled, and all individual result sets locked. Locking is preserved for persistent notebooks, so once locked a notebook is locked forever.

LabNotebook.finish(commit: bool = True)

Mark the entire notebook as finished, closing and locking all result sets against further changes. Finishing a persistent notebook commits it.

By default the finished notebook is committed as such. In certain cases it may be desirable to finish the notebook but not commit it, i.e., to stop updates in memory without changing the backing file. Setting commit=False will accomplish this.

Parameters:commit – (optional) commit the notebook (defaults to True)
LabNotebook.isLocked() → bool

Returns true if the notebook is locked.

Returns:True if the notebook is locked