LabNotebook
: A persistent store for results¶
-
class
epyc.
LabNotebook
(name: str = '', description: str = None)¶ A “laboratory notebook” collecting together the results obtained from different sets of experiments. A notebook is composed of
ResultSet
objects, which are homogeneous collections of results of experiments performed at different values for the same set of parameters. Each result set is tagged for access, with the notebook using one result set as “current” at any time.The notebook collects together pending results from all result sets so that they can be accessed uniformly. This is used by labs to resolve pending results if there are multiple sets of experiments running simultaneously.
Result sets are immutable, but can be added and deleted freely from notebooks: their contents cannot be changed, however.
Parameters: - name – (optional) the notebook name (may be meaningful for sub-classes)
- description – (optional) a free text description
Metadata access¶
-
LabNotebook.
name
() → str¶ Return the name of the notebook. If the notebook is persistent, this likely relates to its storage in some way (for example a file name).
Returns: the notebook name or None
-
LabNotebook.
description
() → str¶ Return the free text description of the notebook.
Returns: the notebook description
-
LabNotebook.
setDescription
(d: str)¶ Set the free text description of the notebook.
Parameters: d – the description
Persistence¶
Notebooks may be persistent, storing results and metadata to disc. The default implementation is simply in-memory and volatile. Committing a notebook ensures its data is written-through to persistent storage (where applicable).
-
LabNotebook.
isPersistent
() → bool¶ By default notebooks are not persistent.
Returns: False
-
LabNotebook.
commit
()¶ Commit to persistent storage. By default does nothing. This should be called periodically to save intermediate results: it may happen automatically in some sub-classes, depending on their implementation.
With blocks¶
Notebooks support with
blocks, like files. For persistent notebooks
this will ensure that the notebook is committed. (For the default in-memory
notebook this does nothing.)
-
LabNotebook.
open
()¶ Open and close the notebook using a
with
block. For persistent notebooks this will cause the notebook to be committed to persistent storage in a robust manner.
(See JSON file access and HDF5 file access for examples of this method in use.)
The with
block approach is slightly more robust than the explicit
use of LabNotebook.commit()
as the notebook will be committed
even if exceptions are thrown while it is open, ensuring no changes
are lost accidentally. However notebooks are often held open for a
long time while experiments are run and/or analysed, so the explicit
commit can be more natural.
Result sets¶
Results are stored as ResultSet
objects, each with a unique tag.
The notebook allows them to be created, and to be selected to receive
results.
They can also be deleted altogether.
-
LabNotebook.
addResultSet
(tag: str, description: str = None) → epyc.resultset.ResultSet¶ Start a new experiment. This creates a new result set to hold the results, which will receive any results and notes.
Parameters: tag – unique tag for this result set Param: (optional) free text description of the result set Returns: the result set
-
LabNotebook.
deleteResultSet
(rs: Union[str, epyc.resultset.ResultSet])¶ Delete a result set. The default result set can’t be deleted: this ensures that a notebook always has at least one result set.
Parameters: rs – the result set or its tag
-
LabNotebook.
resultSet
(tag: str) → epyc.resultset.ResultSet¶ Return the tagged result set.
Parameters: tag – the tag Returns: the result set
-
LabNotebook.
resultSets
() → List[str]¶ Return the tags for all the result sets in this notebook.
Returns: a list of keys
-
LabNotebook.
keys
() → List[str]¶ Return the result set tags in this notebook. The same as
resultSets()
.Returns: the result set tags
-
LabNotebook.
numberOfResultSets
() → int¶ Return the number of result sets in this notebook.
Returns: the number of result sets
-
LabNotebook.
__len__
() → int¶ Return the number of result sets in this notebook. Same as
numberOfResultSets()
.Returns: the number of result sets
-
LabNotebook.
__contains__
(tag: str) → bool¶ Tests if the given result set ic contained in this notebook.
Parameters: tag – the result set tag Returns: True if the result set exists
-
LabNotebook.
resultSetTag
(rs: epyc.resultset.ResultSet) → str¶ Return the tag associated with the given result set.
Parameters: rs – the result set Returns: the tag
-
LabNotebook.
current
() → epyc.resultset.ResultSet¶ Return the current result set.
Returns: the result set
-
LabNotebook.
currentTag
() → str¶ Return the tag of the current result set.
Returns: the tag
-
LabNotebook.
select
(tag: str) → epyc.resultset.ResultSet¶ Select the given result set as current. Sub-classes may use this to manage memory, for example by swapping-out non-current result sets.
Parameters: tag – the tag Returns: the result set
Conditional creation of result sets¶
Sometimes it’s useful to create a result set in an “all or nothing” fashion: if it already exists then do nothing.
-
LabNotebook.
already
(tag: str, description: str = None) → bool¶ Check whether a result set exists. If it does, select it and return True; if it doesn’t, add it and return False. This is a single-call combination of
contains()
andselect()
that’s useful for avoiding repeated computation.Parameters: - tag – the result set tag
- description – (optional) description if a result set is created
Returns: True if the set existed
Note
See the Lab.createWith()
method for a more conmvenient way to
use this function.
Result storage and access¶
Results are stored using the results dict structure of parameters, experimental results, and metadata. There may be many results dicts associated with each parameter point.
-
LabNotebook.
addResult
(results: Union[Dict[str, Dict[str, Any]], List[Dict[str, Dict[str, Any]]]], tag: str = None)¶ Add one or more results dicts to the current result set. Each should be a results dict as returned from an instance of
Experiment
, that contains metadata, parameters, and result.The results may include one or more nested results dicts, for example as returned by
RepeatedExperiment
, whose results are a list of results at the same point in the parameter space. In this case the embedded results will themselves be unpacked and added.One may also add a list of results dicts, in which case they will be added individually.
Any structure of results dicts that can’t be handled will raise a
ResultsStructureException
.Parameters: - result – a results dict or collection of them
- tag – (optional) result set to add tp (defalts to the current result set)
Results can be accessed in a number of ways: all together; as a
pandas.DataFrame
object for easier analysis; or as a list
corresponding to a particular parameter point.
-
LabNotebook.
numberOfResults
(tag: str = None) → int¶ Return the number of results in the tagged dataset.
Params tag: (optional) the result set tag (defaults to the current set) Returns: the number of results
-
LabNotebook.
__len__
() → int Return the number of result sets in this notebook. Same as
numberOfResultSets()
.Returns: the number of result sets
-
LabNotebook.
results
(tag: str = None) → List[Dict[str, Dict[str, Any]]]¶ Return results as a list of results dicts. If no tag is provided, use the current result set. This is a lot slower and more memory-hungry than using
dataframe()
(which is therefore to be preferred), but may be useful for small sets of results that need a more Pythonic interface than that provided by DataFrames. You can pre-filter the results dicts to those matching only some parameters combinations usingresultsFor()
.Params tag: (optional) the tag of the result set (defaults to the currently select result set) Returns: the results dicts
-
LabNotebook.
resultsFor
(params: Dict[str, Any], tag: str = None) → List[Dict[str, Dict[str, Any]]]¶ Return results for the given parameter values a list of results dicts. If no tag is provided, use the current result set. This is a lot slower and more memory-hungry than using
dataframeFor()
(which is therefore to be preferred), but may be useful for small sets of results that need a more Pythonic interface than that provided by DataFrames.Parameters: params – the experimental parameters Returns: results dicts
-
LabNotebook.
dataframe
(tag: str = None, only_successful: bool = True) → pandas.core.frame.DataFrame¶ Return results as a
pandas.DataFrame
. If no tag is provided, use the current result set.If the only_successful flag is set (the default), then the DataFrame will only include results that completed without an exception; if it is set to False, the DataFrame will include all results and also the exception details.
If you are only interested in results corresponding to some sets of parameters you can pre-filter the dataframe using
dataframeFor()
.Params tag: (optional) the tag of the result set (defaults to the currently select result set) Parameters: only_successful – include only successful experiments (defaults to True) Returns: the parameters, results, and metadata in a DataFrame
-
LabNotebook.
dataframeFor
(params: Dict[str, Any], tag: str = None, only_successful: bool = True) → pandas.core.frame.DataFrame¶ Return results for the goven parameter values as a
pandas.DataFrame
. If no tag is provided, the current result set is queried. If the only_successful flag is set (the default), then the DataFrame will only include results that completed without an exception; if it is set to False, the DataFrame will include all results and also the exception details.Parameters: - params – the experimental parameters
- only_successful – include only successful experiments (defaults to True)
Params tag: (optional) the tag of the result set (defaults to the currently select result set)
Returns: the parameters, results, and metadata in a DataFrame
Pending results¶
Pending results allow a notebook to keep track of on-going
experiments, and are used by some Lab
sub-classes (for
example ClusterLab
) to manage submissions to a compute
cluster. A pending result is identified by some unique identifier,
typically a job id. Pending results can be resolved (have their
results filled in) using LabNotebook.addResult()
, or can be
cancelled, which removes the record from the notebook but not from
the lab managing the underlying job.
Since a notebook can have multiple result sets, the pending results interface is split into three parts. Firstly there are the operations on the currently-selected result set.
-
LabNotebook.
addPendingResult
(params: Dict[str, Any], jobid: str, tag: str = None)¶ Add a pending result for the given point in the parameter space under the given job identifier to the current result set. The identifier will generally be meaningful to the lab that submitted the request, and must be unique.
Parameters: - params – the experimental parameters
- jobid – the job id
- tag – (optional) the tag of the result set receiving the pending result (defaults to the current result set)
-
LabNotebook.
numberOfPendingResults
(tag: str = None) → int¶ Return the number of results pending in the tagged dataset.
Params tag: (optional) the result set tag (defaults to the current set) Returns: the number of results
-
LabNotebook.
pendingResults
(tag: str = None) → List[str]¶ Return the identifiers of the results pending in the tagged dataset.
Params tag: (optional) the result set tag (defaults to the current set) Returns: a set of job identifiers
Secondly, there are operations that work on any result set. You can resolve or cancel a pending result simply by knowing its job id and regardless of which is the currently selected result set.
-
LabNotebook.
resolvePendingResult
(rc: Dict[str, Dict[str, Any]], jobid: str)¶ Resolve the pending result with the given job id with the given results dict. The experimental parameters of the result are sanity-checked against what the result set expected for that job.
The result may not be pending within the current result set, but can be within any result set in the notebook. This will not affect the result set that is selected as current.
Parameters: - rc – the results dict
- jobid – the job id
-
LabNotebook.
cancelPendingResult
(jobid: str)¶ Cancel the given pending result.
The result may not be pending within the current result set, but can be within any result set in the notebook. This will not affect the result set that is selected as current.
Parameters: jobid – the job id
You can also check whether there are pending results remaining in any result set, which defaults to the surrently selected result set.
-
LabNotebook.
ready
(tag: str = None) → bool¶ Test whether the result set has pending results.
Params tag: (optional) the result set tag (defaults to the current set) Returns: True if all pending results have been resolved (or cancelled)
-
LabNotebook.
readyFraction
(tag: str = None) → float¶ Test what fraction of results are available in the tagged result set.
Params tag: (optional) the result set tag (defaults to the current set) Returns: the fraction of available results
Thirdly, there are operations that work on all result sets.
-
LabNotebook.
allPendingResults
() → Set[str]¶ Return the identifiers for all pending results in all result sets.
Returns: a set of job identifiers
-
LabNotebook.
numberOfAllPendingResults
() → int¶ Return the number of results pending in all result sets.
Returns: the total number of pending results
Locking the notebook¶
Locking a notebook prevents further updates: result sets cannot be added, all pending results are cancelled, and all individual result sets locked. Locking is preserved for persistent notebooks, so once locked a notebook is locked forever.
-
LabNotebook.
finish
(commit: bool = True)¶ Mark the entire notebook as finished, closing and locking all result sets against further changes. Finishing a persistent notebook commits it.
By default the finished notebook is committed as such. In certain cases it may be desirable to finish the notebook but not commit it, i.e., to stop updates in memory without changing the backing file. Setting
commit=False
will accomplish this.Parameters: commit – (optional) commit the notebook (defaults to True)
-
LabNotebook.
isLocked
() → bool¶ Returns true if the notebook is locked.
Returns: True if the notebook is locked