ResultSet
: A homogeneous collection of results from experiments¶
-
class
epyc.
ResultSet
(description: str = None)¶ A “page” in a lab notebook for the results of a particular set of experiments. This will consist of metadata, notes, and a data table resulting from the execution of the experiment. Each experiment runs with a specific set of parameters: the parameter names are fixed once set initially, with the specific values being stored alongside each result. There may be multiple results for the same parameters, to allow for repetition of experiments at a data point. Results committ5ed to result sets are immutable: once entered, a result can’t be deleted or changed.
Result sets also record “pending” results, allowing us to record experiments in progress. A pending result can be finalised by providing it with a value, or can be cancelled.
A result set can be used very Pythonically using a results dict holding the metadata, parameters, and results of experiments. For larger experiment sets the results are automatically typed using
numpy
’sdtype
system, which both provides more checking and works well with more archival storage formats like HDF5 (seeHDF5LabNotebook
).Parameters: - nb – notebook this result set is part of
- description – (optional) description for the result set (defaults to a datestamp)
Important
Most interactions with results should go through a LabNotebook
to allow
for management of persistence and so on.
Adding results¶
Results can be added one at a time to the result set. Since result sets are persistent there are no other operations.
-
ResultSet.
addSingleResult
(rc: Dict[str, Dict[str, Any]])¶ Add a single result. This should be a single results dict as returned from an instance of
Experiment
, that contains metadata, parameters, and result.The results dict may add metadata, parameters, or results to the result set, and these will be assumed to be present from then on. Missing values in previously-saved results will receive default values.
Parameters: rc – a results dict
The LabNotebook.addResult()
has a much more flexible approach to addition that
handles adding lists of results at one time.
Retrieving results¶
A result set offers two distinct ways to access results: as results dicts,
or as a pandas.DataFrame
. The former is often easier on small scales,
the latter for large scales.
-
ResultSet.
numberOfResults
() → int¶ Return the number of results in the results set, including any repetitions at the same parameter point.
Returns: the total number of results
-
ResultSet.
__len__
() → int¶ Return the number of results in the results set, including any repetitions at the same parameter point.mEquivalent to
numberOfResults()
.Returns: the number of results
-
ResultSet.
results
() → List[Dict[str, Dict[str, Any]]]¶ Return all the results as a list of results dicts. This is useful for avoiding the use of
pandas
and having a more Pythonic interface – which is also a lot less efficient and more memory-hungry.Returns: a list of results dicts
-
ResultSet.
resultsFor
(params: Dict[str, Any]) → List[Dict[str, Dict[str, Any]]]¶ Return all the results for the given paramneters as a list of results dicts. This is useful for avoiding the use of
pandas
and having a more Pythonic interface – which is also a lot less efficient and more memory-hungry. The parameters are interpreted as fordataframeFor()
, with lists or other iterators being converted into disjunctions of values.Parameters: params – the parameters Returns: a list of results dicts
-
ResultSet.
dataframe
(only_successful: bool = False) → pandas.core.frame.DataFrame¶ Return all the available results. The results are returned as a pandas DataFrame object, which is detached from the results held in the result set, thereby keeping the result set itself immutable.
You can pre-filter the contents of the dataframe to only include results for specific parameter values using
dataframeFor()
. You can also discard any unsuccessful results the using only_successful flag.Parameters: only_successful – (optional) filter out any failed results (defaults to False) Returns: a dataframe of results
-
ResultSet.
dataframeFor
(params: Dict[str, Any], only_successful: bool = False) → pandas.core.frame.DataFrame¶ Extract a dataframe the results for only the given set of parameters. These need not be all the parameters for the experiments, so it’s possible to project-out all results for a sub-set of the parameters. If a parameter is mapped to an iterator or list then these are treated as disjunctions and select all results with any of these values for that parameter.
An empty set of parameters filters out nothing and so returns all the results. This is far less efficient that calling
dataframe()
.The results are returned as a pandas DataFrame object, which is detached from the results held in the result set, thereby keeping the result set itself immutable.
You can also discard any unsuccessful results the using only_successful flag.
Parameters: - params – a dict of parameters and values
- only_successful – (optional) filter out any failed results (defaults to False)
Returns: a dataframe containing results matching the parameter constraints
Important
The results dict access methods return all experiments, or all that have the specified parameters, regardless of whether they were successful or not. The dataframe access methods can pre-filter to extract only the successful experiments.
Parameter ranges¶
A result set can hold results for a range of parameter values. These are all returned as part of the results dicts or dataframes, but it can be useful to access them alone as well, independntly of specific results. The ranges returned by these methods refer only to real results.
-
ResultSet.
parameterRange
(param: str) → Set[Any]¶ Return all the values for this parameter for which we have results.
Parameters: param – the parameter name Returns: a collection of values for which we have data
-
ResultSet.
parameterSpace
() → Dict[str, Any]¶ Return a dict mapping parameter names to all their values, which is the space of all possible paramater points at which results could have been collected. This does not guarantee that all combinations of values have results associated with them: that function is provided by
parameterCombinations()
.Returns: a dict mapping parameter names to their ranges
-
ResultSet.
parameterCombinations
() → List[Dict[str, Any]]¶ Return a list of all combinations of parameters for which we have results, as a list of dicts. This means that there are results (possible more than one set) associated with the combination of parameters in each dict. The ranges of the parameters can be found using
parameterSpace()
.Returns: a list of dicts
Managing pending results¶
Pending results are those that are in the process of being computed based on a set of experimental parameters.
-
ResultSet.
pendingResults
() → List[str]¶ Return the job identifiers of all pending results.
Returns: a list of pending job identifiers
-
ResultSet.
numberOfPendingResults
() → int¶ Return the number of pending results.
Returns: the number of pending results
-
ResultSet.
pendingResultsFor
(params: Dict[str, Any]) → List[str]¶ Return the ids of all pending results with the given parameters. Not all parameters have to be provided, allowing for partial matching.
Parameters: params – the experimental parameters Returns: a list of job ids
-
ResultSet.
pendingResultParameters
(jobid: str) → Dict[str, Any]¶ Return a dict of the parameters for the given pending result.
Parameters: jobid – the job id Returns: a dict of parameter values
-
ResultSet.
ready
() → bool¶ Test whether there are pending results.
Returns: True if all pending results have been either resolved or cancelled
Three methods within the interface are used by LabNotebook
to management
pending results. They shouldn’t be needed from user code.
-
ResultSet.
addSinglePendingResult
(params: Dict[str, Any], jobid: str)¶ Add a pending result for the given point in the parameter space under the given job identifier. The identifier will generally be meaningful to the lab that submitted the request. They must be unique.
Parameters: - params – the experimental parameters
- jobid – the job id
-
ResultSet.
cancelSinglePendingResult
(jobid: str)¶ Cancel a pending job, This records the cancellation using a
CancelledException
, storing a traceback to show where the cancellation was triggered from. User code should callLabNotebook.cancelPendingResult()
rather than using this method directly.Cancelling a result generates a message to standard output.
Parameters: jobid – the job id
-
ResultSet.
resolveSinglePendingResult
(jobid: str)¶ Resolve the given pending result. This drops the job from the pending results table. User code should call
LabNotebook.resolvePendingResult()
rather than using this method directly, since this method doesn’t actually store the completed pending result, it just manages its non-pending-ness.Parameters: jobid – the job id
Metadata access¶
The result set gives access to its description and the names of the various elements it stores. These names may change over time, if for example you add a results dict that has extra results than those you added earlier.
-
ResultSet.
description
() → str¶ Return the free text description of the result set.
Returns: the description
-
ResultSet.
setDescription
(d: str)¶ Set the free text description of the result set.
Parameters: d – the description
Important
You can change the description of a result set after it’s been created – but you can’t change any results that’ve been added to it.
-
ResultSet.
names
() → Dict[str, Optional[List[str]]]¶ Return a dict of sets of names, corresponding to the entries in the results dicts for this result set. If only pending results have so far been added the
Experiment.METADATA
andExperiment.RESULTS
sets will be empty.Returns: the dict of parameter names
-
ResultSet.
metadataNames
() → List[str]¶ Return the set of metadata names associated with this result set. If no results have been submitted, this set will be empty.
Returns: the set of experimental metadata names
-
ResultSet.
parameterNames
() → List[str]¶ Return the set of parameter names associated with this result set. If no results (pending or real) have been submitted, this set will be empty.
Returns: the set of experimental parameter names
-
ResultSet.
resultNames
() → List[str]¶ Return the set of result names associated with this result set. If no results have been submitted, this set will be empty.
Returns: the set of experimental result names
The result set can also have attributes set, which can be accessed either using methods or by treating the result set as a dict.
-
ResultSet.
setAttribute
(k: str, v: Any)¶ Set the given attribute.
Parameters: - k – the key
- v – the attribute value
-
ResultSet.
getAttribute
(k: str) → Any¶ Retrieve the given attribute. A KeyException will be raised if the attribute doesn’t exist.
Parameters: k – the attribute Returns: the attribute value
-
ResultSet.
keys
() → Set[str]¶ Return the set of attributes.
Returns: the attribute keys
-
ResultSet.
__contains__
(k: str)¶ True if there is an attribute with the given name.
Oparam k: the attribute Returns: True if that attribute exists
-
ResultSet.
__setitem__
(k: str, v: Any)¶ Set the given attribute. The dict-like form of
setAttribute()
.Parameters: - k – the key
- v – the attribute value
-
ResultSet.
__getitem__
(k: str) → Any¶ Retrieve the given attribute. The dict-like form of
getAttribute()
.Parameters: k – the attribute Returns: the attribute value
-
ResultSet.
__delitem__
(k: str)¶ Delete the named attribute. This method is invoiked by the
del
operator. A KeyException will be raised if the attribute doesn’t exist.Parameters: k – the attribute
There are various uses for these attributes: see Making data archives for one common use case.
Important
The length of a result set (ResultSet.__len__()
) refers to the
number of results, not to the number of attributes (as would be the
case for a dict).
Locking¶
Once the set of experiments to be held in a result set is finished, it’s probably sensible to prevent any further updated. This is accomplished by “finishing” the result set, leaving it locked against any further updates.
-
ResultSet.
finish
()¶ Finish and lock this result set. This cancels any pending results and locks the result set against future additions. This is useful to tidy up after experiments are finished, and protects against accidentally re-using a result set for something else.
One can check the lock in two ways, either by polling or as an assertion that
raises a ResultSetLockedException
when called on a locked result set. This
is mainly used to protect update methods.
-
ResultSet.
isLocked
() → bool¶ Returns true if the result set is locked.
Returns: True if the result set is locked
-
ResultSet.
assertUnlocked
()¶ Tests whether the result set is locked, and raises a
ResultSetLockedException
if so. This is used to protect update methods, since locked result sets are never updated.
Dirtiness¶
Adding results or pending results to a result set makes it dirty, in need of storing if being used with a persistent notebook. This is used to avoid unnecessary writing of unchanged data.
-
ResultSet.
dirty
(f: bool = True)¶ Mark the result set as dirty (the default) or clean.
Parameters: f – True if the result set is dirty
-
ResultSet.
isDirty
() → bool¶ Test whether the result set is dirty, i.e., if its contents need persisting (if the containing notebook is persistent).
Returns: True if the result set is dirty
Type mapping and inference¶
A result set types all the elements within a results dict using numpy
’s
“dtype” data type system.
Note
This approach is transparent to user code, and is explained here purely for the curious.
There are actually two types involved: the dtype of results dicts formed from the metadata, parameters, and experimental results added to the result set; and the dtype of pending results which includes just the parameters.
-
ResultSet.
dtype
() → numpy.dtype¶ Return the dtype of the results, combining the metadata, parameters, and results elements.
Returns: the dtype
-
ResultSet.
pendingdtype
() → numpy.dtype¶ Return the dtype of pending results, using just parameter elements.
Returns: the dtype
The default type mapping maps each Python type we expect to see to a corresponding
dtype
. The type mapping can be changed on a per-result set basis if required.
-
ResultSet.
TypeMapping
¶ Default type mapping from Python types to
numpy
dtypes
.
There is also a mapping from numpy
type kinds to appropriate default values, used
to initialise missing fields.
-
ResultSet.
TypeMapping
Default type mapping from Python types to
numpy
dtypes
.
-
ResultSet.
zero
(dtype: numpy.dtype) → Any¶ Return the appropriate “zero” for the given simple dtype.
Parameters: dtype – the dtype Returns: “zero”
The type mapping is used to generate a dtype for each Python type, but preserving
any numpy
types used.
-
ResultSet.
typeToDtype
(t: type) → numpy.dtype¶ Return the dtype of the given Python type. An exception is thrown if there is no appropriate mapping.
Parameters: t – the (Python) type Returns: the dtype of the value
-
ResultSet.
valueToDtype
(v: Any) → numpy.dtype¶ Return the dtype of a Python value. An exception is thrown if there is no appropriate mapping.
Parameters: v – the value Returns: the dtype
The result set infers the numpy
-level types automatically as results (and pending
results) are added.
-
ResultSet.
inferDtype
(rc: Dict[str, Dict[str, Any]])¶ Infer the dtype of the given result dict. This will include all the standard and exceptional metedata defined for an
Experiment
, plus the parameters and results (if present) for the results dict.If more elements are provided than have previously been seen, the underlying results dataframe will be extended with new columns.
This method will be called automatically if no explicit dtype has been provided for the result set by a call to
setDtype()
.Returns: the dtype
-
ResultSet.
inferPendingResultDtype
(params: Dict[str, Any])¶ Infer the dtype of the pending results of given dict of experimental parameters. This is essentially the same operation as
inferDtype()
but restricted to experimental parameters and including a string job identifier.Parameters: params – the experimental parameters Returns: the pending results dtype
This behaviour can be sidestapped by explicitly setting the stypes (with care!).
-
ResultSet.
setDtype
(dtype) → numpy.dtype¶ Set the dtype for the results. This should be done with care, ensuring that the element names all match. It does however allow precise control over the way data is stored (if required).
Parameters: dtype – the dtype
-
ResultSet.
setPendingResultDtype
(dtype) → numpy.dtype¶ Set the dtype for pending results. This should be done with care, ensuring that the element names all match.
Parameters: dtype – the dtype
The progressive nature of typing a result set means that the type may change as new results are added. This “type-level dirtiness” is controlled by two methods:
-
ResultSet.
typechanged
(f: bool = True)¶ Mark the result set as having changed type (the default) or not.
Parameters: f – True if the result set has changed type
-
ResultSet.
isTypeChanged
() → bool¶ Test whether the result set has changed its metadata, parameters, or results. This is used by persistent notebooks to re-construct the backing storage.
Returns: True if the result set has changed type