ResultSet: A homogeneous collection of results from experiments

class epyc.ResultSet(description: str = None)

A “page” in a lab notebook for the results of a particular set of experiments. This will consist of metadata, notes, and a data table resulting from the execution of the experiment. Each experiment runs with a specific set of parameters: the parameter names are fixed once set initially, with the specific values being stored alongside each result. There may be multiple results for the same parameters, to allow for repetition of experiments at a data point. Results committ5ed to result sets are immutable: once entered, a result can’t be deleted or changed.

Result sets also record “pending” results, allowing us to record experiments in progress. A pending result can be finalised by providing it with a value, or can be cancelled.

A result set can be used very Pythonically using a results dict holding the metadata, parameters, and results of experiments. For larger experiment sets the results are automatically typed using numpy’s dtype system, which both provides more checking and works well with more archival storage formats like HDF5 (see HDF5LabNotebook).

Parameters:
  • nb – notebook this result set is part of
  • description – (optional) description for the result set (defaults to a datestamp)

Important

Most interactions with results should go through a LabNotebook to allow for management of persistence and so on.

Adding results

Results can be added one at a time to the result set. Since result sets are persistent there are no other operations.

ResultSet.addSingleResult(rc: Dict[str, Dict[str, Any]])

Add a single result. This should be a single results dict as returned from an instance of Experiment, that contains metadata, parameters, and result.

The results dict may add metadata, parameters, or results to the result set, and these will be assumed to be present from then on. Missing values in previously-saved results will receive default values.

Parameters:rc – a results dict

The LabNotebook.addResult() has a much more flexible approach to addition that handles adding lists of results at one time.

Retrieving results

A result set offers two distinct ways to access results: as results dicts, or as a pandas.DataFrame. The former is often easier on small scales, the latter for large scales.

ResultSet.numberOfResults() → int

Return the number of results in the results set, including any repetitions at the same parameter point.

Returns:the total number of results
ResultSet.__len__() → int

Return the number of results in the results set, including any repetitions at the same parameter point.mEquivalent to numberOfResults().

Returns:the number of results
ResultSet.results() → List[Dict[str, Dict[str, Any]]]

Return all the results as a list of results dicts. This is useful for avoiding the use of pandas and having a more Pythonic interface – which is also a lot less efficient and more memory-hungry.

Returns:a list of results dicts
ResultSet.resultsFor(params: Dict[str, Any]) → List[Dict[str, Dict[str, Any]]]

Return all the results for the given paramneters as a list of results dicts. This is useful for avoiding the use of pandas and having a more Pythonic interface – which is also a lot less efficient and more memory-hungry. The parameters are interpreted as for dataframeFor(), with lists or other iterators being converted into disjunctions of values.

Parameters:params – the parameters
Returns:a list of results dicts
ResultSet.dataframe(only_successful: bool = False) → pandas.core.frame.DataFrame

Return all the available results. The results are returned as a pandas DataFrame object, which is detached from the results held in the result set, thereby keeping the result set itself immutable.

You can pre-filter the contents of the dataframe to only include results for specific parameter values using dataframeFor(). You can also discard any unsuccessful results the using only_successful flag.

Parameters:only_successful – (optional) filter out any failed results (defaults to False)
Returns:a dataframe of results
ResultSet.dataframeFor(params: Dict[str, Any], only_successful: bool = False) → pandas.core.frame.DataFrame

Extract a dataframe the results for only the given set of parameters. These need not be all the parameters for the experiments, so it’s possible to project-out all results for a sub-set of the parameters. If a parameter is mapped to an iterator or list then these are treated as disjunctions and select all results with any of these values for that parameter.

An empty set of parameters filters out nothing and so returns all the results. This is far less efficient that calling dataframe().

The results are returned as a pandas DataFrame object, which is detached from the results held in the result set, thereby keeping the result set itself immutable.

You can also discard any unsuccessful results the using only_successful flag.

Parameters:
  • params – a dict of parameters and values
  • only_successful – (optional) filter out any failed results (defaults to False)
Returns:

a dataframe containing results matching the parameter constraints

Important

The results dict access methods return all experiments, or all that have the specified parameters, regardless of whether they were successful or not. The dataframe access methods can pre-filter to extract only the successful experiments.

Parameter ranges

A result set can hold results for a range of parameter values. These are all returned as part of the results dicts or dataframes, but it can be useful to access them alone as well, independntly of specific results. The ranges returned by these methods refer only to real results.

ResultSet.parameterRange(param: str) → Set[Any]

Return all the values for this parameter for which we have results.

Parameters:param – the parameter name
Returns:a collection of values for which we have data
ResultSet.parameterSpace() → Dict[str, Any]

Return a dict mapping parameter names to all their values, which is the space of all possible paramater points at which results could have been collected. This does not guarantee that all combinations of values have results associated with them: that function is provided by parameterCombinations().

Returns:a dict mapping parameter names to their ranges
ResultSet.parameterCombinations() → List[Dict[str, Any]]

Return a list of all combinations of parameters for which we have results, as a list of dicts. This means that there are results (possible more than one set) associated with the combination of parameters in each dict. The ranges of the parameters can be found using parameterSpace().

Returns:a list of dicts

Managing pending results

Pending results are those that are in the process of being computed based on a set of experimental parameters.

ResultSet.pendingResults() → List[str]

Return the job identifiers of all pending results.

Returns:a list of pending job identifiers
ResultSet.numberOfPendingResults() → int

Return the number of pending results.

Returns:the number of pending results
ResultSet.pendingResultsFor(params: Dict[str, Any]) → List[str]

Return the ids of all pending results with the given parameters. Not all parameters have to be provided, allowing for partial matching.

Parameters:params – the experimental parameters
Returns:a list of job ids
ResultSet.pendingResultParameters(jobid: str) → Dict[str, Any]

Return a dict of the parameters for the given pending result.

Parameters:jobid – the job id
Returns:a dict of parameter values
ResultSet.ready() → bool

Test whether there are pending results.

Returns:True if all pending results have been either resolved or cancelled

Three methods within the interface are used by LabNotebook to management pending results. They shouldn’t be needed from user code.

ResultSet.addSinglePendingResult(params: Dict[str, Any], jobid: str)

Add a pending result for the given point in the parameter space under the given job identifier. The identifier will generally be meaningful to the lab that submitted the request. They must be unique.

Parameters:
  • params – the experimental parameters
  • jobid – the job id
ResultSet.cancelSinglePendingResult(jobid: str)

Cancel a pending job, This records the cancellation using a CancelledException, storing a traceback to show where the cancellation was triggered from. User code should call LabNotebook.cancelPendingResult() rather than using this method directly.

Cancelling a result generates a message to standard output.

Parameters:jobid – the job id
ResultSet.resolveSinglePendingResult(jobid: str)

Resolve the given pending result. This drops the job from the pending results table. User code should call LabNotebook.resolvePendingResult() rather than using this method directly, since this method doesn’t actually store the completed pending result, it just manages its non-pending-ness.

Parameters:jobid – the job id

Metadata access

The result set gives access to its description and the names of the various elements it stores. These names may change over time, if for example you add a results dict that has extra results than those you added earlier.

ResultSet.description() → str

Return the free text description of the result set.

Returns:the description
ResultSet.setDescription(d: str)

Set the free text description of the result set.

Parameters:d – the description

Important

You can change the description of a result set after it’s been created – but you can’t change any results that’ve been added to it.

ResultSet.names() → Dict[str, Optional[List[str]]]

Return a dict of sets of names, corresponding to the entries in the results dicts for this result set. If only pending results have so far been added the Experiment.METADATA and Experiment.RESULTS sets will be empty.

Returns:the dict of parameter names
ResultSet.metadataNames() → List[str]

Return the set of metadata names associated with this result set. If no results have been submitted, this set will be empty.

Returns:the set of experimental metadata names
ResultSet.parameterNames() → List[str]

Return the set of parameter names associated with this result set. If no results (pending or real) have been submitted, this set will be empty.

Returns:the set of experimental parameter names
ResultSet.resultNames() → List[str]

Return the set of result names associated with this result set. If no results have been submitted, this set will be empty.

Returns:the set of experimental result names

The result set can also have attributes set, which can be accessed either using methods or by treating the result set as a dict.

ResultSet.setAttribute(k: str, v: Any)

Set the given attribute.

Parameters:
  • k – the key
  • v – the attribute value
ResultSet.getAttribute(k: str) → Any

Retrieve the given attribute. A KeyException will be raised if the attribute doesn’t exist.

Parameters:k – the attribute
Returns:the attribute value
ResultSet.keys() → Set[str]

Return the set of attributes.

Returns:the attribute keys
ResultSet.__contains__(k: str)

True if there is an attribute with the given name.

Oparam k:the attribute
Returns:True if that attribute exists
ResultSet.__setitem__(k: str, v: Any)

Set the given attribute. The dict-like form of setAttribute().

Parameters:
  • k – the key
  • v – the attribute value
ResultSet.__getitem__(k: str) → Any

Retrieve the given attribute. The dict-like form of getAttribute().

Parameters:k – the attribute
Returns:the attribute value
ResultSet.__delitem__(k: str)

Delete the named attribute. This method is invoiked by the del operator. A KeyException will be raised if the attribute doesn’t exist.

Parameters:k – the attribute

There are various uses for these attributes: see Making data archives for one common use case.

Important

The length of a result set (ResultSet.__len__()) refers to the number of results, not to the number of attributes (as would be the case for a dict).

Locking

Once the set of experiments to be held in a result set is finished, it’s probably sensible to prevent any further updated. This is accomplished by “finishing” the result set, leaving it locked against any further updates.

ResultSet.finish()

Finish and lock this result set. This cancels any pending results and locks the result set against future additions. This is useful to tidy up after experiments are finished, and protects against accidentally re-using a result set for something else.

One can check the lock in two ways, either by polling or as an assertion that raises a ResultSetLockedException when called on a locked result set. This is mainly used to protect update methods.

ResultSet.isLocked() → bool

Returns true if the result set is locked.

Returns:True if the result set is locked
ResultSet.assertUnlocked()

Tests whether the result set is locked, and raises a ResultSetLockedException if so. This is used to protect update methods, since locked result sets are never updated.

Dirtiness

Adding results or pending results to a result set makes it dirty, in need of storing if being used with a persistent notebook. This is used to avoid unnecessary writing of unchanged data.

ResultSet.dirty(f: bool = True)

Mark the result set as dirty (the default) or clean.

Parameters:f – True if the result set is dirty
ResultSet.isDirty() → bool

Test whether the result set is dirty, i.e., if its contents need persisting (if the containing notebook is persistent).

Returns:True if the result set is dirty

Type mapping and inference

A result set types all the elements within a results dict using numpy’s “dtype” data type system.

Note

This approach is transparent to user code, and is explained here purely for the curious.

There are actually two types involved: the dtype of results dicts formed from the metadata, parameters, and experimental results added to the result set; and the dtype of pending results which includes just the parameters.

ResultSet.dtype() → numpy.dtype

Return the dtype of the results, combining the metadata, parameters, and results elements.

Returns:the dtype
ResultSet.pendingdtype() → numpy.dtype

Return the dtype of pending results, using just parameter elements.

Returns:the dtype

The default type mapping maps each Python type we expect to see to a corresponding dtype. The type mapping can be changed on a per-result set basis if required.

ResultSet.TypeMapping

Default type mapping from Python types to numpy dtypes.

There is also a mapping from numpy type kinds to appropriate default values, used to initialise missing fields.

ResultSet.TypeMapping

Default type mapping from Python types to numpy dtypes.

ResultSet.zero(dtype: numpy.dtype) → Any

Return the appropriate “zero” for the given simple dtype.

Parameters:dtype – the dtype
Returns:“zero”

The type mapping is used to generate a dtype for each Python type, but preserving any numpy types used.

ResultSet.typeToDtype(t: type) → numpy.dtype

Return the dtype of the given Python type. An exception is thrown if there is no appropriate mapping.

Parameters:t – the (Python) type
Returns:the dtype of the value
ResultSet.valueToDtype(v: Any) → numpy.dtype

Return the dtype of a Python value. An exception is thrown if there is no appropriate mapping.

Parameters:v – the value
Returns:the dtype

The result set infers the numpy-level types automatically as results (and pending results) are added.

ResultSet.inferDtype(rc: Dict[str, Dict[str, Any]])

Infer the dtype of the given result dict. This will include all the standard and exceptional metedata defined for an Experiment, plus the parameters and results (if present) for the results dict.

If more elements are provided than have previously been seen, the underlying results dataframe will be extended with new columns.

This method will be called automatically if no explicit dtype has been provided for the result set by a call to setDtype().

Returns:the dtype
ResultSet.inferPendingResultDtype(params: Dict[str, Any])

Infer the dtype of the pending results of given dict of experimental parameters. This is essentially the same operation as inferDtype() but restricted to experimental parameters and including a string job identifier.

Parameters:params – the experimental parameters
Returns:the pending results dtype

This behaviour can be sidestapped by explicitly setting the stypes (with care!).

ResultSet.setDtype(dtype) → numpy.dtype

Set the dtype for the results. This should be done with care, ensuring that the element names all match. It does however allow precise control over the way data is stored (if required).

Parameters:dtype – the dtype
ResultSet.setPendingResultDtype(dtype) → numpy.dtype

Set the dtype for pending results. This should be done with care, ensuring that the element names all match.

Parameters:dtype – the dtype

The progressive nature of typing a result set means that the type may change as new results are added. This “type-level dirtiness” is controlled by two methods:

ResultSet.typechanged(f: bool = True)

Mark the result set as having changed type (the default) or not.

Parameters:f – True if the result set has changed type
ResultSet.isTypeChanged() → bool

Test whether the result set has changed its metadata, parameters, or results. This is used by persistent notebooks to re-construct the backing storage.

Returns:True if the result set has changed type