1 Overview¶
The lsst.verify
package provides a framework for characterizing the LSST Science Pipelines through specific metrics, which are configured by the verify_metrics
package.
However, lsst.verify
does not yet specify how to organize the code that measures and stores the metrics.
This document proposes an extension of lsst.verify
that interacts with the Task framework to make it easy to write new metrics and apply them to the Science Pipelines.
The proposed design, shown below, is similar to the make measurements from output datasets option proposed in DMTN-057.
Each metric will have an associated lsst.pipe.base.Task
class that is responsible for measuring it based on data previously written to a Butler repository.
These tasks will be grouped together for execution, first as plugins to a central metrics-managing task, and later as components of a lsst.pipe.base.Pipeline
.
The central task or pipeline will handle the details of directing the potentially large number of input datasets to the measurement tasks that analyze them.
This design proposal strives to be consistent with the recommendations on metrics generation and usage provided by DMTN-085, without assuming them more than necessary.
However, the design does require that QAWG-REC-34
(“Metric values should have Butler dataIds”) be adopted; otherwise, the output dataset type of the proposed measurement task would be ill-defined.
2 Design Goals¶
The goals of this design are based on those presented in DMTN-057. In particular, the system must be easy to extend, must support a variety of metrics, and must be agnostic to how data processing is divided among task instances. It must not add extra configuration requirements to task users who are not interested in metrics, and it must be possible to disable metrics selectively during commissioning and operations.
DMTN-085 makes several recommendations that, if adopted, would impose additional requirements on a measurement creation framework. Specifically:
QAWG-REC-31
recommends that the computation and aggregation of measurements be separated.QAWG-REC-32
recommends that measurements be stored at the finest granularity at which they can be reasonably defined.QAWG-REC-34
andQAWG-REC-35
recommend that measurements be Butler datasets with data IDs.QAWG-REC-41
recommends that metrics can be submitted to SQuaSH (and, presumably, measured first) from arbitrary execution environments
DMTN-085 Section 4.11.1 also informally proposes that the granularity of a metric be part of its definition, and notes that some metrics may need to be measured during data processing pipeline execution, rather than as a separate step. I note in the appropriate sections how these capabilities can be accommodated.
The capabilities and requirements of PipelineTask
are much clearer now than they were when DMTN-057 was written (note: it refers to PipelineTask
by its previous name, SuperTask
).
Since reusability outside the Tasks framework is not a concern, the design proposed here can be tightly coupled to the existing lsst.pipe.base.PipelineTask
API.
3 Primary Components¶
The framework creates lsst.pipe.base.PipelineTask
subclasses responsible for measuring metrics and constructing lsst.verify.Measurement
objects.
Metrics-measuring tasks (hereafter MetricTasks
) will be added to data processing pipelines, and the PipelineTask
framework will be responsible for scheduling metrics computation and collecting the results.
It is expected that PipelineTask
will provide some mechanism for grouping tasks together (e.g., sub-pipelines), which will make it easier to enable and configure large groups of metrics.
PipelineTask
is not available for general use at the time of writing, so initial implementations may need to avoid referring to its API directly (see the Butler Gen 2 MetricTask).
Because MetricTasks
are handled separately from data processing tasks, the latter can be run without needing to know about or configure metrics.
Metrics that must be calculated while the pipeline is running may be integrated into pipeline tasks as subtasks, with the measurement(s) being added to the list of pipeline task outputs, but doing so greatly reduces the flexibility of the framework and is not recommended.
While this proposal places MetricTask
and its supporting classes in lsst.verify
(see Figure 1), its subclasses can go in any package that can depend on both lsst.verify
and lsst.pipe.base
.
For example, subclasses of MetricTask
may be defined in the packages of the task they instrument, in plugin packages similar to meas_extensions_*
, or in a dedicated pipeline verification package.
The framework is therefore compatible with any future policy decisions concerning metric implementations.
3.1 MetricTask¶
The code to compute any metric shall be a subclass of MetricTask
, a PipelineTask
specialized for metrics.
Each MetricTask
shall read the necessary data from a repository, and produce a lsst.verify.Measurement
of the corresponding metric.
Measurements may be associated with particular quanta or data IDs, or they may be repository-wide.
Because metric measurers may read a variety of datasets, PipelineTask
’s ability to automatically manage dataset types is essential to keeping the framework easy to extend.
3.1.1 Abstract Members¶
run(undefined) : lsst.pipe.base.Struct
Subclasses may provide a
run
method, which should take multiple datasets of a given type. Its return value must contain a field,measurement
, mapping to the resultinglsst.verify.Measurement
.MetricTask
shall do nothing (returningNone
in place of aMeasurement
) if the data it needs are not available. Behavior when the data are available for some quanta but not others is TBD.Supporting processing of multiple datasets together lets metrics be defined with a different granularity from the Science Pipelines processing, and allows for the aggregation (or lack thereof) of the metric to be controlled by the task configuration with no code changes. Note that if
QAWG-REC-32
is implemented, then the input data will typically be a list of one item.getInputDatasetTypes(config: cls.ConfigClass) : dict from str to DatasetTypeDescriptor [initially str to str]
- While required by the
PipelineTask
API, this method will also be used by pre-PipelineTask
code to identify the (Butler Gen 2) inputs to theMetricTask
. getOutputMetric(config: cls.ConfigClass) : lsst.verify.Name
- A class method returning the metric calculated by this object. May be configurable to allow one implementation class to calculate families of related metrics.
3.1.2 Concrete Members¶
getOutputDatasetTypes(config: cls.ConfigClass) : dict from str to DatasetTypeDescriptor
- This method may need to be overridden to reflect Butler persistence of
lsst.verify.Measurement
objects, if individual objects are not supported as a persistable dataset. saveStruct(lsst.pipe.base.Struct, outputDataRefs: dict, butler: lsst.daf.butler.Butler)
- This method may need to be overridden to support Butler persistence of
lsst.verify.Measurement
objects, if individual objects are not supported as a persistable dataset
3.2 SingleMetadataMetricTask¶
This class shall simplify implementations of metrics that are calculated from a single key in the pipeline’s output metadata. The class shall provide the code needed to map a metadata key (possibly across multiple quanta) to a single metric.
Based on the examples implemented in lsst.ap.verify.measurements
, the process of calculating a metric from multiple metadata keys is considerably more complex.
It is better that such metrics inherit from MetricTask
directly than to try to provide generic support through a single class.
3.2.1 Abstract Members¶
getInputMetadataKey(config: cls.ConfigClass) : str
- Shall name the key containing the metric information, with optional task prefixes following the conventions of
lsst.pipe.base.Task.getFullMetadata()
. The name may be an incomplete key in order to match an arbitrary top-level task or an unnecessarily detailed key name. May be configurable to allow one implementation class to calculate families of related metrics. makeMeasurement(values: iterable of any) : lsst.verify.Measurement
- A workhorse method that accepts the metadata values extracted from the metadata passed to
run
.
3.2.2 Concrete Members¶
run(metadata: iterable of lsst.daf.base.PropertySet) : lsst.pipe.base.Struct
- This method shall take multiple metadata objects (possibly all of them, depending on the granularity of the metric).
It shall look up keys partially matching
getInputMetadataKey
and make a single call tomakeMeasurement
with the values of the keys. Behavior when keys are present in some metadata objects but not others is TBD. getInputDatasetTypes(config: cls.ConfigClass) : dict from str to DatasetTypeDescriptor
- This method shall return a single mapping from
"metadata"
to the dataset type of the top-level data processing task’s metadata. The identity of the top-level task shall be extracted from theMetricTask
’s config.
3.3 PpdbMetricTask¶
This class shall simplify implementations of metrics that are calculated from a prompt products database.
PpdbMetricTask
has a potential forward-compatibility problem: at present, the most expedient way to get a Ppdb
that points to the correct database is by loading it from the data processing pipeline’s config. However, the Butler is later expected to support database access directly, and we should adopt the new system when it is ready.
The problem can be solved by making use of the PipelineTask
framework’s existing support for configurable input dataset types, and by delegating the process of constructing a Ppdb
object to a replaceable subtask.
The cost of this solution is an extra configuration line for every instance of PpdbMetricTask
included in a metrics calculation, at least until we can adopt the new system as a default.
3.3.1 Abstract Members¶
makeMeasurement(handle: lsst.dax.ppdb.Ppdb, outputDataId: DataId) : lsst.verify.Measurement
- A workhorse method that takes a database handle and computes a metric using the
Ppdb
API.outputDataId
is used to identify a specific metric for subclasses that support fine-grained metrics (see discussion ofadaptArgsAndRun
, below). dbLoader : lsst.pipe.base.Task
A subtask responsible for creating a
Ppdb
object from the dataset type. Itsrun
method must accept a dataset of the same type as indicated byPpdbMetricTask.getInputDatasetTypes
.Until plans for Butler database support are finalized, config writers should explicitly retarget this task instead of assuming a default. It may be possible to enforce this practice by not providing a default implementation and clearly documenting the supported option(s).
3.3.2 Concrete Members¶
adaptArgsAndRun(dbInfo: dict from str to any, inputDataIds: unused, outputDataId: dict from str to DataId) : lsst.pipe.base.Struct
This method shall load the database using
dbLoader
before callingmakeMeasurement
.PpdbMetricTask
overridesadaptArgsAndRun
in order to support fine-grained metrics: while a repository should have only one prompt products database, metrics may wish to examine subsets grouped by visit, CCD, etc., and if so these details must be passed tomakeMeasurement
.This method is not necessary in the initial implementation, which will not support fine-grained metrics.
run(dbInfo: any) : lsst.pipe.base.Struct
- This method shall be a simplified version of
adaptArgsAndRun
for use beforePipelineTask
is ready. Its behavior shall be equivalent toadaptArgsAndRun
called with empty data IDs. getInputDatasetTypes(config: cls.ConfigClass) : dict from str to DatasetTypeDescriptor
- This method shall return a single mapping from
"dbInfo"
to a suitable dataset type: either the type of the top-level data processing task’s config, or some future type specifically designed for database support.
3.4 MetricComputationError¶
This subclass of RuntimeError
may be raised by MetricTask
to indicate that a metric could not be computed due to algorithmic or system issues.
It is provided to let higher-level code distinguish failures in the metrics framework from failures in the pipeline code.
Note that being unable to compute a metric due to insufficient input data is not considered a failure, and in such a case MetricTask
should return None
instead of raising an exception.
4 Compatibility Components¶
We expect to deploy new metrics before PipelineTask
is ready for general use.
Therefore, the initial framework will include extra classes that allow MetricTask
to function without PipelineTask
features.
By far the best way to simultaneously deal with the incompatible Butler 2 and Butler 3 APIs would be an adapter class that allows MetricTask
classes initially written without PipelineTask
support to serve as lsst.pipe.base.PipelineTask
.
Unfortunately, the design of such an adapter is complicated by the strict requirements on PipelineTask
constructor signatures and the use of configs as a Task
’s primary API.
I suspect that both problems may be solved by applying a decorator to the appropriate type
objects rather than using a conventional class or object adapter[1] for Task
or Config
objects, but the design of such an decorator is best addressed separately.
4.1 MetricTask¶
This MetricTask
shall be a subclass of Task
that has a PipelineTask
-like interface but does not depend on any Butler Gen 3 components. Concrete MetricTasks
will implement this interface before PipelineTask
is available, and can be migrated individually afterward (possibly through a formal deprecation procedure, if MetricTask
is used widely enough to make it necessary).
4.1.1 Abstract Members¶
run(undefined) : lsst.pipe.base.Struct
Subclasses may provide a
run
method, which should take multiple datasets of a given type. Its return value must contain a field,measurement
, mapping to the resultinglsst.verify.Measurement
.MetricTask
shall do nothing (returningNone
in place of aMeasurement
) if the data it needs are not available. Behavior when the data are available for some quanta but not others is TBD.Supporting processing of multiple datasets together lets metrics be defined with a different granularity from the Science Pipelines processing, and allows for the aggregation (or lack thereof) of the metric to be controlled by the task configuration with no code changes. Note that if
QAWG-REC-32
is implemented, then the input data will typically be a list of one item.adaptArgsAndRun(inputData: dict, inputDataIds: dict, outputDataId: dict) : lsst.pipe.base.Struct
The default implementation of this method shall be equivalent to calling
PipelineTask.adaptArgsAndRun
, followed by callingaddStandardMetadata
on the result. Subclasses may overrideadaptArgsAndRun
, but are then responsible for callingaddStandardMetadata
themselves.outputDataId
shall contain a single mapping from"measurement"
to exactly one data ID. The method’s return value must contain a field,measurement
, mapping to the resultinglsst.verify.Measurement
.Behavior requirements as for
run
.getInputDatasetTypes(config: cls.ConfigClass) : dict from str to str
- This method shall identify the Butler Gen 2 inputs to the
MetricTask
. getOutputMetric(config: cls.ConfigClass) : lsst.verify.Name
- A class method returning the metric calculated by this object. May be configurable to allow one implementation class to calculate families of related metrics.
4.1.2 Concrete Members¶
addStandardMetadata(measurement: lsst.verify.Measurement, outputDataId: dict)
This method may add measurement-specific metadata agreed to be of universal use (both across metrics and across clients, including but not limited to SQuaSH), breaking the method API if necessary. This method shall not add common information such as the execution environment (which is the responsibility of the
MetricTask
’s caller) or information specific to a particular metric (which is the responsibility of the corresponding class).This is an unfortunately inflexible solution to the problem of adding client-mandated metadata keys. However, it is not clear whether any such keys will still be needed after the transition to Butler Gen 3 (see SQR-019 and DMTN-085), and any solution that controls the metadata using the task configuration would require independently configuring every single
MetricTask
.
4.2 MetricsControllerTask¶
This class shall execute a configurable set of metrics, handling Butler I/O and Measurement
output internally in a manner similar to JointcalTask
.
The MetricTask
instances to be executed shall not be treated as subtasks, instead being managed using a multi-valued lsst.pex.config.RegistryField
much like meas_base
plugins.
MetricsControllerTask
shall ignore any configuration in a MetricTask
giving its metric a specific level of granularity; the granularity shall instead be inferred from MetricsControllerTask
inputs.
In addition, MetricsControllerTask
will not support metrics that depend on other metrics.
Some existing frameworks (i.e., lsst.ap.verify
and lsst.jointcal
) store metrics computed by a task as part of one or more lsst.verify.Job
objects.
MetricsControllerTask
will not be able to work with such jobs, but will not preempt them, either – they can continue to record metrics that are not managed by MetricsControllerTask
.
4.2.1 Concrete Members¶
runDataRefs(datarefs: list of lsst.daf.persistence.ButlerDataRef) : lsst.pipe.base.Struct
This method shall, for each configured
MetricTask
and eachdataref
, load the metric’s input dataset(s) and pass them to the task (viaadaptArgsAndRun
), collecting the resultingMeasurement
objects and persisting them to configuration-specified files. The return value shall contain a field,jobs
, mapping to a list oflsst.verify.Job
, one for each dataref, containing the measurements.The granularity of each
dataref
shall define the granularity of the corresponding measurement, and must be the same as or coarser than the granularity of eachMetricTask's
input data. The safest way to support metrics of different granularities is to handle each granularity with an independently configuredMetricsControllerTask
object.It is assumed that, since
MetricsControllerTask
is a placeholder, the implementation ofrunDataRefs
will be something simple like a loop. However, it may use internal dataset caching or parallelism to speed things up if it proves necessary.measurers : iterable of MetricTask
- This attribute contains all the metric measuring objects to be called by
runDataRefs
. It is initialized from aRegistryField
inMetricsControllerConfig
. metadataAdder: lsst.pipe.base.Task
- A subtask responsible for adding Job-level metadata required by a particular client (e.g., SQuaSH).
Its
run
method must accept alsst.verify.Job
object and return alsst.pipe.base.Struct
whosejob
field maps to a modifiedJob
.
4.3 MetricRegistry¶
This class shall expose a single instance of lsst.pex.config.Registry
.
MetricsControllerConfig
will depend on this class to create a valid RegistryField
.
It can be easily removed once MetricsControllerTask
is retired.
4.3.1 Concrete Members¶
registry : lsst.pex.config.Registry
- This registry will allow
MetricsControllerConfig
to handle allMetricTask
classes decorated byregister
. It should not require a custom subclass oflsst.pex.config.Registry
, but if the need arose,MetricRegistry
could be easily turned into a singleton class.
4.4 register¶
register(name: str) : callable(MetricTask-type)
This class decorator shall register the class with
MetricRegistry.registry
. IfMetricRegistry
does not exist, it shall have no effect.This decorator can be phased out once
MetricsControllerTask
is retired.
References
[1] | E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional Computing Series, 1994. |