1 Overview¶
The lsst.verify package provides a framework for characterizing the LSST Science Pipelines through specific metrics, which are configured by the verify_metrics package.
However, lsst.verify does not specify how to organize the code that measures and stores the metrics.
This document proposes a new framework that interacts with the lsst.verify and Task frameworks to make it easy to write new metrics and apply them to the Science Pipelines.
The proposed design, shown below, is similar to the make measurements from output datasets option proposed in DMTN-057.
Each metric will have an associated lsst.pipe.base.Task class that is responsible for measuring it based on data previously written to a Butler repository.
These tasks will be grouped together for execution, first as plugins to a central metrics-managing task, and later as components of a lsst.pipe.base.Pipeline.
The central task or pipeline will handle the details of directing the potentially large number of input datasets to the measurement tasks that analyze them.
This design proposal strives to be consistent with the recommendations on metrics generation and usage provided by DMTN-085, without assuming them more than necessary.
However, the design does require that QAWG-REC-34 (“Metric values should have Butler dataIds”) be adopted; otherwise, the output dataset type of the proposed measurement task would be ill-defined.
2 Design Goals¶
The goals of this design are based on those presented in DMTN-057. In particular, the system must be easy to extend, must support a variety of metrics, and must be agnostic to how data processing is divided among task instances. It must not add extra configuration requirements to task users who are not interested in metrics, and it must be possible to disable metrics selectively during commissioning and operations.
DMTN-085 makes several recommendations that, if adopted, would impose additional requirements on a measurement creation framework. Specifically:
QAWG-REC-31recommends that the computation and aggregation of measurements be separated.QAWG-REC-32recommends that measurements be stored at the finest granularity at which they can be reasonably defined.QAWG-REC-34andQAWG-REC-35recommend that measurements be Butler datasets with data IDs.QAWG-REC-41recommends that metrics can be submitted to SQuaSH (and, presumably, measured first) from arbitrary execution environments
DMTN-085 Section 4.11.1 also informally proposes that the granularity of a metric be part of its definition, and notes that some metrics may need to be measured during data processing pipeline execution, rather than as a separate step. I note in the appropriate sections how these capabilities can be accommodated.
The capabilities and requirements of PipelineTask are much clearer now than they were when DMTN-057 was written (note: it refers to PipelineTask by its previous name, SuperTask).
Since reusability outside the Tasks framework is not a concern, the design proposed here can be tightly coupled to the existing lsst.pipe.base.PipelineTask API.
3 Primary Components¶
The framework creates lsst.pipe.base.PipelineTask subclasses responsible for measuring metrics and constructing lsst.verify.Measurement objects.
Metrics-measuring tasks (hereafter MetricTasks) will be added to data processing pipelines, and the PipelineTask framework will be responsible for scheduling metrics computation and collecting the results.
It is expected that PipelineTask will provide some mechanism for grouping tasks together (e.g., sub-pipelines), which will make it easier to enable and configure large groups of metrics.
Because MetricTasks are handled separately from data processing tasks, the latter can be run without needing to know about or configure metrics.
Metrics that must be calculated while the pipeline is running may be integrated into pipeline tasks as subtasks, with the measurement(s) being added to the list of pipeline task outputs, but doing so greatly reduces the flexibility of the framework and is not recommended.
PipelineTask is not available for general use at the time of writing, so initial implementations may need to avoid referring to its API directly.
Any such substitutions will be marked in square brackets and the word “initially”.
If a practical adapter cannot be developed (see 4 Compatibility Components), these substitutions may be implemented as differently-named methods in order to simultaneously support PipelineTask and non-PipelineTask execution in the same class.
For illustration all MetricTask classes are shown as members of a single verify_measurements package.
However, this is not required by the framework; subclasses of MetricTask may be defined in the packages of the task they instrument, or in plugin packages similar to meas_extensions_*.
The framework is therefore compatible with any future policy decisions concerning metric implementations.
3.1 MetricTask¶
The code to compute any metric shall be a subclass of MetricTask, a PipelineTask [initially Task] specialized for metrics.
Each MetricTask shall read the necessary data from a repository, and produce a lsst.verify.Measurement of the corresponding metric.
Measurements may be associated with particular quanta or data IDs, or they may be repository-wide.
Because metric measurers may read a variety of datasets, PipelineTask‘s ability to automatically manage dataset types is essential to keeping the framework easy to extend.
3.1.1 Abstract Members¶
run(undefined) : lsst.pipe.base.StructSubclasses may provide a
runmethod, which should take multiple datasets of a given type. Its return value must contain a field,measurement, mapping to the resultinglsst.verify.Measurement.MetricTaskshall do nothing (returningNonein place of aMeasurement) if the data it needs are not available. Behavior when the data are available for some quanta but not others is TBD.Supporting processing of multiple datasets together lets metrics be defined with a different granularity from the Science Pipelines processing, and allows for the aggregation (or lack thereof) of the metric to be controlled by the task configuration with no code changes. Note that if
QAWG-REC-32is implemented, then the input data will typically be a list of one item.adaptArgsAndRun(inputData: dict, inputDataIds: dict, outputDataIds: dict) : lsst.pipe.base.StructThe default implementation of this method shall be equivalent to calling
PipelineTask.adaptArgsAndRun, followed by callingaddStandardMetadataon the result. Subclasses may overrideadaptArgsAndRun, but are then responsible for callingaddStandardMetadatathemselves.outputDataIdsshall contain a single mapping from"measurement"to exactly one data ID. The method’s return value must contain a field,measurement, mapping to the resultinglsst.verify.Measurement.Behavior requirements as for
run.getInputDatasetTypes(config: self.ConfigClass) : dict from str to DatasetTypeDescriptor [initially str to str]- While required by the
PipelineTaskAPI, this method will also be used by pre-PipelineTaskcode to identify the (Butler Gen 2) inputs to theMetricTask. getOutputMetric(config: self.ConfigClass) : lsst.verify.MetricName- A class method returning the metric calculated by this object. May be configurable to allow one implementation class to calculate families of related metrics.
3.1.2 Concrete Members¶
addStandardMetadata(measurement: lsst.verify.Measurement, outputDataId: dict)This method shall add the output data ID to the
Measurement'smetadata under the key “dataId”, and may add other metadata agreed to be of universal use (both across metrics and across clients, including but not limited to SQuaSH), breaking the method API if necessary. This method shall not add common information such as the execution environment (which is the responsibility of theMetricTask‘s caller) or information specific to a particular metric (which is the responsibility of the corresponding class).This is an unfortunately inflexible solution to the problem of adding client-mandated metadata keys. However, it is not clear whether any such keys will still be needed after the transition to Butler Gen 3 (see SQR-019 and DMTN-085), and any solution that controls the metadata using the task configuration would require independently configuring every single
MetricTask.getOutputDatasetTypes(config: self.ConfigClass) : dict from str to DatasetTypeDescriptor- This method may need to be overridden to reflect Butler persistence of
lsst.verify.Measurementobjects. It is not necessary in the initial implementation. saveStruct(lsst.pipe.base.Struct, outputDataRefs: dict, butler: lsst.daf.butler.Butler)- This method may need to be overridden to support Butler persistence of
lsst.verify.Measurementobjects. It is not necessary in the initial implementation.
3.2 SingleMetadataMetricTask¶
This class shall simplify implementations of metrics that are calculated from a single key in the pipeline’s output metadata. The class shall provide the code needed to map a metadata key (possibly across multiple quanta) to a single metric.
Based on the examples implemented in lsst.ap.verify.measurements, the process of calculating a metric from multiple metadata keys is considerably more complex.
It is better that such metrics inherit from MetricTask directly than to try to provide generic support through a single class.
3.2.1 Abstract Members¶
getInputMetadataKey(config: self.ConfigClass) : str- Shall name the key containing the metric information, with optional task prefixes following the conventions of
lsst.pipe.base.Task.getFullMetadata(). The name may be an incomplete key in order to match an arbitrary top-level task or an unnecessarily detailed key name. May be configurable to allow one implementation class to calculate families of related metrics. makeMeasurement(values: iterable of any) : lsst.verify.Measurement- A workhorse method that accepts the metadata values extracted from the metadata passed to
run.
3.2.2 Concrete Members¶
run(metadata: iterable of lsst.daf.base.PropertySet) : lsst.pipe.base.Struct- This method shall take multiple metadata objects (possibly all of them, depending on the granularity of the metric).
It shall look up keys partially matching
getInputMetadataKeyand make a single call tomakeMeasurementwith the values of the keys. Behavior when keys are present in some metadata objects but not others is TBD. getInputDatasetTypes(config: self.ConfigClass) : dict from str to DatasetTypeDescriptor [initially str to str]- This method shall return a single mapping from
"metadata"to the dataset type of the top-level data processing task’s metadata. The identity of the top-level task shall be extracted from theMetricTask‘s config.
3.3 PpdbMetricTask¶
This class shall simplify implementations of metrics that are calculated from a prompt products database.
PpdbMetricTask has a potential forward-compatibility problem: at present, the most expedient way to get a Ppdb that points to the correct database is by loading it from the data processing pipeline’s config. However, the Butler is later expected to support database access directly, and we should adopt the new system when it is ready.
The problem can be solved by making use of the PipelineTask framework’s existing support for configurable input dataset types, and by delegating the process of constructing a Ppdb object to a replaceable subtask.
The cost of this solution is an extra configuration line for every instance of PpdbMetricTask included in a metrics calculation, at least until we can adopt the new system as a default.
3.3.1 Abstract Members¶
makeMeasurement(handle: lsst.dax.ppdb.Ppdb, outputDataId: DataId) : lsst.verify.Measurement- A workhorse method that takes a database handle and computes a metric using the
PpdbAPI.outputDataIdis used to identify a specific metric for subclasses that support fine-grained metrics (see discussion ofadaptArgsAndRun, below). dbLoader : lsst.pipe.base.TaskA subtask responsible for creating a
Ppdbobject from the dataset type. Itsrunmethod must accept a dataset of the same type as indicated byPpdbMetricTask.getInputDatasetTypes.Until plans for Butler database support are finalized, config writers should explicitly retarget this task instead of assuming a default. It may be possible to enforce this practice by not providing a default implementation and clearly documenting the supported option(s).
3.3.2 Concrete Members¶
adaptArgsAndRun(dbInfo: dict from str to any, inputDataIds: unused, outputDataIds: dict from str to DataId) : lsst.pipe.base.StructThis method shall load the database using
dbLoaderbefore callingmakeMeasurement.PpdbMetricTaskoverridesadaptArgsAndRunin order to support fine-grained metrics: while a repository should have only one prompt products database, metrics may wish to examine subsets grouped by visit, CCD, etc., and if so these details must be passed tomakeMeasurement.This method is not necessary in the initial implementation, which will not support fine-grained metrics.
run(dbInfo: any) : lsst.pipe.base.Struct- This method shall be a simplified version of
adaptArgsAndRunfor use beforePipelineTaskis ready. Its behavior shall be equivalent toadaptArgsAndRuncalled with empty data IDs. getInputDatasetTypes(config: self.ConfigClass) : dict from str to DatasetTypeDescriptor [initially str to str]- This method shall return a single mapping from
"dbInfo"to a suitable dataset type: either the type of the top-level data processing task’s config, or some future type specifically designed for database support.
4 Compatibility Components¶
We expect to deploy new metrics before PipelineTask is ready for general use.
Therefore, the initial framework will include extra classes that allow MetricTask to function without PipelineTask features.
By far the best way to simultaneously deal with the incompatible Butler 2 and Butler 3 APIs would be an adapter class that allows MetricTask classes initially written without PipelineTask support to serve as lsst.pipe.base.PipelineTask.
Unfortunately, the design of such an adapter is complicated by the strict requirements on PipelineTask constructor signatures and the use of configs as a Task‘s primary API.
I suspect that both problems may be solved by applying a decorator to the appropriate type objects rather than using a conventional class or object adapter for Task or Config objects, but the design of such an decorator is best addressed separately.
4.1 MetricsControllerTask¶
This class shall execute a configurable set of metrics, handling Butler I/O and Measurement output internally in a manner similar to JointcalTask.
The MetricTask instances to be executed shall not be treated as subtasks, instead being managed using a multi-valued lsst.pex.config.RegistryField much like meas_base plugins.
MetricsControllerTask shall ignore any configuration in a MetricTask giving its metric a specific level of granularity; the granularity shall instead be inferred from MetricsControllerTask inputs.
In addition, MetricsControllerTask will not support metrics that depend on other metrics.
4.1.1 Concrete Members¶
runDataRefs(datarefs: iterable of lsst.daf.persistence.ButlerDataRef) : lsst.pipe.base.StructThis method shall, for each configured
MetricTaskand eachdataref, load the metric’s input dataset(s) and pass them to the task (viaadaptArgsAndRun), collecting the resultingMeasurementobjects and persisting them to a configuration-specified file. The return value shall contain a field,job, mapping to alsst.verify.Jobcontaining the measurements.The granularity of each
datarefshall define the granularity of the corresponding measurement, and must be the same as or coarser than the granularity of eachMetricTask'sinput data. The safest way to support metrics of different granularities is to handle each granularity with an independently configuredMetricsControllerTaskobject.It is assumed that, since
MetricsControllerTaskis a placeholder, the implementation ofrunDataRefswill be something simple like a loop. However, it may use internal dataset caching or parallelism to speed things up if it proves necessary.measurers : iterable of MetricTask- This attribute contains all the metric measuring objects to be called by
runDataRefs. It is initialized from aRegistryFieldinMetricsControllerConfig. metadataAdder: lsst.pipe.base.Task- A subtask responsible for adding Job-level metadata required by a particular client (e.g., SQuaSH).
Its
runmethod must accept alsst.verify.Jobobject and return alsst.pipe.base.Structwhosejobfield maps to a modifiedJob.
4.2 MetricRegistry¶
This class shall expose a single instance of lsst.pex.config.Registry.
MetricsControllerConfig will depend on this class to create a valid RegistryField.
It can be easily removed once MetricsControllerTask is retired.
4.2.1 Concrete Members¶
registry : lsst.pex.config.Registry- This registry will allow
MetricsControllerConfigto handle allMetricTaskclasses decorated byregister. It should not require a custom subclass oflsst.pex.config.Registry, but if the need arose,MetricRegistrycould be easily turned into a singleton class.
4.3 register¶
register(name: str) : callable(MetricTask-type)This class decorator shall register the class with
MetricRegistry.registry. IfMetricRegistrydoes not exist, it shall have no effect.This decorator can be phased out once
MetricsControllerTaskis retired.