Promote Resource / Datum to first-class documents

Status

Discussion

Abstract

Currently Resource and Datum are directly inserted into the AssetRegistry by ophyd. This breaks the document abstractions by making a specific consumer ‘special’.

Detailed description

An odd asymmetry in how databroker works is that the documents for HeaderSource and EventSource are emitted by the RunEngine and can be subscribed to by one or more consumers. Each consumer is notionally independent, each receive all of the documents, and do not need to coordinate in any way (or even be aware of one another’s existence). In contrast, the Resource and Datum documents are inserted directly into an AssetRegistry by the ophyd objects. This breaks the separation we have between the data collection process / hardware, the generation of the documents, and the consumers of those documents and leads to several unfortunate situations:

  • ohpyd objects hold an instance of an AssetRegisty

  • we need to keep track of which AssertRegistry things were inserted into

  • consumers that want access to the asset documents need to also have a handle to the database that the objects are inserting into

The proposed solution is to promote Resource and Datum documents to be peer documents with Start, Stop, Descriptor and Event. They will appear in the document stream and be inserted into DataBroker via db.insert. This eliminates the ‘special’ side-band communication and brings all consumers back to the same footing. This will require coordinated changes to event-model, databroker, bluesky, and ophyd.

Implementation

Currently, ophyd is responsible for collecting all of the values for the Resource and Datum documents except for the uids. The uids are generated by calls to reg.register_* and the datum uids are subsequently returned to the RunEngine via obj.read. The proposed change is:

  1. ophyd objects would be responsible for generating the full Resource and Datum documents and providing them to the RunEngine to be emitted. ophyd may provide some helpers to make generating compliant documents easy.

    1. Similar to the current documents, a Resource must be emitted before any Datum that refers to it. A Datum can only refer to a Resource that as been emitted after the most recent Start and before the Stop for the most recent Start.

    2. an identical (including uid) Resource and Datum maybe emitted more than once, the consumers will need to handle this.

    3. The Datum documents must be yielded only in the first collect_asset_docs for which there UID is in read.

    4. The Resource documents must only be yielded in the first collect_asset_docs which includes a Datum that refers to it.

    5. Calls to read and collect_asset_docs must be idempotent.

    Identical Resource and Datum documents are to support a single Resource that may span many runs, such as background images, and still ensure that with in the scope of a Start / Stop pair a consumer will see all of the documents required.

  2. in save before the Event document is emitted the RunEngine will acquire and emit any AssetRegistry Documents.

    1. in save the RunEngine knows what objects in the bundle, call collect_asset_docs method

      def collect_asset_docs(self) -> Iterator[Tuple[str, Dict[str, Any]]]:
          ...
      

      which will yield the (name, doc) pairs for anything that was just read.

    2. these documents will be emitted before the Event

  3. consumers will now have access to all relevant documents and can do what ever they want with them (insert into an asset registry, live processing / display, copy files else where)

event-model

  1. add schema for Resource and Datum

  2. assert that datum_id must be of the form {resource_id}/{N}. This is required to support columnar stores where the Datum documents are group by Resource id.

databroker

  1. teach insert how to deal with the additional documents.

  2. revert API changes to use register_* which generate the uids.

  3. helper tools for generating Resource and Datum documents (maybe in ohpyd?)

ophyd

  1. implement new document generation methods on all devices that have external data.

bluesky

  1. implement above logic in RunEngine._save

Backward Compatibility

This will break all of the devices that currently use AssetRegistry, however it will not change anything on the retrieve side. The constraints on the datum_id can not be applied retro-actively, but can be applied to all future data.

This excludes the option of having IOCs directly insert Resource and Datum documents and expose datum_id values to the EPICS layer. We only have one experimental use of this (GeRM caproto IOC). This level of flexibility is not worth non-uniformity at the document level. If we want to have the IOC generate all of the values (including the uids), then they should expose those values to EPICs and the ophyd object will only be responsible for marshaling those values.

Alternatives

Eliminate Resource and Datum as stand alone documents

An alternative considered was to eliminate the Resource and Datum documents all together by merging Resource into Descriptor and Datum into Event. However, this would break several long-standing design principles:

  • all values in ev['data'] are unstructured (scalar, strings, arrays)

  • Descriptors are immutable

In addition to breaking the insert side, this would also be a major change on the retrieval side and would require maintaining either two implementations forever or to migrate all existing data.

This would also require the ophyd objects having a way to notify the RunEngine that it’s configuration / resource was stale so that the Descriptor cache could be invalidated. (this is probably a good idea anyway).

Despite being superficially simpler, the fallout from this alternative would be far greater.