Data Storage and Management#

The flyscanning plans developed for PandA rely on the streaming capabilities of Bluesky enabled in the recent versions of EventModel and make use of StreamResource and StreamDatum documents. The legacy approach for managing collected Bluesky data with databroker does not support these types of documents. Hence, to improve the conveneince of data access, it is recommended that the flyscanning solution be deployed along with the Tiled data management service backed by an SQL catalog. See the official documentation of Tiled for details on how to deploy and configure it.

Once the Tiled service is set up, the writing of flyscanning data collected with Bluesky plans is achieved via a dedicated TiledWriter callback instantiated with a Tiled client and subscribed to the RunEngine. Please ensure that the client has appropriate writing permissions to the desired Tiled catalog. Assuming that the Tiled server is started locally with tiled serve catalog --temp --api-key=secret (or using a more detaild configuration as descibed here), the TiledWriter can be initialized as shown below.

from bluesky import RunEngine
from bluesky.callbacks.tiled_writer import TiledWriter
from tiled.client import from_uri

RE = RunEngine()
tiled_client = from_uri("http://localhost:8000", api_key="secret")
tw = TiledWriter(tiled_client)
RE.subscribe(tw)

tiled_client

Tiled allows the user to work with their data using the familiar array or dataframe API without the need to consider specifics of how the data is stored. Under the hood, however, Tiled distinguishes between two types of data based on their origin. First, the data that originates from the Bluesky Event documents and is transmitted in the stream itself (so called “slow” data: scalar measurements, motor positions, or device configurations) are saved by Tiled itself in its internal writable storage. Second, the “fast” data are typically produced by area detectors at much higher rates and saved directly to the file system; Tiled then uses references to the saved location from the StreamResource documents and registers the file in the catalog of external assets. Typoically, datasets produced by the PandA controller would also be saved externally as HDF5 files and registered similarly.

TiledWriter handles both types of data and places them in subcontainers according the stream name (configured in the Descriptor document). More detail about the layout of the resulting container can be found here.

In this tutorial, to demonstrate these capabilities, we mimic the Bluesky document stream using simulated ophyd devices that write internal and external data.

from ophyd.sim import det
from ophyd.sim import hw
import bluesky.plans as bp
from pathlib import Path

# Collect and write "internal" (Event) data
uid, = RE(bp.count([det], 10))

print(f"Finished collecting data for Bluesky run with uid = {uid}")

tiled_client[uid]

# Read the acquired data as an xarray dataset
tiled_client[uid]['streams/primary'].read()

# Collect and write external data
Path('/tmp/tiled_data').mkdir(exist_ok=True)
uid, = RE(bp.count([hw(save_path='/tmp/tiled_data').img], 10))

print(f"Finished collecting data for Bluesky run with uid = {uid}")

Note: to be able to retrieve the registered external data from Tiled, please ensure that the server has read access to the storgae location, e.g. start it using

tiled serve catalog --temp --api-key=secret -r /tmp/tiled_data

tiled_client[uid]['streams/primary'].read()