Navigate Metadata in a Run¶

In this tutorial we will access secondary measurements and metadata including:

Hardware configuration readings (e.g. exposure time)
User-provided context like sample information
Whether the Run completed with an error (and if so what error)
Hardware-level timestamps for each measurement

Set up for Tutorial¶

Before you begin, install databroker and databroker-pack, following the Installation Tutorial.

Start your favorite interactive Python environment, such as ipython or jupyter lab.

For this tutorial, we’ll use a catalog of publicly available, openly licensed sample data. Specifically, it is high-quality transmission XAS data from all over the periodical table.

This utility downloads it and makes it discoverable to Databroker.

In [1]: import databroker.tutorial_utils

In [2]: databroker.tutorial_utils.fetch_BMM_example()
Out[2]: bluesky-tutorial-BMM:
  args:
    name: bluesky-tutorial-BMM
    paths:
    - /home/runner/.local/share/bluesky_tutorial_data/bluesky-tutorial-BMM/documents/*.msgpack
    root_map: {}
  description: ''
  driver: databroker._drivers.msgpack.BlueskyMsgpackCatalog
  metadata:
    catalog_dir: /home/runner/.local/share/intake/
    generated_by:
      library: databroker_pack
      version: 0.3.0
    relative_paths:
    - ./documents/*.msgpack

Access the catalog and assign it to a variable for convenience.

In [3]: import databroker

In [4]: catalog = databroker.catalog['bluesky-tutorial-BMM']

Let’s take a Run from this Catalog.

In [5]: run = catalog[23463]

(Hardware) Configuration¶

The Run may include configurational readings necessary for interpreting the data. These are typically things that change slowly or not at all during the Run, like detector exposure time, detector gain settings, or the configured maximum motor velocity.

First, let’s look at the I0 readings in the primary stream. What are the configuration readings that might be necessary to interpret this data or compare it with other data?

In [6]: da = run.primary.read()["I0"]

In [7]: da.head()
Out[7]: 
<xarray.DataArray 'I0' (time: 5)>
array([135.16644077, 134.97916279, 134.78557224, 134.63573771,
       134.70294218])
Coordinates:
  * time     (time) float64 1.584e+09 1.584e+09 1.584e+09 1.584e+09 1.584e+09
Attributes:
    object:   quadem1

This section at the bottom of that summary

Attributes:
    object:   quadem1

is showing us that I0 was measured by the device quadem1. We can also access that programmatically like

In [8]: da.attrs.get("object")
Out[8]: 'quadem1'

We can then look up all the configuration readings associated with quadem1 in this stream.

In [9]: run.primary.config["quadem1"].read()
Out[9]: 
<xarray.Dataset>
Dimensions:                   (time: 411)
Coordinates:
  * time                      (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
Data variables:
    quadem1_integration_time  (time) float64 0.0004 0.0004 ... 0.0004 0.0004
    quadem1_averaging_time    (time) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    quadem1_em_range          (time) <U6 '350 pC' '350 pC' ... '350 pC' '350 pC'
    quadem1_values_per_read   (time) int64 5 5 5 5 5 5 5 5 5 ... 5 5 5 5 5 5 5 5
    quadem1_num_averaged      (time) int64 250 250 250 250 ... 250 250 250 250

If another Run ran the quadem1 detector with a different integration time, we could use this information to normalize the readings and compare them accurately.

TO DO: Get an example of that.

Let’s look at some other readings in the dataset. The It also comes from quadem1, so those same configuration readings apply.

In [10]: ds["It"].attrs
Out[10]: {'object': 'quadem1'}

The dcm_energy``readings, on the other hand, comes from a different device, which happens to also be named ``dcm_energy.

In [11]: ds["dcm_energy"].attrs
Out[11]: {'object': 'dcm_energy'}

We can see that no configuration was recorded for that device.

In [12]: run.primary.config["dcm_energy"].read()
Out[12]: 
<xarray.Dataset>
Dimensions:  ()
Data variables:
    *empty*

How It Started¶

There are many useful pieces of metadata that we know at the start, before we begin acquiring data or running data processing/analysis. This includes what we intend to do (i.e. which scan type or which data processing routine), who is doing it, and any additional context like sample information.

The only fields guaranteed by Databroker to be present are uid (a globally unique identifier for the Run) and time (when it started) but there is often a great deal more.

>>> run.metadata["start"]
Start({
'XDI': {'Beamline': {'collimation': 'paraboloid mirror, 5 nm Rh on 30 nm Pt',
                   'focusing': 'not in use',
                   'harmonic_rejection': 'flat mirror, Pt stripe, pitch = '
                                         '7.0 mrad relative to beam',
                   'name': 'BMM (06BM) -- Beamline for Materials '
                           'Measurement',
                   'xray_source': 'NSLS-II three-pole wiggler'},
      'Column': {},
      'Detector': {'I0': '10 cm N2', 'Ir': '25 cm N2', 'It': '25 cm N2'},
      'Element': {'edge': 'K', 'symbol': 'Ni'},
      'Facility': {'GUP': 305832,
                   'SAF': 305669,
                   'current': '399.6',
                   'cycle': '2020-1',
                   'energy': '3.0',
                   'mode': 'top-off',
                   'name': 'NSLS-II'},
      'Mono': {'angle_offset': 16.058109,
               'd_spacing': '3.1353241',
               'direction': 'forward',
               'encoder_resolution': 5e-06,
               'name': 'Si(111)',
               'scan_mode': 'fixed exit',
               'scan_type': 'step'},
      'Sample': {'name': 'Ni', 'prep': 'Ni foil in ref'},
      'Scan': {'edge_energy': 8332.800000000001,
               'experimenters': 'Neil Hyatt, Martin Stennett, Dan Austin, '
                                'Seb Lawson'},

 ...  # snipped for brevity
 }

How It Ended¶

There are other things we can only know at the stop (end) of an experiment, including when and how it finished and how many events (rows) of data were collected in each stream.

In [13]: run.metadata["stop"]
Out[13]: 
Stop({'exit_status': 'success',
 'num_events': {'baseline': 2, 'primary': 411},
 'run_start': '4393404b-8986-4c75-9a64-d7f6949a9344',
 'time': 1583577680.5466747,
 'uid': '1548ec1e-1a01-4df2-9a2b-e17927971c58'})

We can use this to print the unique IDs of any experiments that failed

In [14]: for uid, run in catalog.items():
   ....:     if run.metadata["stop"]["exit_status"] != "success":
   ....:         print(f"Run {uid} failed!")
   ....: 

or, getting a bit fancier, to tally the number of failures.

In [15]: from collections import Counter

In [16]: counter = Counter()

In [17]: for _, run in catalog.items():
   ....:     counter.update({run.metadata["stop"]["exit_status"]: 1})
   ....: 

In [18]: counter
Out[18]: Counter({'success': 123})

TO DO: Obtain an example catalog that has some failures in it so that this example is not so trivial.

Low-level Hardware Timestamps¶

Note

Any preicse timing measurements should be in the data itself, not in this supplemental hardware timestamp metadata. This should generally be considered good for ~0.1 second precision alignment.

Control systems provide us with individually timestamps for every reading. These should generally not be used for data analysis. Any timing readings necessary for analysis should be recorded as data, as a column in some stream. These are intended to be used for debugging and troubleshooting.

The timestamps associated with the readings in run.primary.read() are available as

In [19]: run.primary.timestamps.read()
Out[19]: 
<xarray.Dataset>
Dimensions:                   (time: 411)
Coordinates:
  * time                      (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
Data variables:
    I0                        (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
    It                        (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
    Ir                        (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
    dwti_dwell_time           (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
    dwti_dwell_time_setpoint  (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
    dcm_energy                (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
    dcm_energy_setpoint       (time) float64 1.584e+09 1.584e+09 ... 1.584e+09

Configuration readings also come with timestamps. The timestamps associated with the configuration readings in run.primary.config["quadem1"].read() are available as

In [20]: run.primary.config_timestamps["quadem1"].read()
Out[20]: 
<xarray.Dataset>
Dimensions:                   (time: 411)
Coordinates:
  * time                      (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
Data variables:
    quadem1_integration_time  (time) float64 1.578e+09 1.578e+09 ... 1.578e+09
    quadem1_averaging_time    (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
    quadem1_em_range          (time) float64 1.578e+09 1.578e+09 ... 1.578e+09
    quadem1_values_per_read   (time) float64 1.578e+09 1.578e+09 ... 1.578e+09
    quadem1_num_averaged      (time) float64 1.584e+09 1.584e+09 ... 1.584e+09