Navigate Metadata in a Run¶
In this tutorial we will access secondary measurements and metadata including:
Hardware configuration readings (e.g. exposure time)
User-provided context like sample information
Whether the Run completed with an error (and if so what error)
Hardware-level timestamps for each measurement
Set up for Tutorial¶
Before you begin, install databroker
and databroker-pack
, following the
Installation Tutorial.
Start your favorite interactive Python environment, such as ipython
jupyter lab
For this tutorial, we’ll use a catalog of publicly available, openly licensed sample data. Specifically, it is high-quality transmission XAS data from all over the periodical table.
This utility downloads it and makes it discoverable to Databroker.
In [1]: import databroker.tutorial_utils
In [2]: databroker.tutorial_utils.fetch_BMM_example()
Out[2]: bluesky-tutorial-BMM:
name: bluesky-tutorial-BMM
- /home/runner/.local/share/bluesky_tutorial_data/bluesky-tutorial-BMM/documents/*.msgpack
root_map: {}
description: ''
driver: databroker._drivers.msgpack.BlueskyMsgpackCatalog
catalog_dir: /home/runner/.local/share/intake/
library: databroker_pack
version: 0.3.0
- ./documents/*.msgpack
Access the catalog and assign it to a variable for convenience.
In [3]: import databroker
In [4]: catalog = databroker.catalog['bluesky-tutorial-BMM']
Let’s take a Run from this Catalog.
In [5]: run = catalog[23463]
(Hardware) Configuration¶
The Run may include configurational readings necessary for interpreting the data. These are typically things that change slowly or not at all during the Run, like detector exposure time, detector gain settings, or the configured maximum motor velocity.
First, let’s look at the I0
readings in the primary
stream. What are
the configuration readings that might be necessary to interpret this data or
compare it with other data?
In [6]: da =["I0"]
In [7]: da.head()
<xarray.DataArray 'I0' (time: 5)>
array([135.16644077, 134.97916279, 134.78557224, 134.63573771,
* time (time) float64 1.584e+09 1.584e+09 1.584e+09 1.584e+09 1.584e+09
object: quadem1
This section at the bottom of that summary
object: quadem1
is showing us that I0
was measured by the device quadem1
. We can also
access that programmatically like
In [8]: da.attrs.get("object")
Out[8]: 'quadem1'
We can then look up all the configuration readings associated with quadem1
in this stream.
In [9]: run.primary.config["quadem1"].read()
Dimensions: (time: 411)
* time (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
Data variables:
quadem1_integration_time (time) float64 0.0004 0.0004 ... 0.0004 0.0004
quadem1_averaging_time (time) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
quadem1_em_range (time) <U6 '350 pC' '350 pC' ... '350 pC' '350 pC'
quadem1_values_per_read (time) int64 5 5 5 5 5 5 5 5 5 ... 5 5 5 5 5 5 5 5
quadem1_num_averaged (time) int64 250 250 250 250 ... 250 250 250 250
If another Run ran the quadem1
detector with a different integration
time, we could use this information to normalize the readings and compare them
TO DO: Get an example of that.
Let’s look at some other readings in the dataset. The It
also comes from
, so those same configuration readings apply.
In [10]: ds["It"].attrs
Out[10]: {'object': 'quadem1'}
The dcm_energy``readings, on the other hand, comes from a different device,
which happens to also be named ``dcm_energy
In [11]: ds["dcm_energy"].attrs
Out[11]: {'object': 'dcm_energy'}
We can see that no configuration was recorded for that device.
In [12]: run.primary.config["dcm_energy"].read()
Dimensions: ()
Data variables:
How It Started¶
There are many useful pieces of metadata that we know at the start, before we begin acquiring data or running data processing/analysis. This includes what we intend to do (i.e. which scan type or which data processing routine), who is doing it, and any additional context like sample information.
The only fields guaranteed by Databroker to be present are uid
globally unique identifier for the Run) and time
(when it started) but
there is often a great deal more.
>>> run.metadata["start"]
'XDI': {'Beamline': {'collimation': 'paraboloid mirror, 5 nm Rh on 30 nm Pt',
'focusing': 'not in use',
'harmonic_rejection': 'flat mirror, Pt stripe, pitch = '
'7.0 mrad relative to beam',
'name': 'BMM (06BM) -- Beamline for Materials '
'xray_source': 'NSLS-II three-pole wiggler'},
'Column': {},
'Detector': {'I0': '10 cm N2', 'Ir': '25 cm N2', 'It': '25 cm N2'},
'Element': {'edge': 'K', 'symbol': 'Ni'},
'Facility': {'GUP': 305832,
'SAF': 305669,
'current': '399.6',
'cycle': '2020-1',
'energy': '3.0',
'mode': 'top-off',
'name': 'NSLS-II'},
'Mono': {'angle_offset': 16.058109,
'd_spacing': '3.1353241',
'direction': 'forward',
'encoder_resolution': 5e-06,
'name': 'Si(111)',
'scan_mode': 'fixed exit',
'scan_type': 'step'},
'Sample': {'name': 'Ni', 'prep': 'Ni foil in ref'},
'Scan': {'edge_energy': 8332.800000000001,
'experimenters': 'Neil Hyatt, Martin Stennett, Dan Austin, '
'Seb Lawson'},
... # snipped for brevity
How It Ended¶
There are other things we can only know at the stop (end) of an experiment, including when and how it finished and how many events (rows) of data were collected in each stream.
In [13]: run.metadata["stop"]
Stop({'exit_status': 'success',
'num_events': {'baseline': 2, 'primary': 411},
'run_start': '4393404b-8986-4c75-9a64-d7f6949a9344',
'time': 1583577680.5466747,
'uid': '1548ec1e-1a01-4df2-9a2b-e17927971c58'})
We can use this to print the unique IDs of any experiments that failed
In [14]: for uid, run in catalog.items():
....: if run.metadata["stop"]["exit_status"] != "success":
....: print(f"Run {uid} failed!")
or, getting a bit fancier, to tally the number of failures.
In [15]: from collections import Counter
In [16]: counter = Counter()
In [17]: for _, run in catalog.items():
....: counter.update({run.metadata["stop"]["exit_status"]: 1})
In [18]: counter
Out[18]: Counter({'success': 123})
TO DO: Obtain an example catalog that has some failures in it so that this example is not so trivial.
Low-level Hardware Timestamps¶
Any preicse timing measurements should be in the data itself, not in this supplemental hardware timestamp metadata. This should generally be considered good for ~0.1 second precision alignment.
Control systems provide us with individually timestamps for every reading. These should generally not be used for data analysis. Any timing readings necessary for analysis should be recorded as data, as a column in some stream. These are intended to be used for debugging and troubleshooting.
The timestamps associated with the readings in
available as
In [19]:
Dimensions: (time: 411)
* time (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
Data variables:
I0 (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
It (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
Ir (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
dwti_dwell_time (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
dwti_dwell_time_setpoint (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
dcm_energy (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
dcm_energy_setpoint (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
Configuration readings also come with timestamps. The timestamps associated
with the configuration readings in run.primary.config["quadem1"].read()
available as
In [20]: run.primary.config_timestamps["quadem1"].read()
Dimensions: (time: 411)
* time (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
Data variables:
quadem1_integration_time (time) float64 1.578e+09 1.578e+09 ... 1.578e+09
quadem1_averaging_time (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
quadem1_em_range (time) float64 1.578e+09 1.578e+09 ... 1.578e+09
quadem1_values_per_read (time) float64 1.578e+09 1.578e+09 ... 1.578e+09
quadem1_num_averaged (time) float64 1.584e+09 1.584e+09 ... 1.584e+09