Navigate Metadata in a Run
==========================

In this tutorial we will access secondary measurements and metadata including:

* Hardware configuration readings (e.g. exposure time)
* User-provided context like sample information
* Whether the Run completed with an error (and if so what error)
* Hardware-level timestamps for each measurement

Set up for Tutorial
-------------------

Before you begin, install ``databroker`` and ``databroker-pack``, following the
:doc:`install`.

Start your favorite interactive Python environment, such as ``ipython`` or
``jupyter lab``.

For this tutorial, we'll use a catalog of publicly available, openly licensed
sample data. Specifically, it is high-quality transmission XAS data from all
over the periodical table.

This utility downloads it and makes it discoverable to Databroker.

.. ipython:: python

   import databroker.tutorial_utils
   databroker.tutorial_utils.fetch_BMM_example()

Access the catalog and assign it to a variable for convenience.

.. ipython:: python

   import databroker
   catalog = databroker.catalog['bluesky-tutorial-BMM']

Let's take a Run from this Catalog.

.. ipython:: python

   run = catalog[23463]

(Hardware) Configuration
------------------------

The Run may include configurational readings necessary for interpreting the
data. These are typically things that change slowly or not at all during the
Run, like detector exposure time, detector gain settings, or the configured
maximum motor velocity.

First, let's look at the ``I0`` readings in the ``primary`` stream. What are
the configuration readings that might be necessary to interpret this data or
compare it with other data?

.. ipython:: python

   da = run.primary.read()["I0"]
   da.head()

This section at the bottom of that summary

.. code::

   Attributes:
       object:   quadem1

is showing us that ``I0`` was measured by the device ``quadem1``. We can also
access that programmatically like

.. ipython:: python

   da.attrs.get("object")

We can then look up all the configuration readings associated with ``quadem1``
in this stream.

.. ipython:: python

   run.primary.config["quadem1"].read()

If another Run ran the ``quadem1`` detector with a *different* integration
time, we could use this information to normalize the readings and compare them
accurately.

TO DO: Get an example of that.

Let's look at some other readings in the dataset. The ``It`` also comes from
``quadem1``, so those same configuration readings apply.

.. ipython:: python

   ds["It"].attrs

The ``dcm_energy``readings, on the other hand, comes from a different device,
which happens to also be named ``dcm_energy``.

.. ipython:: python

   ds["dcm_energy"].attrs

We can see that no configuration was recorded for that device.

.. ipython:: python

   run.primary.config["dcm_energy"].read()

How It Started
--------------

There are many useful pieces of metadata that we know at the **start**, before
we begin acquiring data or running data processing/analysis. This includes what
we intend to do (i.e. which scan type or which data processing routine), who is
doing it, and any additional context like sample information.

The only fields *guaranteed* by Databroker to be present are ``uid`` (a
globally unique identifier for the Run) and ``time`` (when it started) but
there is often a great deal more.

.. code:: python

   >>> run.metadata["start"]
   Start({
   'XDI': {'Beamline': {'collimation': 'paraboloid mirror, 5 nm Rh on 30 nm Pt',
                      'focusing': 'not in use',
                      'harmonic_rejection': 'flat mirror, Pt stripe, pitch = '
                                            '7.0 mrad relative to beam',
                      'name': 'BMM (06BM) -- Beamline for Materials '
                              'Measurement',
                      'xray_source': 'NSLS-II three-pole wiggler'},
         'Column': {},
         'Detector': {'I0': '10 cm N2', 'Ir': '25 cm N2', 'It': '25 cm N2'},
         'Element': {'edge': 'K', 'symbol': 'Ni'},
         'Facility': {'GUP': 305832,
                      'SAF': 305669,
                      'current': '399.6',
                      'cycle': '2020-1',
                      'energy': '3.0',
                      'mode': 'top-off',
                      'name': 'NSLS-II'},
         'Mono': {'angle_offset': 16.058109,
                  'd_spacing': '3.1353241',
                  'direction': 'forward',
                  'encoder_resolution': 5e-06,
                  'name': 'Si(111)',
                  'scan_mode': 'fixed exit',
                  'scan_type': 'step'},
         'Sample': {'name': 'Ni', 'prep': 'Ni foil in ref'},
         'Scan': {'edge_energy': 8332.800000000001,
                  'experimenters': 'Neil Hyatt, Martin Stennett, Dan Austin, '
                                   'Seb Lawson'},

    ...  # snipped for brevity
    }

How It Ended
------------

There are other things we can only know at the **stop** (end) of an experiment,
including when and how it finished and how many events (rows) of data were
collected in each stream.

.. ipython:: python

   run.metadata["stop"]

We can use this to print the unique IDs of any experiments that failed

.. ipython:: python

   for uid, run in catalog.items():
       if run.metadata["stop"]["exit_status"] != "success":
           print(f"Run {uid} failed!")

or, getting a bit fancier, to tally the number of failures.

.. ipython:: python

   from collections import Counter

   counter = Counter()
   for _, run in catalog.items():
       counter.update({run.metadata["stop"]["exit_status"]: 1})
   counter

TO DO: Obtain an example catalog that has some failures in it so that this
example is not so trivial.

Low-level Hardware Timestamps
-----------------------------

.. note::

   Any *preicse* timing measurements should be in the data itself, not in this
   supplemental hardware timestamp metadata. This should generally be
   considered good for ~0.1 second precision alignment.

Control systems provide us with individually timestamps for every reading.
These should generally *not* be used for data analysis. Any timing readings
necessary for analysis should be recorded as data, as a column in some stream.
These are intended to be used for debugging and troubleshooting.

The timestamps associated with the readings in ``run.primary.read()`` are
available as

.. ipython:: python

   run.primary.timestamps.read()

Configuration readings also come with timestamps. The timestamps associated
with the configuration readings in ``run.primary.config["quadem1"].read()`` are
available as

.. ipython:: python

   run.primary.config_timestamps["quadem1"].read()