Find Runs in a Catalog

In this tutorial we will:

  • Look up a specific Run by some identifier.

  • Look up a specific Run based on recency (e.g. “Show me the data I just took”).

  • Search for Runs using both simple and complex search queries.

Set up for Tutorial

Before you begin, install databroker and databroker-pack, following the Installation Tutorial.

Start your favorite interactive Python environment, such as ipython or jupyter lab.

For this tutorial, we’ll use a catalog of publicly available, openly licensed sample data. Specifically, it is high-quality transmission XAS data from all over the periodical table.

This utility downloads it and makes it discoverable to Databroker.

In [1]: import databroker.tutorial_utils

In [2]: databroker.tutorial_utils.fetch_BMM_example()
Out[2]: bluesky-tutorial-BMM:
  args:
    name: bluesky-tutorial-BMM
    paths:
    - /home/runner/.local/share/bluesky_tutorial_data/bluesky-tutorial-BMM/documents/*.msgpack
    root_map: {}
  description: ''
  driver: databroker._drivers.msgpack.BlueskyMsgpackCatalog
  metadata:
    catalog_dir: /home/runner/.local/share/intake/
    generated_by:
      library: databroker_pack
      version: 0.3.0
    relative_paths:
    - ./documents/*.msgpack

Access the catalog and assign it to a variable for convenience.

In [3]: import databroker

In [4]: catalog = databroker.catalog['bluesky-tutorial-BMM']

Look-up

In this section we will look up a Run by its

  • Globally unique identifier — unmemorable, but great for scripts

  • Counting-number “scan ID” — easier to remember, but not necessarily unique

  • Recency — e.g. “the data I just took”

If you know exactly which Run you are looking for, the surest way to get it is to look it up by its globally unique identifier, its “uid”. This is the recommended way to look up runs in scripts but it is not especially fluid for interactive use.

In [5]: catalog['c07e765b-ce5c-4c75-a16e-06f66546c1d4']
Out[5]: 
BlueskyRun
  uid='c07e765b-ce5c-4c75-a16e-06f66546c1d4'
  exit_status='success'
  2020-03-07 10:13:25.108 -- 2020-03-07 10:24:58.551
  Streams:
    * baseline
    * primary

The uid may be abbreviated. The first 7 or 8 characters are usually sufficient to uniquely identify an entry.

In [6]: catalog['c07e765']
Out[6]: 
BlueskyRun
  uid='c07e765b-ce5c-4c75-a16e-06f66546c1d4'
  exit_status='success'
  2020-03-07 10:13:25.108 -- 2020-03-07 10:24:58.551
  Streams:
    * baseline
    * primary

If the abbreviated uid is ambiguous—if it matches more than one Run—a ValueError is raised listing the matches. Try catalog['a'], which will match two Runs in this Catalog and raise that error.

Runs typically also have a counting number identifier, dubbed scan_id. This is easier to remember. Keep in mind that scan_id is not neccesarily unique, and Databroker will always give you the most recent match. Some users are in the habit of resetting scan_id to 1 at the beginning of a new experiment or operating cycle. This is why lookup based on the globally unique identifier is safest for scripts and Jupyter notebooks, especially long-lived ones.

In [7]: catalog[23463]
Out[7]: 
BlueskyRun
  uid='4393404b-8986-4c75-9a64-d7f6949a9344'
  exit_status='success'
  2020-03-07 10:29:49.483 -- 2020-03-07 10:41:20.546
  Streams:
    * baseline
    * primary

Finally, it is often convenient to access data by recency, as in “the data that I just took”.

In [8]: catalog[-1]
Out[8]: 
BlueskyRun
  uid='12a63104-f8e1-4491-9f3e-e03a30575e33'
  exit_status='success'
  2020-03-09 00:44:03.191 -- 2020-03-09 00:54:38.510
  Streams:
    * baseline
    * primary

This syntax is meant to feel similar to accessing elements in a list or array in Python, where a[-N] means “N elements from the end of a”.

In summary:

catalog["..."]

Globally unique identifier (“uid”)

catalog[N]

Counting number “scan ID” N (most recent match)

catalog[-N]

Nth most recent Run in the Catalog

All of these always return one BlueskyRun or raise an exception.