Important

You can run this notebook in a live session Binder or view it on nbviewer or GitHub.

Access Saved Data

Data Broker solves two problems:

  • Search for data based on time, a unique identifer, or some other query.

  • Load data into standard scientific Python data structures without worrying about file formats.

We have used Bluesky to acquire several Runs and made them available in this example Catalog.

[1]:
from bluesky_tutorial_utils import get_example_catalog

catalog = get_example_catalog()

What can you do with a Bluesky Catalog?

A Catalog has a length.

[2]:
len(catalog)
[2]:
17

Iterating over a Catalog gives the names of its entries.

[3]:
for name in catalog:
    ...

As with dict objects in Python, iterating over a Catalog’s items() gives (name, entry) pairs.

[4]:
for name, entry in catalog.items():
    ...

The Catalogs support lookup by recency, scan_id, and globally unique ID.

catalog[-1]  # the most recent Run
catalog[-5]  # the fifth-most-recent Run
catalog[3]  # 'scan_id' == 3 (if ambiguous, returns the most recent match)
catalog["6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced"]  # a full globally unique ID...
catalog["6f3ee9"]  # ...or just enough characters to uniquely identify it (6-8 usually suffices)

The globally unique ID is best for use in scripts, but the others are nice for interactive use. All of these incantations return a BlueskyRun.

[5]:
run = catalog[-1]
run
6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced:
  args:
    entry: !!python/object:databroker.core.Entry
      args: []
      cls: databroker.core.Entry
      kwargs:
        name: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
        description: {}
        driver: databroker.core.BlueskyRunFromGenerator
        direct_access: forbid
        args:
          gen_args: !!python/tuple
          - /home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/bluesky_tutorial_utils/example_data/6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced.jsonl
          gen_func: &id003 !!python/name:databroker._drivers.jsonl.gen ''
          gen_kwargs: {}
          get_filler: &id004 !!python/object/apply:functools.partial
            args:
            - &id001 !!python/name:event_model.Filler ''
            state: !!python/tuple
            - *id001
            - !!python/tuple []
            - handler_registry: !!python/object:event_model.HandlerRegistryView
                _handler_registry:
                  NPY_SEQ: !!python/name:ophyd.sim.NumpySeqHandler ''
                  newton: !!python/name:bluesky_tutorial_utils._newton.NewtonHandler ''
                  npy: !!python/name:bluesky_tutorial_utils._old_handlers.NpyHandler ''
                  npy_FRAMEWISE: !!python/name:bluesky_tutorial_utils._old_handlers.NpyFrameWise ''
              inplace: false
              root_map: {}
            - null
          transforms:
            descriptor: &id002 !!python/name:databroker.core._no_op ''
            resource: *id002
            start: *id002
            stop: *id002
        cache: null
        parameters: []
        metadata:
          start:
            detectors:
            - ns
            hints:
              dimensions:
              - - - ns_gap
                - primary
            motors:
            - ns_gap
            num_intervals: 24
            num_points: 25
            operator: Dmitri
            plan_args:
              args:
              - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
              - 0
              - 4
              detectors:
              - NewtonSimulator(prefix='', name='ns', read_attrs=['gap', 'image'],
                configuration_attrs=[])
              num: 25
              per_step: None
            plan_name: scan
            plan_pattern: inner_product
            plan_pattern_args:
              args:
              - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
              - 0
              - 4
              num: 25
            plan_pattern_module: bluesky.plan_patterns
            plan_type: generator
            scan_id: 17
            time: 1580688000.0
            uid: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
            versions:
              bluesky: 1.6.1
              ophyd: 1.5.1b1
          stop:
            exit_status: success
            num_events:
              baseline: 2
              primary: 25
            reason: ''
            run_start: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
            time: 1580688000.0707304
            uid: 44c278f1-f318-44c4-8817-6ef77846b03e
        catalog_dir: null
        getenv: true
        getshell: true
        catalog:
          cls: databroker._drivers.jsonl.BlueskyJSONLCatalog
          args:
          - /home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/bluesky_tutorial_utils/example_data/*.jsonl
          kwargs: {}
    gen_args: !!python/tuple
    - /home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/bluesky_tutorial_utils/example_data/6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced.jsonl
    gen_func: *id003
    gen_kwargs: {}
    get_filler: *id004
    transforms:
      descriptor: *id002
      resource: *id002
      start: *id002
      stop: *id002
  description: ''
  driver: databroker.core.BlueskyRunFromGenerator
  metadata:
    catalog_dir: null
    start: !!python/object/new:databroker.core.Start
      dictitems:
        detectors: &id005
        - ns
        hints: &id006
          dimensions:
          - - - ns_gap
            - primary
        motors: &id007
        - ns_gap
        num_intervals: 24
        num_points: 25
        operator: Dmitri
        plan_args: &id008
          args:
          - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
          - 0
          - 4
          detectors:
          - NewtonSimulator(prefix='', name='ns', read_attrs=['gap', 'image'], configuration_attrs=[])
          num: 25
          per_step: None
        plan_name: scan
        plan_pattern: inner_product
        plan_pattern_args: &id009
          args:
          - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
          - 0
          - 4
          num: 25
        plan_pattern_module: bluesky.plan_patterns
        plan_type: generator
        scan_id: 17
        time: 1580688000.0
        uid: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
        versions: &id010
          bluesky: 1.6.1
          ophyd: 1.5.1b1
      state:
        detectors: *id005
        hints: *id006
        motors: *id007
        num_intervals: 24
        num_points: 25
        operator: Dmitri
        plan_args: *id008
        plan_name: scan
        plan_pattern: inner_product
        plan_pattern_args: *id009
        plan_pattern_module: bluesky.plan_patterns
        plan_type: generator
        scan_id: 17
        time: 1580688000.0
        uid: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
        versions: *id010
    stop: !!python/object/new:databroker.core.Stop
      dictitems:
        exit_status: success
        num_events: &id011
          baseline: 2
          primary: 25
        reason: ''
        run_start: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
        time: 1580688000.0707304
        uid: 44c278f1-f318-44c4-8817-6ef77846b03e
      state:
        exit_status: success
        num_events: *id011
        reason: ''
        run_start: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
        time: 1580688000.0707304
        uid: 44c278f1-f318-44c4-8817-6ef77846b03e

Catalog also support search.

[6]:
results = catalog.search({"plan_name": "count"})
len(results)
[6]:
10

When you search on a Catalog, you get another Catalog with a subset of the entries. You can search on this in turn, progressively narrowing the results.

[7]:
from databroker.queries import TimeRange

jan_results = results.search(TimeRange(since="2020-01-01", until="2020-02-01", timezone="US/Eastern"))
len(jan_results)
[7]:
3

The syntax for these queries is that of MongoDB. It is powerful and flexible, but it takes some getting used to, so databroker is growing higher-level utilities like TimeRange to compose common queries in a user-friendly way. We can peek inside if we like to see the MongoDB query that it generates.

[8]:
dict(TimeRange(since="2020-01-01", until="2020-02-01", timezone="US/Eastern"))
[8]:
{'time': {'$gte': 1577854800.0, '$lt': 1580533200.0}}

Exercise

Build some TimeRange queries, filling in ... below. Notice that you can specify the time with more or less specificity: try just giving YYYY or YYYY-MM or adding a time. Notice that all of the parameters are optional.

[9]:
# catalog.search(TimeRange(...))
[10]:
# catalog.search(TimeRange(...))
[11]:
# catalog.search(TimeRange(...))
[12]:
# catalog.search(TimeRange(...))

What can you with a BlueskyRun?

A BlueskyRun bundles together some metadata and several logical tables (“streams”) of data. First, the metadata. It always comes in two sections, "start" and "stop".

[13]:
run.metadata["start"]  # Everything we know before the measurements start.
[13]:
Start({'detectors': ['ns'],
 'hints': {'dimensions': [[['ns_gap'], 'primary']]},
 'motors': ['ns_gap'],
 'num_intervals': 24,
 'num_points': 25,
 'operator': 'Dmitri',
 'plan_args': {'args': ["Signal(name='ns_gap', parent='ns', value=4.0, "
                        'timestamp=1589570582.2353418)',
                        0,
                        4],
               'detectors': ["NewtonSimulator(prefix='', name='ns', "
                             "read_attrs=['gap', 'image'], "
                             'configuration_attrs=[])'],
               'num': 25,
               'per_step': 'None'},
 'plan_name': 'scan',
 'plan_pattern': 'inner_product',
 'plan_pattern_args': {'args': ["Signal(name='ns_gap', parent='ns', value=4.0, "
                                'timestamp=1589570582.2353418)',
                                0,
                                4],
                       'num': 25},
 'plan_pattern_module': 'bluesky.plan_patterns',
 'plan_type': 'generator',
 'scan_id': 17,
 'time': 1580688000.0,
 'uid': '6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced',
 'versions': {'bluesky': '1.6.1', 'ophyd': '1.5.1b1'}})

The above contains a mixture of things that bluesky automatically recorded (e.g. the time), things the bluesky plan reported (e.g. which motor(s) are scanned), and things the user told us (e.g. the name of the operator).

[14]:
run.metadata["stop"]  # Everything we only know after the measurements stop.
[14]:
Stop({'exit_status': 'success',
 'num_events': {'baseline': 2, 'primary': 25},
 'reason': '',
 'run_start': '6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced',
 'time': 1580688000.0707304,
 'uid': '44c278f1-f318-44c4-8817-6ef77846b03e'})

These objects Start and Stop are just dictionaries. You can dig into their contents in the usual way.

[15]:
run.metadata["start"]["num_points"]
[15]:
25
[16]:
run.metadata["stop"]["exit_status"] == "success"
[16]:
True

As we said, a Run bundles together any number of “streams” of data. Picture these as tables or spreadsheets. The stream names are shown when we print run.

[17]:
run
6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced:
  args:
    entry: !!python/object:databroker.core.Entry
      args: []
      cls: databroker.core.Entry
      kwargs:
        name: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
        description: {}
        driver: databroker.core.BlueskyRunFromGenerator
        direct_access: forbid
        args:
          gen_args: !!python/tuple
          - /home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/bluesky_tutorial_utils/example_data/6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced.jsonl
          gen_func: &id003 !!python/name:databroker._drivers.jsonl.gen ''
          gen_kwargs: {}
          get_filler: &id004 !!python/object/apply:functools.partial
            args:
            - &id001 !!python/name:event_model.Filler ''
            state: !!python/tuple
            - *id001
            - !!python/tuple []
            - handler_registry: !!python/object:event_model.HandlerRegistryView
                _handler_registry:
                  NPY_SEQ: !!python/name:ophyd.sim.NumpySeqHandler ''
                  newton: !!python/name:bluesky_tutorial_utils._newton.NewtonHandler ''
                  npy: !!python/name:bluesky_tutorial_utils._old_handlers.NpyHandler ''
                  npy_FRAMEWISE: !!python/name:bluesky_tutorial_utils._old_handlers.NpyFrameWise ''
              inplace: false
              root_map: {}
            - null
          transforms:
            descriptor: &id002 !!python/name:databroker.core._no_op ''
            resource: *id002
            start: *id002
            stop: *id002
        cache: null
        parameters: []
        metadata:
          start:
            detectors:
            - ns
            hints:
              dimensions:
              - - - ns_gap
                - primary
            motors:
            - ns_gap
            num_intervals: 24
            num_points: 25
            operator: Dmitri
            plan_args:
              args:
              - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
              - 0
              - 4
              detectors:
              - NewtonSimulator(prefix='', name='ns', read_attrs=['gap', 'image'],
                configuration_attrs=[])
              num: 25
              per_step: None
            plan_name: scan
            plan_pattern: inner_product
            plan_pattern_args:
              args:
              - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
              - 0
              - 4
              num: 25
            plan_pattern_module: bluesky.plan_patterns
            plan_type: generator
            scan_id: 17
            time: 1580688000.0
            uid: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
            versions:
              bluesky: 1.6.1
              ophyd: 1.5.1b1
          stop:
            exit_status: success
            num_events:
              baseline: 2
              primary: 25
            reason: ''
            run_start: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
            time: 1580688000.0707304
            uid: 44c278f1-f318-44c4-8817-6ef77846b03e
        catalog_dir: null
        getenv: true
        getshell: true
        catalog:
          cls: databroker._drivers.jsonl.BlueskyJSONLCatalog
          args:
          - /home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/bluesky_tutorial_utils/example_data/*.jsonl
          kwargs: {}
    gen_args: !!python/tuple
    - /home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/bluesky_tutorial_utils/example_data/6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced.jsonl
    gen_func: *id003
    gen_kwargs: {}
    get_filler: *id004
    transforms:
      descriptor: *id002
      resource: *id002
      start: *id002
      stop: *id002
  description: ''
  driver: databroker.core.BlueskyRunFromGenerator
  metadata:
    catalog_dir: null
    start: !!python/object/new:databroker.core.Start
      dictitems:
        detectors: &id005
        - ns
        hints: &id006
          dimensions:
          - - - ns_gap
            - primary
        motors: &id007
        - ns_gap
        num_intervals: 24
        num_points: 25
        operator: Dmitri
        plan_args: &id008
          args:
          - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
          - 0
          - 4
          detectors:
          - NewtonSimulator(prefix='', name='ns', read_attrs=['gap', 'image'], configuration_attrs=[])
          num: 25
          per_step: None
        plan_name: scan
        plan_pattern: inner_product
        plan_pattern_args: &id009
          args:
          - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
          - 0
          - 4
          num: 25
        plan_pattern_module: bluesky.plan_patterns
        plan_type: generator
        scan_id: 17
        time: 1580688000.0
        uid: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
        versions: &id010
          bluesky: 1.6.1
          ophyd: 1.5.1b1
      state:
        detectors: *id005
        hints: *id006
        motors: *id007
        num_intervals: 24
        num_points: 25
        operator: Dmitri
        plan_args: *id008
        plan_name: scan
        plan_pattern: inner_product
        plan_pattern_args: *id009
        plan_pattern_module: bluesky.plan_patterns
        plan_type: generator
        scan_id: 17
        time: 1580688000.0
        uid: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
        versions: *id010
    stop: !!python/object/new:databroker.core.Stop
      dictitems:
        exit_status: success
        num_events: &id011
          baseline: 2
          primary: 25
        reason: ''
        run_start: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
        time: 1580688000.0707304
        uid: 44c278f1-f318-44c4-8817-6ef77846b03e
      state:
        exit_status: success
        num_events: *id011
        reason: ''
        run_start: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
        time: 1580688000.0707304
        uid: 44c278f1-f318-44c4-8817-6ef77846b03e

We can also list them programmatically.

[18]:
list(run)
[18]:
['baseline', 'primary']

We can access a particular stream like run["primary"].read(). Dot access also works — run.primary.read() — if the stream name is a valid Python identifier and does not collide with any other attributes.

[19]:
ds = run["primary"].read()
ds
[19]:
Show/Hide data repr Show/Hide attributes
xarray.Dataset
    • time: 25
    • x: 128
    • y: 128
    • time
      (time)
      float64
      1.581e+09 1.581e+09 ... 1.581e+09
      array([1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09,
             1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09,
             1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09,
             1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09,
             1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09])
    • ns_gap
      (time)
      float64
      0.0 0.1667 0.3333 ... 3.833 4.0
      array([0.        , 0.16666667, 0.33333333, 0.5       , 0.66666667,
             0.83333333, 1.        , 1.16666667, 1.33333333, 1.5       ,
             1.66666667, 1.83333333, 2.        , 2.16666667, 2.33333333,
             2.5       , 2.66666667, 2.83333333, 3.        , 3.16666667,
             3.33333333, 3.5       , 3.66666667, 3.83333333, 4.        ])
    • ns_image
      (time, x, y)
      float64
      1.258 1.959 1.752 ... 1.959 1.258
      array([[[1.25826974, 1.95925757, 1.75195039, ..., 1.75195039,
               1.95925757, 1.25826974],
              [1.95925757, 1.74115687, 0.83008752, ..., 0.83008752,
               1.74115687, 1.95925757],
              [1.75195039, 0.83008752, 0.07707731, ..., 0.07707731,
               0.83008752, 1.75195039],
              ...,
              [1.75195039, 0.83008752, 0.07707731, ..., 0.07707731,
               0.83008752, 1.75195039],
              [1.95925757, 1.74115687, 0.83008752, ..., 0.83008752,
               1.74115687, 1.95925757],
              [1.25826974, 1.95925757, 1.75195039, ..., 1.75195039,
               1.95925757, 1.25826974]],
      
             [[0.29249125, 1.23494802, 1.9468762 , ..., 1.9468762 ,
               1.23494802, 0.29249125],
              [1.23494802, 1.9519689 , 1.76847642, ..., 1.76847642,
               1.9519689 , 1.23494802],
              [1.9468762 , 1.76847642, 0.87194573, ..., 0.87194573,
               1.76847642, 1.9468762 ],
              ...,
              [1.9468762 , 1.76847642, 0.87194573, ..., 0.87194573,
               1.76847642, 1.9468762 ],
              [1.23494802, 1.9519689 , 1.76847642, ..., 1.76847642,
               1.9519689 , 1.23494802],
              [0.29249125, 1.23494802, 1.9468762 , ..., 1.9468762 ,
               1.23494802, 0.29249125]],
      
             [[0.0342215 , 0.27569044, 1.19492582, ..., 1.19492582,
               0.27569044, 0.0342215 ],
              [0.27569044, 1.21081203, 1.9383889 , ..., 1.9383889 ,
               1.21081203, 0.27569044],
              [1.19492582, 1.9383889 , 1.79486842, ..., 1.79486842,
               1.9383889 , 1.19492582],
              ...,
              [1.19492582, 1.9383889 , 1.79486842, ..., 1.79486842,
               1.9383889 , 1.19492582],
              [0.27569044, 1.21081203, 1.9383889 , ..., 1.9383889 ,
               1.21081203, 0.27569044],
              [0.0342215 , 0.27569044, 1.19492582, ..., 1.19492582,
               0.27569044, 0.0342215 ]],
      
             ...,
      
             [[1.70750875, 0.76505198, 0.0531238 , ..., 0.0531238 ,
               0.76505198, 1.70750875],
              [0.76505198, 0.0480311 , 0.23152358, ..., 0.23152358,
               0.0480311 , 0.76505198],
              [0.0531238 , 0.23152358, 1.12805427, ..., 1.12805427,
               0.23152358, 0.0531238 ],
              ...,
              [0.0531238 , 0.23152358, 1.12805427, ..., 1.12805427,
               0.23152358, 0.0531238 ],
              [0.76505198, 0.0480311 , 0.23152358, ..., 0.23152358,
               0.0480311 , 0.76505198],
              [1.70750875, 0.76505198, 0.0531238 , ..., 0.0531238 ,
               0.76505198, 1.70750875]],
      
             [[1.9657785 , 1.72430956, 0.80507418, ..., 0.80507418,
               1.72430956, 1.9657785 ],
              [1.72430956, 0.78918797, 0.0616111 , ..., 0.0616111 ,
               0.78918797, 1.72430956],
              [0.80507418, 0.0616111 , 0.20513158, ..., 0.20513158,
               0.0616111 , 0.80507418],
              ...,
              [0.80507418, 0.0616111 , 0.20513158, ..., 0.20513158,
               0.0616111 , 0.80507418],
              [1.72430956, 0.78918797, 0.0616111 , ..., 0.0616111 ,
               0.78918797, 1.72430956],
              [1.9657785 , 1.72430956, 0.80507418, ..., 0.80507418,
               1.72430956, 1.9657785 ]],
      
             [[1.25826974, 1.95925757, 1.75195039, ..., 1.75195039,
               1.95925757, 1.25826974],
              [1.95925757, 1.74115687, 0.83008752, ..., 0.83008752,
               1.74115687, 1.95925757],
              [1.75195039, 0.83008752, 0.07707731, ..., 0.07707731,
               0.83008752, 1.75195039],
              ...,
              [1.75195039, 0.83008752, 0.07707731, ..., 0.07707731,
               0.83008752, 1.75195039],
              [1.95925757, 1.74115687, 0.83008752, ..., 0.83008752,
               1.74115687, 1.95925757],
              [1.25826974, 1.95925757, 1.75195039, ..., 1.75195039,
               1.95925757, 1.25826974]]])
    • seq_num
      (time)
      int64
      1 2 3 4 5 6 7 ... 20 21 22 23 24 25
      array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
             18, 19, 20, 21, 22, 23, 24, 25])
    • uid
      (time)
      <U36
      '0a952c4c-6c5c-40de-935d-2c2f3554eafc' ... '2f15eee8-c260-432a-b5ef-dcf3a23bd9ad'
      array(['0a952c4c-6c5c-40de-935d-2c2f3554eafc',
             'd2d53fbc-1f9a-4396-b20f-c75514f2f2e0',
             'c2226562-c92d-495e-aa7c-2dfe16f89378',
             '25044beb-bfcb-448f-a67e-82b4cc48db1c',
             '39a882bf-3ec7-4e06-afdf-bf4e769fe303',
             '3886ce0d-71f4-4504-a331-22c34f9ba971',
             'bac5a5d0-edd5-426b-b886-a492bf5f0b89',
             'e5f2db2a-9740-44c8-852d-52476cbf4c90',
             '0776543a-44fe-4bd0-93b1-75defcc483ed',
             '86a5ec1d-ce49-4db1-856d-63dd87f387b4',
             'da719a14-0e48-4f76-8176-93eb06d0d83f',
             'e99f767f-48f7-408a-b8b1-579c0a79a9b6',
             'bdad9834-d0da-46e2-ab7f-4c53946ed22c',
             '147ecf7e-a77a-4799-9003-0f052b0219c1',
             '52a1a303-56a1-4870-aea5-a72373a6a635',
             '4db6b4d7-b8eb-4577-a846-538fdf342491',
             '32aff373-4ca0-492a-9826-463465ed5272',
             'e3736060-ffc8-4f65-b8e0-254be8f53f76',
             'd7ef3a85-c5c7-4966-8c9a-0efc82b463a4',
             '21fe6881-3681-4cb5-be05-7906d4601f64',
             '4065d276-9d4e-45ad-9095-768465a18870',
             '00effeee-440c-4148-9aa1-36fae34dc1f7',
             '76ab2f7a-aa0c-46cf-bde1-194281378d19',
             'ed77ba83-520d-4691-b6d7-f984d8230cee',
             '2f15eee8-c260-432a-b5ef-dcf3a23bd9ad'], dtype='<U36')

This is an xarray.Dataset. At this point Bluesky and Data Broker have served their purpose and handed us a useful, general-purpose scientific Python data structure with our data in it.

What can you do with an xarray.Dataset?

We can easily generate scatter plots of one dimension vs another.

[20]:
ds.plot.scatter(x="time", y="ns_gap")
[20]:
<matplotlib.collections.PathCollection at 0x7fd4e01b3160>
_images/Access_Saved_Data_35_1.png

We can pull out specific columns. (Each column in an xarray.Dataset is called an xarray.DataArray.)

[21]:
image = ds["ns_image"]
image
[21]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'ns_image'
  • time: 25
  • x: 128
  • y: 128
  • 1.258 1.959 1.752 0.8621 0.1039 ... 0.1039 0.8621 1.752 1.959 1.258
    array([[[1.25826974, 1.95925757, 1.75195039, ..., 1.75195039,
             1.95925757, 1.25826974],
            [1.95925757, 1.74115687, 0.83008752, ..., 0.83008752,
             1.74115687, 1.95925757],
            [1.75195039, 0.83008752, 0.07707731, ..., 0.07707731,
             0.83008752, 1.75195039],
            ...,
            [1.75195039, 0.83008752, 0.07707731, ..., 0.07707731,
             0.83008752, 1.75195039],
            [1.95925757, 1.74115687, 0.83008752, ..., 0.83008752,
             1.74115687, 1.95925757],
            [1.25826974, 1.95925757, 1.75195039, ..., 1.75195039,
             1.95925757, 1.25826974]],
    
           [[0.29249125, 1.23494802, 1.9468762 , ..., 1.9468762 ,
             1.23494802, 0.29249125],
            [1.23494802, 1.9519689 , 1.76847642, ..., 1.76847642,
             1.9519689 , 1.23494802],
            [1.9468762 , 1.76847642, 0.87194573, ..., 0.87194573,
             1.76847642, 1.9468762 ],
            ...,
            [1.9468762 , 1.76847642, 0.87194573, ..., 0.87194573,
             1.76847642, 1.9468762 ],
            [1.23494802, 1.9519689 , 1.76847642, ..., 1.76847642,
             1.9519689 , 1.23494802],
            [0.29249125, 1.23494802, 1.9468762 , ..., 1.9468762 ,
             1.23494802, 0.29249125]],
    
           [[0.0342215 , 0.27569044, 1.19492582, ..., 1.19492582,
             0.27569044, 0.0342215 ],
            [0.27569044, 1.21081203, 1.9383889 , ..., 1.9383889 ,
             1.21081203, 0.27569044],
            [1.19492582, 1.9383889 , 1.79486842, ..., 1.79486842,
             1.9383889 , 1.19492582],
            ...,
            [1.19492582, 1.9383889 , 1.79486842, ..., 1.79486842,
             1.9383889 , 1.19492582],
            [0.27569044, 1.21081203, 1.9383889 , ..., 1.9383889 ,
             1.21081203, 0.27569044],
            [0.0342215 , 0.27569044, 1.19492582, ..., 1.19492582,
             0.27569044, 0.0342215 ]],
    
           ...,
    
           [[1.70750875, 0.76505198, 0.0531238 , ..., 0.0531238 ,
             0.76505198, 1.70750875],
            [0.76505198, 0.0480311 , 0.23152358, ..., 0.23152358,
             0.0480311 , 0.76505198],
            [0.0531238 , 0.23152358, 1.12805427, ..., 1.12805427,
             0.23152358, 0.0531238 ],
            ...,
            [0.0531238 , 0.23152358, 1.12805427, ..., 1.12805427,
             0.23152358, 0.0531238 ],
            [0.76505198, 0.0480311 , 0.23152358, ..., 0.23152358,
             0.0480311 , 0.76505198],
            [1.70750875, 0.76505198, 0.0531238 , ..., 0.0531238 ,
             0.76505198, 1.70750875]],
    
           [[1.9657785 , 1.72430956, 0.80507418, ..., 0.80507418,
             1.72430956, 1.9657785 ],
            [1.72430956, 0.78918797, 0.0616111 , ..., 0.0616111 ,
             0.78918797, 1.72430956],
            [0.80507418, 0.0616111 , 0.20513158, ..., 0.20513158,
             0.0616111 , 0.80507418],
            ...,
            [0.80507418, 0.0616111 , 0.20513158, ..., 0.20513158,
             0.0616111 , 0.80507418],
            [1.72430956, 0.78918797, 0.0616111 , ..., 0.0616111 ,
             0.78918797, 1.72430956],
            [1.9657785 , 1.72430956, 0.80507418, ..., 0.80507418,
             1.72430956, 1.9657785 ]],
    
           [[1.25826974, 1.95925757, 1.75195039, ..., 1.75195039,
             1.95925757, 1.25826974],
            [1.95925757, 1.74115687, 0.83008752, ..., 0.83008752,
             1.74115687, 1.95925757],
            [1.75195039, 0.83008752, 0.07707731, ..., 0.07707731,
             0.83008752, 1.75195039],
            ...,
            [1.75195039, 0.83008752, 0.07707731, ..., 0.07707731,
             0.83008752, 1.75195039],
            [1.95925757, 1.74115687, 0.83008752, ..., 0.83008752,
             1.74115687, 1.95925757],
            [1.25826974, 1.95925757, 1.75195039, ..., 1.75195039,
             1.95925757, 1.25826974]]])
    • time
      (time)
      float64
      1.581e+09 1.581e+09 ... 1.581e+09
      array([1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09,
             1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09,
             1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09,
             1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09,
             1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09, 1.580688e+09])

Inside this xarray.DataArray is a plain old numpy array.

[22]:
type(image.values)
[22]:
numpy.ndarray

The extra context provided by xarray is very useful. Notice that the dimensions have names, so we can perform aggregations over named axes without remembering the order of the dimensions.

[23]:
image.sum("time")  # With just plain numpy, this would be image.sum(0) and we'd have to keep track ourselves that 0 = "time".
[23]:
Show/Hide data repr Show/Hide attributes
xarray.DataArray
'ns_image'
  • x: 128
  • y: 128
  • 25.26 25.96 25.75 24.86 24.1 24.12 ... 24.1 24.86 25.75 25.96 25.26
    array([[25.25826974, 25.95925757, 25.75195039, ..., 25.75195039,
            25.95925757, 25.25826974],
           [25.95925757, 25.74115687, 24.83008752, ..., 24.83008752,
            25.74115687, 25.95925757],
           [25.75195039, 24.83008752, 24.07707731, ..., 24.07707731,
            24.83008752, 25.75195039],
           ...,
           [25.75195039, 24.83008752, 24.07707731, ..., 24.07707731,
            24.83008752, 25.75195039],
           [25.95925757, 25.74115687, 24.83008752, ..., 24.83008752,
            25.74115687, 25.95925757],
           [25.25826974, 25.95925757, 25.75195039, ..., 25.75195039,
            25.95925757, 25.25826974]])

    The plot method on xarray.DataArray often just “does the right thing” based on the dimensionality of the data. It even labels our axes for us!

    [24]:
    
    image.sum("time").plot()
    
    [24]:
    
    <matplotlib.collections.QuadMesh at 0x7fd4e00a2080>
    
    _images/Access_Saved_Data_43_1.png

    For a quick overview of xarray see the xarray documentation. Also see these tutorials in particular for interesting usages of xarray:

    Exercises

    1. Coming back to our run

    [25]:
    
    run
    
    6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced:
      args:
        entry: !!python/object:databroker.core.Entry
          args: []
          cls: databroker.core.Entry
          kwargs:
            name: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
            description: {}
            driver: databroker.core.BlueskyRunFromGenerator
            direct_access: forbid
            args:
              gen_args: !!python/tuple
              - /home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/bluesky_tutorial_utils/example_data/6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced.jsonl
              gen_func: &id003 !!python/name:databroker._drivers.jsonl.gen ''
              gen_kwargs: {}
              get_filler: &id004 !!python/object/apply:functools.partial
                args:
                - &id001 !!python/name:event_model.Filler ''
                state: !!python/tuple
                - *id001
                - !!python/tuple []
                - handler_registry: !!python/object:event_model.HandlerRegistryView
                    _handler_registry:
                      NPY_SEQ: !!python/name:ophyd.sim.NumpySeqHandler ''
                      newton: !!python/name:bluesky_tutorial_utils._newton.NewtonHandler ''
                      npy: !!python/name:bluesky_tutorial_utils._old_handlers.NpyHandler ''
                      npy_FRAMEWISE: !!python/name:bluesky_tutorial_utils._old_handlers.NpyFrameWise ''
                  inplace: false
                  root_map: {}
                - null
              transforms:
                descriptor: &id002 !!python/name:databroker.core._no_op ''
                resource: *id002
                start: *id002
                stop: *id002
            cache: null
            parameters: []
            metadata:
              start:
                detectors:
                - ns
                hints:
                  dimensions:
                  - - - ns_gap
                    - primary
                motors:
                - ns_gap
                num_intervals: 24
                num_points: 25
                operator: Dmitri
                plan_args:
                  args:
                  - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
                  - 0
                  - 4
                  detectors:
                  - NewtonSimulator(prefix='', name='ns', read_attrs=['gap', 'image'],
                    configuration_attrs=[])
                  num: 25
                  per_step: None
                plan_name: scan
                plan_pattern: inner_product
                plan_pattern_args:
                  args:
                  - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
                  - 0
                  - 4
                  num: 25
                plan_pattern_module: bluesky.plan_patterns
                plan_type: generator
                scan_id: 17
                time: 1580688000.0
                uid: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
                versions:
                  bluesky: 1.6.1
                  ophyd: 1.5.1b1
              stop:
                exit_status: success
                num_events:
                  baseline: 2
                  primary: 25
                reason: ''
                run_start: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
                time: 1580688000.0707304
                uid: 44c278f1-f318-44c4-8817-6ef77846b03e
            catalog_dir: null
            getenv: true
            getshell: true
            catalog:
              cls: databroker._drivers.jsonl.BlueskyJSONLCatalog
              args:
              - /home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/bluesky_tutorial_utils/example_data/*.jsonl
              kwargs: {}
        gen_args: !!python/tuple
        - /home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/bluesky_tutorial_utils/example_data/6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced.jsonl
        gen_func: *id003
        gen_kwargs: {}
        get_filler: *id004
        transforms:
          descriptor: *id002
          resource: *id002
          start: *id002
          stop: *id002
      description: ''
      driver: databroker.core.BlueskyRunFromGenerator
      metadata:
        catalog_dir: null
        start: !!python/object/new:databroker.core.Start
          dictitems:
            detectors: &id005
            - ns
            hints: &id006
              dimensions:
              - - - ns_gap
                - primary
            motors: &id007
            - ns_gap
            num_intervals: 24
            num_points: 25
            operator: Dmitri
            plan_args: &id008
              args:
              - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
              - 0
              - 4
              detectors:
              - NewtonSimulator(prefix='', name='ns', read_attrs=['gap', 'image'], configuration_attrs=[])
              num: 25
              per_step: None
            plan_name: scan
            plan_pattern: inner_product
            plan_pattern_args: &id009
              args:
              - Signal(name='ns_gap', parent='ns', value=4.0, timestamp=1589570582.2353418)
              - 0
              - 4
              num: 25
            plan_pattern_module: bluesky.plan_patterns
            plan_type: generator
            scan_id: 17
            time: 1580688000.0
            uid: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
            versions: &id010
              bluesky: 1.6.1
              ophyd: 1.5.1b1
          state:
            detectors: *id005
            hints: *id006
            motors: *id007
            num_intervals: 24
            num_points: 25
            operator: Dmitri
            plan_args: *id008
            plan_name: scan
            plan_pattern: inner_product
            plan_pattern_args: *id009
            plan_pattern_module: bluesky.plan_patterns
            plan_type: generator
            scan_id: 17
            time: 1580688000.0
            uid: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
            versions: *id010
        stop: !!python/object/new:databroker.core.Stop
          dictitems:
            exit_status: success
            num_events: &id011
              baseline: 2
              primary: 25
            reason: ''
            run_start: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
            time: 1580688000.0707304
            uid: 44c278f1-f318-44c4-8817-6ef77846b03e
          state:
            exit_status: success
            num_events: *id011
            reason: ''
            run_start: 6f3ee9a1-ff4b-47ba-a439-9027cd9e6ced
            time: 1580688000.0707304
            uid: 44c278f1-f318-44c4-8817-6ef77846b03e
    
    

    read the “baseline” stream. The baseline stream conventionally includes readings taken just before and after a scan to record all potentially-relevant positions and temperatures and note if they have drifted.

    [26]:
    
    # Try your solution here.
    
    [27]:
    
    %load solutions/access_baseline_data.py