Databroker

Build Status Test Coverage Latest PyPI version BSD 3-Clause License

Databroker is a data access tool built around the Bluesky Data Model. The data it manages may be from ingested files, captured results of a Python-based data analysis, or experimental data acquired using the Bluesky Run Engine.

  • Provide a consistent programmatic interface to data, regardless of storage details like file format or storage medium.

  • Provide metadata and data in a coherent bundle, using standard widely-used Python and SciPy data structures.

  • Support fast, flexible search over metadata.

  • Enable software tools to operate seamlessly on a mixture of live-streaming data from the Bluesky Run Engine and saved data from Databroker.

Databroker is developed in concert with Suitcase. Suitcase does data writing, and databroker does the reading. Databroker builds on Intake, a generic data access tool (outside of the Bluesky Project).

PyPI

pip install databroker

Conda

pip install -c nsls2forge databroker

Source code

https://github.com/bluesky/databroker

Documentation

https://blueskyproject.io/databroker

The bundle of metadata and data looks like this, for example.

>>> run
BlueskyRun
  uid='4a794c63-8223-4893-895e-d16e763188a8'
  exit_status='success'
  2020-03-07 09:17:40.436 -- 2020-03-07 09:28:53.173
  Streams:
    * primary
    * baseline

Additional user metadata beyond what is shown is stored in run.metadata. The bundle contains some number of logical tables of data (“streams”). They can be accessed by name and read into a standard data structure from xarray.

>>> run.primary.read()
<xarray.Dataset>
Dimensions:                   (time: 411)
Coordinates:
  * time                      (time) float64 1.584e+09 1.584e+09 ... 1.584e+09
Data variables:
    I0                        (time) float64 13.07 13.01 12.95 ... 9.862 9.845
    It                        (time) float64 11.52 11.47 11.44 ... 4.971 4.968
    Ir                        (time) float64 10.96 10.92 10.88 ... 4.761 4.763
    dwti_dwell_time           (time) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    dwti_dwell_time_setpoint  (time) float64 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    dcm_energy                (time) float64 1.697e+04 1.698e+04 ... 1.791e+04
    dcm_energy_setpoint       (time) float64 1.697e+04 1.698e+04 ... 1.791e+04

Common search queries can be done with a high-level Python interface.

>>> from databroker.queries import TimeRange
>>> catalog.search(TimeRange(since="2020"))

Custom queries can be done with the MongoDB query language.

>>> query = {
...    "motors": {"$in": ["x", "y"]},  # scanning either x or y
...    "temperature" {"$lt": 300},  # temperature less than 300
...    "sample.element": "Ni",
... }
>>> catalog.search(query)

See the tutorials for more.

How the documentation is structured

Tutorials

Tutorials for installation and usage. New users start here.

How-to Guides

Practical step-by-step guides for the more experienced user.

Explanations

Explanation of how the library works and why it works that way.

Reference

Technically detailed API documenation.

About the documentation

Why is the documentation structured this way?