Documents#
A primary design goal of bluesky is to enable better research by recording rich metadata alongside measured data for use in later analysis. Documents are how we do this.
A document is our term for a Python dictionary with a schema — that is, organized in a formally specified way — created by the RunEngine during plan execution. All of the metadata and data generated by executing the plan is organized into documents.
A later section describes how outside functions can “subscribe” to a stream of these documents, visualizing, processing, or saving them. This section provides an outline of documents themselves, aiming to give a sense of the structure and familiarity with useful components.
Overview of a “Run”#
Each document belongs to a run — loosely speaking, a dataset. Executing any
of the built-in pre-assembled plans, like
scan()
and count()
, creates one run.
Note
Fundamentally, the scope of a run is intentionally vague and flexible. One plan might generate many runs or one long run. It just depends on how you want to organize your data, both at collection time and analysis time.
The tutorial’s Capture Data section explores this.
The documents in each run are:
A Run Start document, containing all of the metadata known at the start of the run. Highlights:
time — the start time
plan_name — e.g.,
'scan'
or'count'
uid — unique ID that identifies this run
scan_id — human-friendly integer scan ID (not necessarily unique)
any other metadata captured at execution time from the plan or the user
Event documents, containing the actual measurements. These are your data.
time — a timestamp for this group of readings
seq_num — sequence number, counting up from 1
data — a dictionary of readings like
{'temperature': 5.0, 'position': 3.0}
timestamps — a dictionary of individual timestamps for each reading, from the hardware
Event Descriptor documents provide a schema for the data in the Event documents. They list all of the keys in the Event’s data and give useful information about them, such as units and precision. They also contain information about the configuration of the hardware.
A Run Stop document, containing metadata known only at the end of the run. Highlights:
time — the time when the run was completed
exit_status — “success”, “abort”, or “fail”
Every document has a time
(its creation time) and a separate uid
to
identify it. The Event documents also have a descriptor
field linking them
to the Event Descriptor with their metadata. And the Event Descriptor and
Run Stop documents have a run_start
field linking them to their Run
Start. Thus, all the documents in a run are linked back to the Run Start.
Documents in Detail#
Run Start#
Again, a ‘start’ document marks the beginning of the run. It comprises everything we know before we start taking data, including all metadata provided by the user and the plan. (More on this in the next section.)
All built-in plans provide some useful metadata like the names of the detector(s) and motor(s) used. (User-defined plans may also do this; see this section of the tutorial.)
The command:
from bluesky.plans import scan
from ophyd.sim import det, motor # simulated detector, motor
# Scan 'motor' from -3 to 3 in 10 steps, taking readings from 'det'.
RE(scan([det], motor, -3, 3, 16), purpose='calibration',
sample='kryptonite')
generates a ‘start’ document like this:
# 'start' document
{'purpose': 'calibration',
'sample': 'kryptonite',
'detectors': ['det'],
'motors': ['motor'],
'plan_name': 'scan',
'plan_type': 'generator',
'plan_args': {'detectors': '[det]',
'motor': 'Mover(...)',
'num': '16',
'start': '-3',
'stop': '3'},
'scan_id': 282,
'time': 1442521005.6099606,
'uid': '<randomly-generated unique ID>',
}
Note
Time is given in UNIX time (seconds since 1970). Software for looking at the data would, of course, translate that into a more human-readable form.
Event#
An ‘event’ records one or more measurements with an associated time.
# 'event' document
{'data':
{'temperature': 5.0,
'x_setpoint': 3.0,
'x_readback': 3.05},
'timestamps':
{'temperature': 1442521007.9258342,
'x_setpoint': 1442521007.5029348,
'x_readback': 1442521007.5029348},
'time': 1442521007.3438923,
'seq_num': 1
'uid': '<randomly-generated unique ID>',
'descriptor': '<reference to a descriptor document>'}
From a data analysis perspective, these readings were simultaneous, but in actuality the occurred at separate times. The separate times of the individual readings are not thrown away (they are recorded in ‘timestamps’) but the overall event ‘time’ is often more useful.
Run Stop#
A ‘stop’ document marks the end of the run. It contains metadata that is not known until the run completes.
The most commonly useful fields here are ‘time’ and ‘exit_status’.
# 'stop' document
{'exit_status': 'success', # or 'fail' or 'abort'
'reason': '', # The RunEngine can provide reason for failure here.
'time': 1442521012.1021606,
'uid': '<randomly-generated unique ID>',
'start': '<reference to the start document>',
'num_events': {'primary': 16}
}
Event Descriptor#
As stated above, a ‘descriptor’ document provides a schema for the data in the Event documents. It provides useful information about each key in the data and about the configuration of the hardware. The layout of a descriptor is detailed and takes some time to cover, so we defer it to a later section.