Data Model#

A primary design goal of bluesky is to enable better research by recording rich metadata alongside measured data for use in later analysis. Documents are how we do this.

A document is our term for a Python dictionary with a schema. The bluesky RunEngine emits documents during plan execution. All of the metadata and data generated by executing the plan is organized into documents. Bluesky’s document-based data model supports complex, asynchronous data collection and enables sophisticated live, prompt, streaming, and post-facto data analysis.

The bluesky documentation describes how outside functions can “subscribe” to a stream of these documents, visualizing, processing, or saving them. This section provides an outline of documents themselves, aiming to give a sense of the structure and familiarity with useful components.

Overview#

The data model is composed of eight types of Documents, which in Python are represented as dictionaries but could be represented as nested mappings (e.g. JSON) in any language. Each document class has a defined, but flexible, schema.

  • Run Start Document — Everything we know about an experiment or simulation before any data acquisition begins: the who / why / what and metadata such as sample information.

  • Event — A “row” of measurements with associated timestamps.

  • Event Descriptor — Metadata about a series of Events. Envision richly-detail column headings in a table, encompassing physical units, hardware configuration information, etc.

  • Resource — A pointer to an external file (or resource in general) that has predictable and fixed dimensionality.

  • Datum — A pointer to a specific slice of data within a Resource.

  • Stream Resource (Experimental) — A pointer to an external resource that contains a stream of data without restriction on the length of the stream. This resources with know ‘column’ dimension without a known number of rows (e.g., time series, point detectors).

  • Stream Datum (Experimental) — A pointer to a specific slice of data within a Stream Resource.

  • Run Stop Document — Everything that we can only know at the very end, such as the time it ended and the exit status (succeeded, aborted, failed due to error).

Every document contains a unique identifier. The Event documents also have a descriptor field linking them to the Event Descriptor with their metadata. And the Event Descriptor and Run Stop documents have a run_start field linking them to their Run Start. Thus, all the documents in a run are linked back to the Run Start.

Event documents may contain the literal values or pointers to values that are stored in some external file or networked resource, yet to be loaded. The Resource and Datum document types manage references to externally-stored data.

Example Runs#

../_images/document-generation-timeline.svg

Finally, Event and Datum can be represented in “paged” form, where multiple rows are contained in one structure for efficient transport and vectorized computation. The representations contain equivalent information: an EventPage can always be transformed, without loss, into an Event and vice versa.

The scope of a “run” is left to the instrument team or individual scientist. It is quite generic: a set of Documents generated by following a given sequence of instructions. This might encompass a single short measurement (say, “a count”) or a multi-step procedure such as a raster scan. A run should represent a set of individual measurements that will be collected or processed together.

Document Types in Detail#

For each type, we will show:

  • the minimal nontrivial example that satisfies the schema

  • one or more “typical” examples generated by bluesky’s RunEngine during an experiment

  • the formal schema, encoded using JSON schema

Run Start Document#

Again, a ‘run start’ document marks the beginning of the run. It comprises everything we know before we start taking data, including all metadata provided by the user and the plan.

Minimal nontrivial valid example:

Documentation note: It might seem more natural to use json code-blocks here than python ones, but using python allows us to include comments in line.

# 'run start' document
{'time': 1550069716.5092213,  # UNIX epoch (seconds since 1 Jan 1970)
 'uid': '10bf6945-4afd-43ca-af36-6ad8f3540bcd'}  # globally unique ID

A typical example

# 'run start' document
{'data_session': 'vist54321',
 'data_groups': ['bl42', 'proposal12345'],
 'detectors': ['random_walk:x'],
 'hints': {'dimensions': [(['random_walk:dt'], 'primary')]},
 'motors': ('random_walk:dt',),
 'num_intervals': 2,
 'num_points': 3,
 'plan_args': {'args': ["EpicsSignal(read_pv='random_walk:dt', " "name='random_walk:dt', " 'value=1.0, ' 'timestamp=1550070001.828528, ' 'auto_monitor=False, ' 'string=False, ' "write_pv='random_walk:dt', " 'limits=False, ' 'put_complete=False)', -1, 1],
               'detectors': ["EpicsSignal(read_pv='random_walk:x', " "name='random_walk:x', " 'value=1.61472277847348, ' 'timestamp=1550070000.807677, ' 'auto_monitor=False, ' 'string=False, ' "write_pv='random_walk:x', " 'limits=False, ' 'put_complete=False)'],
               'num': 3,
               'per_step': 'None'},
 'plan_name': 'scan',
 'plan_pattern': 'inner_product',
 'plan_pattern_args': {'args': ["EpicsSignal(read_pv='random_walk:dt', " "name='random_walk:dt', " 'value=1.0, ' 'timestamp=1550070001.828528, ' 'auto_monitor=False, ' 'string=False, ' "write_pv='random_walk:dt', " 'limits=False, ' 'put_complete=False)', -1, 1],
                       'num': 3},
 'plan_pattern_module': 'bluesky.plan_patterns',
 'plan_type': 'generator',
 'scan_id': 2,
 'time': 1550070004.9850419,
 'uid': 'ba1f9076-7925-4af8-916e-0e1eaa1b3c47'}

Note

Time is given in UNIX time (seconds since 1970). Software for looking at the data would, of course, translate that into a more human-readable form.

Projections (Experimental)#

The Run Start document may include a projections field. It is intended that a projection is an aid to interacting with external systems using standardized vocabularies. Projections might be used in a variety of use cases such as providing run data to analysis tools or suitcases. Each projection represents multiple ways to represent data from the run. Each field in the projection dictionary is an unique and externally-identifiable string and each value is an instruction for accessing data from the run. This feature is experimetal and subject to backward-incompatible changes in future releases.

The run start document formal schema:

{
    "title": "run_start",
    "description": "Document created at the start of run. Provides a seach target and later documents link to it",
    "type": "object",
    "$defs": {
        "Calculation": {
            "title": "Calculation",
            "type": "object",
            "properties": {
                "args": {
                    "title": "Args",
                    "type": "array",
                    "items": {}
                },
                "callable": {
                    "title": "Callable",
                    "description": "callable function to perform calculation",
                    "type": "string"
                },
                "kwargs": {
                    "title": "Kwargs",
                    "description": "kwargs for calcalation callable",
                    "type": "object"
                }
            },
            "required": [
                "callable"
            ]
        },
        "DataType": {
            "title": "DataType",
            "patternProperties": {
                "^([^./]+)$": {
                    "$ref": "#/$defs/DataType"
                }
            },
            "additionalProperties": false
        },
        "Hints": {
            "title": "Hints",
            "description": "Start-level hints",
            "type": "object",
            "properties": {
                "dimensions": {
                    "title": "Dimensions",
                    "description": "The independent axes of the experiment. Ordered slow to fast",
                    "type": "array",
                    "items": {
                        "items": {
                            "anyOf": [
                                {
                                    "items": {
                                        "type": "string"
                                    },
                                    "type": "array"
                                },
                                {
                                    "type": "string"
                                }
                            ]
                        },
                        "type": "array"
                    }
                }
            }
        },
        "Projection": {
            "title": "Projection",
            "description": "Where to get the data from",
            "type": "object",
            "properties": {
                "calculation": {
                    "title": "calculation properties",
                    "description": "required fields if type is calculated",
                    "$ref": "#/$defs/Calculation"
                },
                "config_device": {
                    "title": "Config Device",
                    "type": "string"
                },
                "config_index": {
                    "title": "Config Index",
                    "type": "integer"
                },
                "field": {
                    "title": "Field",
                    "type": "string"
                },
                "location": {
                    "title": "Location",
                    "description": "start comes from metadata fields in the start document, event comes from event, configuration comes from configuration fields in the event_descriptor document",
                    "type": "string",
                    "enum": [
                        "start",
                        "event",
                        "configuration"
                    ]
                },
                "stream": {
                    "title": "Stream",
                    "type": "string"
                },
                "type": {
                    "title": "Type",
                    "description": "linked: a value linked from the data set, calculated: a value that requires calculation, static:  a value defined here in the projection ",
                    "type": "string",
                    "enum": [
                        "linked",
                        "calculated",
                        "static"
                    ]
                },
                "value": {
                    "title": "Value",
                    "description": "value explicitely defined in the projection when type==static."
                }
            },
            "allOf": [
                {
                    "if": {
                        "allOf": [
                            {
                                "properties": {
                                    "location": {
                                        "enum": [
                                            "configuration"
                                        ]
                                    }
                                }
                            },
                            {
                                "properties": {
                                    "type": {
                                        "enum": [
                                            "linked"
                                        ]
                                    }
                                }
                            }
                        ]
                    },
                    "then": {
                        "required": [
                            "type",
                            "location",
                            "config_index",
                            "config_device",
                            "field",
                            "stream"
                        ]
                    }
                },
                {
                    "if": {
                        "allOf": [
                            {
                                "properties": {
                                    "location": {
                                        "enum": [
                                            "event"
                                        ]
                                    }
                                }
                            },
                            {
                                "properties": {
                                    "type": {
                                        "enum": [
                                            "linked"
                                        ]
                                    }
                                }
                            }
                        ]
                    },
                    "then": {
                        "required": [
                            "type",
                            "location",
                            "field",
                            "stream"
                        ]
                    }
                },
                {
                    "if": {
                        "allOf": [
                            {
                                "properties": {
                                    "location": {
                                        "enum": [
                                            "event"
                                        ]
                                    }
                                }
                            },
                            {
                                "properties": {
                                    "type": {
                                        "enum": [
                                            "calculated"
                                        ]
                                    }
                                }
                            }
                        ]
                    },
                    "then": {
                        "required": [
                            "type",
                            "field",
                            "stream",
                            "calculation"
                        ]
                    }
                },
                {
                    "if": {
                        "properties": {
                            "type": {
                                "enum": [
                                    "static"
                                ]
                            }
                        }
                    },
                    "then": {
                        "required": [
                            "type",
                            "value"
                        ]
                    }
                }
            ]
        },
        "Projections": {
            "title": "Projections",
            "description": "Describe how to interperet this run as the given projection",
            "type": "object",
            "properties": {
                "configuration": {
                    "title": "Configuration",
                    "description": "Static information about projection",
                    "type": "object"
                },
                "name": {
                    "title": "Name",
                    "description": "The name of the projection",
                    "type": "string"
                },
                "projection": {
                    "title": "Projection",
                    "description": "",
                    "type": "object",
                    "additionalProperties": {
                        "$ref": "#/$defs/Projection"
                    }
                },
                "version": {
                    "title": "Version",
                    "description": "The version of the projection spec. Can specify the version of an external specification.",
                    "type": "string"
                }
            },
            "required": [
                "configuration",
                "projection",
                "version"
            ]
        }
    },
    "properties": {
        "data_groups": {
            "title": "Data Groups",
            "description": "An optional list of data access groups that have meaning to some external system. Examples might include facility, beamline, end stations, proposal, safety form.",
            "type": "array",
            "items": {
                "type": "string"
            }
        },
        "data_session": {
            "title": "Data Session",
            "description": "An optional field for grouping runs. The meaning is not mandated, but this is a data management grouping and not a scientific grouping. It is intended to group runs in a visit or set of trials.",
            "type": "string"
        },
        "data_type": {
            "description": "",
            "$ref": "#/$defs/DataType"
        },
        "group": {
            "title": "Group",
            "description": "Unix group to associate this data with",
            "type": "string"
        },
        "hints": {
            "$ref": "#/$defs/Hints",
            "additionalProperties": false,
            "patternProperties": {
                "^([^.]+)$": {
                    "$ref": "#/$defs/DataType"
                }
            }
        },
        "owner": {
            "title": "Owner",
            "description": "Unix owner to associate this data with",
            "type": "string"
        },
        "project": {
            "title": "Project",
            "description": "Name of project that this run is part of",
            "type": "string"
        },
        "projections": {
            "title": "Projections",
            "description": "",
            "type": "array",
            "items": {
                "$ref": "#/$defs/Projections"
            }
        },
        "sample": {
            "title": "Sample",
            "description": "Information about the sample, may be a UID to another collection",
            "anyOf": [
                {
                    "type": "object"
                },
                {
                    "type": "string"
                }
            ]
        },
        "scan_id": {
            "title": "Scan Id",
            "description": "Scan ID number, not globally unique",
            "type": "integer"
        },
        "time": {
            "title": "Time",
            "description": "Time the run started.  Unix epoch time",
            "type": "number"
        },
        "uid": {
            "title": "Uid",
            "description": "Globally unique ID for this run",
            "type": "string"
        }
    },
    "required": [
        "time",
        "uid"
    ],
    "patternProperties": {
        "^([^./]+)$": {
            "$ref": "#/$defs/DataType"
        }
    },
    "additionalProperties": false
}

Event Descriptor#

As stated above, an ‘event descriptor’ document provides a schema for the data in the Event documents. It provides useful information about each key in the data and about the configuration of the hardware. The layout of a descriptor is detailed and takes some time to cover, so we defer it to a another page.

Minimal nontrivial valid example:

# 'event descriptor' document
{'configuration': {},
 'data_keys': {'camera_image': {'dtype': 'number',
                                'shape': [512, 512],
                                'source': 'PV:...'}},
 'hints': {},
 'name': 'primary',
 'object_keys': {},
 'run_start': '10bf6945-4afd-43ca-af36-6ad8f3540bcd',  # foreign key
 'time': 1550070954.276659,
 'uid': 'd08d2ada-5f4e-495b-8e73-ff36186e7183'}

Typical example:

# 'event descriptor' document
{'configuration': {'random_walk:dt': {'data': {'random_walk:dt': -1.0},
                                      'data_keys': {'random_walk:dt': {'dtype': 'number',
                                                                       'lower_ctrl_limit': 0.0,
                                                                       'precision': 0,
                                                                       'shape': [],
                                                                       'source': 'PV:random_walk:dt',
                                                                       'units': '',
                                                                       'upper_ctrl_limit': 0.0}},
                                      'timestamps': {'random_walk:dt': 1550070004.994477}},
                   'random_walk:x': {'data': {'random_walk:x': 1.9221013521832928},
                                     'data_keys': {'random_walk:x': {'dtype': 'number',
                                                                     'lower_ctrl_limit': 0.0,
                                                                     'precision': 0,
                                                                     'shape': [],
                                                                     'source': 'PV:random_walk:x',
                                                                     'units': '',
                                                                     'upper_ctrl_limit': 0.0}},
                                     'timestamps': {'random_walk:x': 1550070004.812525}}},
 'data_keys': {'random_walk:dt': {'dtype': 'number',
                                  'lower_ctrl_limit': 0.0,
                                  'object_name': 'random_walk:dt',
                                  'precision': 0,
                                  'shape': [],
                                  'source': 'PV:random_walk:dt',
                                  'units': '',
                                  'upper_ctrl_limit': 0.0},
               'random_walk:x': {'dtype': 'number',
                                 'lower_ctrl_limit': 0.0,
                                 'object_name': 'random_walk:x',
                                 'precision': 0,
                                 'shape': [],
                                 'source': 'PV:random_walk:x',
                                 'units': '',
                                 'upper_ctrl_limit': 0.0}},
 'hints': {'random_walk:dt': {'fields': ['random_walk:dt']},
           'random_walk:x': {'fields': ['random_walk:x']}},
 'name': 'primary',
 'object_keys': {'random_walk:dt': ['random_walk:dt'],
                 'random_walk:x': ['random_walk:x']},
 'run_start': 'ba1f9076-7925-4af8-916e-0e1eaa1b3c47',
 'time': 1550070005.0109222,
 'uid': '0ad55d9e-1b31-4af2-865c-7ab7c8171303'}

Formal schema:

{
    "title": "event_descriptor",
    "description": "Document to describe the data captured in the associated event documents",
    "type": "object",
    "$defs": {
        "Configuration": {
            "title": "Configuration",
            "type": "object",
            "properties": {
                "data": {
                    "title": "Data",
                    "description": "The actual measurement data",
                    "type": "object"
                },
                "data_keys": {
                    "title": "Data Keys",
                    "description": "This describes the data stored alongside it in this configuration object.",
                    "type": "object",
                    "additionalProperties": {
                        "$ref": "#/$defs/DataKey"
                    }
                },
                "timestamps": {
                    "title": "Timestamps",
                    "description": "The timestamps of the individual measurement data",
                    "type": "object"
                }
            }
        },
        "DataKey": {
            "title": "DataKey",
            "description": "Describes the objects in the data property of Event documents",
            "type": "object",
            "properties": {
                "choices": {
                    "title": "Choices",
                    "description": "Choices of enum value.",
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                },
                "dims": {
                    "title": "Dims",
                    "description": "The names for dimensions of the data. Null or empty list if scalar data",
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                },
                "dtype": {
                    "title": "Dtype",
                    "description": "The type of the data in the event, given as a broad JSON schema type.",
                    "type": "string",
                    "enum": [
                        "string",
                        "number",
                        "array",
                        "boolean",
                        "integer"
                    ]
                },
                "dtype_numpy": {
                    "title": "Dtype Numpy",
                    "description": "The type of the data in the event, given as a numpy dtype string (or, for structured dtypes, array).",
                    "anyOf": [
                        {
                            "description": "A numpy dtype e.g `<U9`, `<f16`",
                            "pattern": "[|<>][tbiufcmMOSUV][0-9]+",
                            "type": "string"
                        },
                        {
                            "items": {
                                "maxItems": 2,
                                "minItems": 2,
                                "prefixItems": [
                                    {
                                        "type": "string"
                                    },
                                    {
                                        "description": "A numpy dtype e.g `<U9`, `<f16`",
                                        "pattern": "[|<>][tbiufcmMOSUV][0-9]+",
                                        "type": "string"
                                    }
                                ],
                                "type": "array"
                            },
                            "type": "array"
                        }
                    ]
                },
                "external": {
                    "title": "External",
                    "description": "Where the data is stored if it is stored external to the events",
                    "type": "string",
                    "pattern": "^[A-Z]+:?"
                },
                "limits": {
                    "description": "Epics limits.",
                    "$ref": "#/$defs/Limits"
                },
                "object_name": {
                    "title": "Object Name",
                    "description": "The name of the object this key was pulled from.",
                    "type": "string"
                },
                "precision": {
                    "title": "Precision",
                    "description": "Number of digits after decimal place if a floating point number",
                    "anyOf": [
                        {
                            "type": "integer"
                        },
                        {
                            "type": "null"
                        }
                    ]
                },
                "shape": {
                    "title": "Shape",
                    "description": "The shape of the data.  Empty list indicates scalar data.",
                    "type": "array",
                    "items": {
                        "type": "integer"
                    }
                },
                "source": {
                    "title": "Source",
                    "description": "The source (ex piece of hardware) of the data.",
                    "type": "string"
                },
                "units": {
                    "title": "Units",
                    "description": "Engineering units of the value",
                    "anyOf": [
                        {
                            "type": "string"
                        },
                        {
                            "type": "null"
                        }
                    ]
                }
            },
            "required": [
                "dtype",
                "shape",
                "source"
            ]
        },
        "Limits": {
            "title": "Limits",
            "description": "Epics limits:\nsee 3.4.1 https://epics.anl.gov/base/R3-14/12-docs/AppDevGuide/node4.html",
            "type": "object",
            "properties": {
                "alarm": {
                    "description": "Alarm limits.",
                    "$ref": "#/$defs/LimitsRange"
                },
                "control": {
                    "description": "Control limits.",
                    "$ref": "#/$defs/LimitsRange"
                },
                "display": {
                    "description": "Display limits.",
                    "$ref": "#/$defs/LimitsRange"
                },
                "warning": {
                    "description": "Warning limits.",
                    "$ref": "#/$defs/LimitsRange"
                }
            }
        },
        "LimitsRange": {
            "title": "LimitsRange",
            "type": "object",
            "properties": {
                "high": {
                    "title": "High",
                    "anyOf": [
                        {
                            "type": "number"
                        },
                        {
                            "type": "null"
                        }
                    ]
                },
                "low": {
                    "title": "Low",
                    "anyOf": [
                        {
                            "type": "number"
                        },
                        {
                            "type": "null"
                        }
                    ]
                }
            },
            "required": [
                "high",
                "low"
            ]
        },
        "PerObjectHint": {
            "title": "PerObjectHint",
            "description": "The 'interesting' data keys for this device.",
            "type": "object",
            "properties": {
                "NX_class": {
                    "title": "Nx Class",
                    "description": "The NeXus class definition for this device.",
                    "type": "string",
                    "pattern": "^NX[A-Za-z_]+$"
                },
                "fields": {
                    "title": "Fields",
                    "description": "The 'interesting' data keys for this device.",
                    "type": "array",
                    "items": {
                        "type": "string"
                    }
                }
            }
        },
        "DataType": {
            "title": "DataType",
            "patternProperties": {
                "^([^./]+)$": {
                    "$ref": "#/$defs/DataType"
                }
            },
            "additionalProperties": false
        }
    },
    "properties": {
        "configuration": {
            "title": "Configuration",
            "description": "Readings of configurational fields necessary for interpreting data in the Events.",
            "type": "object",
            "additionalProperties": {
                "$ref": "#/$defs/Configuration"
            }
        },
        "data_keys": {
            "title": "data_keys",
            "description": "This describes the data in the Event Documents.",
            "type": "object",
            "additionalProperties": {
                "$ref": "#/$defs/DataKey"
            }
        },
        "hints": {
            "$ref": "#/$defs/PerObjectHint"
        },
        "name": {
            "title": "Name",
            "description": "A human-friendly name for this data stream, such as 'primary' or 'baseline'.",
            "type": "string"
        },
        "object_keys": {
            "title": "Object Keys",
            "description": "Maps a Device/Signal name to the names of the entries it produces in data_keys.",
            "type": "object"
        },
        "run_start": {
            "title": "Run Start",
            "description": "Globally unique ID of this run's 'start' document.",
            "type": "string"
        },
        "time": {
            "title": "Time",
            "description": "Creation time of the document as unix epoch time.",
            "type": "number"
        },
        "uid": {
            "title": "uid",
            "description": "Globally unique ID for this event descriptor.",
            "type": "string"
        }
    },
    "required": [
        "data_keys",
        "run_start",
        "time",
        "uid"
    ],
    "patternProperties": {
        "^([^./]+)$": {
            "$ref": "#/$defs/DataType"
        }
    },
    "additionalProperties": false
}

Event Document#

The Event document may contain data directly:

# 'event' document
{'data': {'camera_image': [[...512x512 array...]]},
 'descriptor': 'd08d2ada-5f4e-495b-8e73-ff36186e7183',  # foreign key
 'filled': {},
 'seq_num': 1,
 'time': 1550072091.2793343,
 'timestamps': {'camera_image': 1550072091.2793014},
 'uid': '8eac2f83-2b3e-4d67-ae2c-1d3aaff29ff5'}

or it may reference it via a datum_id from a Datum document.

# 'event' document
{'data': {'camera_image': '272132cf-564f-428f-bf6b-149ee4287024/1'},  # foreign key
 'descriptor': 'd08d2ada-5f4e-495b-8e73-ff36186e7183',  # foreign key
 'filled': {'camera_image': False},
 'seq_num': 1,
 'time': 1550072091.2793343,
 'timestamps': {'camera_image': 1550072091.2793014},
 'uid': '8eac2f83-2b3e-4d67-ae2c-1d3aaff29ff5'}

See External Assets for details on how external assets are handled.

Typical example:

# 'event' document
{'data': {'random_walk:dt': -1.0,
          'random_walk:x': 1.9221013521832928},
 'descriptor': '0ad55d9e-1b31-4af2-865c-7ab7c8171303',
 'filled': {},
 'seq_num': 1,
 'time': 1550070005.0189056,
 'timestamps': {'random_walk:dt': 1550070004.994477,
                'random_walk:x': 1550070004.812525},
 'uid': '7b5343fe-dfd7-4884-bc18-a0b571ff60b7'}

From a data analysis perspective, these readings were simultaneous, but in actuality the occurred at separate times. The separate times of the individual readings are not thrown away (they are recorded in ‘timestamps’) but the overall event ‘time’ is often more useful.

Formal schema:

{
    "title": "event",
    "description": "Document to record a quanta of collected data",
    "type": "object",
    "properties": {
        "data": {
            "title": "Data",
            "description": "The actual measurement data",
            "type": "object"
        },
        "descriptor": {
            "title": "Descriptor",
            "description": "UID of the EventDescriptor to which this Event belongs",
            "type": "string"
        },
        "filled": {
            "title": "Filled",
            "description": "Mapping each of the keys of externally-stored data to the boolean False, indicating that the data has not been loaded, or to foreign keys (moved here from 'data' when the data was loaded)",
            "type": "object",
            "additionalProperties": {
                "anyOf": [
                    {
                        "type": "boolean"
                    },
                    {
                        "type": "string"
                    }
                ]
            }
        },
        "seq_num": {
            "title": "Seq Num",
            "description": "Sequence number to identify the location of this Event in the Event stream",
            "type": "integer"
        },
        "time": {
            "title": "Time",
            "description": "The event time. This maybe different than the timestamps on each of the data entries.",
            "type": "number"
        },
        "timestamps": {
            "title": "Timestamps",
            "description": "The timestamps of the individual measurement data",
            "type": "object"
        },
        "uid": {
            "title": "Uid",
            "description": "Globally unique identifier for this Event",
            "type": "string"
        }
    },
    "required": [
        "data",
        "descriptor",
        "seq_num",
        "time",
        "timestamps",
        "uid"
    ],
    "additionalProperties": false
}

Event Page#

Event contents can also be represented in “paged” form, where multiple rows are contained in one structure for efficient transport and vectorized computation. The representations contain equivalent information: an EventPage can always be transformed, without loss, into an Event and vice versa. Here is the example Event above structured as an Event Page with a single row:

# 'event_page' document
{'data': {'random_walk:dt': [-1.0],
          'random_walk:x': [1.9221013521832928]},
 'descriptor': '0ad55d9e-1b31-4af2-865c-7ab7c8171303',
 'filled': {},
 'seq_num': [1],
 'time': [1550070005.0189056],
 'timestamps': {'random_walk:dt': [1550070004.994477],
                'random_walk:x': [1550070004.812525]},
 'uid': ['7b5343fe-dfd7-4884-bc18-a0b571ff60b7']}

Formal Event Page schema:

{
    "title": "event_page",
    "description": "Page of documents to record a quanta of collected data",
    "type": "object",
    "$defs": {
        "DataFrameForEventPage": {
            "title": "DataFrameForEventPage",
            "type": "object",
            "additionalProperties": {
                "items": {},
                "type": "array"
            }
        },
        "DataFrameForFilled": {
            "title": "DataFrameForFilled",
            "type": "object",
            "additionalProperties": {
                "items": {
                    "anyOf": [
                        {
                            "type": "boolean"
                        },
                        {
                            "type": "string"
                        }
                    ]
                },
                "type": "array"
            }
        }
    },
    "properties": {
        "data": {
            "description": "The actual measurement data",
            "$ref": "#/$defs/DataFrameForEventPage"
        },
        "descriptor": {
            "title": "Descriptor",
            "description": "The UID of the EventDescriptor to which all of the Events in this page belong",
            "type": "string"
        },
        "filled": {
            "description": "Mapping each of the keys of externally-stored data to an array containing the boolean False, indicating that the data has not been loaded, or to foreign keys (moved here from 'data' when the data was loaded)",
            "$ref": "#/$defs/DataFrameForFilled"
        },
        "seq_num": {
            "title": "Seq Num",
            "description": "Array of sequence numbers to identify the location of each Event in the Event stream",
            "type": "array",
            "items": {
                "type": "integer"
            }
        },
        "time": {
            "title": "Time",
            "description": "Array of Event times. This maybe different than the timestamps on each of the data entries",
            "type": "array",
            "items": {
                "type": "number"
            }
        },
        "timestamps": {
            "description": "The timestamps of the individual measurement data",
            "$ref": "#/$defs/DataFrameForEventPage"
        },
        "uid": {
            "title": "Uid",
            "description": "Array of globally unique identifiers for each Event",
            "type": "array",
            "items": {
                "type": "string"
            }
        }
    },
    "required": [
        "data",
        "descriptor",
        "seq_num",
        "time",
        "timestamps",
        "uid"
    ],
    "additionalProperties": false
}

It is intentional that the values in the “data” and “timestamps” dictionaries do not have structure. The values may be numeric, bool, null (None), or a homogeneous N-dimensional array of any of these. The values are never objects or dictionaries (to use the JSON and Python terminology respectively). This requirement allows document-consumers to make useful simplifying assumptions. As another justification for this design, consider that if we allowed one level of nesting in “data”, then it could lead to wanting those values to allow nesting and so on, which would lead us to accepting arbitrarily nested structured data. This in turn would makes the Event Descriptors significantly more complex. Thus, we require that the values in “data” never be structured.

Run Stop Document#

A ‘run stop’ document marks the end of the run. It contains metadata that is not known until the run completes.

The most commonly useful fields here are ‘time’ and ‘exit_status’.

Minimal nontrivial valid example:

# 'run stop' document
{'uid': '546cc556-5f69-46b5-bf36-587d8cfe67a9',
 'time': 1550072737.175858,
 'run_start': '61bb1db8-c95c-4144-845b-e248c06d80e1',
 'exit_status': 'success',
 'reason': '',
 'num_events': {}}

Typical example:

# 'stop' document
{'run_start': 'ba1f9076-7925-4af8-916e-0e1eaa1b3c47',
 'time': 1580172029.3419003,
 'uid': '78c70c2c-2508-479e-9857-05553748022e',
 'exit_status': 'success',
 'reason': '',
 'num_events': {'primary': 10}

Formal schema:

{
    "title": "run_stop",
    "description": "Document for the end of a run indicating the success/fail state of the run and the end time",
    "type": "object",
    "$defs": {
        "DataType": {
            "title": "DataType"
        }
    },
    "properties": {
        "data_type": {
            "description": "data_type",
            "$ref": "#/$defs/DataType"
        },
        "exit_status": {
            "title": "Exit Status",
            "description": "State of the run when it ended",
            "type": "string",
            "enum": [
                "success",
                "abort",
                "fail"
            ]
        },
        "num_events": {
            "title": "Num Events",
            "description": "Number of Events per named stream",
            "type": "object",
            "additionalProperties": {
                "type": "integer"
            }
        },
        "reason": {
            "title": "Reason",
            "description": "Long-form description of why the run ended",
            "type": "string"
        },
        "run_start": {
            "title": "Run Start",
            "description": "Reference back to the run_start document that this document is paired with.",
            "type": "string"
        },
        "time": {
            "title": "Time",
            "description": "The time the run ended. Unix epoch",
            "type": "number"
        },
        "uid": {
            "title": "Uid",
            "description": "Globally unique ID for this document",
            "type": "string"
        }
    },
    "required": [
        "exit_status",
        "run_start",
        "time",
        "uid"
    ],
    "patternProperties": {
        "^([^./]+)$": {
            "$ref": "#/$defs/DataType"
        }
    },
    "additionalProperties": false
}

Resource Document#

See External Assets for details on the role Resource documents play in referencing external assets, such as large array data written by detectors.

Minimal nontrivial valid example:

# 'resource' document
{'path_semantics': 'posix',
 'resource_kwargs': {},
 'resource_path': '/local/path/subdirectory/data_file',
 'root': '/local/path/',
 'run_start': '10bf6945-4afd-43ca-af36-6ad8f3540bcd',
 'spec': 'SOME_SPEC',
 'uid': '272132cf-564f-428f-bf6b-149ee4287024'}

Typical example:

# resource
{'spec': 'AD_HDF5',
 'root': '/GPFS/DATA/Andor/',
 'resource_path': '2020/01/03/8ff08ff9-a2bf-48c3-8ff3-dcac0f309d7d.h5',
 'resource_kwargs': {'frame_per_point': 10},
 'path_semantics': 'posix',
 'uid': '3b300e6f-b431-4750-a635-5630d15c81a8',
 'run_start': '10bf6945-4afd-43ca-af36-6ad8f3540bcd'}

Formal schema:

{
    "title": "resource",
    "description": "Document to reference a collection (e.g. file or group of files) of externally-stored data",
    "type": "object",
    "properties": {
        "path_semantics": {
            "title": "Path Semantics",
            "description": "Rules for joining paths",
            "type": "string",
            "enum": [
                "posix",
                "windows"
            ]
        },
        "resource_kwargs": {
            "title": "Resource Kwargs",
            "description": "Additional argument to pass to the Handler to read a Resource",
            "type": "object"
        },
        "resource_path": {
            "title": "Resource Path",
            "description": "Filepath or URI for locating this resource",
            "type": "string"
        },
        "root": {
            "title": "Root",
            "description": "Subset of resource_path that is a local detail, not semantic.",
            "type": "string"
        },
        "run_start": {
            "title": "Run Start",
            "description": "Globally unique ID to the run_start document this resource is associated with.",
            "type": "string"
        },
        "spec": {
            "title": "Spec",
            "description": "String identifying the format/type of this Resource, used to identify a compatible Handler",
            "type": "string"
        },
        "uid": {
            "title": "Uid",
            "description": "Globally unique identifier for this Resource",
            "type": "string"
        }
    },
    "required": [
        "resource_kwargs",
        "resource_path",
        "root",
        "spec",
        "uid"
    ],
    "additionalProperties": false
}

Datum Document#

See External Assets for details on the role Datum documents play in referencing external assets, such as large array data written by detectors.

Minimal nontrivial valid example:

# 'datum' document
{'resource': '272132cf-564f-428f-bf6b-149ee4287024',  # foreign key
 'datum_kwargs': {},  # format-specific parameters
 'datum_id': '272132cf-564f-428f-bf6b-149ee4287024/1'}

Typical example:

# datum
{'resource': '3b300e6f-b431-4750-a635-5630d15c81a8',
 'datum_kwargs': {'index': 0},
 'datum_id': '3b300e6f-b431-4750-a635-5630d15c81a8/0'}

It is an implementation detail that datum_id is often formatted as {resource}/{counter} but this should not be considered part of the schema.

Formal schema:

{
    "title": "datum",
    "description": "Document to reference a quanta of externally-stored data",
    "type": "object",
    "properties": {
        "datum_id": {
            "title": "Datum Id",
            "description": "Globally unique identifier for this Datum (akin to 'uid' for other Document types), typically formatted as '<resource>/<integer>'",
            "type": "string"
        },
        "datum_kwargs": {
            "title": "Datum Kwargs",
            "description": "Arguments to pass to the Handler to retrieve one quanta of data",
            "type": "object"
        },
        "resource": {
            "title": "Resource",
            "description": "The UID of the Resource to which this Datum belongs",
            "type": "string"
        }
    },
    "required": [
        "datum_id",
        "datum_kwargs",
        "resource"
    ],
    "additionalProperties": false
}

Datum Page#

Like Events, Datum contents can also be represented in “paged” form, and the representations contain equivalent information. This is the Datum example above strucuted as a Datum Page with one row:

# datum
{'resource': '3b300e6f-b431-4750-a635-5630d15c81a8',
'datum_kwargs': {'index': [0]},
'datum_id': ['3b300e6f-b431-4750-a635-5630d15c81a8/0']}

Formal Datum Page schema:

{
    "title": "datum_page",
    "description": "Page of documents to reference a quanta of externally-stored data",
    "type": "object",
    "$defs": {
        "DataFrameForDatumPage": {
            "title": "DataFrameForDatumPage",
            "type": "array",
            "items": {
                "type": "string"
            }
        }
    },
    "properties": {
        "datum_id": {
            "description": "Array unique identifiers for each Datum (akin to 'uid' for other Document types), typically formatted as '<resource>/<integer>'",
            "$ref": "#/$defs/DataFrameForDatumPage"
        },
        "datum_kwargs": {
            "title": "Datum Kwargs",
            "description": "Array of arguments to pass to the Handler to retrieve one quanta of data",
            "type": "object",
            "additionalProperties": {
                "items": {},
                "type": "array"
            }
        },
        "resource": {
            "title": "Resource",
            "description": "The UID of the Resource to which all Datums in the page belong",
            "type": "string"
        }
    },
    "required": [
        "datum_id",
        "datum_kwargs",
        "resource"
    ],
    "additionalProperties": false
}

Stream Resource Document (Experimental)#

See External Assets for details on the role Stream Resource documents play in referencing external assets that are natively ragged, such as single-photon detectors, or assets where there are many relatively small data sets (e.g. scanned fluorescence data).

Typical example:

# 'Stream Resource' document
{'data_key': 'detector_1',
 'mimetype': 'application/x-hdf5',
 'uri': 'file://localhost/GPFS/DATA/Andor/01/03/8ff08ff9-a2bf-48c3-8ff3-dcac0f309d7d.h5',
 'parameters': {'frame_per_point': 1},
 'uid': '3b300e6f-b431-4750-a635-5630d15c81a8',
 'run_start': '10bf6945-4afd-43ca-af36-6ad8f3540bcd'}

Formal schema:

{
    "title": "stream_resource",
    "description": "Document to reference a collection (e.g. file or group of files) of externally-stored data streams",
    "type": "object",
    "properties": {
        "data_key": {
            "title": "Data Key",
            "description": "A string to show which data_key of the Descriptor are being streamed",
            "type": "string"
        },
        "mimetype": {
            "title": "Mimetype",
            "description": "String identifying the format/type of this Stream Resource, used to identify a compatible Handler",
            "type": "string"
        },
        "parameters": {
            "title": "Parameters",
            "description": "Additional keyword arguments to pass to the Handler to read a Stream Resource",
            "type": "object"
        },
        "run_start": {
            "title": "Run Start",
            "description": "Globally unique ID to the run_start document this Stream Resource is associated with.",
            "type": "string"
        },
        "uid": {
            "title": "Uid",
            "description": "Globally unique identifier for this Stream Resource",
            "type": "string"
        },
        "uri": {
            "title": "Uri",
            "description": "URI for locating this resource",
            "type": "string"
        }
    },
    "required": [
        "data_key",
        "mimetype",
        "parameters",
        "uid",
        "uri"
    ],
    "additionalProperties": false
}

Stream Datum Document#

See External Assets for details on the role Stream Datum documents play in referencing external assets that are natively ragged, such as single-photon detectors, or assets where there are many relatively small data sets (e.g. scanned fluorescence data).

Typical example:

# 'Stream Datum' document
{'uid': '86340942-9865-47f9-9a8d-bdaaab1bfce2',
 'descriptor': '8c70b8c2-df32-40e3-9f50-29cda8142fa0',
 'stream_resource': '272132cf-564f-428f-bf6b-149ee4287024',  # foreign key
 'indices': {'start': 0, 'stop': 1},
 'seq_nums': {'start': 1, 'stop': 2},
 }

Formal schema:

{
    "title": "stream_datum",
    "description": "Document to reference a quanta of an externally-stored stream of data.",
    "type": "object",
    "$defs": {
        "StreamRange": {
            "title": "StreamRange",
            "description": "The parameters required to describe a sequence of incrementing integers",
            "type": "object",
            "properties": {
                "start": {
                    "title": "Start",
                    "description": "First number in the range",
                    "type": "integer"
                },
                "stop": {
                    "title": "Stop",
                    "description": "Last number in the range is less than this number",
                    "type": "integer"
                }
            },
            "required": [
                "start",
                "stop"
            ]
        }
    },
    "properties": {
        "descriptor": {
            "title": "Descriptor",
            "description": "UID of the EventDescriptor to which this Datum belongs",
            "type": "string"
        },
        "indices": {
            "description": "A slice object passed to the StreamResource handler so it can hand back data and timestamps",
            "$ref": "#/$defs/StreamRange"
        },
        "seq_nums": {
            "description": "A slice object showing the Event numbers the resource corresponds to",
            "$ref": "#/$defs/StreamRange"
        },
        "stream_resource": {
            "title": "Stream Resource",
            "description": "The UID of the Stream Resource to which this Datum belongs.",
            "type": "string"
        },
        "uid": {
            "title": "Uid",
            "description": "Globally unique identifier for this Datum. A suggested formatting being '<stream_resource>/<stream_name>/<block_id>",
            "type": "string"
        }
    },
    "required": [
        "descriptor",
        "indices",
        "seq_nums",
        "stream_resource",
        "uid"
    ],
    "additionalProperties": false
}

“Bulk Events” Document (DEPRECATED)#

This is another representation of Events. This representation is deprecated. Use EventPage instead.

{
    "patternProperties": {
        "^.*$": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "description": "The actual measurement data"
                    },
                    "timestamps": {
                        "type": "object",
                        "description": "The timestamps of the individual measurement data"
                    },
                    "filled": {
                        "type": "object",
                        "description": "Mapping the keys of externally-stored data to a boolean indicating whether that data has yet been loaded"
                    },
                    "descriptor": {
                        "type": "string",
                        "description": "UID to point back to Descriptor for this event stream"
                    },
                    "seq_num": {
                        "type": "integer",
                        "description": "Sequence number to identify the location of this Event in the Event stream"
                    },
                    "time": {
                        "type": "number",
                        "description": "The event time.  This maybe different than the timestamps on each of the data entries"
                    },
                    "uid": {
                        "type": "string",
                        "description": "Globally unique identifier for this Event"
                    }
                },
                "required": [
                    "uid",
                    "data",
                    "timestamps",
                    "time",
                    "descriptor",
                    "seq_num"
                ],
                "additionalProperties": false,
                "type": "object",
                "title": "bulk_events",
                "description": "Document to record a quanta of collected data"
            }
        }
    }
}

“Bulk Datum” Document (DEPRECATED)#

This is another representation of Datum. This representation is deprecated. Use DatumPage instead.

{
    "properties": {
        "datum_kwarg_list": {
            "type": "array",
            "items": {"type": "object"},
            "description": "Array of arguments to pass to the Handler to retrieve one quanta of data"
        },
        "resource": {
            "type": "string",
            "description": "UID of the Resource to which all these Datum documents belong"
        },
        "datum_ids": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Globally unique identifiers for each Datum (akin to 'uid' for other Document types), typically formatted as '<resource>/<integer>'"
        }
    },
    "required": [
        "datum_kwarg_list",
        "resource",
        "datum_ids"
    ],
    "additionalProperties": false,
    "type": "object",
    "title": "bulk_datum",
    "description": "Document to reference a quanta of externally-stored data"
}