Changelog¶
A catalog of new features, improvements, and bug-fixes in each release. Follow links to the relevant GitHub issue or pull request for specific code changes and any related discussion.
v1.2.5 (2022-1-21)¶
Fixed¶
several typos
compatibility with pymongo 4
problem with intake v0.6.5
stop document metadata access
v1.2.4 (2021-10-07)¶
Fixed¶
tzlocal support
v1.2.3 (2021-03-29)¶
Fixed¶
An issue where databroker was not compatible with dask version 2021.3.0.
Typo in the format string for mongodb URIs.
Changed¶
Add an optional authsource configuration to the mongo connection string.
Drop support for Python 3.6.
Add CI tests for Python 3.9.
v1.2.2 (2021-02-19)¶
Fixed¶
An issue where databroker was not compatible with older versions of intake. This is a new issue introduced in release of databroker 1.2.1.
v1.2.1 (2020-12-17)¶
Fixed¶
An issue where catalogs were not discovered correctly. This is a new issue caused by the release of intake 0.6.1.
Sqlite connection timeout CI failures.
v1.2.0 (2020-12-17)¶
Added¶
New accessors on
BlueskyEventStream
namedconfig
,timestamps
, andconfig_timestamps
provide xarray Datasets of secondary readings from Event Descriptors and Events.>>> run.primary <BlueskyEventStream 'primary' from Run 69fd42fa...> >>> run.primary.read() <xarray.Dataset> Dimensions: (time: 3) Coordinates: * time (time) float64 1.608e+09 1.608e+09 1.608e+09 Data variables: det (time) float64 1.0 1.0 1.0 >>> run.primary.timestamps.read() <xarray.Dataset> Dimensions: (time: 3) Coordinates: * time (time) float64 1.608e+09 1.608e+09 1.608e+09 Data variables: det (time) float64 1.608e+09 1.608e+09 1.608e+09 >>> list(run.primary.config) ['det'] >>> run.primary.config.det.read() <xarray.Dataset> Dimensions: (time: 3) Coordinates: * time (time) float64 1.608e+09 1.608e+09 1.608e+09 Data variables: det_Imax (time) int64 1 1 1 det_center (time) int64 0 0 0 det_sigma (time) int64 1 1 1 det_noise (time) <U4 'none' 'none' 'none' det_noise_multiplier (time) int64 1 1 1 >>> run.primary.config_timestamps.det.read() <xarray.Dataset> Dimensions: (time: 3) Coordinates: * time (time) float64 1.608e+09 1.608e+09 1.608e+09 Data variables: det_Imax (time) float64 1.608e+09 1.608e+09 1.608e+09 det_center (time) float64 1.608e+09 1.608e+09 1.608e+09 det_sigma (time) float64 1.608e+09 1.608e+09 1.608e+09 det_noise (time) float64 1.608e+09 1.608e+09 1.608e+09 det_noise_multiplier (time) float64 1.608e+09 1.608e+09 1.608e+09
Added a “summary” projector that operates, inexpensively, on just the ‘start’ document.
Fixed¶
Adjust to upstream (yet unreleased) changes to intake.
When the ‘stop’ document is not yet available (or permanently missing) the IPython repr degrades gracefully.
When comparing
BlueskyRun
(v2) or aHeader
(v1) to another object via equality, returnFalse
if the other object is of a different type, instead of raising an exception.
Changed¶
Remove configuration columns from output of
BlueskyEventStream.read()
and.to_dask()
. The newconfig
acccessor should be used to access these.In v1, provide more helpful exceptions when a Datum is referenced by an Event is not found. (In v2 this is already done, via
event_model.Filler
.)The method
BlueskyRun.canonical
has been renamed toBlueskyRun.documents
. The old name is deprecated and issues a warning when used.
v1.1.0 (2020-09-03)¶
Added¶
Experimental
databroker.projector
moduleA
stats
method on theBlueskyMongoCatalog
to access MongoDB storage info.
Fixed¶
Do more to try to recover from inaccurate
shape
metadata.Tolerate old Resource documents that rely on MongoDB
_id
and are missinguid
.
v1.0.6 (2020-06-10)¶
Fixed¶
Xarray shape is now correct when multiple streams have matching keys.
Msgpack and jsonl backed catalogs now find new entries correctly.
The order of descriptors in v1.Header.descriptors now matches v0.Header.descriptors.
v1.0.5 (2020-06-04)¶
Fixed¶
The latest release of intake, v0.6.0, introduced a regression which databroker now works around.
v1.0.4 (2020-06-03)¶
Internals¶
Adjust our usage of intake’s Entry abstraction in preparation for changes in intake’s upcoming release
Fixed¶
The canonical method now only yields a stop document if it is not None.
v1.0.3 (2020-05-12)¶
Added¶
Added SingleRunCache which collects the documents from a single run, and when complete, provides a BlueskyRun.
v1.0.2 (2020-04-07)¶
Fixed¶
databroker now supports mongo backends with authentication.
v1.0.1 (2020-04-03)¶
Added¶
When a
Broker
is constructed from a YAML configuration file, theroot_map
values may be given as relative paths interpreted relative to the location of that configuration file.
Changed¶
The minimum version of the dependency
intake
has been increased tov0.5.5
and various internal changes have been made to adjust for changes inintake
.
Fixed¶
The object
LazyMap
object now support pickling.The query
TimeRange
now properly propagates itstimezone
parameter throughreplace()
.If installation with
python2
is attempted, a helpful error message is shown.
v1.0.0 (2020-03-13)¶
This release amounts to a full rewrite of databroker. See What are the API versions v0, v1, v2? for details.
See the v1.0.0 GitHub milestone for a full enumeration of the changes in this release.
v0.13.3 (2019-08-21)¶
Enhancements¶
Replaced deprecated unordered bulk write (requires
pymongo >= 3.0
).
Documentation¶
Update the links in the sidebar to point to the Bluesky Project.
Packaging¶
Added missing files in the source distribution for PyPI.
See the v0.13.3 GitHub milestone for a full enumeration of the changes in this release.
v0.13.2 (2019-07-30)¶
Bug Fixes¶
Support round trip of databroker configs, reporting the config, module and class.
Packaging¶
Removed vestigial dependency on
dask
, which is no longer used.
See the v0.13.2 GitHub milestone for a full enumeration of the changes in this release.
v0.13.1 (2019-07-30)¶
Bug Fixes¶
Make sqlite-backed assets registry threadsafe, for compatibility with bluesky 1.6.0.
v0.13.0 (2019-06-06)¶
API Changes¶
Drop support for Python 2
v0.12.2 (2019-03-11)¶
Bug Fixes¶
Support round trip of resource
Documentation¶
Fix typos in the tutorial
Update installing sentinel code example
v0.12.1 (2019-01-25)¶
Bug Fixes¶
Fixed a bug in EventSourceShim.docs_given_header when filtering the fields.
v0.12.0 (2019-01-03)¶
Enhancements/API changes¶
documents()
now yields any Resource and Datum documents referenced by Event documents.documents()
now yields documents in strict time order which may interlace Events from different streams. Previously documents were yielded in time order by descriptor.Added
event_sources_by_name
property toBrokerES
classAdded
event_sources
kwarg toBroker
classReplaced
url
,timezone
andpvs
kwargs inArchiverEventSource
class with aconfig
dictionary kwarg and updated other methods to use this.Added
name
andpvs
attributes toArchiverEventSource
class and updated other methods to use these.Added
tables_given_times()
method toArchiverEventSource
class.Added
name
property toEventSourceShim
class
Bug Fixes¶
Fixed an issue in the tutorial where importing databroker was forgotten.
The docstring for the
RegistryTemplate
class has been fixed.
v0.11.3 (2018-09-05)¶
Bug Fixes¶
Removes an assumption that Descriptors have a ‘name’ field.
v0.11.2 (2018-06-19)¶
Bug Fixes¶
Fixed a number of typos in the documentation
Fixed rendering issue of the README.md file on PyPI.
v0.11.1 (2018-05-19)¶
Bug Fixes¶
Fixed limitation whereby sqlite backend could not be used by multiple threads. One important problem with limitation is that it broke the ability to insert documents generated by “monitoring” in bluesky.
Removed accidental call to
print
.
v0.11.0 (2018-05-14)¶
Enhancements¶
Broker objects now have a
db.name
attribute which is the name passed intoBroker.named
orNone
.Header objects now an
ext
attribute, containing aSimpleNamespace
. By default, it is empty. It is intended to be used to pull metadata from external data sources, such as sample metadata, comments or tags, or proposal information. To register a datasource, add an item to the dictionaryBroker.external_fetchers
. The value, which should be a callable, will be passed two arguments, the RunStart document and RunStop document, and the result will be added toext
using the key. The callable is expected to handle all special cases (errors, etc.) internally and returnNone
if it has nothing to add.Accept Resource and Datum documents via the generic
insert
method. To facilitate the “asset refactor” transition in bluesky and ophyd, ignore duplicate attempts to insert a document with the sameuid
. (This is controllable by a new flagignore_duplicate_error
on the Registry insert methods.)
Bug Fixes¶
The
Header.fields()
method wrongly ignored itsstream_name
argument.
v0.10.0 (2018-02-20)¶
Enhancements¶
Add special name
Broker.named('temp')
which creates new, temporary storage each time it is called. This is convenient for testing and teaching.
Deprecations¶
The
Broker.__call__()
method for searching headers by metadata accepted special keyword arguments,start_time
and/orend_time
, which filtered results by RunStart time. These names proved to be confusing, so they have been renamedsince
anduntil
(terminology inspired bygit log
). The original names still work, but warn if used.
Bug Fixes¶
The mongoquery backend returned identical references (i.e. the same dictionary) on subsequent queries, meaning that mutations could propagate across results sets.
Ensure there is only one definition of a
DuplicateHandler
exception.Remove invalid keyword argument from
get_images
.
v0.9.4 (2017-12-06)¶
This release contains bug fixes and experimental new features.
Enhancements¶
Add experimental integration with glue.
The HDF5 handlers have been refactored, and a new HDF5 handler returning dask objects has been added.
Bug Fixes¶
Rendering the HTML repr (
_repr_html_
) of a Header produced an unnecesary warning.Headers without a stop document wrongly produced an error and could not be created. This was a regression.
v0.9.3 (2017-09-13)¶
This release contains one bug fix for a feature that was new in v0.9.0.
Bug Fixes¶
Properly implement “filling” of external data in the case of multiple event streams with different data keys. This case generated a
KeyError
in v0.9.2.
v0.9.2 (2017-09-11)¶
This release contains one bug fix for a feature that was new in v0.9.0.
Bug Fixes¶
Allow handlers to be registered via a configuration file. This feature was intended to be added in v0.9.0, but it was broken and unusable.
v0.9.1 (2017-09-06)¶
This is release contains small but important bug fixes. It is recommended that all users upgrade from v0.9.0.
Bug Fixes¶
Respect the
fill
kwargs inHeader.table()
andBroker.get_table()
. In v0.9.0, a regression was introduced that always set it toTrue
regardless of the user input.Omit the special value
'_legacy_config'
from the results returned bylist_configs()
because it should a (private) synonym for one of the other values.Make document retrieval lazy (as it was intended to be) by removing an internal call to
check_fields_exist
.Do not attempt to fill external data that has already been filled.
v0.9.0 (2017-08-22)¶
Overview¶
This is a major update to databroker.
The packages metadatastore, filestore, portable-mds, portable-fs, metadataservice, and metadataclient have all been merged into databroker itself. The individual packages have been deprecated; all future development will occur in databroker.
In response to feedback, new convenience functions and methods have been added.
The configuration management has been completely overhauled.
Enhancements¶
User-facing API Changes¶
The following changes may break old user code.
DataBroker used to rely indirectly on configuration files located at:
/etc/metadatastore.yml
or~/.config/metadatastore/connection.yml
/etc/filestore.yml
or~/.config/filestore/connection.yml
These configuration files are now completely ignored. Users must adopt the new configuration system.
The order of parameters to these methods has been rearranged to be mutally consistent.
We judge that the short-term pain of updating some user code now is less than the long-term pain of asking everyone to keep mental track of random, inconsistent parameter orderings forever.
The option
handler_override
, which overrode handlers by field name, has been removed from all methods and functions that formerly supported it. Use the optionhandler_registry
instead, which overrides handlers by handler spec name—a less complex, less error-prone operation.The method
Header.events()
defaults to returning only events from the ‘primary’ stream, not all events.Documents refer to other documents by a uid. In past versions of databroker they were dereferenced. That is:
# Assume run_start, run_stop, descriptor, and event are documents. # True for databroker version < 0.9.0: event['descriptor'] == descriptor descriptor['run_start'] == run_start run_stop['run_start'] == run_start # True for databroker version 0.9.0: event['descriptor'] == descriptor['uid'] descriptor['run_start'] == run_start['uid'] run_stop['run_start'] == run_start['uid']
The type of db.filters changed from list to dict.
Deprecations¶
The following changes to recommended usage may produce warnings in user code, but it will not break user code in this release. It may break user code in a future release, so the warnings should be heeded during this cycle if possible.
The following usages are deprecated and will stop being supported in a future release:
# THIS USAGE IS DEPRECATED from databroker import db # or, equivalently: from databroker import DataBroker
Instead, do:
from databroker import Broker db = Broker('example')
where
example
is the name of some configuration. This new approach makes it possible to connect to multiple Brokers in the same process.db1 = Broker('laptop') db2 = Broker('beamline')
This is useful for transferring data, among other things.
Likewise, the top-level functions for fetching data are deprecated and will be removed in a future release:
# THIS USAGE IS DEPRECATED from databroker import db, get_table, get_events, get_images h = db[-1] get_table(h)
The new recommended usage is:
from databroker import Broker db = Broker.named('example') h = db[-1] h.table()
The method
Broker.get_images
and the classImages
are deprecated and may be removed in a future release. See issue for the motivating discussion. Use the new methodHeader.data()
.Databroker uses a custom dictionary subclass that supports dot access like
event.data
as a synonym for item lookup likeevent['data']
. Employing a custom dictionary subclass has downsides, including performance and complexity. We are considering defaulting to plain dictionaries in a future release, which would break any user code that relies on dot access. To prepare for this possible change, the usageevent.data
now produces a warning advising users to switch toevent['data']
. More detail is available in the section Advanced: Controlling the Return Type.The method
Header.stream()
has been renamedHeader.documents()
. The old name issues a warning in this release; it will be removed in a future one.The modules
databroker.broker
anddatabroker.core
have been combined, and all public members are importable from justdatabroker
. The old modules will be maintained as shims to avoid breaking user code.
Internal API Changes¶
The following API changes affect the libraries that have been merged into databroker in this release (metadatastore, filestore, portable-mds, portable-fs, metadataclient, metadataservice). These changes are internal to databroker and will only affect advanced users.
All
FileStore
classes have been renamed toRegistry
.The method
change_root
, which is implemented on variousRegistry
classes, has been renamed tomove_files
.The method
get_datum
onRegistry
classes is fully removed. Useretrieve
instead.The
version
keyword argument has been removed from allRegistry
classes andMDS
classes. It is now part of theconfig
dictionary.A script for launching the “metadataservice” server has been moved to a CLI named
start_md_server
.The “writers” formerly in filestore now require a
Registry
as an argument.The modules
filestore.commands
andfilestore.api
have been removed. Same formetadatastore.commands
andmetadatastore.api
.Registry.correct_root
supports uids as args, verify is now optional and defaults to False, argresource` is now ``resource_or_uid
v0.8.4 (2017-05-24)¶
(TO DO)
v0.8.3 (2017-05-23)¶
(TO DO)
v0.8.2 (2017-05-22)¶
(TO DO)
v0.8.1 (2017-05-22)¶
(TO DO)
v0.8.0¶
API Changes¶
databroker.core
¶
This module is semi-private
Removed
process
,stream
, andrestream
as top-level functions. The implementation now lives in databroker.broker.BrokerES. These functions knew too much about the internals of the databroker to remain as separate functions.Broker.__call__
returns an iterableResults
object, akin to a generator, instead of a list. This means that queries with large results sets return quickly. Iterating through the Headers in the result set is up to the caller.
Header
¶
Change Header` from a ``doct.Document
subclass to a attr
based
class. A majority of the API has been maintained, however there are
changes in what exceptions are raised when trying to mutate the
instance.
Method |
Old Exception |
New Exception |
---|---|---|
|
doc.DocumentIsReadOnly |
AttributeError |
|
doc.DocumentIsReadOnly |
AttributeError |
|
doc.DocumentIsReadOnly |
attr.exceptions.FrozenInstanceError |
|
doc.DocumentIsReadOnly |
TypeError |
|
doc.DocumentIsReadOnly |
attr.exceptions.FrozenInstanceError |
|
doc.DocumentIsReadOnly |
TypeError |
Header.from_run_start
¶
Take a Broker
object instead of a MetadataStore
object. This
is now tacked on the Header
object.
Changes to functions in databroker.core
¶
Explicitly passed mds/fs have been removed, instead relying on the DataBroker instance included in the header.
Break up internal structure of databroker¶
The core functions that touch events have a new required argument,
es
. This does not affect the API of theBroker
object, only the functions in thecore
module.
Top level insert¶
Broker
now has an insert
method, use this over db.mds.insert
.
v0.7.0 (2016-12-21)¶
Enhancements¶
Add convenience method for exporting from one Broker instance into another.
Experimental: support regex-based field selection in Broker methods.
Bug Fixes¶
Fix handling of timezones. To summarize: all times are stored as a float number that is a UNIX timestamp (number of seconds since 1970 in Greenwich). The
get_events
method simply returns this raw number. Theget_table
method provides the option (on by default) to convert these float numbers to datetime objects, which can be more convenient in some circumstances. There are two flags for controlling this feature:convert_times
andlocalize_times
. By default,convert_times=True
andlocalize_times=True
. This returns pandas datetime64 objects that are “naive” (meaning they don’t have a timezone attached) and are in the local time. This tells you the wall clock time when the experiment was performed in timezone configured indb.mds.config['timezone']
. Iflocalize_times=False
, the datetime objects are again “naive” but in UTC time. This tells you the wall clock time of a clock in Greenwich when the experiment was performed.
v0.6.2 (2016-09-28)¶
(TO DO)
v0.6.1 (2016-09-23)¶
Enhancements¶
Remove hard dependency on metadatastore and filestore packages so that other providers of metadatastore and filestore interface may be used instead.
v0.6.0 (2016-09-07)¶
Bug Fixes¶
Make get_table
properly respect its stream_name
argument. (Previously
any value but default returned an empty result.)
Enhancements¶
Allow
Broker
to be imported without configuration present.Add stateful “filters” to restrict searches.
Add aliases to save searches during a session.
Overhaul documents and tests.
API Changes¶
The default value of
stream_name
inget_events
is nowALL
, a sentinel value. (The default forget_table
is still'primary'
, but it now also acceptsALL
.)
v0.5.0 (2016-07-25)¶
API Changes¶
Change kwarg
name
tostream_name
to select a descriptor by nameRequires filestore >= v0.5.0
Enhancements¶
Learned how to get all of the FS resource documents from a header
v0.4.1 (2016-05-09)¶
(TO DO)
v0.4.0 (2016-05-02)¶
(TO DO)
v0.3.3 (2016-02-26)¶
(TO DO)
v0.3.2 (2016-02-23)¶
(TO DO)
v0.3.1 (2016-02-04)¶
(TO DO)
v0.3.1 (2015-09-29)¶
(TO DO)
dataportal v0.2.2¶
Bug Fixes¶
Times, as returned by pandas-aware functions, are now reported correctly. Previously, these times were being reported as UTC, which is 4 or 5 hours different from US/Eastern time, depending on the time of year. (GH209)
dataportal v0.2.1¶
API Changes¶
get_images (as alias for Images) added for consistency with other function names
SubtractedImages removed; prefer PIMS pipeline feature.
dataportal v0.2.0 (2015-09-15)¶
API Changes¶
DataBroker[] for slicing by scan ID(s) or recency
DataBroker() for building queries from keyword arguments
get_events return events generator
get_table return DataFrame
Header, vastly simplified: it is merely Document with a dedicated constructor that accepts a Run Start Document
Dataportal v0.0.6¶
Enhancements¶
A new
StepScan
interface acts likeDataBroker
but immediately returns tabular data as a DataFrame in one step. (GH136)Look up scans by the name of a detector or motor used. For example, to get all scans that measured ‘Tsam’, use
DataBroker.find_headers(data_key='Tsam')
. (GH88, GH107)Look up scans using the first few characters of its unique ID, like
DataBroker['aow23oif']
. To be clear this is the ophyd-provided uid, not the mongo _id. (GH130, GH131)Replay remembers settings when flipping between scans, and it retains these settings between sessions. (GH114)
DataMuxer.to_sparse_dataframe
returns all data with one Event per row. (GH134)DataMuxer.plan.bin_on
andDataMuxer.plan.bin_by_edges
explain the planned operation ofDataMuxer.bin_on
andDataMuxer.bin_by_edges
for a given data set and given arguments. (GH134)
API Changes¶
The Event documents are reorganized to be more intuitive and require less typing. Formerly,
event.data
returned a dictionary of(value, timestamp)
tuples.event.data = {'motor1': (value, timestamp), 'motor2': (value, timestamp)}
Now,
event.data
is dictionary of the dataevent.data = {'motor1': value, 'motor2': value}
and
event.timestamp
is a dictionary of the timestamps.event.timestamps = {'motor1': timestamp, 'motor2': timestamp}
All functions that return Documents, including Headers, Events, and everything stored in metadatastore and filestore, now return Python generators, iterable objects that load data one element at a time. To convert these to normal Python lists, simply use
list(gen)
. (GH127)The output of
DataMuxer.bin_*
functions is indexed by bin number (0, 1, 2…). The Event time is given as a column.Metadatastore and filestore require configuration settings. They look in the following locations, in increasing order of precedence. Use #3 or #4 to customize your own metadatastore and filestore.
CONDA_ENV/etc/name.yaml
(ifCONDA_ETC_
env is defined)/etc/name.yaml
~/.config/name/connection.yml
reading environmental variables formatted like
MDS_DATABASE
orFS_DATABASE
For example, in
~/.config/metadatastore/connection.yaml
host: localhost port: 27017 database: my_metadatastore timezone: US/Eastern
and likewise with filestore/connection.yaml. (Filestore does not need a timezone field, however.) If no configuration can be found, they will raise an error on import. We avoid defaults so that experimental data cannot be accidentally saved to an unsafe destination.
Metadatastore configuration also requires a
timezone
field, which is uses to interpret human-friendly datetimes.
Bug Fixes¶
All DataMuxer output is sorted by Event time. (GH134)