Usage

There is a Python interface, but most users will find the commandline tool suitable for their needs.

Packing a Catalog

For the command line tool databroker-pack you must provide:

  • The name of the source catalog

  • The name of the target directory

  • Which Runs in the Catalog to pack: either --all, a query such as a time window, or a list of --uids.

The result is a directory, which you can optionally compress and transfer by any convenient means.

Examples

List the available options for CATALOG and exit.

databroker-pack --list-catalogs
<list of catalog names>

Export every Run in the Catalog into a self-contained directory with Documents and any external files (e.g. large array data from detectors).

databroker-pack CATALOG --all DIRECTORY --copy-external

Or, read the data from the external files and place it directly in the documents. This may make data access slower and less flexible, but it removes the requiment for the recipient to install any special I/O code to deal with detector formats.

databroker-pack CATALOG --all DIRECTORY --fill-external

Or, omit the external files and transfer them separately. The DIRECTORY will still contain text file manifests listing the locations of the external files on the source system, suitable for feeding to tools like rsync or globus transfer --batch. This is like the recommended approach for very large transfers.

databroker-pack CATALOG --all DIRECTORY

Export Runs from a range of time.

databroker-pack CATALOG -q "TimeRange(since='2020')" DIRECTORY
databroker-pack CATALOG -q "TimeRange(since='2020', until='2020-03-01)" DIRECTORY

Export Runs from a range of time with a certain plan_name.

databroker-pack CATALOG -q "TimeRange(since='2020')" -q "{'plan_name': 'count'}" DIRECTORY

Export a specific Run by its scan_id

databroker-pack CATALOG -q "{'scan_id': 126360}" DIRECTORY

Export specific Runs given by their Run Start UID (or the first several characters) entered at the command prompt…

databroker-pack CATALOG --uids -
3c93c54e
47587fa8
ebad8c01
<Ctrl D>

…or read from a file.

databroker-pack CATALOG --uids uids_to_pack.txt

Unpacking a Packed Catalog

There are two ways to do this:

# inplace — Run databroker on top of the files as they are. This is

recommended only for smalls exports (tens of Runs) when a Mongo database is not available.

# mongo_normalized — Copy the documents from the packed directory into

MongoDB, and point databroker at MongoDB.

Option 1: Unpacking “in place”

Use databroker-unpack to make DIRECTORY automatically discoverable by databroker. You must specify a NAME to give the catalog.

If the name already exists, the catalog will be updated to include content from the pre-existing location(s) and the new one. If you want to ensure that this catalog name is unique, prohibiting automatic merging, use the flag --no-merge.

databroker-unpack inplace DIRECTORY NAME

For example

databroker-unpack inplace path/to/directory_from_pack my_data

It is important not to move the directory after you do this.

Option 2: Unpacking into MongoDB

Note

If you need to install MongoDB, we refer you to the official guides for installing the MongoDB Community Edition.

Use databroker-unpack to copy the data from the documents (stored in the pack directory as .msgpack or .jsonl files) into MongoDB. Any external files (e.g. large detector images stored separately) will be left where they are and must done be deleted or moved once databroker-unpack has been run.

If the name already exists, the catalog will be updated to include content from the pre-existing location(s) and the new one. If you want to ensure that this catalog name is unique, prohibiting automatic merging, use the flag --no-merge.

databroker-unpack mongo_normalized DIRECTORY NAME

For example

databroker-unpack mongo_normalized path/to/directory_from_pack my_data

By default this look for an unauthenticated MongoDB running on localhost on the standard port. A custom MongoDB URI amy be specified using the option --mongo-uri MONGO_URI. See databroker-unpack --help for more information.

Using an Unpacked Catalog

Then the newly “unpacked” catalog (e.g. my_data) will show in

databroker-pack --list-catalogs

and can be accessed like

>>> import databroker
>>> db = databroker.catalog['my_data'].get()

This catalog, db, contains the packed Runs, which can be accessed in the usual way like db['<uid>'], db[<scan_id>], db[-1], or fully enumerated (unwise if the Catalog is huge) list(db).

Use Without Unpacking

Alternatively, you can run databroker on top of a directory generated by databroker-pack without any unpacking step.

Important

Currently, the following only works if the packed directory is in its original location. In a future release, it will also work if the directory has been moved or copied to a different location.

import intake
catalog = intake.open_catalog('DIRECTORY/catalog.yml')

replacing DIRECTORY with the path to the directory generated by databroker-pack. This will contain a catalog named 'packed_catalog', which you can open like so.

db = catalog["packed_catalog"].get()