Usage

Simple Export

  1. Peruse the list of suitcases and install the suitcase for the format you want. For example, CSV.

    pip install suitcase-csv
    
  2. Access the bluesky “documents” containing the data/metadata that you want to export. For example, saved data can be accessed from the databroker.

    from databroker import Broker
    db = Broker.named('my_beamline')
    docs = db[-1].documents(fill=True)
    
  3. Use the export() function in the suitcase.

    suitcase.csv.export(docs, '')
    

    This will generate one or more files in the current directory. You may also specify a different directory like so:

    suitcase.csv.export(docs, 'path/to/usb_stick')
    

    The number of files generated depends on the format and also the specifics of the data being exported. For example, suitcase-csv generates one CSV file for each logical table (“stream”) in the data, which varies. The filenames are returned by the export() function.

    By default the file names are derived from the run’s unique ID, which is guaranteed to be unique but not very descriptive — names like e687d1b6-af34-4f8f-9f0d-2ebe1e1edcb7-primary.csv and e687d1b6-af34-4f8f-9f0d-2ebe1e1edcb7-baseline.csv. To tailor the name to your needs, you can specify a file prefix:

    suitcase.csv.export(docs, 'path/to/files', 'my-data-')
    

    which would lead to names like my-data-primary.csv and my-data-baseline.csv in this case.

    You can also template the file prefix with metadata (extracted from the RunStart document). Examples:

    export(docs, 'path/to/files', '{plan_name}-{motors}-')
    export(docs, 'path/to/files', '{time:%%Y-%%m-%%d_%%H:%%M}-') # timestamp
    export(docs, 'path/to/files', '{sample_name}')
    

    The last example assumes that sample_name was included in the metadata when the data was acquired.

  4. Repeat if multiple formats are desired. For example, you may wish to export to CSV (which captures only scalar data), TIFF (which captures only image data), and JSON (which is well-suited for exporting metadata). It may be useful to wrap these up in a custom function.

    from itertools import tee
    import suitcase.csv
    import suitcase.tiff
    import suitcase.json_metadata
    
    def my_exporter(docs, directory, file_prefix):
        docs1, docs2, docs3 = tee(docs, 3)
        suitcase.csv.export(docs1, directory, file_prefix)
        suitcase.tiff.export(docs2, directory, file_prefix)
        suitcase.json_metadata.export(docs3, directory, file_prefix)
    
    my_exporter(docs)
    

    Note

    The first line in my_exporter above duplicates docs into 3 identical versions. It is required as docs may be a generator that will be exhausted when used and we need to use it 3 independent times.

Warning

Note that export() can only be used on one “run” (one RunStart document) at a time. Do multiple runs like this:

for header in db(since='2018-01'):
    export(header.documents(), '')

Streaming Export

In addition to the export() function, each suitcase package implements a Serializer class. It produces exactly the same files and has the same options; export() is just a wrapper around Serializer. But where export() loops through a list or generator of documents, Serializer expect documents to be pushed through, thus:

# Export documents from *one run only* in a streaming fashion.

from suitcase.csv import Serializer
serializer = Serializer('path/to/files')
for name, doc in docs:
    serializer(name, doc)

serializer.artifacts  # Access the filenames.

The filenames may be accessed at any time via serializer.artifacts. (This is what is returned by export().) The Serializer should be closed when finished. This closes all the of the resources (e.g. files) that is has opened.

This is suitable for streaming export. Note that a given Serializer instance may only be used for one run (one RunStart document, RunStop document, and whatever in between). A new instance must be created for each new run. The RunRouter streamlines this process.

# Set up a RunRouter suitable for exporting from many runs.

from event_model import RunRouter
from suitcase.csv import Serializer

def factory(name, start_doc):

    serializer = Serializer('path/to/files')
    serializer('start', start_doc)

    return [serializer], []

rr = RunRouter([factory])

The RunRouter will call our factory at the beginning of each run, creating a fresh serializer instance and routing documents through it. We can push documents in directly

for name, doc in docs:
    rr(name, doc)

or subscribe them to the bluesky RunEngine to receive documents in a streaming fashion during acquition.

RE.subscribe(rr)

For documents containing pointers to external files that need to be “filled” (that is, employing Resource and Datum documents), a Filler must be used as well. This is typically relevant for exporting images.

from event_model import RunRouter, Filler
import suitcase.tiff

def factory(name, start_doc):

    filler = Filler(...)
    serializer = suitcase.tiff_series.Serializer('path/to/files')
    serializer('start', start_doc)

    def cb(name, doc):
        filler(name, doc)  # Fill in place any externally-stored data.
        serializer(name, doc)

    return [cb], []

rr = RunRouter([factory])
RE.subscribe(rr)

Serialize to Any Buffer

While most users will use suitcase to write files to disk, advanced users may write to a memory buffer, a network socket, etc. This is useful if the data’s ultimate destination is a web client or some ready application. There is no need to waste time writing the data to disk and then reading it right back.

To support this naturally, suitcase’s architecture cleanly separates the serialization (documents-to-bytes) from the transport (what to do with the bytes).

This:

serializer = Serializer(directory)

is a shorthand for this:

from suitcase.utils import MultiFileManager

manager = MultiFileManager(directory)
serializer = Serializer(manager)

“Who asked for MultiFileManager?” you may ask. At first one might expect to simply hand the Serializer a writable buffer instead of filename, as in Serializer(buffer). In fact, a more sophisticated interface is necessary because, for many formats, the Serializer needs to create multiple buffers, sometimes a mixture of text (string) buffers and binary (bytes) buffers. And for some formats, the number and type of buffers may vary from one dataset to another.

The MultiFileManager class handles opening the file(s) with the name requested by a Serializer and providing it with writable buffers. The Serializer interacts with files only indirectly, always mediated through the MultiFileManager. Therefore, to write to a different sort of buffer, you need only provide a different manager class. No changes are necessary to the Serializer itself.

This example will write the serialized data into memory buffers—subclasses of StringIO and/or BytesIO, as requested by a given Serializer. The buffers can then be accessed via serializer.artifacts or, equivalently, manager.artifacts.

from suitcase.utils import MemoryBuffersManager

manager = MemoryBuffersManager()
serializer = Serializer(manager)

There may be formats where it is not possible to write to anything but an ordinary file because the underlying I/O library requires a filename and cannot write to an arbitrary buffer. In that case, a clear error will be raised. See Write Your Own Suitcase for details.