Usage¶
Simple Export¶
Peruse the list of suitcases and install the suitcase for the format you want. For example, CSV.
pip install suitcase-csv
Access the bluesky “documents” containing the data/metadata that you want to export. For example, saved data can be accessed from the databroker.
from databroker import Broker db = Broker.named('my_beamline') docs = db[-1].documents(fill=True)
Use the
export()
function in the suitcase.suitcase.csv.export(docs, '')
This will generate one or more files in the current directory. You may also specify a different directory like so:
suitcase.csv.export(docs, 'path/to/usb_stick')
The number of files generated depends on the format and also the specifics of the data being exported. For example, suitcase-csv generates one CSV file for each logical table (“stream”) in the data, which varies. The filenames are returned by the
export()
function.By default the file names are derived from the run’s unique ID, which is guaranteed to be unique but not very descriptive — names like
e687d1b6-af34-4f8f-9f0d-2ebe1e1edcb7-primary.csv
ande687d1b6-af34-4f8f-9f0d-2ebe1e1edcb7-baseline.csv
. To tailor the name to your needs, you can specify a file prefix:suitcase.csv.export(docs, 'path/to/files', 'my-data-')
which would lead to names like
my-data-primary.csv
andmy-data-baseline.csv
in this case.You can also template the file prefix with metadata (extracted from the RunStart document). Examples:
export(docs, 'path/to/files', '{plan_name}-{motors}-') export(docs, 'path/to/files', '{time:%%Y-%%m-%%d_%%H:%%M}-') # timestamp export(docs, 'path/to/files', '{sample_name}')
The last example assumes that
sample_name
was included in the metadata when the data was acquired.Repeat if multiple formats are desired. For example, you may wish to export to CSV (which captures only scalar data), TIFF (which captures only image data), and JSON (which is well-suited for exporting metadata). It may be useful to wrap these up in a custom function.
from itertools import tee import suitcase.csv import suitcase.tiff_series import suitcase.json_metadata def my_exporter(docs, directory, file_prefix): docs1, docs2, docs3 = tee(docs, 3) suitcase.csv.export(docs1, directory, file_prefix) suitcase.tiff_series.export(docs2, directory, file_prefix) suitcase.json_metadata.export(docs3, directory, file_prefix) my_exporter(docs)
Note
The first line in
my_exporter
above duplicates docs into 3 identical versions. It is required asdocs
may be a generator that will be exhausted when used and we need to use it 3 independent times.
Warning
Note that export()
can only be used on one “run” (one RunStart
document) at a time. Do multiple runs like this:
for header in db(since='2018-01'):
export(header.documents(), '')
Streaming Export¶
In addition to the export()
function, each suitcase package implements a
Serializer
class. It produces exactly the same files and has the same
options; export()
is just a wrapper around Serializer
. But
where export()
loops through a list or generator of documents,
Serializer
expect documents to be pushed through, thus:
# Export documents from *one run only* in a streaming fashion.
from suitcase.csv import Serializer
serializer = Serializer('path/to/files')
for name, doc in docs:
serializer(name, doc)
serializer.artifacts # Access the filenames.
The filenames may be accessed at any time via serializer.artifacts
. (This
is what is returned by export()
.) The Serializer
should be
closed when finished. This closes all the of the resources (e.g. files) that is
has opened.
This is suitable for streaming export. Note that a given Serializer
instance may only be used for one run (one RunStart document, RunStop document,
and whatever in between). A new instance must be created for each new run.
The RunRouter
streamlines this process.
# Set up a RunRouter suitable for exporting from many runs.
from event_model import RunRouter
from suitcase.csv import Serializer
def factory(name, start_doc):
serializer = Serializer('path/to/files')
serializer('start', start_doc)
return [serializer], []
rr = RunRouter([factory])
The RunRouter
will call our factory
at the beginning
of each run, creating a fresh serializer
instance and routing
documents through it. We can push documents in directly
for name, doc in docs:
rr(name, doc)
or subscribe them to the bluesky RunEngine to receive documents in a streaming fashion during acquition.
RE.subscribe(rr)
For documents containing pointers to external files that need to be “filled”
(that is, employing Resource and Datum documents), a
Filler
must be used as well. This is typically relevant
for exporting images.
from event_model import RunRouter, Filler
import suitcase.tiff_series
def factory(name, start_doc):
filler = Filler(...)
serializer = suitcase.tiff_series.Serializer('path/to/files')
serializer('start', start_doc)
def cb(name, doc):
filler(name, doc) # Fill in place any externally-stored data.
serializer(name, doc)
return [cb], []
rr = RunRouter([factory])
RE.subscribe(rr)
Serialize to Any Buffer¶
While most users will use suitcase to write files to disk, advanced users may write to a memory buffer, a network socket, etc. This is useful if the data’s ultimate destination is a web client or some ready application. There is no need to waste time writing the data to disk and then reading it right back.
To support this naturally, suitcase’s architecture cleanly separates the serialization (documents-to-bytes) from the transport (what to do with the bytes).
This:
serializer = Serializer(directory)
is a shorthand for this:
from suitcase.utils import MultiFileManager
manager = MultiFileManager(directory)
serializer = Serializer(manager)
“Who asked for MultiFileManager
?” you may ask. At first one might
expect to simply hand the Serializer
a writable buffer instead of
filename, as in Serializer(buffer)
. In fact, a more sophisticated interface
is necessary because, for many formats, the Serializer
needs to create
multiple buffers, sometimes a mixture of text (string) buffers and binary
(bytes) buffers. And for some formats, the number and type of buffers may vary
from one dataset to another.
The MultiFileManager
class handles opening the file(s) with the name
requested by a Serializer
and providing it with writable buffers. The
Serializer
interacts with files only indirectly, always mediated
through the MultiFileManager
. Therefore, to write to a different sort
of buffer, you need only provide a different manager class. No changes are
necessary to the Serializer
itself.
This example will write the serialized data into memory buffers—subclasses of
StringIO
and/or BytesIO
, as requested by a given Serializer
.
The buffers can then be accessed via serializer.artifacts
or, equivalently,
manager.artifacts
.
from suitcase.utils import MemoryBuffersManager
manager = MemoryBuffersManager()
serializer = Serializer(manager)
There may be formats where it is not possible to write to anything but an ordinary file because the underlying I/O library requires a filename and cannot write to an arbitrary buffer. In that case, a clear error will be raised. See Write Your Own Suitcase for details.