Case Study: Reading and Exporting a Specialized Format
For this guide, we’ll take the example of XDI, which is a formalized text-based format for X-ray Spectroscopy data. An example XDI file is reproduced below.
# XDI/1.0 GSE/1.0
# Column.1: energy eV
# Column.2: i0
# Column.3: itrans
# Column.4: mutrans
# Element.edge: K
# Element.symbol: Cu
# Scan.edge_energy: 8980.0
# Mono.name: Si 111
# Mono.d_spacing: 3.13553
# Beamline.name: 13ID
# Beamline.collimation: none
# Beamline.focusing: yes
# Beamline.harmonic_rejection: rhodium-coated mirror
# Facility.name: APS
# Facility.energy: 7.00 GeV
# Facility.xray_source: APS Undulator A
# Scan.start_time: 2001-06-26T22:27:31
# Detector.I0: 10cm N2
# Detector.I1: 10cm N2
# Sample.name: Cu
# Sample.prep: Cu metal foil
# GSE.EXTRA: config 1
# ///
# Cu foil Room Temperature
# measured at beamline 13-ID
#----
# energy i0 itrans mutrans
8779.0 149013.7 550643.089065 -1.3070486
8789.0 144864.7 531876.119084 -1.3006104
8799.0 132978.7 489591.10592 -1.3033816
8809.0 125444.7 463051.104096 -1.3059724
8819.0 121324.7 449969.103983 -1.3107085
8829.0 119447.7 444386.117562 -1.3138152
8839.0 119100.7 440176.091039 -1.3072055
8849.0 117707.7 440448.106567 -1.3195882
8859.0 117754.7 442302.10637 -1.3233895
8869.0 117428.7 441944.116528 -1.3253521
8879.0 117383.7 442810.120466 -1.327693
8889.0 117185.7 443658.11566 -1.3312944
As you can see , the file’s contents comprise a single table
and a header of key–value pairs. Therefore, it makes sense to export dataframe
structures and their associated metadata into this format.
Consider a folder data
containing a number of xdi files that we wish to serve using tiled.
An example folder containing a single example.xdi
file can be generated by running python -m tiled.examples.xdi
.
Note
All code below is in tiled/examples/xdi.py
.
The code is reproduced for reference but running the examples only requires modifying config.yaml
.
Read the XDI file
The first step in exposing this data via tiled is to write a function which can read xdi formatted files.
from tiled.adapters.dataframe import DataFrameAdapter
from tiled.structures.core import Spec
def read_xdi(filepath, specs=None, **kwargs):
# Parse file into pandas.DataFrame and dict of metadata.
df = ...
metadata = ...
specs = (specs or []) + [Spec("xdi", version="1.0")],
return DataFrameAdapter.from_pandas(df, metadata=metadata, specs=specs, **kwargs)
See the source code of tiled.examples.xdi
for a fully-worked example.
Now take the following simple server configuration:
# config.yml
trees:
- path: /
tree: tiled.catalog:from_uri
args:
uri: ./catalog.db
readable_storage:
- ./data/
adapters_by_mimetype:
application/x-xdi: tiled.examples.xdi:read_xdi
and serve it:
tiled serve config --public config.yml --api-key secret
Register the files:
tiled register http://localhost:8000/ \
--api-key secret \
--verbose \
--ext '.xdi=application/x-xdi' \
--adapter 'application/x-xdi=tiled.examples.xdi:read_xdi' \
data/
As is, we can access the data as CSV, for example.
$ curl -H 'Accept: text/csv' 'http://localhost:8000/api/v1/table/full/example'
,energy,i0,itrans,mutrans
0,8779.0,149013.7,550643.089065,-1.3070486
1,8789.0,144864.7,531876.119084,-1.3006104
2,8799.0,132978.7,489591.10592,-1.3033816
3,8809.0,125444.7,463051.104096,-1.3059724
4,8819.0,121324.7,449969.103983,-1.3107085
5,8829.0,119447.7,444386.117562,-1.3138152
6,8839.0,119100.7,440176.091039,-1.3072055
7,8849.0,117707.7,440448.106567,-1.3195882
8,8859.0,117754.7,442302.10637,-1.3233895
9,8869.0,117428.7,441944.116528,-1.3253521
10,8879.0,117383.7,442810.120466,-1.327693
11,8889.0,117185.7,443658.11566,-1.3312944
Note
There are three equivalent ways to request a format, more formally called a “media type” or a “MIME type”.
1. Use the standard [HTTP `Accept` Header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept).
```
$ curl -H 'Accept: text/csv' 'http://localhost:8000/api/v1/table/full/example'
```
2. Place the media type in a `format` query parameter.
```
$ curl 'http://localhost:8000/api/v1/table/full/example?format=text/csv'
```
3. Provide just a file extension. This is user friendly for people who do not know or care what
a "media type" is. The server looks up `csv` in a registry mapping file extensions to media types.
```
$ curl 'http://localhost:8000/api/v1/table/full/example?format=csv'
```
Define an exporter
When a client requests an XDI-formatted response, the Tiled server
will call our custom exporter with two arguments: the structure itself
(in this case, a pandas.DataFrame
) and a dictionary of metadata.
The metadata is freeform as far as Tiled is concerned—its content
and any internal structure is completely up to the user—so if we
have special requirements about what it must contain, we need to
do that validation inside our exporter. We might, for example,
refuse to export (raise an error) if required fields are missing
from the metadata or if the DataFrame we are given does not have the
expected columns.
The exporter must return either str
or bytes
.
def write_xdi(df, metadata):
output = io.StringIO()
xdi_version = metadata.get("xdi_version")
comments = metadata.get("comments", "")
extra_version = metadata.get("extra_version")
output.write(f"# XDI/{xdi_version} {extra_version}\n")
fields = metadata["fields"]
for namespace, namespace_dict in fields.items():
for tag, value in namespace_dict.items():
output.write(f"# {namespace}.{tag}: {value}\n")
output.write("# /////////////\n")
output.write("# generated by tiled\n")
output.write(comments)
output.write("# -------------\n")
# write column labels
columns = list(df.columns)
output.write("# ")
output.write(" ".join(columns))
output.write("\n")
# write data
df.to_csv(output, header=False, index=False)
return output.getvalue()
Register the exporter
Add new sections to the configuration as follows.
trees:
- path: /
tree: tiled.catalog:from_uri
args:
uri: ./catalog.db
readable_storage:
- ./data/
adapters_by_mimetype:
application/x-xdi: tiled.examples.xdi:read_xdi
media_types:
xdi:
application/x-xdi: tiled.examples.xdi:write_xdi
file_extensions:
xdi: application/x-xdi
First consider the section
media_types:
xdi:
application/x-xdi: tiled.examples.xdi:write_xdi
The key, xdi
refers to a spec
so that this exporter only applies to readers which have “xdi” in their spec
attribute.
The key, application/x-xdi
is a valid media type. In our case, there is no
registered IANA Media Type
for the format of interest. Therefore, the standard tells us
to invent one of the form application/x-*
, as in application/x-xdi
. There
is, of course, some risk of name collisions when we invent names outside of the
official list, so be specific.
The final section
file_extensions:
xdi: application/x-xdi
enables the usage
$ curl 'http://...?format=xdi'
by mapping "xdi"
to the media type. This is optional. You can provide
no file extensions for a media type. You can also provide multiple
file extensions that map to the same media type. For example, both
tif
and tiff
map to image/tiff
.
The value, tiled.examples.xdi:write_xdi
, is the module or package that our
exporter is defined in, followed by :
and then the function name.
The Python file where write_xdi
is defined by must in an importable location.
During configuration-parsing, Tiled temporarily adds the directory containing
the config file itself to sys.path
. This means that we can conveniently
drop xdi.py
next to config.yml
and know that it will be found.
For long-term deployments it’s better to place exporters in installable Python
packages (i.e. with a proper setup.py
, etc.) but for prototyping and
development this is much more expedient.
Now if we restart the server again with this updated config.yml
tiled serve config --public config.yml
we can request the content as XDI in any of these ways:
$ curl -H 'Accept: application/x-xdi' 'http://localhost:8000/api/v1/table/full/example.xdi'
$ curl 'http://localhost:8000/api/v1/table/full/example?format=application/x-xdi'
$ curl 'http://localhost:8000/api/v1/table/full/example?format=xdi'