Case Study: Reading and Exporting a Specialized Format

For this guide, we’ll take the example of XDI, which is a formalized text-based format for X-ray Spectroscopy data. An example XDI file is reproduced below.

# XDI/1.0 GSE/1.0
# Column.1: energy eV
# Column.2: i0
# Column.3: itrans
# Column.4: mutrans
# Element.edge: K
# Element.symbol: Cu
# Scan.edge_energy: 8980.0
# Mono.name: Si 111
# Mono.d_spacing: 3.13553
# Beamline.name: 13ID
# Beamline.collimation: none
# Beamline.focusing: yes
# Beamline.harmonic_rejection: rhodium-coated mirror
# Facility.name: APS
# Facility.energy: 7.00 GeV
# Facility.xray_source: APS Undulator A
# Scan.start_time: 2001-06-26T22:27:31
# Detector.I0: 10cm  N2
# Detector.I1: 10cm  N2
# Sample.name: Cu
# Sample.prep: Cu metal foil
# GSE.EXTRA:  config 1
# ///
# Cu foil Room Temperature
# measured at beamline 13-ID
#----
# energy i0 itrans mutrans
  8779.0  149013.7  550643.089065  -1.3070486
  8789.0  144864.7  531876.119084  -1.3006104
  8799.0  132978.7  489591.10592  -1.3033816
  8809.0  125444.7  463051.104096  -1.3059724
  8819.0  121324.7  449969.103983  -1.3107085
  8829.0  119447.7  444386.117562  -1.3138152
  8839.0  119100.7  440176.091039  -1.3072055
  8849.0  117707.7  440448.106567  -1.3195882
  8859.0  117754.7  442302.10637  -1.3233895
  8869.0  117428.7  441944.116528  -1.3253521
  8879.0  117383.7  442810.120466  -1.327693
  8889.0  117185.7  443658.11566  -1.3312944

As you can see , the file’s contents comprise a single table and a header of key–value pairs. Therefore, it makes sense to export dataframe structures and their associated metadata into this format.

Consider a folder data containing a number of xdi files that we wish to serve using tiled. An example folder containing a single example.xdi file can be generated by running python -m tiled.examples.xdi.

Note

All code below is in tiled/examples/xdi.py. The code is reproduced for reference but running the examples only requires modifying config.yaml.

Define an Adapter

The first step in exposing this data via tiled is to write a function which can read xdi formatted files. It MUST accept a filepath (string) and it SHOULD also accept a file buffer, which makes it easier to use in tests.

def read_xdi(file):
    ...
    return df, metadata

See the source code of tiled.examples.xdi for a fully-worked example.

In order to indicate that this data has some additional metadata structure because it comes from an xdi file we subclass DataFrameAdapter and define the specs attribute. We also define a classmethod from_file which uses our read_xdi function to construct an instance from a file.

class XDIDataFrameAdapter(DataFrameAdapter):
    specs = ["xdi"]

    @classmethod
    def from_file(cls, file):
        df, metadata = read_xdi(file)
        return cls(dask.dataframe.from_pandas(df, npartitions=1), metadata=metadata)

Now take the following simple server configuration:

# config.yml
trees:
  - path: /
    tree: tiled.adapters.files:DirectoryAdapter.from_directory
    args:
      directory: "data"
      mimetypes_by_file_ext:
        .xdi: application/x-xdi
      readers_by_mimetype:
        application/x-xdi: tiled.examples.xdi:XDIDataFrameAdapter.from_file

and serve it:

tiled serve config --public config.yml

As is, we can access the data as CSV, for example.

$ curl -H 'Accept: text/csv' 'http://localhost:8000/api/node/full/example'
,energy,i0,itrans,mutrans
0,8779.0,149013.7,550643.089065,-1.3070486
1,8789.0,144864.7,531876.119084,-1.3006104
2,8799.0,132978.7,489591.10592,-1.3033816
3,8809.0,125444.7,463051.104096,-1.3059724
4,8819.0,121324.7,449969.103983,-1.3107085
5,8829.0,119447.7,444386.117562,-1.3138152
6,8839.0,119100.7,440176.091039,-1.3072055
7,8849.0,117707.7,440448.106567,-1.3195882
8,8859.0,117754.7,442302.10637,-1.3233895
9,8869.0,117428.7,441944.116528,-1.3253521
10,8879.0,117383.7,442810.120466,-1.327693
11,8889.0,117185.7,443658.11566,-1.3312944

Note

There are three equivalent ways to request a format, more formally called a “media type” or a “MIME type”.

1. Use the standard [HTTP `Accept` Header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept).

```
$ curl -H 'Accept: text/csv' 'http://localhost:8000/api/node/full/example'
```

2. Place the media type in a `format` query parameter.

```
$ curl 'http://localhost:8000/api/node/full/example?format=text/csv'
```

3. Provide just a file extension. This is user friendly for people who do not know or care what
a "media type" is. The server looks up `csv` in a registry mapping file extensions to media types.

```
$ curl 'http://localhost:8000/api/node/full/example?format=csv'
```

Define an exporter

When a client requests an XDI-formatted response, the Tiled server will call our custom exporter with two arguments: the structure itself (in this case, a pandas.DataFrame) and a dictionary of metadata. The metadata is freeform as far as Tiled is concerned—its content and any internal structure is completely up to the user—so if we have special requirements about what it must contain, we need to do that validation inside our exporter. We might, for example, refuse to export (raise an error) if required fields are missing from the metadata or if the DataFrame we are given does not have the expected columns.

The exporter must return either str or bytes.

def write_xdi(df, metadata):
    output = io.StringIO()

    xdi_version = metadata.get("xdi_version")
    comments = metadata.get("comments", "")
    extra_version = metadata.get("extra_version")

    output.write(f"# XDI/{xdi_version} {extra_version}\n")

    fields = metadata["fields"]
    for namespace, namespace_dict in fields.items():
        for tag, value in namespace_dict.items():
            output.write(f"# {namespace}.{tag}: {value}\n")

    output.write("# /////////////\n")
    output.write("# generated by tiled\n")
    output.write(comments)
    output.write("# -------------\n")

    # write column labels
    columns = list(df.columns)
    output.write("# ")
    output.write(" ".join(columns))
    output.write("\n")

    # write data
    df.to_csv(output, header=False, index=False)
    return output.getvalue()

Register the exporter

Add new sections to the configuration as follows.

trees:
  - path: /
    tree: tiled.adapters.files:DirectoryAdapter.from_directory
    args:
      directory: "data"
      mimetypes_by_file_ext:
        .xdi: application/x-xdi
      readers_by_mimetype:
        application/x-xdi: tiled.examples.xdi:XDIDataFrameAdapter.from_file

media_types:
  xdi:
    application/x-xdi: tiled.examples.xdi:write_xdi
file_extensions:
  xdi: application/x-xdi

First consider the section

media_types:
  xdi:
    application/x-xdi: tiled.examples.xdi:write_xdi

The key, xdi refers to a spec so that this exporter only applies to readers which have “xdi” in their spec attribute. The key, application/x-xdi is a valid media type. In our case, there is no registered IANA Media Type for the format of interest. Therefore, the standard tells us to invent one of the form application/x-*, as in application/x-xdi. There is, of course, some risk of name collisions when we invent names outside of the official list, so be specific.

The final section

file_extensions:
  xdi: application/x-xdi

enables the usage

$ curl 'http://...?format=xdi'

by mapping "xdi" to the media type. This is optional. You can provide no file extensions for a media type. You can also provide multiple file extensions that map to the same media type. For example, both tif and tiff map to image/tiff.

The value, tiled.examples.xdi:write_xdi, is the module or package that our exporter is defined in, followed by : and then the function name.

The Python file where write_xdi is defined by must in an importable location. During configuration-parsing, Tiled temporarily adds the directory containing the config file itself to sys.path. This means that we can conveniently drop xdi.py next to config.yml and know that it will be found. For long-term deployments it’s better to place exporters in installable Python packages (i.e. with a proper setup.py, etc.) but for prototyping and development this is much more expedient.

Now if we restart the server again with this updated config.yml

tiled serve config --public config.yml

we can request the content as XDI in any of these ways:

$ curl -H 'Accept: application/x-xdi' 'http://localhost:8000/api/node/full/example.xdi'
$ curl 'http://localhost:8000/api/node/full/example?format=application/x-xdi'
$ curl 'http://localhost:8000/api/node/full/example?format=xdi'