Requirements¶
This document is slightly out of date, as the requirements have evolved in tandem with the exploratory implementation.
HTTP API that supports JSON and msgpack requests, with JSON and msgpack responses, as well as binary blob responses for chunked data
Be usable from any HTTP client and from languages other that Python. Avoid baking any Python-isms deeply into it. No pickle, no msgpack-python. (Those can be options, for the purpose of communicating with a Python client, but they should be privileged or default.)
Efficiently list and access entries, with pagination and random access.
Efficiently search entries using an extensible text of queries including broad ones like “full text search” and more application- and dataset-specific queries.
Access metadata cheaply.
Serve data from numpy arrays, DataFrames, and xarray structures in various formats in a tile-based (chunked, partitioned) fashion.
A Python client with rich proxy objects that do chunk-based access transparently (like Intake’s
RemoteXarray
and similar). But, differently from current Intake and Databroker, do not switch dask-vs-not-dask or dask-vs-another-delayed-framework at call time. Use a consistent delayed framework (or none at all) consistently within a given context. Your only option at call time should beread()
. Whether that is in memory, dask, or something else should be set higher up—for example, on the client instance.Usable performance without any intrinsic caching in the server. Objects may do some internal caching for optimization, but the server will not explicitly hang on to any state between requests.
Path toward adding state / caching in external systems (e.g. Redis, nginx)