Caching Design and Roadmap¶
For practical guides on client-side and service-side caching, see Keep a Local Copy and Tune Caches to Balance Speed and Memory Usage.
Note
This page discusses both current features and planned features. Italicized remarks in the discussion below makes clear what exists now and what is on the road map.
Overview¶
Caching can make Tiled faster. Because, in general, caches make programs more complex and harder to trace, Tiled was designed without any caching at first. Caches were added with clear separation from the rest of Tiled and an easy opt-out path.
There are three types of centrally-managed cache in Tiled:
Client-side response cache. The Tiled Python client implements a standard web cache, similar in both concept and implementation to a web browser’s cache. It enables an offline “airplane mode”. If a server is available, it enables the client to inexpensively check whether the version it has is the latest one.
Service-side response cache. This is not yet implemented, but planned soon. This is also a standard web cache on the server side. It stores the content of the most frequent responses. This covers use cases such as, “Several users are asking for the exact same chunks of data in the exact same format.”
Service-side object cache. The response caches operate near the outer edges of the application, stashing and retrieve HTTP response bytes. The object cache is more deeply integrated into the application: it is available for authors of Adapters to use for stashing any objects that may be useful in expediting future work. These objects may serializable, such as chunks of array data, or unserializable, such as file handles. Requests that ask for overlapping but distinct slices of data or requests that ask for the same data but in varied formats will not benefit from the response cache; they will “miss”. The object cache, however, can slice and encode its cached resources differently for different requests. The object cache will not provide quite the same speed boost as a response cache, but it has a broader impact.
Where is the cache content stored?¶
Caches can be private, stored in the memory space of the worker process, or shared by multiple workers in a horizontally-scaled deployment via an external service such as Redis.
The Tiled Python client currently supports a private, transient cache in memory and a shared, persistent cache backed by files on disk. (The disk cache uses file-based locking to ensure consistency.) The caching mechanism is pluggable: other storage mechanisms can be injected without changes to Tiled itself.
On the service side, only the object cache is currently implemented, and it currently supports storage in worker memory only. Workers cannot currently access resources cached by other workers. In the future, Tiled will support (optionally) configuring the service-side response and object caches to sync with a shared Redis cache. Response data, being bytes, is straightforward to stored in a shared cache. But only a subset of the items in the object cache—those with known types and secure serialization schemes—will be eligible for the shared cache. For example, Tiled cannot place a file handle in Redis, and Tiled will not place unsigned pickled data in Redis (for security reasons).
Connection to Dask¶
Dask provides an opt-in, experimental opportunistic caching mechanism. It caches at the granularity of “tasks”, such as chunks of array or partitions of dataframes.
Tiled’s object cache is generic—not exclusive to dask code paths—but it plugs into dask in a similar way to make it easy for any Adapters that happen to use dask to leverage Tiled’s object cache very simply, like this:
from tiled.server.object_cache import get_object_cache
with get_object_cache().dask_context:
# Any tasks that happen to already be cached will be looked up
# instead of computed here. Anything that _is_ computed here may
# be cached, depending on its bytesize and its cost (how long it took to
# compute).
dask_object.compute()
Items can be proactively cleared from the cache like so:
from tiled.server.object_cache import get_object_cache, NO_CACHE
cache = get_object_cache()
if cache is not NO_CACHE:
cache.discard_dask(dask_object.__dask_keys__())
What other kinds of caching happen in Tiled?¶
The file-based directory-walking tree uses LRU caches, fixed at 10k items per subdirectory, to stash Adapter instances on first access. It discards them if the underlying file is removed or modified.