# Control Speed, Memory Usage, and Disk Usage

*There are no solutions, only trade-offs.*

We will build Trees with a variety of trade-offs in speed and resource
usage. Any of these may be a good solution depending the specifics of your
situation. They are presented in order of increasing complexity.

## Everything in memory

```python
import numpy
from tiled.readers.array ArrayAdapter
from tiled.trees.in_memory import Tree

# Generate data and store it in memory.
a = numpy.random.random((100, 100))
b = numpy.random.random((100, 100))
tree = Tree(
    {
        "a": ArrayAdapter.from_array(a),
        "b": ArrayAdapter.from_array(b),
    }
)
```
* Server startup is **slow** because all data is generated or read up front.
* Data access is **fast** because all the data is ready in memory.
* The machine running the server must have sufficient RAM for all the entries in the Tree.
  This **is not** a scalable solution for large Trees with large data.

## Load on first read

For the next example we'll use a neat dictionary-like object. It is created
by mapping keys to *functions*. The first time a given item is accessed, the
function is called to generate the value, and the result is stashed
internally for next time.

```python
>>> from tiled.utils import OneShotCachedMap
>>> m = OneShotCachedMap({"a": lambda: 1, "b": lambda: 2})
>>> m["a"]  # value is computed on demand, so this could slow
1
>>> m["a"]  # value is returned immediately, remembered from last time
1
>>> dict(m)  # can be converted to an ordinary dict
{"a": 1, "b": 2}
```

Notice that one the object is created, it acts just like a normal Python
mapping. Downstream code does not have to do anything special to work with
it.

It can be integrated with ``Tree`` directly, just replacing the dictionary
in the first example with a ``OneShotCachedMap``.

```python
import numpy
from tiled.utils import OneShotCachedMap
from tiled.readers.array ArrayAdapter
from tiled.trees.in_memory import Tree

# Use OneShotCachedMap which maps keys to *functions* that are
# run when the data is fist accessed.
tree = Tree(
    OneShotCachedMap(
        {
            "a": lambda: ArrayAdapter.from_array(numpy.random.random((100, 100))),
            "b": lambda: ArrayAdapter.from_array(numpy.random.random((100, 100))),
        }
    )
)
```

* Server startup is **fast** because nothing is generated or read up front.
* The first access for each item is **slow** because the data is generated or
  read on demand. Subsequent access is **fast**.
* The machine running the server must have sufficient RAM for all the entries
  in the Tree. The memory usage will grow monotonically as items are
  accessed and stashed internally by ``OneShotCachedMap``. This **is not** a
  scalable solution for large Trees with large data.

## Load on first read and stash for awhile (but not forever)

We'll use another neat dictionary-like object very much like the previous one.
These two objects behave identically...

```python
from tiled.utils import CachingMap, OneShotCachedMap

OneShotCachedMap({"a": lambda: 1, "b": lambda: 2})
CachingMap({"a": lambda: 1, "b": lambda: 2}, cache={})
```

...except that ``CachingMap`` can regenerate values if need be and allows us
to control which values it stashes and for how long. For example, using the
third-party library ``cachetools`` we can keep up to N items, discarding the least
recently used item when the cache is full.

```
pip install cachetools
```

```python
from cachetools import LRUCache

CachingMap({"a": lambda: 1, "b": lambda: 2}, cache=LRUCache(1))
```

As another example, we can keep items for up to some time limit. This
enforces a maximum size as well, so that the cache cannot grow too large if
it is receiving many requests within the specified time window.

```python
from cachetools import LRUCache

# "TTL" stands for "time to live", measured in seconds.
CachingMap({"a": lambda: 1, "b": lambda: 2}, cache=TTLCache(maxsize=100, ttl=10))
```

If we keep a reference to our cache before passing it in, then other parts
of the program can explicitly evict items to force an update.

```python
cache = TTLCache(maxsize=100, ttl=10)
CachingMap({"a": lambda: 1, "b": lambda: 2}, cache=cache)

# Meanwhile...elsewhere in the code....
cache.pop("a")  # Force a refresh of this item next time it is accessed from CachingMap.
```

``CachingMap`` integrates with ``Tree`` exactly the same as the others.

```python
from cachetools import TTLCache
import numpy
from tiled.utils import CachingMap
from tiled.readers.array ArrayAdapter
from tiled.trees.in_memory import Tree

# Use CachingMap which again maps keys to *functions* that are
# run when the data is fist accessed. The values may be cached
# for a time.
tree = Tree(
    CachingMap(
        {
            "a": lambda: ArrayAdapter.from_array(numpy.random.random((100, 100))),
            "b": lambda: ArrayAdapter.from_array(numpy.random.random((100, 100))),
        },
        cache=TTLCache(maxsize=100, ttl=10),
    ),
)
```

* Server startup is **fast** because nothing is generated or read up front.
* The first access for each item is **slow** because the data is generated or
  read on demand. Later access may be **fast or slow** depending on whether
  the item has been evicted from the cache.
* With an appropriately-scaled cache, this **is** scalable for large Trees
  with large data.

## Proxy data from a network service and keep an on-disk cache

TO DO: Add fsspec example --- How to put an upper bound on the disk cache?

## Load keys dynamically as well as values

In all the previous examples, the *keys* of the Tree were held in memory,
in Python, and the *values* were sometimes generated on demand. For Tree
at the scale of thousands of entries, backed by a database or a web service,
we'll need to generate the keys on demand too. At that point, we escalate to
writing a custom class satisfying the Tree specification, which fetches
keys and values on demand by making queries specific to the underlying
storage.

TO DO: Simplify Bluesky MongoDB example into MVP and add it here.