--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 jupytext_version: 1.17.1 kernelspec: display_name: Python 3 (ipykernel) language: python name: python3 --- # 10 minutes to Tiled This is a short, tutorial-style introduction to Tiled, for new users. ## Connect To begin, we will use a public demo instance of Tiled. If you are reading this tutorial without an Internet connection, see the section below on [running your own Tiled server](#run-a-tiled-server) on your laptop. This tutorial focuses on accessing Tiled from Python. But you can also interact with Tiled from your web browser by navigating to [https://tiled-demo.nsls2.bnl.gov](https://tiled-demo.nsls2.bnl.gov) where you'll find a web-based user interface and more. ```{code-cell} ipython3 from tiled.client import from_uri c = from_uri("https://tiled-demo.nsls2.bnl.gov") ``` ```{note} At this point, some Tiled servers might prompt you to **log in** with a username and password. But the demo we are using here is configured to allow **public**, anonymous access. ``` ## Navigate Tiled holds its data in a directory-like "container". Here we see the names of the entries it contains. ```{code-cell} ipython3 c ``` Let's look inside "examples"—another container. ```{code-cell} ipython3 c['examples'] ``` When the container is large, we see the first several entries. ```{code-cell} ipython3 c['examples/xraydb'] ``` ````{tip} These are equivalent, but the first two run faster. ```python # fast c['examples/xraydb'] c['examples', 'xraydb'] # slower because Python cannot "see ahead" c['examples']['xraydb'] ``` ```` ## Search on metadata Every entry in Tiled has metadata, which we can access via the `metadata` attribute. Tiled does not enforce any constraints on this by default: the metadata may be any key–value pairs. In technical terms, it accepts arbitrary JSON. ```{code-cell} ipython3 c['examples/xraydb'].metadata ``` Let's take a peek at the first entry to get a sense of what we might search for in this container. ```{code-cell} ipython3 x = c['examples/xraydb'] x.values().first().metadata ``` Tiled supports many types of queries; in the following simple example, we'll search the Tiled catalog for nodes whose metadata contains a given key–value pair. ```{code-cell} ipython3 from tiled.queries import Key x.search(Key('element.category') == 'nonmetal') ``` Queries can be chained to progressively narrow results. For numerical parameters, querying over a given range could also be useful. ```{code-cell} ipython3 x.search( Key('element.category') == 'nonmetal' ).search( Key('element.atomic_number') < 16 ) ``` What other values does `element.category` take? We could answer that question by downloading all the entries and tabulating them in Python, but it's faster to ask Tiled to do this and just send the answer. ```{code-cell} ipython3 x.distinct('element.category', counts=True)['metadata'] ``` We can stash the results in a variable and access them in various ways. ```{code-cell} ipython3 results = x.search(Key('element.category') == 'noble_gas') print(f"Noble gases in this data set: {list(results)}") ``` We can efficiently access only the first result without downloading the metadata for _all_ the results. ```{code-cell} ipython3 first_result = results.values().first() first_result.metadata ``` ````{tip} Try these: ```python results.keys().first() results.keys().last() results.keys()[2] results.keys()[:3] results.values().first() results.values().last() results.values()[2] results.values()[:3] for key, value in results.items(): print(f"~~ {key} ~~") print(value.metadata) ``` ```` ## Access as Scientific Python data structures Tiled can download data directly into scientific Python data structures, such as **numpy**, **pandas**, and **xarray**. _This is how we encourage Python users to use Tiled for analysis._ It has several advantages: - No need to name or organize files. - No need to write a copy to your local disk and then read it into your program. Instead, load the data straight from the network into your data analysis. (Disks are often the slowest things we deal with in computing.) ```{code-cell} ipython3 c['examples/xraydb/C/edges'] ``` ```{code-cell} ipython3 c['examples/xraydb/C/edges'].read() ``` ```{code-cell} ipython3 c['examples/images/binary_blobs'] ``` ```{code-cell} ipython3 arr = c['examples/images/binary_blobs'].read() arr ``` ```{code-cell} ipython3 %matplotlib inline import matplotlib.pyplot as plt plt.imshow(arr) ``` We'll see shortly that you can also fetch just a slice of a dataset without downloading the whole thing. ## Export to a preferred format In this section, we tell Tiled how we want the data, and it sends it to us in that format. This works: - No matter what format the data is stored in - Even if that data isn't even stored in a file at all (e.g., in a database or an S3-like blob store) Let's download the table of edges for carbon from the xraydb data. ```{code-cell} ipython3 # Download as Excel spreadsheet c['examples/xraydb/C/edges'].export('my_table.xlsx') # Or, download as CSV file c['examples/xraydb/C/edges'].export('my_table.csv') ``` We can open the files here or in any other program. They are now just files on our local disk. ```{code-cell} ipython3 !cat my_table.csv ``` Let's download an image dataset as a PNG file. ```{code-cell} ipython3 c['examples/images/binary_blobs'].export('my_image.png') ``` Again, we can open the file here or in any other program. ```{code-cell} ipython3 :tags: [hide-input] from IPython.display import Image Image(filename='my_image.png') ``` Tiled tries to recognize the file format you want from the file extension, as in `my_file.png` above. It can be also be specified explicitly using: ```{code-cell} ipython3 c['examples/images/binary_blobs'].export('my_image.png', format='image/png') ``` We can review the file formats. ```{code-cell} ipython3 c['examples/images/binary_blobs'].formats ``` Different data structures support different formats: arrays fit into different formats than tables do. ```{code-cell} ipython3 c['examples/xraydb/C/edges'].formats ``` ```{tip} Tiled ships with support for a set of commonly-used formats, and server admins can [add custom ones](#custom-export-formats) to meet their users' particular requirements. ``` ## Slice remotely A major advantage of Tiled over traditional file transfer is the ability to download just the slice of a dataset you need, without downloading the entire thing. This is handy for both sophisticated applications and simple tasks like previewing whether a dataset is "the good one" before waiting for a full download. Think of Google Maps, fetching the data of interest on demand. Standard numpy slicing syntax works, fetching only the data you request. ```{code-cell} ipython3 # Top-right corner arr = c['examples/images/binary_blobs'][:50,-50:] plt.imshow(arr) ``` This works for exporting data to a file as well. ```{code-cell} ipython3 import numpy as np c['examples/images/binary_blobs'].export( 'top_right_corner.png', slice=np.s_[:50,-50:], ) ``` Tabular data is different than array data, so it slices differently. For tabular data, we can select the columns of interest. ```{code-cell} ipython3 c['examples/xraydb/C/edges'] ``` ```{code-cell} ipython3 c['examples/xraydb/C/edges'].read(['edge', 'energy_eV']) ``` And this again works for exporting to a file. ```{code-cell} ipython3 c['examples/xraydb/C/edges'].export('my_table.csv', columns=['edge', 'energy_eV']) ``` ## Locate data sources (e.g., files) Once you have identified data sets of interest in the Tiled catalog, it's easy to determine the physical location of the underlying data. You can then access them by any convenient means and download the entire original files, instead of using the export feature, if desired: - Direct filesystem access - File transfer via SFTP, Globus, etc. - File transfer via Tiled Here we'll see the file that backs the table of Carbon edges in our xraydb dataset. ```{code-cell} ipython3 from tiled.client.utils import get_asset_filepaths get_asset_filepaths(c['examples/xraydb/C/edges']) ``` Tiled knows a whole lot more than just the file path. The snippet below includes the format (`mimetype`) of the data, its `structure`, and other machine-readable information that is necessary for applications to navigate the file and load the data. ```{code-cell} ipython3 ds, = c['examples/xraydb/C/edges'].data_sources() ds ``` Now, the data may not be stored in a file at all. Tiled understands data stored in databases or S3-like blob stores as well, and these are becoming more common as data scales and moves into cloud environments. The data location is always given as a URL. That URL begins with `file://` if it's a plain old file or something else if it is not. ```{code-cell} ipython3 ds.assets[0].data_uri ``` ## Download raw files Sometimes it is best to just download the files exactly as they were. This may be the most convenient thing, or it may be necessary to comply with transparency requirements that mandate providing a byte-for-byte copy of the raw data. As shown above, Tiled can provide the filepaths, and you can fetch the files by any available means. Tiled can also download the files directly. It does this efficiently by launching parallel downloads. ```{code-cell} ipython3 c['examples/xraydb/C/edges'].raw_export('downloads/') ``` (run-a-tiled-server)= ## Run a Tiled server Up to this point, we've been reading from Tiled's public demo instance. To demonstrate writing data, we'll need our own server because the public demo doesn't allow us to write. (If you already have access to an institutional Tiled server that grants you write access, feel free to use that!) The simplest way to get started is to launch a local server with embedded storage and basic security: ```{code-cell} ipython3 from tiled.client import simple c = simple() ``` The server starts in the background (on a thread). You will see a URL printed when it starts. Your URL will differ: each launch generates a unique secret `api_key`. You can paste this URL into a browser to open Tiled's web interface. ```{tip} Just `simple()` uses temporary storage, which is convenient for experimentation. For persistent storage, pass a directory like `simple('data/')`. This embedded setup is convenient for personal use and small experiments but isn't designed for production or multi-user deployments. For robust, scalable options, see the user guide. ``` ## Upload data We now have an empty Tiled server that we can _write_ into. ```{code-cell} ipython3 ac = c.write_array([1, 2, 3]) ac.read() ``` We can optionally include metadata and/or give it a name, a `key`. (By default it gets a long random one.) ```{code-cell} ipython3 ac = c.write_array( [1, 2, 3], metadata={'color': 'blue'}, key='hello' ) ac.metadata ``` We can find it via search. ```{code-cell} ipython3 c.search(Key('color') == 'blue') ``` Similarly, we can upload tabular data. ```{code-cell} ipython3 tc = c.write_table({'a': [1, 2, 3], 'b': [4, 5, 6]}) tc.read() ``` We can organize items into nested containers. ```{code-cell} ipython3 c.create_container('x') c['x'].write_array([1,2,3], key='a') c['x'].write_array([4,5,6], key='b') c['x'] ``` ## Stream So far we've been pulling data from Tiled on demand. Streaming flips this around: Tiled pushes data to us as soon as it is written, which is useful when monitoring a live experiment or an instrument. The example below sets up callbacks that fire when a new entry is created in the Tiled catalog and when new data arrives. ```{code-cell} ipython3 # Collect references to active subscriptions. Otherwise, Python may # garbage collect them, and they will never run. subs = [] def on_child_created(update): "Called when a new entry is created in the container" print(f"New item named {update.key}") child_sub = update.child().subscribe() child_sub.new_data.add_callback(on_new_data) child_sub.start_in_thread(start=0) subs.append(child_sub) # Keep a reference. def on_new_data(update): "Called when new data is uploaded or registered for this entry" print(f"New data: {update.data()}") sub = c.subscribe() sub.child_created.add_callback(on_child_created) sub.start_in_thread() ``` ```{code-cell} ipython3 import numpy as np ac = c.write_array(np.array([1, 2, 3])) # Extend the array. ac.patch(np.array([4, 5, 6]), offset=3, extend=True) ac.patch(np.array([7, 8, 9]), offset=6, extend=True) # Wait for subscriptions to process. import time; time.sleep(1) ``` ```{tip} Under the hood, the pull-based methods (`read`, `search`, etc.) use HTTP REST requests, while subscriptions use a WebSocket connection. This is why subscriptions can receive data as it arrives rather than polling the server repeatedly. Uploaded data is streamed via the WebSocket connection before it is even saved to disk, which minimizes latency. ``` ## Register data Detectors or analysis programs often write files directly to disk. Tiled can make those files accessible without any re-uploading or reformatting. For security reasons, the server administrator must designate which directories data can be registered from, like so. ```{code-cell} ipython3 c = simple(readable_storage=['external_data']) ``` We'll make some example files to be registered with Tiled. ```{code-cell} ipython3 from pathlib import Path import numpy as np import pandas as pd import tifffile # Create directory. dir = Path('external_data') dir.mkdir(exist_ok=True) # Write an image stack of TIFFs. for i in range(10): tifffile.imwrite(f'external_data/image_stack{i:03}.tiff', np.random.random((5, 5))) # Write a table as a CSV. df1 = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) df1.to_csv('external_data/table.csv') print('Contents of external_data directory:') print('\n'.join(sorted(p.name for p in dir.iterdir()))) ``` We'll register them. ```{code-cell} ipython3 from tiled.client.register import register await register(c, 'external_data') ``` ```{tip} Note that the `register` function is asynchronous. In Jupyter or IPython, we must use `await register(...)`. In a Python script, call `asyncio.run(register(...))`. A commandline interface is also available. See `tiled register --help` for more. ``` Tiled correctly consolidated the TIFFs into a single logical entry in the container. ```{code-cell} ipython3 from tiled.client import tree tree(c) ``` The file formats are considered a low-level detail. The file extensions are intentionally stripped off the names (though this is configurable). The user does not need to know the format to read the data! ```{code-cell} ipython3 c['table'].read() ``` ```{code-cell} ipython3 print(c['image_stack']) plt.imshow(c['image_stack'][3]) ``` However, the storage details are available if wanted, via the `data_sources()` method. ```{code-cell} ipython3 print(c['image_stack'].data_sources()[0].mimetype) print(c['table'].data_sources()[0].mimetype) ``` This concludes the whirlwind tour of Tiled's core features using its Python client.