.. currentmodule:: databroker Find Runs in a Catalog ====================== In this tutorial we will: * Look up a specific Run by some identifier. * Look up a specific Run based on recency (e.g. "Show me the data I just took"). * Search for Runs using both simple and complex search queries. Set up for Tutorial ------------------- Before you begin, install ``databroker`` and ``databroker-pack``, following the :doc:`install`. Start your favorite interactive Python environment, such as ``ipython`` or ``jupyter lab``. For this tutorial, we'll use a catalog of publicly available, openly licensed sample data. Specifically, it is high-quality transmission XAS data from all over the periodical table. This utility downloads it and makes it discoverable to Databroker. .. ipython:: python import databroker.tutorial_utils databroker.tutorial_utils.fetch_BMM_example() Access the catalog and assign it to a variable for convenience. .. ipython:: python import databroker catalog = databroker.catalog['bluesky-tutorial-BMM'] Look-up ------- In this section we will look up a Run by its * Globally unique identifier --- unmemorable, but great for scripts * Counting-number "scan ID" --- easier to remember, but not necessarily unique * Recency --- e.g. "the data I just took" If you know exactly which Run you are looking for, the surest way to get it is to look it up by its globally unique identifier, its "uid". This is the recommended way to look up runs *in scripts* but it is not especially fluid for interactive use. .. ipython:: python catalog['c07e765b-ce5c-4c75-a16e-06f66546c1d4'] The uid may be abbreviated. The first 7 or 8 characters are usually sufficient to uniquely identify an entry. .. ipython:: python catalog['c07e765'] If the abbreviated uid is ambiguous---if it matches more than one Run---a ``ValueError`` is raised listing the matches. Try ``catalog['a']``, which will match two Runs in this Catalog and raise that error. Runs typically also have a counting number identifier, dubbed ``scan_id``. This is easier to remember. Keep in mind that ``scan_id`` *is not neccesarily unique*, and Databroker will always give you the most recent match. Some users are in the habit of resetting ``scan_id`` to 1 at the beginning of a new experiment or operating cycle. This is why lookup based on the globally unique identifier is safest for scripts and Jupyter notebooks, especially long-lived ones. .. ipython:: python catalog[23463] Finally, it is often convenient to access data by recency, as in "the data that I just took". .. ipython:: python catalog[-1] This syntax is meant to feel similar to accessing elements in a list or array in Python, where ``a[-N]`` means "``N`` elements from the end of ``a``". In summary: ================== =============================================== ``catalog["..."]`` Globally unique identifier ("uid") ``catalog[N]`` Counting number "scan ID" N (most recent match) ``catalog[-N]`` Nth most recent Run in the Catalog ================== =============================================== All of these always return *one* ``BlueskyRun`` or raise an exception. Search ------ Common search queries can be done with a high-level Python interface. .. ipython:: python :okwarning: from databroker.queries import TimeRange results = catalog.search(TimeRange(since="2020-03-05")) The result of a search is just another Catalog. It has a subset of the original Catalog's entries. We can compare the number of search results to the total number of entries in ``catalog``. .. ipython:: python print(f"Results: {len(results)} Total: {len(catalog)}") We can iterate through the results for batch processing .. ipython:: python for uid, run in results.items(): # Do something. ... or access a particular result by using any of the lookup methods in the section above, such as recency. This is a convenient way to quickly look at one search result. .. ipython:: python results[-1] Because ``results`` is just another Catalog, we can search on the search results to progressively narrow our results. .. ipython:: python narrowed_results = results.search({"num_points": {"$gt": 400}}) # Read on... print(f"Narrowed Results: {len(narrowed_results)} Results: {len(results)} Total: {len(catalog)}") Custom queries can be done with the `MongoDB query language`_. The simplest examples check for equality of a key and value, as in .. ipython:: python results = catalog.search({"XDI.Element.symbol": "Mn"}) len(results) The above matches Runs where the 'start' document looks like:: { ... "XDI": {"Element": {"symbol": "Mn"}}, ... } The allowed keys are totally open-ended as far as Databroker is concerned. This example is particular to the metadata recorded by the instrument that it came from. What's useful in your case will depend on what metadata was provided when the data was captured. Look at a couple Runs' start documents to get a sense of the metadata that would be useful in searches. .. code:: python run = catalog[-1] run.metadata["start"] Again, the syntax of a query is that of the `MongoDB query language`_. It's an expressive language for specifying searches over heterogeneous metadata. .. note:: When the data is stored by some means other than MongoDB, databroker uses Python libraries that support most of MongoDB's query language without actual MongoDB. Here is an example of a more sophisticated query, doing more than just checking for equality. .. ipython:: python query = { "XDI.Scan.edge_energy": {"$lte": 6539.0}, # less than or equal to "XDI.Element.symbol": "Mn", } results = catalog.search(query) len(results) See the MongoDB documentation linked above to learn other expressions like ``$lte``. .. _MongoDB query language: https://docs.mongodb.com/manual/reference/operator/query/