How to create a new Catalog backed by MongoDB

I want to set up a performant Catalog that scales to large numbers of Runs and supports the full search capability of Databroker.

  1. Install the MongoDB Community Edition. We recommend the latest stable version. Any version 3.x or later should be fine. Alternatively, you can run MongoDB in the MongoDB Docker container maintained by Docker. See Container Advice below if you go this route.

  2. Find where Databroker looks for Catalog configuration files on your system. It varies by OS and environment because Databroker does its best to be a polite guest and place configuration files where the local conventions dictate. Run this snippet to find the list of paths where it looks on your system.

    python3 -c "import databroker; print(databroker.catalog_search_path())"
    
  3. Compose a configuration file like this. The filename of the configuration file is unimportant, but using CATALOG_NAME.yml is conventional. The file should be placed in any one of the directories listed by the previous step.

    sources:
      CATALOG_NAME:
        driver: bluesky-mongo-normalized-catalog
        args:
          metadatastore_db: mongodb://HOST:PORT/DATABASE_NAME
          asset_registry_db: mongodb://HOST:PORT/DATABASE_NAME
    

    where CATALOG_NAME is a name of the entry that will appear in databroker.catalog. The two database URIs, metadatastore_db and asset_registry_db, are distinct only for historical reasons. For new deployments, we recommend that you set them to the same value—i.e. that you use one database shared by both.

    If you are using Databroker on the same system where you are running MongoDB, then the URI would be mongodb://localhost:27017/DATABASE_NAME where DATABASE_NAME is fully up to you.

  4. Now CATALOG_NAME should appear in

    import databroker
    
    # List catalog names.
    list(databroker.catalog)
    

    If it does not appear, call databroker.catalog.force_reload() and retry. The catalog may be accessed like

    catalog = databroker.catalog[CATALOG_NAME]
    

    using the CATALOG_NAME in the text of the configuration file. (Again, the filename of the configuration file is not relevant.)

See How to store data from the Run Engine or How to store analysis results to put some actual data in there, and see the tutorials for how to get it back out.

Security

Databroker was designed with access controls per Run in mind, and this is now being actively developed, but currently only all-or-nothing access is supported: Users can access all the Runs in the MongoDB or none of them.

  1. Enable authentication on MongoDB. Following those instructions, create a user with read and write access to your database and set a secure password.

  2. Edit your configuration file as to add a template for username and password in the URI as follows. Notice the addition of the query parameter authSource=admin as well.

    metadatastore_db: mongodb://{{ env(DATABROKER_MONGO_USER) }}:{{ env(DATABROKER_MONGO_PASSWORD) }}@HOST:PORT/DATABASE_NAME?authSource=admin
    asset_registry_db: mongodb://{{ env(DATABROKER_MONGO_USER) }}:{{ env(DATABROKER_MONGO_PASSWORD) }}@HOST:PORT/DATABASE_NAME?authSource=admin
    

    Refer to PyMongo authentication documentation for context.

  3. Set these environment variables to provide access to the database.

    export DATABROKER_MONGO_USER='...'
    export DATABROKER_MONGO_PASSWORD='...'
    

Container Advice

If you choose to run MongoDB in a Docker container:

  • Be sure to mount persistent storage from the host machine into the volumes MongoDB stores it data. When the container stops, you presumably still want your data!

  • See this resource for information on enabling authentication.