Skip to content

Setting up your own catalog services

Rich Signell edited this page Oct 14, 2015 · 8 revisions

Catalog-driven workflows are awesome.
For proof, check out this video presentation (start at the 12:30 mark).

So you have a bunch of data accessible via OPeNDAP in THREDDS catalogs and now you want to be able to query them, right? Okay! let's do this

  • Identify the THREDDS catalogs you want to harvest data from.
  • If your catalogs are all from THREDDS Data Servers with ISO metadata services enabled, you can set up a scripts like this: https://github.com/USGS-CMG/usgs-cmg-portal/blob/master/catalog_harvest/get_ncml_daily_iso.py that crawl a specified list of catalogs, apply filters to the datasets, and save the resulting ISO metadata records as XML files in a local folder. We have this folder exposed to the outside world, making it a Web Accessible Folder (WAF), just in case someone else wants to read our metadata records directly. If you need to access THREDDS catalogs that do not have the ISO services enabled, you can run ncISO stand-alone to generate the ISO metadata records.
  • Install pycsw (super easy).
  • Set up a script to have pycsw ingest the ISO metadata records in your metadata folder. We use a script like:
#!/bin/bash
WAF_TOPDIR='/usgs/data0/iso/iso_records'
PYCSW_CONFIG='/opt/pycsw/default.cfg'

# load all records into DB
/opt/pycsw/bin/pycsw-admin.py -c load_records -f $PYCSW_CONFIG -p $WAF_TOPDIR -r
  • Query your pycsw endpoint in python using OWSLib. See this nice CSW blog post by @ocefpaf