02. Data Discovery with CMR-STAC API

Timing

  • Exercise: 30 min

Summary

In this example we will access the NASA’s Harmonized Landsat Sentinel-2 (HLS) version 2 assets, which are archived in cloud optimized geoTIFF (COG) format in the LP DAAC Cumulus cloud space. The COGs can be used like any other geoTIFF file, but have some added features that make them more efficient within the cloud data access paradigm. These features include: overviews and internal tiling. Below we will demonstrate how to leverage these features.

But first, what is STAC?

SpatioTemporal Asset Catalog (STAC) is a specification that provides a common language for interpreting geospatial information in order to standardize indexing and discovering data.

The STAC specification is made up of a collection of related, yet independent specifications that when used together provide search and discovery capabilities for remote assets.

Four STAC Specifications

STAC Catalog (aka DAAC Archive)
STAC Collection (aka Data Product)
STAC Item (aka Granule)
STAC API

In the following sections, we will explore each of STAC element using NASA’s Common Metadata Repository (CMR) STAC application programming interface (API), or CMR-STAC API for short.

CMR-STAC API

The CMR-STAC API is NASA’s implementation of the STAC API specification for all NASA data holdings within EOSDIS. The current implementation does not allow for querries accross the entire NASA catalog. Users must execute searches within provider catalogs (e.g., LPCLOUD) to find the STAC Items they are searching for. All the providers can be found at the CMR-STAC endpoint here: https://cmr.earthdata.nasa.gov/stac/.

In this exercise, we will query the LPCLOUD provider to identify STAC Items from the Harmonized Landsat Sentinel-2 (HLS) collection that fall within our region of interest (ROI) and within our specified time range.

What you will learn from this tutorial

  • how to connect to NASA CMR-STAC API using Python’s pystac-client
  • how to navigate CMR-STAC records
  • how to read in a geojson file using geopandas to specify your region of interest
  • how to use the CMR-STAC API to search for data
  • how to perform post-search filtering of CMR-STAC API search result in Python
  • how to extract and save data access URLs for geospatial assets

This exercise can be found in the 2021 Cloud Hackathon Book


Import Required Packages

from pystac_client import Client  
from collections import defaultdict    
import json
import geopandas
import geoviews as gv
from cartopy import crs
gv.extension('bokeh', 'matplotlib')