Getting Started with NASA’s CMR-STAC API in R


This tutorial demonstrates how to interact with CMR-STAC in R.

This tutorial will teach you how to navigate and explore NASA’s Common Metadata Repository (CMR) SpatioTemporal Asset Catalog (STAC) to learn about the datasets available through LP DAAC Cumulus cloud archive.


Topics Covered in this Tutorial

  1. Introduction to STAC and the CMR-STAC API
    1a. What is STAC?
    1b. What is the CMR-STAC API?
  2. Get started with CMR-STAC
    2a. CMR-STAC API
    2b. STAC Catalog
    2c. STAC Collection
    2d. STAC Item
    2e. Assets
  3. CMR-STAC Search
    3a. Define Search Parameters
    3b. Search for Items

Prerequisites:

  • R and RStudio are required to execute this tutorial. Installation details can be found here.

  • This tutorial has been tested on Windows using R Version 4.1.0 and RStudio version 1.4.1717.


Procedures:

Getting Started:

  • Clone or download HLS_Tutorial_R Repository from the LP DAAC Data User Resources Repository.

  • When you open this Rmarkdown notebook in RStudio, you can click the little green “Play” button in each grey code chunk to execute the code. The result can be printed either in the R Console or inline in the RMarkdown notebook, depending on your RStudio preferences.

Environment Setup:

1. Check the version of R by typing version into the console and RStudio by typing RStudio.Version() into the console and update them if needed.

  • Windows

    • Install and load installr:

      • install.packages("installr");library(installr)
    • Copy/Update the existing packages to the new R installation:

      • updateR()
    • Open RStudio, go to Help > Check for Updates to install newer version of RStudio (if available).

  • Mac

    • Go to https://cloud.r-project.org/bin/macosx/.
    • Download the latest release (R-4.0.1.pkg) and finish the installation.
    • Open RStudio, go to Help > Check for Updates to install newer version of RStudio (if available).
    • To update packages, go to Tools > Check for Package Updates. If updates are available, select All, and click Install Updates.

2. Required packages

  • Required packages:

    • httr
    • jsonlite
    • purrr
    • DT
    • dplyr
    • magrittr
    • xml2

Run the cell below to identify any missing packages to install, and then load all of the required packages.

packages <- c('httr','purrr','jsonlite','DT','magrittr', 'xml2', 'dplyr')
new.packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos='http://cran.rstudio.com/') else print('All required packages are installed.')
[1] "All required packages are installed."
invisible(lapply(packages, library, character.only = TRUE))

1. Introduction to STAC and the CMR-STAC API

1a. What is STAC?

STAC is short for Spatiotemporal Asset Catalog, a series of specifications that provide a common language for interpreting geospatial information in order to standardize indexing and discovery of spatiotemporal assets (files containing information about the Earth across space and time).

There are four specifications that work both independently and together:

  1. STAC Catalog
  2. STAC Collection
  3. STAC Item
  4. STAC API specification builds on top of the three core specifications mentioned above. All these specifications are intended to be used together, yet are designed in a way that each piece is small, self-contained, and reusable in other contexts.

1b. What is the CMR-STAC API?

The Common Metadata Repository (CMR) is a metadata system that catalogs Earth Science data and associated metadata records. NASA’s CMR-STAC Application Programming Interface (API) is a translation API for STAC users who want to access and search through CMR’s vast metadata holdings using STAC keywords.


2. Get started with CMR-STAC

2a. CMR-STAC API

The CMR-STAC API contains endpoints that enable the querying of STAC items.

Assign the CMR-STAC URL to a static variable.

CMR_STAC_URL <- 'https://cmr.earthdata.nasa.gov/stac/'

Connect to the CMR-STAC landing page which contains all the available data providers and their STAC endpoint. In this tutorial, the httr package is used to navigate CMR-STAC API.

cmr_cat <- httr::GET(CMR_STAC_URL) %>%          # Request and retrieve the info from CMR-STAC URL
  httr::content()         
cat('You are using',cmr_cat$title,'version',cmr_cat$stac_version,".", cmr_cat$description,sep=" ")
You are using NASA CMR STAC Proxy version 1.0.0 . This is the landing page for CMR-STAC. Each provider link contains a STAC endpoint.

Here, jsonlite is used to change the format of the content returned from our request and the DT package is used to make the returned information more readable. The providers’ names and URL links are found in the title and ‘href’ fields respectively.

cmr_cat_links <- cmr_cat$links %>% 
  jsonlite::toJSON(auto_unbox = TRUE) %>% 
  jsonlite::fromJSON() %>% 
  as.data.frame()
DT::datatable(cmr_cat_links)

The data frame above shows all the data providers with their associated STAC catalog endpoints. You will notice above that the CMR-STAC API contains many different endpoints–not just from NASA LP DAAC, but also contains endpoints for other NASA ESDIS DAACs. Use the title field to identify the data provider you are interested in. The data product used in this tutorial is hosted in the LP DAAC Cumulus Cloud space (LPCLOUD).

Assign LPCLOUD to the provider variable and get this provider’s endpoint from the CMR catalog using the URL in Link field.

provider <- 'LPCLOUD'
lpcloud_cat_link <- cmr_cat_links[which(cmr_cat_links$title == provider), 'href']
lpcloud_cat_link
[1] "https://cmr.earthdata.nasa.gov/stac/LPCLOUD"

2b. STAC Catalog

STAC Catalog Contains a JSON file of links that organize all the available collections. Below, connect to the LPCLOUD STAC Catalog endpoint using httr package and print the information contained in the Catalog.

lpcloud_cat <- httr::GET(lpcloud_cat_link) %>% 
  httr::content()

lpcloud_cat <- lpcloud_cat %>% 
  jsonlite::toJSON(auto_unbox = TRUE) %>% 
  jsonlite::fromJSON() 

DT::datatable(lpcloud_cat$links)

LPCLOUD STAC catalog includes URL links to the root, collections, search, and child STAC Catalogs. The data frame above also shows the available collections in the LPCLOUD catalog.


2c. STAC Collection

STAC Collection is extension of STAC Catalog containing additional information that describe the STAC Items in that Collection.

Get the URL link to the STAC Collections.

lpcloud_col_link <- lpcloud_cat$links[which(lpcloud_cat$links$rel == 'collections'),'href']
lpcloud_col_link
[1] "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections"

Next, get the content describing the collections within LPCLOUD Catalog. Important information such as data collection ID, title, description, and links to collection endpoints are provided here.

lpcloud_collection <- httr::GET(lpcloud_col_link) %>% 
  httr::content() 

lpcloud_collection <- lpcloud_collection %>% 
  jsonlite::toJSON(auto_unbox = TRUE, pretty = TRUE)

Print the collections within LPCLOUD STAC catalog.

lpcloud_collection_df <- jsonlite::fromJSON(lpcloud_collection)$collections
lpcloud_collection_df$id
 [1] "ASTGTM.v003"       "ECO_L1B_ATT.v002"  "ECO_L2_CLOUD.v002"
 [4] "ECO_L1B_GEO.v002"  "ECO_L2_LSTE.v002"  "ECO_L1B_RAD.v002" 
 [7] "ECO_L2T_LSTE.v002" "EMITL1BRAD.v001"   "EMITL2ARFL.v001"  
[10] "HLSL30.v2.0"      

In CMR, Collection ID is used to query by a specific product, so be sure to save the ID for a collection you are interested in. For instance, the Collection ID for ASTER Global Digital Elevation Model V003 is ASTGTM.v003. Note that the “id” shortname is in the format: productshortname.vVVV (where VVV = product version).

Here, get the URL link to the ASTGTM.v003 STAC Collection. If you are interested in querying a different LPCLOUD product, swap out the shortname to assign to the collection variable below.

collection <- 'ASTGTM.v003'    # USER INPUT
col_links <- lpcloud_collection_df$links[which(lpcloud_collection_df$id == collection)] %>% 
  as.data.frame()

astgtm_URL <- col_links[which(col_links$rel == 'self'), 'href']
astgtm_URL
[1] "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003"

The STAC Collection metadata for any collection contains metadata and information that is applicable to every STAC Item and asset(s) that it contains. Get the content of the ASTGTM.v003 collection URL and print the collection description.

astgtm_collection <- httr::GET(astgtm_URL) %>% 
  httr::content()

astgtm_collection <- astgtm_collection %>% 
  jsonlite::toJSON(auto_unbox = TRUE) %>% 
  jsonlite::fromJSON()

cat(astgtm_collection$description)
The ASTER Global Digital Elevation Model (GDEM) Version 3 (ASTGTM) provides a global digital elevation model (DEM) of land areas on Earth at a spatial resolution of 1 arc second (approximately 30 meter horizontal posting at the equator).

The development of the ASTER GDEM data products is a collaborative effort between National Aeronautics and Space Administration (NASA) and Japan’s Ministry of Economy, Trade, and Industry (METI). The ASTER GDEM data products are created by the Sensor Information Laboratory Corporation (SILC) in Tokyo. 

The ASTER GDEM Version 3 data product was created from the automated processing of the entire ASTER Level 1A (https://doi.org/10.5067/ASTER/AST_L1A.003) archive of scenes acquired between March 1, 2000, and November 30, 2013. Stereo correlation was used to produce over one million individual scene based ASTER DEMs, to which cloud masking was applied. All cloud screened DEMs and non-cloud screened DEMs were stacked. Residual bad values and outliers were removed. In areas with limited data stacking, several existing reference DEMs were used to supplement ASTER data to correct for residual anomalies. Selected data were averaged to create final pixel values before partitioning the data into 1 degree latitude by 1 degree longitude tiles with a one pixel overlap. To correct elevation values of water body surfaces, the ASTER Global Water Bodies Database (ASTWBD) (https://doi.org/10.5067/ASTER/ASTWBD.001) Version 1 data product was also generated. 

The geographic coverage of the ASTER GDEM extends from 83° North to 83° South. Each tile is distributed in GeoTIFF format and projected on the 1984 World Geodetic System (WGS84)/1996 Earth Gravitational Model (EGM96) geoid. Each of the 22,912 tiles in the collection contain at least 0.01% land area. 

Provided in the ASTER GDEM product are layers for DEM and number of scenes (NUM). The NUM layer indicates the number of scenes that were processed for each pixel and the source of the data.

While the ASTER GDEM Version 3 data products offer substantial improvements over Version 2, users are advised that the products still may contain anomalies and artifacts that will reduce its usability for certain applications. 

Improvements/Changes from Previous Versions 
• Expansion of acquisition coverage to increase the amount of cloud-free input scenes from about 1.5 million in Version 2 to about 1.88 million scenes in Version 3.
• Separation of rivers from lakes in the water body processing. 
• Minimum water body detection size decreased from 1 km2 to 0.2 km2. 

We can also get the spatial and temporal extent information. Below, we can see this collection has a global spatial extent. ASTER GDEM is a single, static dataset that incorporates observation from March 2000 to November 2013.

astgtm_collection$extent %>% 
  jsonlite::toJSON(auto_unbox = TRUE)
{"spatial":{"bbox":[[-180,-83,180,82]]},"temporal":{"interval":[["2000-03-01T00:00:00.000Z","2013-11-30T23:59:59.999Z"]]}} 

STAC collection also includes useful links. You can visit all the items within this collection using the Items URL.

DT::datatable(astgtm_collection$links)

Get the URL to the ASTGTM.v003 Items.

items_url <- astgtm_collection$links [which(astgtm_collection$links$rel == 'items'), 'href']
items_url
[1] "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003/items"

2d. STAC Item

STAC Item represents data and metadata assets that are spatiotemporally coincident. Below, query the STAC Items within the ASTGTM.v003 STAC Collection and print the first item in the collection.

astgtm_items <- httr::GET(items_url) %>% 
  httr::content(as = "text") %>%  
  jsonlite::fromJSON()

F1 <- astgtm_items$features[1,] %>% 
  jsonlite::toJSON(auto_unbox = TRUE, pretty = TRUE)
F1
[
  {
    "type": "Feature",
    "id": "ASTGTMV003_N03E008",
    "stac_version": "1.0.0",
    "stac_extensions": [],
    "collection": "ASTGTM.v003",
    "geometry": {
      "type": "Polygon",
      "coordinates": [
        [
          [7.9999, 2.9999],
          [9.0001, 2.9999],
          [9.0001, 4.0001],
          [7.9999, 4.0001],
          [7.9999, 2.9999]
        ]
      ]
    },
    "bbox": [7.9999, 2.9999, 9.0001, 4.0001],
    "links": [
      {
        "rel": "self",
        "href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003/items/ASTGTMV003_N03E008"
      },
      {
        "rel": "parent",
        "href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003"
      },
      {
        "rel": "collection",
        "href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003"
      },
      {
        "rel": "root",
        "href": "https://cmr.earthdata.nasa.gov/stac/"
      },
      {
        "rel": "provider",
        "href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD"
      },
      {
        "rel": "via",
        "href": "https://cmr.earthdata.nasa.gov/search/concepts/G1716133754-LPCLOUD.json"
      },
      {
        "rel": "via",
        "href": "https://cmr.earthdata.nasa.gov/search/concepts/G1716133754-LPCLOUD.umm_json"
      }
    ],
    "properties": {
      "datetime": "2000-03-01T00:00:00.000Z",
      "start_datetime": "2000-03-01T00:00:00.000Z",
      "end_datetime": "2013-11-30T23:59:59.000Z"
    },
    "assets": {
      "003/ASTGTMV003_N03E008_dem": {
        "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ASTGTM.003/ASTGTMV003_N03E008_dem.tif",
        "title": "Download ASTGTMV003_N03E008_dem.tif"
      },
      "003/ASTGTMV003_N03E008_num": {
        "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ASTGTM.003/ASTGTMV003_N03E008_num.tif",
        "title": "Download ASTGTMV003_N03E008_num.tif"
      },
      "browse": {
        "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/ASTGTM.003/ASTGTMV003_N03E008.1.jpg",
        "type": "image/jpeg",
        "title": "Download ASTGTMV003_N03E008.1.jpg"
      },
      "metadata": {
        "href": "https://cmr.earthdata.nasa.gov/search/concepts/G1716133754-LPCLOUD.xml",
        "type": "application/xml"
      },
      "003/ASTGTMV003_N02E022_dem": {},
      "003/ASTGTMV003_N02E022_num": {},
      "003/ASTGTMV003_N00W065_dem": {},
      "003/ASTGTMV003_N00W065_num": {},
      "003/ASTGTMV003_N01E009_dem": {},
      "003/ASTGTMV003_N01E009_num": {},
      "003/ASTGTMV003_N02E009_dem": {},
      "003/ASTGTMV003_N02E009_num": {},
      "003/ASTGTMV003_N03E021_dem": {},
      "003/ASTGTMV003_N03E021_num": {},
      "003/ASTGTMV003_N01E021_dem": {},
      "003/ASTGTMV003_N01E021_num": {},
      "003/ASTGTMV003_N01E042_dem": {},
      "003/ASTGTMV003_N01E042_num": {},
      "003/ASTGTMV003_N01W069_dem": {},
      "003/ASTGTMV003_N01W069_num": {},
      "003/ASTGTMV003_N01W080_dem": {},
      "003/ASTGTMV003_N01W080_num": {}
    }
  }
] 

Notice that the number of items matching our request is far more than what is returned.

cat(astgtm_items$context$matched, 'items matched your request but', astgtm_items$context$returned, 'items are returned.')
22912 items matched your request but 10 items are returned.

This is because the return is paginated. The STAC API, by default, returns the first 10 records. To explore more items, you can add ?page=n (in which n is the page number (i.e. ?page=2)) to the URL link and submit another request. Below, request a query to return records on the second page.

page_2_url <- paste0(items_url, '?page=2')

astgtm_page2_items <- httr::GET(page_2_url) %>% 
  httr::content(as = "text") %>%  
  jsonlite::fromJSON()

astgtm_page2_items$features[1,] %>% 
  jsonlite::toJSON(auto_unbox = TRUE, pretty = TRUE)
[
  {
    "type": "Feature",
    "id": "ASTGTMV003_N03E042",
    "stac_version": "1.0.0",
    "stac_extensions": [],
    "collection": "ASTGTM.v003",
    "geometry": {
      "type": "Polygon",
      "coordinates": [
        [
          [41.9999, 2.9999],
          [43.0001, 2.9999],
          [43.0001, 4.0001],
          [41.9999, 4.0001],
          [41.9999, 2.9999]
        ]
      ]
    },
    "bbox": [41.9999, 2.9999, 43.0001, 4.0001],
    "links": [
      {
        "rel": "self",
        "href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003/items/ASTGTMV003_N03E042"
      },
      {
        "rel": "parent",
        "href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003"
      },
      {
        "rel": "collection",
        "href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/ASTGTM.v003"
      },
      {
        "rel": "root",
        "href": "https://cmr.earthdata.nasa.gov/stac/"
      },
      {
        "rel": "provider",
        "href": "https://cmr.earthdata.nasa.gov/stac/LPCLOUD"
      },
      {
        "rel": "via",
        "href": "https://cmr.earthdata.nasa.gov/search/concepts/G1726373735-LPCLOUD.json"
      },
      {
        "rel": "via",
        "href": "https://cmr.earthdata.nasa.gov/search/concepts/G1726373735-LPCLOUD.umm_json"
      }
    ],
    "properties": {
      "datetime": "2000-03-01T00:00:00.000Z",
      "start_datetime": "2000-03-01T00:00:00.000Z",
      "end_datetime": "2013-11-30T23:59:59.000Z"
    },
    "assets": {
      "003/ASTGTMV003_N03E042_dem": {
        "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ASTGTM.003/ASTGTMV003_N03E042_dem.tif",
        "title": "Download ASTGTMV003_N03E042_dem.tif"
      },
      "003/ASTGTMV003_N03E042_num": {
        "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ASTGTM.003/ASTGTMV003_N03E042_num.tif",
        "title": "Download ASTGTMV003_N03E042_num.tif"
      },
      "browse": {
        "href": "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/ASTGTM.003/ASTGTMV003_N03E042.1.jpg",
        "type": "image/jpeg",
        "title": "Download ASTGTMV003_N03E042.1.jpg"
      },
      "metadata": {
        "href": "https://cmr.earthdata.nasa.gov/search/concepts/G1726373735-LPCLOUD.xml",
        "type": "application/xml"
      },
      "003/ASTGTMV003_N00W061_dem": {},
      "003/ASTGTMV003_N00W061_num": {},
      "003/ASTGTMV003_N02W066_dem": {},
      "003/ASTGTMV003_N02W066_num": {},
      "003/ASTGTMV003_N02W069_dem": {},
      "003/ASTGTMV003_N02W069_num": {},
      "003/ASTGTMV003_N01E022_dem": {},
      "003/ASTGTMV003_N01E022_num": {},
      "003/ASTGTMV003_N01E026_dem": {},
      "003/ASTGTMV003_N01E026_num": {},
      "003/ASTGTMV003_N02W064_dem": {},
      "003/ASTGTMV003_N02W064_num": {},
      "003/ASTGTMV003_N01W064_dem": {},
      "003/ASTGTMV003_N01W064_num": {},
      "003/ASTGTMV003_N01E027_dem": {},
      "003/ASTGTMV003_N01E027_num": {},
      "003/ASTGTMV003_N00E006_dem": {},
      "003/ASTGTMV003_N00E006_num": {}
    }
  }
] 

2e. Assets

The STAC Item ID (CMR Granule ID) is the unique identifier assigned to each granule within a data collection. Within each STAC Item are assets, which include the downloadable and streamable URL to data files along with other asset objects. Below, the first Granule ID is used to get the downloadable data file.

items_df <- jsonlite::fromJSON(F1) 
item <- items_df$assets            # Get the assets for the first Item
assets <- purrr::map_df(items_df$assets, data.frame, .id = 'asset')
assets
                        asset
1  003/ASTGTMV003_N03E008_dem
2  003/ASTGTMV003_N03E008_num
3                      browse
4                    metadata
5  003/ASTGTMV003_N02E022_dem
6  003/ASTGTMV003_N02E022_num
7  003/ASTGTMV003_N00W065_dem
8  003/ASTGTMV003_N00W065_num
9  003/ASTGTMV003_N01E009_dem
10 003/ASTGTMV003_N01E009_num
11 003/ASTGTMV003_N02E009_dem
12 003/ASTGTMV003_N02E009_num
13 003/ASTGTMV003_N03E021_dem
14 003/ASTGTMV003_N03E021_num
15 003/ASTGTMV003_N01E021_dem
16 003/ASTGTMV003_N01E021_num
17 003/ASTGTMV003_N01E042_dem
18 003/ASTGTMV003_N01E042_num
19 003/ASTGTMV003_N01W069_dem
20 003/ASTGTMV003_N01W069_num
21 003/ASTGTMV003_N01W080_dem
22 003/ASTGTMV003_N01W080_num
                                                                                                  href
1  https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ASTGTM.003/ASTGTMV003_N03E008_dem.tif
2  https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/ASTGTM.003/ASTGTMV003_N03E008_num.tif
3       https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-public/ASTGTM.003/ASTGTMV003_N03E008.1.jpg
4                               https://cmr.earthdata.nasa.gov/search/concepts/G1716133754-LPCLOUD.xml
5                                                                                                 <NA>
6                                                                                                 <NA>
7                                                                                                 <NA>
8                                                                                                 <NA>
9                                                                                                 <NA>
10                                                                                                <NA>
11                                                                                                <NA>
12                                                                                                <NA>
13                                                                                                <NA>
14                                                                                                <NA>
15                                                                                                <NA>
16                                                                                                <NA>
17                                                                                                <NA>
18                                                                                                <NA>
19                                                                                                <NA>
20                                                                                                <NA>
21                                                                                                <NA>
22                                                                                                <NA>
                                 title            type
1  Download ASTGTMV003_N03E008_dem.tif            <NA>
2  Download ASTGTMV003_N03E008_num.tif            <NA>
3    Download ASTGTMV003_N03E008.1.jpg      image/jpeg
4                                 <NA> application/xml
5                                 <NA>            <NA>
6                                 <NA>            <NA>
7                                 <NA>            <NA>
8                                 <NA>            <NA>
9                                 <NA>            <NA>
10                                <NA>            <NA>
11                                <NA>            <NA>
12                                <NA>            <NA>
13                                <NA>            <NA>
14                                <NA>            <NA>
15                                <NA>            <NA>
16                                <NA>            <NA>
17                                <NA>            <NA>
18                                <NA>            <NA>
19                                <NA>            <NA>
20                                <NA>            <NA>
21                                <NA>            <NA>
22                                <NA>            <NA>

The links found in the href field can be used to download each specific asset.