NASA Earthdata Cloud Clinic

Summary

Welcome to the NASA Earthdata Cloud Clinic.

We will go through two different direct cloud access & subsetting options available in the Earthdata Cloud:

  1. The earthaccess python library for data search and direct cloud access, followed by xarray subsetting
  2. The Harmony-py python library for direct cloud access & data subsetting

In both scenarios, we will be accessing data directly from Amazon Web Services (AWS), specifically in the us-west-2 region, which is where all cloud-hosted NASA Earthdata reside. This shared compute environment (JupyterHub) is also running in the same location. We will then load the data into Python as an xarray dataset.

See the bottom of the notebook for additional resources, including several tutorials that that served as a foundation for this clinic.

A note on subsetting

In addition to directly accessing the files archived and distributed by each of the NASA DAACs, many datasets also support services that allow us to customize the data via subsetting, reformatting, reprojection/regridding, and file aggregation. What does subsetting mean? Here’s a generalized graphic of what we mean.

Three maps of the United States are present, with a red bounding box over the state of Colorado. Filtering and subsetting are demonstrated by overlaying SMAP L2 data, with data overlapping and cropping the rectangle, respectively.

Note: “direct cloud access” is also called “direct S3 access” or simply “direct access”. And “subsetting” is also called “transformation”.

Learning Objectives

  1. Utilize the earthaccess python library to search for data using spatial and temporal filters and explore search results
  2. Perform in-region direct access of data from an Amazon Simple Storage Service (S3) bucket
  3. Extract variables and spatial slices from an xarray dataset
  4. Plot data using xarray
  5. Conceptualize data subsetting services provided by NASA Earthdata, including Harmony
  6. Plot a polygon geojson file with a basemap using geoviews
  7. Utilize the harmony-py library to request data over the Gulf of Mexico

Prerequisites

First we’ll import python packages and set our authentication that will be used for both of our access and subsetting methods.

You’ll also need to be aware that data in NASA’s Earthdata Cloud reside in Amazon Web Services (AWS) Simple Storage Service (S3) buckets. Access is provided via temporary credentials; this access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region. While this compute location is required for direct S3 access, all data in Earthdata Cloud are still freely available via download.

Import Required Packages

# Suppress warnings
import warnings
warnings.simplefilter('ignore')
warnings.filterwarnings('ignore')

# Direct access
import earthaccess 
from pprint import pprint
import xarray as xr

# Harmony
import geopandas as gpd
import geoviews as gv
gv.extension('bokeh', 'matplotlib')
from harmony import BBox, Client, Collection, Request, LinkType
import datetime as dt
import s3fs
%matplotlib inline
/srv/conda/envs/notebook/lib/python3.9/site-packages/geoviews/operation/__init__.py:14: HoloviewsDeprecationWarning: 'ResamplingOperation' is deprecated and will be removed in version 1.17, use 'ResampleOperation2D' instead.
  from holoviews.operation.datashader import (