How to Perform Cross-DAAC S3 Bucket Access Using Python

Author: Chris Battisto

Date Authored: 10-02-22


Exercise: 20 minutes

Note: Because this notebook uses the S3 protocol, it will only run in an environment with us-west-2 AWS access.


This notebook demonstrates how to access two cloud-hosted Earthdata granules using the Commmon Metadata Repository (CMR) API. The granules are from two different DAACs (GES DISC and PO.DAAC). It shows the process of obtaining and plotting two variables from two distinct granules hosted in S3 buckets: sea surface temperature (SST) from the GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis (v4.1) (GHRSST)) and calculated 2-meter (2M) wind velocity from the MERRA-2 hourly time-averaged reanalysis dataset (M2T1NXSLV.5.12.4), before plotting them together on 25 October 2012 over the Caribbean Sea when Hurricane Sandy was at peak strength.


This notebook was written using Python 3.8, and requires these libraries and files: - netrc file with valid Earthdata Login credentials. - How to Generate Earthdata Prerequisite Files - Xarray - requests - S3FS - Boto3 - NumPy - Matplotlib - Cartopy

Import Libraries

import requests
import xarray as xr
import s3fs
import numpy as np
import pprint
import as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import boto3
from IPython.display import display, Markdown

%matplotlib inline

Check AWS Region before running notebook

A common error when executing this notebook occurs when the notebook is run outside of the us-west-2 AWS region. Here, we check the region using the Boto3 Python library, and throw a ValueError if you are outside the region.

This cell is not necessary to access the S3 buckets for users inside the us-west-2 region, and can be commented out or deleted at the users’ discretion.

if (boto3.client('s3').meta.region_name == 'us-west-2'):
    display(Markdown('### us-west-2 Region Check: ✅'))
    display(Markdown('### us-west-2 Region Check: ❌'))
    raise ValueError('Your notebook is not running inside the AWS us-west-2 region, and will not be able to directly access NASA Earthdata S3 buckets')

us-west-2 Region Check: ✅

Obtain S3 Credentials and Open Bucket Granules

Direct S3 access is granted to the user through a temporary token, which will last for one hour, and will need to be rerun after that hour has passed to access the bucket. Then, to access the bucket, we can pass this token into the S3FS S3FileSystem() function which will “mount” the S3 bucket to our notebook as if it were a locally stored file system. Here, we create a function that queries the S3 Token Endpoint with their Earthdata credentials, stored in a netrc file, using the Python requests library. Next, it returns an s3fs.core.S3FileSystem object that represents the “mounted” S3 bucket:

# Define a function for S3 access credentials

def begin_s3_direct_access(daac_url):
    # Retrieve the token as a JSON
    response = requests.get(daac_url).json()
    # Mount the bucket and return it as an S3FileSystem object
    return s3fs.S3FileSystem(key=response['accessKeyId'],

We need two tokens in order to access the two different buckets, so we will perform two function calls with the two S3 Token Endpoints retrieved from CMR, stored in separate variables. If you immediately receive an error, double-check that your username and password were entered correctly in your netrc file, or that you can access the following URL:

# Open S3 file systems with S3FS

gesdisc_fs = begin_s3_direct_access(gesdisc_s3)
podaac_fs = begin_s3_direct_access(podaac_s3)

# Check that the file system is intact as an S3FileSystem object, which means that token is valid


Open Granules in Xarray

In order to open the granules in Xarray, we must use the S3FS open() function. Here, we pass the S3 URLs from each separate file system into two different variables:

# Open datasets with S3FS

merra_ds = xr.open_dataset(
ghrsst_ds = xr.open_dataset(

Now, the granules are stored in memory as Xarray datasets, which will be stored as long as the kernel is running. Here, we clip to the extent over where the hurricane was:

min_lon = -89
min_lat = 14
max_lon = -67
max_lat = 31

merra_ds = merra_ds.sel(lat=slice(min_lat,max_lat), lon=slice(min_lon,max_lon))
ghrsst_ds = ghrsst_ds.sel(lat=slice(min_lat,max_lat), lon=slice(min_lon,max_lon))

Convert Dataset Grids

Here, we interpolate the GHRSST grid to the MERRA grid using Xarray’s interp() function:

ghrsst_ds = ghrsst_ds.interp(, lon=merra_ds.lon)
Dimensions:           (time: 1, lat: 35, lon: 35)
  * time              (time) datetime64[ns] 2012-10-25T09:00:00
  * lat               (lat) float64 14.0 14.5 15.0 15.5 ... 29.5 30.0 30.5 31.0
  * lon               (lon) float64 -88.75 -88.12 -87.5 ... -68.75 -68.12 -67.5
Data variables:
    analysed_sst      (time, lat, lon) float64 nan nan nan ... 300.0 299.4 299.4
    analysis_error    (time, lat, lon) float64 nan nan nan ... 0.37 0.37 0.37
    mask              (time, lat, lon) float64 2.0 2.0 2.0 2.0 ... 1.0 1.0 1.0
    sea_ice_fraction  (time, lat, lon) float64 nan nan nan nan ... nan nan nan
Attributes: (12/47)
    Conventions:                CF-1.5
    title:                      Daily MUR SST, Final product
    summary:                    A merged, multi-sensor L4 Foundation SST anal...
    institution:                Jet Propulsion Laboratory
    history:                    created at nominal 4-day latency; replaced nr...
    ...                         ...
    project:                    NASA Making Earth Science Data Records for Us...
    publisher_name:             GHRSST Project Office
    processing_level:           L4
    cdm_data_type:              grid

Plot Variables using Matplotlib and Cartopy

Finally, we use Matplotlib and Cartopy to plot a color mesh and quiver plot of SSTs and 2M winds of Hurricane Sandy at her peak strength:

# Pre-configure wind vector variables


lons, lats = np.meshgrid(lon, lat)

# Plotting routines:

# Figure size
plt.rcParams['figure.figsize'] = 15, 15

# Figure and geography setup
fig = plt.figure()
ax = fig.add_subplot(2, 1, 1, projection=ccrs.PlateCarree())
ax.set_extent([-89, -67, 14, 31], crs=ccrs.PlateCarree())
ax.add_feature(cfeature.COASTLINE.with_scale('50m'), linewidth=0.5, zorder=5) 
ax.add_feature(cfeature.LAND, facecolor='white', zorder=2) 
ax.add_feature(cfeature.BORDERS, linewidth=0.5, zorder=5)
ax.add_feature(cfeature.STATES, zorder=5)

# Colormesh of SSTs
mmp = ax.pcolormesh(lons, lats, ghrsst_ds.analysed_sst.isel(time=0), 
              cmap='hot_r', transform=ccrs.PlateCarree(), zorder=1)

# Quiver plot of 2M vector field
q = ax.quiver(lons, lats, u.isel(time=0).values, v.isel(time=0).values, zorder=4, 
              transform=ccrs.PlateCarree(), scale_units='inches', color='gray')

# Quiver key for scale
ax.quiverkey(q, 1.22, 1.05, 10, r'$10 \frac{m}{s}$', zorder=4)

# Lat/lon grid lines
ax.gridlines(draw_labels=True, dms=True, x_inline=False, y_inline=False)

# SST color bar setup
cbar = plt.colorbar(mmp, pad=0.1)
cbar.set_label("Analyzed SST (K)")

# Figure title
fig.suptitle("GHRSST Analyzed SST and MERRA-2 2M Wind Vectors on 2012-10-25T00:00:00Z", size=16, y=0.95)
Text(0.5, 0.95, 'GHRSST Analyzed SST and MERRA-2 2M Wind Vectors on 2012-10-25T00:00:00Z')