%matplotlib inline
import matplotlib.pyplot as plt
import os
import requests
import s3fs
from osgeo import gdal
import xarray as xr
import hvplot.xarray
import holoviews as hv
Accessing a NetCDF4/HDF5 File - S3 Direct Access
Summary
In this notebook, we will access monthly sea surface height from ECCO V4r4 (10.5067/ECG5D-SSH44). The data are provided as a time series of monthly netCDFs on a 0.5-degree latitude/longitude grid.
We will access a single netCDF file from inside the AWS cloud (us-west-2 region, specifically) and load it into Python as an xarray
dataset
. This approach leverages S3 native protocols for efficient access to the data.
Requirements
1. AWS instance running in us-west-2
NASA Earthdata Cloud data in S3 can be directly accessed via temporary credentials; this access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region.
2. Earthdata Login
An Earthdata Login account is required to access data, as well as discover restricted data, from the NASA Earthdata system. Thus, to access NASA data, you need Earthdata Login. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.
3. netrc File
You will need a netrc file containing your NASA Earthdata Login credentials in order to execute the notebooks. A netrc file can be created manually within text editor and saved to your home directory. For additional information see: Authentication for NASA Earthdata.
Learning Objectives
- how to retrieve temporary S3 credentials for in-region direct S3 bucket access
- how to perform in-region direct access of ECCO_L4_SSH_05DEG_MONTHLY_V4R4 data in S3
- how to plot the data
Import Packages
Get Temporary AWS Credentials
Direct S3 access is achieved by passing NASA supplied temporary credentials to AWS so we can interact with S3 objects from applicable Earthdata Cloud buckets. For now, each NASA DAAC has different AWS credentials endpoints. Below are some of the credential endpoints to various DAACs:
= {
s3_cred_endpoint 'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
'gesdisc': 'https://data.gesdisc.earthdata.nasa.gov/s3credentials',
'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials',
'ornldaac': 'https://data.ornldaac.earthdata.nasa.gov/s3credentials',
'ghrcdaac': 'https://data.ghrc.earthdata.nasa.gov/s3credentials'
}
Create a function to make a request to an endpoint for temporary credentials. Remember, each DAAC has their own endpoint and credentials are not usable for cloud data from other DAACs.
def get_temp_creds(provider):
return requests.get(s3_cred_endpoint[provider]).json()
= get_temp_creds('podaac')
temp_creds_req #temp_creds_req
Set up an s3fs
session for Direct Access
s3fs
sessions are used for authenticated access to s3 bucket and allows for typical file-system style operations. Below we create session by passing in the temporary credentials we recieved from our temporary credentials endpoint.
= s3fs.S3FileSystem(anon=False,
fs_s3 =temp_creds_req['accessKeyId'],
key=temp_creds_req['secretAccessKey'],
secret=temp_creds_req['sessionToken']) token
In this example we’re interested in the ECCO data collection from NASA’s PO.DAAC in Earthdata Cloud. Below we specify the s3 URL to the data asset in Earthdata Cloud. This URL can be found via Earthdata Search or programmatically through the CMR and CMR-STAC APIs.
= 's3://podaac-ops-cumulus-protected/ECCO_L4_SSH_05DEG_MONTHLY_V4R4/SEA_SURFACE_HEIGHT_mon_mean_2015-01_ECCO_V4r4_latlon_0p50deg.nc' s3_url
Direct In-region Access
Open with the netCDF file using the s3fs package, then load the cloud asset into an xarray
dataset
.
= fs_s3.open(s3_url, mode='rb') s3_file_obj
= xr.open_dataset(s3_file_obj, engine='h5netcdf')
ssh_ds ssh_ds
Get the SSH
variable as an xarray
dataarray
= ssh_ds.SSH
ssh_da ssh_da
Plot the SSH
dataarray
for time 2015-01-16T12:00:00 using hvplot
.
='longitude', y='latitude', cmap='Spectral_r', aspect='equal').opts(clim=(ssh_da.attrs['valid_min'][0],ssh_da.attrs['valid_max'][0])) ssh_da.hvplot.image(x
Resources
Direct access to ECCO data in S3 (from us-west-2)
Data_Access__Direct_S3_Access__PODAAC_ECCO_SSH using CMR-STAC API to retrieve S3 links