# Suppress warnings
import warnings
'ignore')
warnings.simplefilter('ignore')
warnings.filterwarnings(from pprint import pprint
# Example 1 imports
import earthaccess
import xarray as xr
=False)
xr.set_options(display_expand_attrsimport matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
# Example 2 imports (Example 1 imports plus these...)
import datetime as dt
import json
from cartopy.mpl.ticker import LatitudeFormatter, LongitudeFormatter
import geopandas as gpd
import geoviews as gv
'bokeh', 'matplotlib', logo=False)
gv.extension(from harmony import Client, Collection, Request, CapabilitiesRequest
# Appendix 1 imports
from pathlib import Path
import rioxarray as rxr
%matplotlib inline
Part 2 of the In-cloud Science Workflow: Data subsetting and plotting with earthaccess, xarray, and harmony
Summary
Welcome to Part 2 of the In-cloud Science Workflow workshop.
In these examples we will use the xarray, earthaccess, and harmony-py libraries to subset data and make figures using cartopy
, matplotlib
, and geoviews
.
We will go through two examples of subsetting and plotting data in the Earthdata Cloud:
- Example 1 -
earthaccess
andxarray
for precipitation estimates from IMERG, Daily Level 3 data - Example 2 -
harmony-py
for direct cloud subsetting of precipitable water data from the DSCOVR EPIC Composite. - Appendix 1 - Snow cover data from MODIS/Terra, Daily Level 3 data with
rioxarray
- Appendix 2 - Snow mass data from SMAP, 3-hourly Level 4 data
In the first example, we will be accessing data directly from Amazon Web Services (AWS), specifically in the us-west-2 region, which is where all cloud-hosted NASA Earthdata reside. This shared compute environment (JupyterHub) is also running in the same location. We will then load the data into Python as an xarray
dataset and use xarray
to subset.
In the second example, we will demonstrate an example of pulling data via the cloud from an existing on-premise data server. In this example, the data are subsetted using one of the data transformation services provided in the NASA Earthdata system. Both xarray
and harmony-py
can be run outside of AWS as well.
See the bottom of the notebook for additional resources, including several tutorials that that served as a foundation for this clinic. Includes: https://github.com/rupesh2/atmospheric_rivers/tree/main
Note: “direct cloud access” is also called “direct S3 access” or simply “direct access”.
Learning Objectives
- Extract variables, temporal slices, and spatial slices from an
xarray
dataset - Plot data and exclude data points via boolean conditions, using
xarray
,cartopy
,matplotlib
, andrasterio
- Plot a polygon geojson file with a basemap using
geoviews
- Conceptualize data subsetting services provided by NASA Earthdata, including Harmony
- Utilize the
harmony-py
library to request data over the Bay of San Francisco
Import Required Packages
Picking up where we left off
We will authenticate our Earthaccess session, and then open the results like we did in the Search & Discovery section.
= earthaccess.login()
auth # are we authenticated?
if not auth.authenticated:
# ask for credentials and persist them in a .netrc file
="interactive", persist=True) auth.login(strategy
EARTHDATA_USERNAME and EARTHDATA_PASSWORD are not set in the current environment, try setting them or use a different strategy (netrc, interactive)
You're now authenticated with NASA Earthdata Login
Using token with expiration date: 01/26/2024
Using .netrc file for EDL
Example 1 - Xarray Subsetting - Precipitation estimates from IMERG, Daily Level 3
Dataset
We will use the GPM IMERG Final Precipitation L3 Daily dataset for this tutorial. The IMERG Precipitation Rate provides the rain and snow rates in millimeters per hour (mm/hr). It is estimated by the Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) (IMERG) algorithm. The IMERG algorithm uses passive-microwave data from the GPM constellation of satellites and infrared data from geosynchronous satellites. IMERG “morphs” observations to earlier or later times using wind from weather-model analyses. The daily IMERG dataset is derived from the half-hourly GPM_3IMERGHH. The derived result represents the final estimate of the daily mean precipitation rate in mm/day.
The IMERG data has 0.1 x 0.1 degree latitude-longitude resolution (approximately 11 by 11 km at the Equator). The grid covers the globe, although precipitation cannot always be estimated near the Poles. The dataset and algorithm are described in the data user guide and the Algorithm Theoretical Basis Document (ATBD).
Please cite the dataset as: > Huffman, G.J., E.F. Stocker, D.T. Bolvin, E.J. Nelkin, Jackson Tan (2023), GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree V07, Edited by Andrey Savtchenko, Greenbelt, MD, Goddard Earth Sciences Data and Information Services Center (GES DISC), https://doi.org/10.5067/GPM/IMERGDF/DAY/07
= 'C2723754864-GES_DISC' # GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree V07 (GPM_3IMERGDF)
collection_id
# Bounds within which we search for data granules
= "2023-02-24"
date_start = "2023-02-26"
date_end = (date_start, date_end)
date_range = (-127.0761, 31.6444, -113.9039, 42.6310) # min lon, min lat, max lon, max lat
bbox
# For reference (e.g., to visualize in https://geojson.io/), here is a GeoJSON representing the above bounding box:
# {"type": "FeatureCollection", "features": [{"type": "Feature", "properties": {}, "geometry": {"type": "LineString", "bbox": [-127.0761, 31.6444, -113.9039, 42.631], "coordinates": [[-113.9039, 42.631], [-127.0761,42.631], [-127.0761, 31.6444], [-113.9039, 31.6444], [-113.9039, 42.631]]}}]}
= earthaccess.search_data(
results = collection_id,
concept_id = True,
cloud_hosted = date_range,
temporal = bbox,
bounding_box
)
= xr.open_mfdataset(earthaccess.open(results)) ds
Granules found: 3
Opening 3 granules, approx size: 0.08 GB
Note that xarray
works with “lazy” computation whenever possible. In this case, the metadata are loaded into JupyterHub memory, but the data arrays and their values are not — until there is a need for them.
Let’s print out all the variable names.
for v in ds.variables:
print(v)
precipitation
precipitation_cnt
precipitation_cnt_cond
MWprecipitation
MWprecipitation_cnt
MWprecipitation_cnt_cond
randomError
randomError_cnt
probabilityLiquidPrecipitation
lon
lat
time
time_bnds
Of the variables listed above, we are interested in three variables: precipitation
, precipitation_cnt_cond
, and probabilityLiquidPrecipitation
. Let’s print their attributes.
'precipitation'].attrs ds.variables[
{'units': 'mm/day',
'long_name': 'Daily mean precipitation rate (combined microwave-IR) estimate. Formerly precipitationCal.'}
'precipitation_cnt_cond'].attrs ds.variables[
{'units': 'count',
'long_name': 'Count of half-hourly precipitation retrievals for the day where precipitation is at least 0.01 mm/hr'}
'probabilityLiquidPrecipitation'].attrs ds.variables[
{'units': 'percent',
'long_name': 'Probability of liquid precipitation',
'description': 'Probability of liquid precipitation estimated with a diagnostic parameterization using ancillary data. 0=missing values; 1=likely solid; 100=likely liquid or no precipitation. Screen by positive precipitation or precipitation_cnt_cond to locate meaningful probabilities.'}
Subsetting
In addition to directly accessing the files archived and distributed by each of the NASA DAACs, many datasets also support services that allow us to customize the data via subsetting, reformatting, reprojection/regridding, and file aggregation. What does subsetting mean? To subset means to extract only the portions of a dataset that are needed for a given purpose. Here’s a generalized graphic of what we mean.
There are three primary types of subsetting that we will walk through: 1. Temporal 2. Spatial 3. Variable
In each case, we will be excluding parts of the dataset that are not wanted using xarray
. Note that “subsetting” is also called a data “transformation”.
ds.time.values
array(['2023-02-24T00:00:00.000000000', '2023-02-25T00:00:00.000000000',
'2023-02-26T00:00:00.000000000'], dtype='datetime64[ns]')
We start with a subset that represents the U.S. state of California. Notice the dimensions of the Dataset and each variable — time, lon, lat, and ‘nv’ (number of vertices) for the bounds variable.
# Display the full dataset's metadata
ds
<xarray.Dataset> Dimensions: (time: 3, lon: 3600, lat: 1800, nv: 2) Coordinates: * lon (lon) float32 -179.9 -179.9 ... 179.9 179.9 * lat (lat) float64 -89.95 -89.85 ... 89.85 89.95 * time (time) datetime64[ns] 2023-02-24 ... 2023... Dimensions without coordinates: nv Data variables: precipitation (time, lon, lat) float32 dask.array<chunksize=(1, 3600, 1800), meta=np.ndarray> precipitation_cnt (time, lon, lat) int8 dask.array<chunksize=(1, 3600, 1800), meta=np.ndarray> precipitation_cnt_cond (time, lon, lat) int8 dask.array<chunksize=(1, 3600, 1800), meta=np.ndarray> MWprecipitation (time, lon, lat) float32 dask.array<chunksize=(1, 3600, 1800), meta=np.ndarray> MWprecipitation_cnt (time, lon, lat) int8 dask.array<chunksize=(1, 3600, 1800), meta=np.ndarray> MWprecipitation_cnt_cond (time, lon, lat) int8 dask.array<chunksize=(1, 3600, 1800), meta=np.ndarray> randomError (time, lon, lat) float32 dask.array<chunksize=(1, 3600, 1800), meta=np.ndarray> randomError_cnt (time, lon, lat) int8 dask.array<chunksize=(1, 3600, 1800), meta=np.ndarray> probabilityLiquidPrecipitation (time, lon, lat) int8 dask.array<chunksize=(1, 3600, 1800), meta=np.ndarray> time_bnds (time, nv) datetime64[ns] dask.array<chunksize=(1, 2), meta=np.ndarray> Attributes: (9)
- lonPandasIndex
PandasIndex(Float64Index([ -179.9499969482422, -179.85000610351562, -179.75, -179.64999389648438, -179.5500030517578, -179.4499969482422, -179.35000610351562, -179.25, -179.14999389648438, -179.0500030517578, ... 179.0500030517578, 179.14999389648438, 179.25, 179.35000610351562, 179.4499969482422, 179.5500030517578, 179.64999389648438, 179.75, 179.85000610351562, 179.9499969482422], dtype='float64', name='lon', length=3600))
- latPandasIndex
PandasIndex(Float64Index([ -89.95, -89.85000000000001, -89.75, -89.65, -89.55, -89.45, -89.35000000000001, -89.25, -89.15, -89.05, ... 89.05, 89.15000000000002, 89.25000000000001, 89.35000000000001, 89.45, 89.55, 89.65000000000002, 89.75000000000001, 89.85000000000001, 89.95], dtype='float64', name='lat', length=1800))
- timePandasIndex
PandasIndex(DatetimeIndex(['2023-02-24', '2023-02-25', '2023-02-26'], dtype='datetime64[ns]', name='time', freq=None))
- BeginDate :
- 2023-02-24
- BeginTime :
- 00:00:00.000Z
- EndDate :
- 2023-02-24
- EndTime :
- 23:59:59.999Z
- FileHeader :
- StartGranuleDateTime=2023-02-24T00:00:00.000Z; StopGranuleDateTime=2023-02-24T23:59:59.999Z
- InputPointer :
- 3B-HHR.MS.MRG.3IMERG.20230224-S000000-E002959.0000.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S003000-E005959.0030.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S010000-E012959.0060.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S013000-E015959.0090.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S020000-E022959.0120.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S023000-E025959.0150.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S030000-E032959.0180.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S033000-E035959.0210.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S040000-E042959.0240.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S043000-E045959.0270.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S050000-E052959.0300.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S053000-E055959.0330.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S060000-E062959.0360.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S063000-E065959.0390.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S070000-E072959.0420.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S073000-E075959.0450.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S080000-E082959.0480.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S083000-E085959.0510.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S090000-E092959.0540.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S093000-E095959.0570.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S100000-E102959.0600.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S103000-E105959.0630.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S110000-E112959.0660.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S113000-E115959.0690.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S120000-E122959.0720.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S123000-E125959.0750.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S130000-E132959.0780.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S133000-E135959.0810.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S140000-E142959.0840.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S143000-E145959.0870.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S150000-E152959.0900.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S153000-E155959.0930.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S160000-E162959.0960.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S163000-E165959.0990.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S170000-E172959.1020.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S173000-E175959.1050.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S180000-E182959.1080.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S183000-E185959.1110.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S190000-E192959.1140.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S193000-E195959.1170.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S200000-E202959.1200.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S203000-E205959.1230.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S210000-E212959.1260.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S213000-E215959.1290.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S220000-E222959.1320.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S223000-E225959.1350.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S230000-E232959.1380.V07A.HDF5;3B-HHR.MS.MRG.3IMERG.20230224-S233000-E235959.1410.V07A.HDF5
- title :
- GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree (GPM_3IMERGDF)
- DOI :
- 10.5067/GPM/IMERGDF/DAY/07
- ProductionTime :
- 2023-08-29T08:52:09.517Z