How do I subset data granules?

How do I subset a data granule using Harmony?

Install the harmony-py package:

# Install harmony-py
pip install -U harmony-py

Import packages:

import datetime as dt

from harmony import BBox, Client, Collection, Request, LinkType

import s3fs
import xarray as xr

Set up Harmony client and authentication

We will authenticate the following Harmony request using a netrc file. See the appendix for more information on Earthdata Login and netrc setup. This basic line below to create a Harmony Client assumes that we have a .netrc available.

harmony_client = Client()

Create and submit Harmony request

We are interested in the GHRSST Level 4 MUR Global Foundation Sea Surface Temperature Analysis dataset. We are subsetting over the Pacific Ocean to the west of Mexico during 1:00 - 2:00 on 10 March 2021. The dataset is organized into daily files, so while we are specifying a single hour in our request, this will return that full day’s worth of data.

dataset_short_name = 'MUR-JPL-L4-GLOB-v4.1'

request = Request(
  collection=Collection(id=dataset_short_name),
  spatial=BBox(-125.469,15.820,-99.453,35.859),
  temporal={
    'start': dt.datetime(2021, 3, 10, 1),
    'stop': dt.datetime(2021, 3, 10, 2)
  }
)

job_id = harmony_client.submit(request)

harmony_client.wait_for_processing(job_id)

Open and read the subsetted file in xarray

Harmony data outputs can be accessed within the cloud using the s3 URLs and AWS credentials provided in the Harmony job response. Using aws_credentials we can retrieve the credentials needed to access the Harmony s3 staging bucket and its contents. We then use the AWS s3fs package to create a file system that can then be read by xarray.

results = harmony_client.result_urls(job_id, link_type=LinkType.s3)
urls = list(results)
url = urls[0]

creds = harmony_client.aws_credentials()

s3_fs = s3fs.S3FileSystem(
    key=creds['aws_access_key_id'],
    secret=creds['aws_secret_access_key'],
    token=creds['aws_session_token'],
    client_kwargs={'region_name':'us-west-2'},
)

f = s3_fs.open(url, mode='rb')
ds = xr.open_dataset(f)
ds

Plot data

Use the xarray built in plotting function to create a simple plot along the x and y dimensions of the dataset:

ds.analysed_sst.plot() ;

R code coming soon!

# Coming soon!

Matlab code coming soon!

#| echo: true
# Coming soon!

With wget and curl:

# Coming soon!

How do I subset an OPeNDAP granule in the cloud?

How do I subset a data granule using xarray?

How do I download a subset of NetCDF-4?

this might be a deprecated idea