import requests
import xarray as xr
import ujson
import s3fs
import fsspec
from tqdm import tqdm
from glob import glob
import os
import pathlib
import hvplot.xarray
from kerchunk.hdf import SingleHdf5ToZarr
from kerchunk.combine import MultiZarrToZarr
# The xarray produced from the reference file throws a SerializationWarning for each variable. Will need to explore why
import warnings
"ignore") warnings.simplefilter(
PO.DAAC ECCO SSH
Reading ECCO Sea Surface Height (SSH) Data Using Kerchunk Reference File
Many of NASA’s current and legacy data collections are archive in netCDF4 format. By itself, netCDF4 are not cloud optimized and reading these files can take as long from a personal/local work environment as it takes to read the data from a working environment deployed in the cloud. Using Kerchunk
, we can treat these files as cloud optimized assets by creating metadata json file describing existing netCDF4 files, their chunks, and where to access them. The json reference files can be read in using Zarr
and Xarray
for efficient reads and fast processing.
Requirements
1. AWS instance running in us-west-2
NASA Earthdata Cloud data in S3 can be directly accessed via temporary credentials; this access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region.
2. Earthdata Login
An Earthdata Login account is required to access data, as well as discover restricted data, from the NASA Earthdata system. Thus, to access NASA data, you need Earthdata Login. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.
3. netrc File
You will need a netrc file containing your NASA Earthdata Login credentials in order to execute the notebooks. A netrc file can be created manually within text editor and saved to your home directory. For additional information see: Authentication for NASA Earthdata.
Import required packages
Create Dask client to process the output json file in parallel
Generating the Kerchunk
reference file can take some time depending on the internal structure of the data. Dask
allows us to execute the reference file generation process in parallel, thus speeding up the overall process.
import dask
from dask.distributed import Client
= Client(n_workers=4)
client client
2022-05-11 15:27:29,674 - distributed.diskutils - INFO - Found stale lock file and directory '/home/jovyan/earthdata-cloud-cookbook/examples/PODAAC/dask-worker-space/worker-mezhdsy7', purging
/srv/conda/envs/notebook/lib/python3.9/contextlib.py:126: UserWarning: Creating scratch directories is taking a surprisingly long time. This is often due to running workers on a network file system. Consider specifying a local-directory to point workers to write scratch data to a local disk.
next(self.gen)
Client
Client-ddf55e52-d13e-11ec-818c-b6609e8b92a4
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:41805/status |
Cluster Info
LocalCluster
a24e60d3
Dashboard: http://127.0.0.1:41805/status | Workers: 4 |
Total threads: 4 | Total memory: 15.18 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-8e045442-a409-4c3b-8c8c-a95470883931
Comm: tcp://127.0.0.1:36901 | Workers: 4 |
Dashboard: http://127.0.0.1:41805/status | Total threads: 4 |
Started: Just now | Total memory: 15.18 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:34235 | Total threads: 1 |
Dashboard: http://127.0.0.1:42845/status | Memory: 3.80 GiB |
Nanny: tcp://127.0.0.1:37927 | |
Local directory: /home/jovyan/earthdata-cloud-cookbook/examples/PODAAC/dask-worker-space/worker-869qv5xb |
Worker: 1
Comm: tcp://127.0.0.1:40997 | Total threads: 1 |
Dashboard: http://127.0.0.1:41189/status | Memory: 3.80 GiB |
Nanny: tcp://127.0.0.1:35257 | |
Local directory: /home/jovyan/earthdata-cloud-cookbook/examples/PODAAC/dask-worker-space/worker-3mo0d80c |
Worker: 2
Comm: tcp://127.0.0.1:46429 | Total threads: 1 |
Dashboard: http://127.0.0.1:42211/status | Memory: 3.80 GiB |
Nanny: tcp://127.0.0.1:34287 | |
Local directory: /home/jovyan/earthdata-cloud-cookbook/examples/PODAAC/dask-worker-space/worker-o2fvmao4 |
Worker: 3
Comm: tcp://127.0.0.1:41615 | Total threads: 1 |
Dashboard: http://127.0.0.1:41507/status | Memory: 3.80 GiB |
Nanny: tcp://127.0.0.1:43053 | |
Local directory: /home/jovyan/earthdata-cloud-cookbook/examples/PODAAC/dask-worker-space/worker-9u77hywd |