NASA Earthdata Python Library

Overview

TL;DR: earthdata is a Python package to search, preview and access NASA datasets (on-prem or in the cloud) with a few lines of code.


from earthdata import Auth, DataGranules, Store

# first we authenticate with NASA EDL
auth = Auth().login(strategy="netrc")

# Then we build a Query with spatiotemporal parameters
GranuleQuery = DataGranules().concept_id("C1575731655-LPDAAC_ECS").bounding_box(-134.7,58.9,-133.9,59.2)

# We get the metadata records from CMR
granules = GranuleQuery.get()

# Now it{s time to download (or open) our data granules list with get()
files = Store(auth).get(granules, local_path='./data')

# Now to the important science!

Why?

There are many ways to access NASA datasets, we can use the Earthdata search portal. We can use DAAC specific portals or tools. We could even use data.gov! Web portals are great but they are not designed for programmatic access and reproducible workflows. This is extremely important in the age of the cloud and reproducible open science.

The good news is that NASA also exposes APIs that allows us to search, transform and access data in a programmatic way. Many of these libraries contain amazing features and some similarities. In this context, earthdata aims to be a simple library that can deal with the important parts of the metadata so we can access or download data without having to worry if a given dataset is on-prem or in the cloud.


Library Language Agnostic On-Prem Access Cloud Access Programmatic Subsetting GIS Operations Authentication Full Archive Coverage
earthdata Python βœ… βœ… βœ… No No βœ… βœ…
HarmonyPy Python* βœ… βœ… βœ… βœ… βœ… βœ… No
OpenDAP βœ… βœ… No βœ… βœ… No No βœ…
cmr-stac Python βœ… βœ… βœ… No No No βœ…
Earthdata Portal βœ… βœ… βœ… No No No βœ… βœ…
GDAL βœ…* βœ… βœ… βœ… No βœ…* βœ…* βœ…
rsat R βœ… No βœ… No βœ…* βœ…* No
getSpatialData R βœ… No βœ… No βœ…* βœ…* No

Installing earthdata with conda/mamba

conda -c conda-forge install earthdata

NASA EDL and the Auth class

What is Earthdata Login (EDL)?

Earthdata Login provides free and immediate access to thousands of EOSDIS data products covering all Earth science disciplines and topic areas for researchers, applied science users, application developers, and the general public. For more information about Earthdata Login benefits, features, and terms of service, go to What do I need to know about Earthdata Login.To learn more about EOSDIS and its mission to meet the needs of diverse users, please visit our Earthdata Website.

The Auth class will handle authentication with NASA Earthdata for both on-prem or cloud-hosted datasets

# We import the classes from earthdata
from earthdata import Auth, DataCollections, DataGranules, Store

auth = Auth()

# First we try to use a .netrc, if it's not present we use the interactive login
if not auth.login(strategy="netrc"):
    auth.login(strategy="interactive")
auth
You're now authenticated with NASA Earthdata Login
<earthdata.auth.Auth at 0x7f550afc7bb0>

1. Querying for data collections (datasets)

The DataCollection client can query CMR for any collection using all of CMR’s Query parameters and has built-in accessors for the common ones. This makes it ideal for one liners and easier search notation.

Note: use bbox finder to get bounding box coordinates of an area of interest, bboxfinder

CMR API Documentation

from pprint import pprint
# We can now search for collections using a pythonic API client for CMR.

# CollectionQuery = DataCollections().keyword('elevation change').bounding_box(-134.7,58.9,-133.9,59.2).temporal("2016-01-01", "2020-12-12")

CollectionQuery = DataCollections().parameters(
    keyword = 'earth wind fire',
    bounding_box = (-134.7,58.9,-133.9,59.2),
    temporal = ("2016-01-01", "2020-12-12")
)

print(f'Collections found: {CollectionQuery.hits()}')

# filtering what UMM fields to print using display(), meta is always included
collections = CollectionQuery.fields(['ShortName', 'Abstract', 'Version']).get(5)
# Inspect 5 results printing just the ShortName and Abstract
for collection in collections:
    # print(collection["meta"]["concept-id"])
    print(collection.concept_id(), collection.version())
    # pprint(collection)
    # display(collection)
Collections found: 5
C2213569167-ORNL_DAAC 1
C1917875080-LARC_ASDC 1
C2065183159-LARC_ASDC 1
C2244201513-LARC_ASDC 1
C1924032947-LARC_ASDC 1

The DataCollections class returns python dictionaries with some handy methods.

collection.concept_id() # returns the concept-id, used to search for data granules
collection.abstract() # returns the abstract
collection.landing_page() # returns the landing page if present in the UMM fields
collection.get_data() # returns the portal where data can be accessed.

The same results can be obtained using the dict syntax:

collection["meta"]["concept-id"] # concept-id
collection["umm"]["RelatedUrls"] # URLs, with GET DATA, LANDING PAGE etc
# What if we want cloud collections
CollectionQuery = DataCollections().daac("PODAAC").cloud_hosted(True)

print(f'Collections found: {CollectionQuery.hits()}')
collections = CollectionQuery.fields(['ShortName']).get(10)
# Printing 3 collections
collections[0]
Collections found: 327
{
  "meta": {
    "concept-id": "C1940473819-POCLOUD",
    "granule-count": 2385823,
    "provider-id": "POCLOUD"
  },
  "umm": {
    "ShortName": "MODIS_A-JPL-L2P-v2019.0"
  }
}
# Printing the concept-id for the first 10 collections
[collection.concept_id() for collection in collections]
['C1940473819-POCLOUD',
 'C1940475563-POCLOUD',
 'C2075141559-POCLOUD',
 'C2075141638-POCLOUD',
 'C1996880725-POCLOUD',
 'C1996881146-POCLOUD',
 'C1996880450-POCLOUD',
 'C2036878688-POCLOUD',
 'C2036877595-POCLOUD',
 'C2075141605-POCLOUD']

Cloud or On-prem with a simple parameter

  • cloud_hosted(True) will return cloud collections
  • cloud_hosted(False) will return on-prem collections
ShortName = "SMAP_JPL_L3_SSS_CAP_8DAY-RUNNINGMEAN_V5"

collections = DataCollections().short_name(ShortName).cloud_hosted(True).get()

for collection in collections:
    concept_id = collection.concept_id()
    print(concept_id)
    

collections = DataCollections().short_name(ShortName).cloud_hosted(False).get()

for collection in collections:
    concept_id = collection.concept_id()
    print(concept_id)
C2208422957-POCLOUD
C1972955240-PODAAC

2. Querying for data granules

The DataGranules class provides similar functionality as the collection class. To query for granules in a more reliable way concept-id would be the main key. You can search data granules using a short name but that could (more likely will) return different versions of the same data granules.

In this example we’re querying for 20 data grnaules from ICESat-2 ATL05 version 005 dataset.

# Generally speaking we won't need the auth instance for queries to collections and granules
# Query = DataGranules().short_name('ATL03').version("005").bounding_box(-134.7,58.9,-133.9,59.2)

GranuleQuery = DataGranules().parameters(
    short_name = "ATL03",
    version = "005",
    bounding_box = (-134.7,58.9,-133.9,59.2),
    # day_night_flag = "day",
    # cloud_cover = (0,25),
    # instrument = "MODIS",
    # platform = "TERRA"
)

granules = GranuleQuery.get(3)

for granule in granules:
    # print(granule)
    # pprint(granule)
    display(granule)

Data: https://n5eil01u.ecs.nsidc.org/DP5/ATLAS/ATL03.005/2018.10.14/ATL03_20181014001049_02350102_005_01.h5

Size: 1764.72 MB

Spatial: {'HorizontalSpatialDomain': {'Orbit': {'AscendingCrossing': -127.0482205607256, 'StartLatitude': 27.0, 'StartDirection': 'A', 'EndLatitude': 59.5, 'EndDirection': 'A'}}}

Data PreviewData Preview

Data: https://n5eil01u.ecs.nsidc.org/DP5/ATLAS/ATL03.005/2018.11.09/ATL03_20181109112837_06390106_005_01.h5

Size: 313.02 MB

Spatial: {'HorizontalSpatialDomain': {'Orbit': {'AscendingCrossing': 50.741590031724314, 'StartLatitude': 59.5, 'StartDirection': 'D', 'EndLatitude': 27.0, 'EndDirection': 'D'}}}

Data PreviewData Preview

Data: https://n5eil01u.ecs.nsidc.org/DP5/ATLAS/ATL03.005/2018.11.11/ATL03_20181111224708_06770102_005_01.h5

Size: 1769.47 MB

Spatial: {'HorizontalSpatialDomain': {'Orbit': {'AscendingCrossing': -126.78857810482624, 'StartLatitude': 27.0, 'StartDirection': 'A', 'EndLatitude': 59.5, 'EndDirection': 'A'}}}

Data PreviewData Preview
# Generally speaking we won't need the auth instance for queries to collections and granules
# Query = DataGranules().short_name('ATL03').version("005").bounding_box(-134.7,58.9,-133.9,59.2)

GranuleQuery = DataGranules().parameters(
    short_name = "MITgcm_LLC4320_Pre-SWOT_JPL_L4_ACC_SMST_v1.0",
    # temporal = ("2012-01-01")
    temporal = ("2012-01-01", "2012-01-05")
)

granules = GranuleQuery.get()

for granule in granules:
    # print(granule)
    # pprint(granule)
    display(granule)

Data: https://archive.podaac.earthdata.nasa.gov/podaac-ops-cumulus-protected/MITgcm_LLC4320_Pre-SWOT_JPL_L4_ACC_SMST_v1.0/LLC4320_pre-SWOT_ACC_SMST_20120101.ncs3://podaac-ops-cumulus-protected/MITgcm_LLC4320_Pre-SWOT_JPL_L4_ACC_SMST_v1.0/LLC4320_pre-SWOT_ACC_SMST_20120101.nc

Size: 4987.44 MB

Spatial: {'HorizontalSpatialDomain': {'Geometry': {'BoundingRectangles': [{'WestBoundingCoordinate': 148.01, 'SouthBoundingCoordinate': -57.498, 'EastBoundingCoordinate': 157.99, 'NorthBoundingCoordinate': -53.006}]}}}