TL;DR: earthdata is a Python package to search, preview and access NASA datasets (on-prem or in the cloud) with a few lines of code.
from earthdata import Auth, DataGranules, Store# first we authenticate with NASA EDLauth = Auth().login(strategy="netrc")# Then we build a Query with spatiotemporal parametersGranuleQuery = DataGranules().concept_id("C1575731655-LPDAAC_ECS").bounding_box(-134.7,58.9,-133.9,59.2)# We get the metadata records from CMRgranules = GranuleQuery.get()# Now it{s time to download (or open) our data granules list with get()files = Store(auth).get(granules, local_path='./data')# Now to the important science!
Why?
There are many ways to access NASA datasets, we can use the Earthdata search portal. We can use DAAC specific portals or tools. We could even use data.gov! Web portals are great but they are not designed for programmatic access and reproducible workflows. This is extremely important in the age of the cloud and reproducible open science.
The good news is that NASA also exposes APIs that allows us to search, transform and access data in a programmatic way. Many of these libraries contain amazing features and some similarities. In this context, earthdata aims to be a simple library that can deal with the important parts of the metadata so we can access or download data without having to worry if a given dataset is on-prem or in the cloud.
Library
Language Agnostic
On-Prem Access
Cloud Access
Programmatic
Subsetting
GIS Operations
Authentication
Full Archive Coverage
earthdata
Python
β
β
β
No
No
β
β
HarmonyPy
Python*
β
β
β
β
β
β
No
OpenDAP
β
β
No
β
β
No
No
β
cmr-stac
Python
β
β
β
No
No
No
β
Earthdata Portal
β
β
β
No
No
No
β
β
GDAL
β *
β
β
β
No
β *
β *
β
rsat
R
β
No
β
No
β *
β *
No
getSpatialData
R
β
No
β
No
β *
β *
No
Installing earthdata with conda/mamba
conda-c conda-forge install earthdata
NASA EDL and the Auth class
What is Earthdata Login (EDL)?
Earthdata Login provides free and immediate access to thousands of EOSDIS data products covering all Earth science disciplines and topic areas for researchers, applied science users, application developers, and the general public. For more information about Earthdata Login benefits, features, and terms of service, go to What do I need to know about Earthdata Login.To learn more about EOSDIS and its mission to meet the needs of diverse users, please visit our Earthdata Website.
The Auth class will handle authentication with NASA Earthdata for both on-prem or cloud-hosted datasets
# We import the classes from earthdatafrom earthdata import Auth, DataCollections, DataGranules, Storeauth = Auth()# First we try to use a .netrc, if it's not present we use the interactive loginifnot auth.login(strategy="netrc"): auth.login(strategy="interactive")auth
You're now authenticated with NASA Earthdata Login
<earthdata.auth.Auth at 0x7f550afc7bb0>
1. Querying for data collections (datasets)
The DataCollection client can query CMR for any collection using all of CMRβs Query parameters and has built-in accessors for the common ones. This makes it ideal for one liners and easier search notation.
Note: use bbox finder to get bounding box coordinates of an area of interest, bboxfinder
from pprint import pprint# We can now search for collections using a pythonic API client for CMR.# CollectionQuery = DataCollections().keyword('elevation change').bounding_box(-134.7,58.9,-133.9,59.2).temporal("2016-01-01", "2020-12-12")CollectionQuery = DataCollections().parameters( keyword ='earth wind fire', bounding_box = (-134.7,58.9,-133.9,59.2), temporal = ("2016-01-01", "2020-12-12"))print(f'Collections found: {CollectionQuery.hits()}')# filtering what UMM fields to print using display(), meta is always includedcollections = CollectionQuery.fields(['ShortName', 'Abstract', 'Version']).get(5)# Inspect 5 results printing just the ShortName and Abstractfor collection in collections:# print(collection["meta"]["concept-id"])print(collection.concept_id(), collection.version())# pprint(collection)# display(collection)
The DataCollections class returns python dictionaries with some handy methods.
collection.concept_id() # returns the concept-id, used to search for data granulescollection.abstract() # returns the abstractcollection.landing_page() # returns the landing page if present in the UMM fieldscollection.get_data() # returns the portal where data can be accessed.
The same results can be obtained using the dict syntax:
collection["meta"]["concept-id"] # concept-idcollection["umm"]["RelatedUrls"] # URLs, with GET DATA, LANDING PAGE etc
# What if we want cloud collectionsCollectionQuery = DataCollections().daac("PODAAC").cloud_hosted(True)print(f'Collections found: {CollectionQuery.hits()}')collections = CollectionQuery.fields(['ShortName']).get(10)# Printing 3 collectionscollections[0]
cloud_hosted(False) will return on-prem collections
ShortName ="SMAP_JPL_L3_SSS_CAP_8DAY-RUNNINGMEAN_V5"collections = DataCollections().short_name(ShortName).cloud_hosted(True).get()for collection in collections: concept_id = collection.concept_id()print(concept_id)collections = DataCollections().short_name(ShortName).cloud_hosted(False).get()for collection in collections: concept_id = collection.concept_id()print(concept_id)
C2208422957-POCLOUD
C1972955240-PODAAC
2. Querying for data granules
The DataGranules class provides similar functionality as the collection class. To query for granules in a more reliable way concept-id would be the main key. You can search data granules using a short name but that could (more likely will) return different versions of the same data granules.
In this example weβre querying for 20 data grnaules from ICESat-2 ATL05 version 005 dataset.
# Generally speaking we won't need the auth instance for queries to collections and granules# Query = DataGranules().short_name('ATL03').version("005").bounding_box(-134.7,58.9,-133.9,59.2)GranuleQuery = DataGranules().parameters( short_name ="ATL03", version ="005", bounding_box = (-134.7,58.9,-133.9,59.2),# day_night_flag = "day",# cloud_cover = (0,25),# instrument = "MODIS",# platform = "TERRA")granules = GranuleQuery.get(3)for granule in granules:# print(granule)# pprint(granule) display(granule)
# Generally speaking we won't need the auth instance for queries to collections and granules# Query = DataGranules().short_name('ATL03').version("005").bounding_box(-134.7,58.9,-133.9,59.2)GranuleQuery = DataGranules().parameters( short_name ="MITgcm_LLC4320_Pre-SWOT_JPL_L4_ACC_SMST_v1.0",# temporal = ("2012-01-01") temporal = ("2012-01-01", "2012-01-05"))granules = GranuleQuery.get()for granule in granules:# print(granule)# pprint(granule) display(granule)