Programmatic access and processing of NSIDC data can happen in 2 ways, using the old Search -> Download -> Analize pattern or using a more modern Search -> Process_in_the_cloud -> Analyze approach.
There is nothing wrong with downloading data to our local machine but that can get complicated or even impossible if a dataset is too large. For this reason NSIDC along with other NASA data centers started to collocate or migrate their dataset holdings to the cloud.
In order to use NSIDC cloud collections we need to 1. Authenticate ourselves with the NASA Earthdata Login API (EDL). 2. Search granules/collections using a CMR client that supports authentication 3. Parse CMR responses looking for AWS S3 URLs 4. Access the data granules using temporary AWS credentials given by the NSIDC cloud credentials endpoint
Data used:
ICESat-2 ATL03: This data set contains height above the WGS 84 ellipsoid (ITRF2014 reference frame), latitude, longitude, and time for all photons.
Most collections at NSIDC have not being migrated to the cloud and can be found using CMR with no authentication at all. Here is a simple example for altimeter data (ATL03) coming from the ICESat-2 mission. First we’ll search the regular collection and then we’ll do the same using the cloud collection.
Note: This notebook uses a low level CMR endpoint, this won’t be not the only workflow for data discovery.
from cmr.search import collection as cmr_collectionfrom cmr.search import granule from cmr.auth import tokenimport textwrap# NON_AWS collections are hosted at the NSIDC DAAC data center# AWS_CLOUD collections are hosted at AWS S3 us-west-2NSIDC_PROVIDERS = {'NSIDC_HOSTED': 'NSIDC_ECS', 'AWS_HOSTED':'NSIDC_CPRD'}# First let's search for some collections hosted at NSIDC using a keywordcollections = cmr_collection.search({'keyword':'ice','provider': NSIDC_PROVIDERS['NSIDC_HOSTED']})# Let's print some information about the first 3 collection that match our providerfor collection in collections[0:3]: wrapped_abstract ='\n'.join(textwrap.wrap(f"Abstract: {collection['umm']['Abstract']}", 80)) +'\n'print(f"concept-id: {collection['meta']['concept-id']}\n"+f"Title: {collection['umm']['EntryTitle']}\n"+ wrapped_abstract)
concept-id: C1997321091-NSIDC_ECS
Title: ATLAS/ICESat-2 L2A Global Geolocated Photon Data V004
Abstract: This data set (ATL03) contains height above the WGS 84 ellipsoid
(ITRF2014 reference frame), latitude, longitude, and time for all photons
downlinked by the Advanced Topographic Laser Altimeter System (ATLAS) instrument
on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory.
The ATL03 product was designed to be a single source for all photon data and
ancillary information needed by higher-level ATLAS/ICESat-2 products. As such,
it also includes spacecraft and instrument parameters and ancillary data not
explicitly required for ATL03.
concept-id: C1705401930-NSIDC_ECS
Title: ATLAS/ICESat-2 L2A Global Geolocated Photon Data V003
Abstract: This data set (ATL03) contains height above the WGS 84 ellipsoid
(ITRF2014 reference frame), latitude, longitude, and time for all photons
downlinked by the Advanced Topographic Laser Altimeter System (ATLAS) instrument
on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory.
The ATL03 product was designed to be a single source for all photon data and
ancillary information needed by higher-level ATLAS/ICESat-2 products. As such,
it also includes spacecraft and instrument parameters and ancillary data not
explicitly required for ATL03.
concept-id: C2003771331-NSIDC_ECS
Title: ATLAS/ICESat-2 L3A Land Ice Height V004
Abstract: This data set (ATL06) provides geolocated, land-ice surface heights
(above the WGS 84 ellipsoid, ITRF2014 reference frame), plus ancillary
parameters that can be used to interpret and assess the quality of the height
estimates. The data were acquired by the Advanced Topographic Laser Altimeter
System (ATLAS) instrument on board the Ice, Cloud and land Elevation Satellite-2
(ICESat-2) observatory.
# Now let's do the same with short names, a more specific way of finding data.#First let's search for some collections hosted at NSIDCcollections = cmr_collection.search({'short_name':'ATL03','provider': NSIDC_PROVIDERS['NSIDC_HOSTED']})# Note how we get back the same collection twice, that's because we have 2 versions available.for collection in collections[0:3]: wrapped_abstract ='\n'.join(textwrap.wrap(f"Abstract: {collection['umm']['Abstract']}", 80)) +'\n'print(f"concept-id: {collection['meta']['concept-id']}\n"+f"Title: {collection['umm']['EntryTitle']}\n"+ wrapped_abstract)
concept-id: C1997321091-NSIDC_ECS
Title: ATLAS/ICESat-2 L2A Global Geolocated Photon Data V004
Abstract: This data set (ATL03) contains height above the WGS 84 ellipsoid
(ITRF2014 reference frame), latitude, longitude, and time for all photons
downlinked by the Advanced Topographic Laser Altimeter System (ATLAS) instrument
on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory.
The ATL03 product was designed to be a single source for all photon data and
ancillary information needed by higher-level ATLAS/ICESat-2 products. As such,
it also includes spacecraft and instrument parameters and ancillary data not
explicitly required for ATL03.
concept-id: C1705401930-NSIDC_ECS
Title: ATLAS/ICESat-2 L2A Global Geolocated Photon Data V003
Abstract: This data set (ATL03) contains height above the WGS 84 ellipsoid
(ITRF2014 reference frame), latitude, longitude, and time for all photons
downlinked by the Advanced Topographic Laser Altimeter System (ATLAS) instrument
on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory.
The ATL03 product was designed to be a single source for all photon data and
ancillary information needed by higher-level ATLAS/ICESat-2 products. As such,
it also includes spacecraft and instrument parameters and ancillary data not
explicitly required for ATL03.
# now that we have the concept-ids we can look for data granules in that collection and pass spatiotemporal parameters.from cmr_serializer import QueryResult# a bbox over Juneau Icefield # bbox = min Longitude , min Latitude , max Longitude , max Latitude query = {'concept-id': 'C1997321091-NSIDC_ECS','bounding_box': '-135.1977,58.3325,-133.3410,58.9839'}# Querying for ATL03 v3 using its concept-id and a bounding boxresults = granule.search(query, limit=1000)# This is a wrapper with convenient methods to work with CMR query results.granules = QueryResult(results).items()print(f"Total granules found: {len(results)}\n")for g in granules[0:3]: display(g)
# We can access the data links with the data_links()for g in granules[0:10]:print(g.data_links())
Cloud Collections
Some NSIDC cloud collections are not yet public we need to authenticate ourselves with CMR first.
import getpassimport textwrapfrom cmr.search import collection as cmr_collectionfrom cmr.search import granule from cmr.auth import tokenfrom cmr_auth import CMRAuth# NON_AWS collections are hosted at the NSIDC DAAC data center# AWS_CLOUD collections are hosted at AWS S3 us-west-2NSIDC_PROVIDERS = {'NSIDC_HOSTED': 'NSIDC_ECS', 'AWS_HOSTED':'NSIDC_CPRD'}# Use your own EDL usernameUSER='betolink'print('Enter your NASA Earthdata login password:')password = getpass.getpass()CMR_auth = CMRAuth(USER, password)# Token to search private collections on CMRcmr_token = CMR_auth.get_token()
Enter your NASA Earthdata login password:
········
# Now let's start our aunthenticated queries on CMRquery = {'short_name':'ATL03','token': cmr_token,'provider': NSIDC_PROVIDERS['AWS_HOSTED']}collections = cmr_collection.search(query)for collection in collections[0:3]: wrapped_abstract ='\n'.join(textwrap.wrap(f"Abstract: {collection['umm']['Abstract']}", 80)) +'\n'print(f"concept-id: {collection['meta']['concept-id']}\n"+f"Title: {collection['umm']['EntryTitle']}\n"+ wrapped_abstract)
concept-id: C2027878642-NSIDC_CPRD
Title: ATLAS/ICESat-2 L2A Global Geolocated Photon Data V004
Abstract: This data set (ATL03) contains height above the WGS 84 ellipsoid
(ITRF2014 reference frame), latitude, longitude, and time for all photons
downlinked by the Advanced Topographic Laser Altimeter System (ATLAS) instrument
on board the Ice, Cloud and land Elevation Satellite-2 (ICESat-2) observatory.
The ATL03 product was designed to be a single source for all photon data and
ancillary information needed by higher-level ATLAS/ICESat-2 products. As such,
it also includes spacecraft and instrument parameters and ancillary data not
explicitly required for ATL03.
# now that we have the concept-id for our ATL03 in the cloud we do the same thing we did with ATL03 hosted atfrom cmr_serializer import QueryResult# NSIDC but using the cloud concept-id# Jeneau ice sheetquery = {'concept-id': 'C2027878642-NSIDC_CPRD','token': cmr_token,'bounding_box': '-135.1977,58.3325,-133.3410,58.9839'}# Querying for ATL03 v3 using its concept-id and a bounding boxresults = granule.search(query, limit=1000)granules = QueryResult(results).items()print(f"Total granules found: {len(results)}\n")# Print the first 3 granulesfor g in granules[0:3]: display(g)# You can use: print(g) for the regular text representation.
NOTE: Not all the data granules for NSIDC datasets have been migrated to S3. This might result in different counts between the NSIDC hosted data collections and the ones in AWS S3
# We can list the s3 links but for g in granules[0:10]:print(g.data_links(only_s3=True))
We note that our RelatedLinks array now contain links to AWS S3, these are the direct URIs for our data granules in the AWS us-west-2 region.
Data Access using AWS S3
IMPORTANT: This section will only work if this notebook is running on the AWS us-west-2 zone
There is more than one way of accessing data on AWS S3, either downloading it to your local machine using the official client library or using a python library.
Performance tip: using the HTTPS URLs will decrease the access performance since these links have to internally be processed by AWS’s content delivery system (CloudFront). To get a better performance we should access the S3:// URLs with BOTO3 or a high level S3 enabled library (i.e. S3FS)
# READ only temporary credentialsimport s3fsimport h5py# This credentials only last 1 hour.s3_cred = CMR_auth.get_s3_credentials()s3_fs = s3fs.S3FileSystem(key=s3_cred['accessKeyId'], secret=s3_cred['secretAccessKey'], token=s3_cred['sessionToken'])# Now you could grab S3 links to your cloud instance (EC2, Hub etc) using:# s3_fs.get('s3://SOME_LOCATION/ATL03_20181015124359_02580106_004_01.h5', 'test.h5')
We now have the propper credentials and file mapper to access the data within AWS us-west-2.
with s3_fs.open('s3://nsidc-cumulus-prod-protected/ATLAS/ATL03/004/2018/10/15/ATL03_20181015124359_02580106_004_01.h5', 'rb') as s3f:with h5py.File(s3f, 'r') as f:print([key for key in f.keys()])
Using xarray to open files on S3
ATL data is complex so xarray doesn’t know how to extract the important bits out of it.
import xarraywith s3_fs.open('s3://nsidc-cumulus-prod-protected/ATLAS/ATL03/004/2018/10/15/ATL03_20181015124359_02580106_004_01.h5', 'rb') as s3f: ds= xarray.open_dataset(s3f)for varname in ds:print(varname)ds
“Downloading” files on S3 using the official aws-cli library
The quotes on downloading are because ideally you’ll be working on an EC2 (virtual machine for short) instance on the us-west-2 region.