Appendix

Extra resources for Earthdata Cloud.

About the Cloud

Maybe you’ve heard that NASA Earthdata is “moving to the cloud” but you want to know why. You can read the details of the Earthdata Cloud Evolution, but here we summarize the benefits of the cloud and additional resources on its use and history. In short, the cloud will make the download of data unnecessary, allow for data processing and manipulation without new software purchases or installation, and, ultimately, reduce the amount of time it takes to get the data needed to do science.

Amazon Web Services

NASA’s Office of the Chief Information Officer chose Amazon Web Services (AWS) as the source of general-purpose cloud services (but some areas within NASA are working with Google Earth Engine (GEE) to make NASA data accessible in the GEE cloud-based analysis platform). The following resources provide a background on AWS, but much of the information is relevant to folks who want to develop in the cloud rather than simply access data. Remember, all NASA’s science information (including the algorithms, metadata, and documentation associated with science mission data) must be freely available to the public. This means that anyone, anywhere in the world, can access NASA Earth science data without restriction. However, advanced cloud operations could require a user to set-up their own cloud account through AWS or another cloud provider.

  • Cloud Primer for Amazon Web Services This primer provides step-by-step tutorials on how to get started in the AWS cloud.

  • What is AWS Amazon Web Services is the world’s most comprehensive and broadly adopted cloud, offering over 200 fully featured services from data centers globally.

Cloud Optimized Data Formats

Traditional file formats can easily be migrated to the cloud, but serving or processing the data from the cloud is inefficient and often requires that the data be downloaded and then translated to another format and stored in memory. Cloud optimized formats are being developed to better serve analysis-in-place workflows that make the cloud so beneficial to science users.

  • Cloud-Optimized Format Study The cloud infrastructure provides a number of capabilities that can dramatically improve access and use of Earth Observation data. However, in many cases, data may need to be reorganized and/or reformatted in order to make them tractable to support cloud-native analysis and access patterns. The purpose of this study is to examine different formats for storing data on the cloud.

  • Cloud Optimized GeoTIFF A Cloud Optimized GeoTIFF is a regular GeoTIFF file with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing ​HTTP GET range requests to ask for just the parts of a file they need.

  • Cloud Optimized Formats: NetCDF-as-Zarr Optimizations and Next Steps Building on the work by USGS/HDF to access netCDF as Zarr, the authors found that a sidecar metadata record that includes byte offsets provides users “access HDF5 format data as efficiently as Zarr format data using the Zarr library.” In other words, users can gain the cloud-optimized performance of Zarr while retaining the archival benefits of NetCDF4.

Environments

Development on AWS

Python

  • Python on AWS Tools, docs, and sample code to develop applications on the AWS cloud.

R

Additional Coding Resources

Python

R