Appendix
Extra resources for Earthdata Cloud.
About the Cloud
Maybe you’ve heard that NASA Earthdata is “moving to the cloud” but you want to know why. You can read the details of the Earthdata Cloud Evolution, but here we summarize the benefits of the cloud and additional resources on its use and history. In short, the cloud will make the download of data unnecessary, allow for data processing and manipulation without new software purchases or installation, and, ultimately, reduce the amount of time it takes to get the data needed to do science.
Amazon Web Services
NASA’s Office of the Chief Information Officer chose Amazon Web Services (AWS) as the source of general-purpose cloud services (but some areas within NASA are working with Google Earth Engine (GEE) to make NASA data accessible in the GEE cloud-based analysis platform). The following resources provide a background on AWS, but much of the information is relevant to folks who want to develop in the cloud rather than simply access data. Remember, all NASA’s science information (including the algorithms, metadata, and documentation associated with science mission data) must be freely available to the public. This means that anyone, anywhere in the world, can access NASA Earth science data without restriction. However, advanced cloud operations could require a user to set-up their own cloud account through AWS or another cloud provider.
Cloud Primer for Amazon Web Services This primer provides step-by-step tutorials on how to get started in the AWS cloud.
What is AWS Amazon Web Services is the world’s most comprehensive and broadly adopted cloud, offering over 200 fully featured services from data centers globally.
Cloud Optimized Data Formats
Traditional file formats can easily be migrated to the cloud, but serving or processing the data from the cloud is inefficient and often requires that the data be downloaded and then translated to another format and stored in memory. Cloud optimized formats are being developed to better serve analysis-in-place workflows that make the cloud so beneficial to science users.
Cloud-Optimized Format Study The cloud infrastructure provides a number of capabilities that can dramatically improve access and use of Earth Observation data. However, in many cases, data may need to be reorganized and/or reformatted in order to make them tractable to support cloud-native analysis and access patterns. The purpose of this study is to examine different formats for storing data on the cloud.
Cloud Optimized GeoTIFF A Cloud Optimized GeoTIFF is a regular GeoTIFF file with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing HTTP GET range requests to ask for just the parts of a file they need.
Cloud Optimized Formats: NetCDF-as-Zarr Optimizations and Next Steps Building on the work by USGS/HDF to access netCDF as Zarr, the authors found that a sidecar metadata record that includes byte offsets provides users “access HDF5 format data as efficiently as Zarr format data using the Zarr library.” In other words, users can gain the cloud-optimized performance of Zarr while retaining the archival benefits of NetCDF4.
Environments
Managing Python Environments This book is intended to introduce students to modern computing software, programming tools, and best practices that are broadly applicable to the analysis and visualization of Earth and Environmental data. This section describes basic programming in the open-source Python language.
Reproducible and upgradable Conda environments with conda-lock
The definitive guide to Python virtual environments with conda
Development on AWS
Python
- Python on AWS Tools, docs, and sample code to develop applications on the AWS cloud.
R
Getting started with R on Amazon Web Services This guide demonstrates how to use AWS in R with the Paws AWS software development kit.
R for Cloud Computing This book will help you kick-start analytics on the cloud including chapters on both cloud computing, R, common tasks performed in analytics.
Additional Coding Resources
Python
- Intro to Geospatial Raster and Vector Data with Python This tutorial provides an introduction to raster data, and describes how to plot, program, and access satellite imagery using Python.
R
R for Data Science Online Learning Community The R4DS Online Learning Community is a community of R learners at all skill levels working together to improve their skills.
The Environmental Data Science Book This book is a living, open and community-driven online resource to showcase and support the publication of data, research and open-source tools for collaborative, reproducible and transparent Environmental Data Science.
CU EarthLab’s Earth Data Science This site offers free online courses, tutorials, and tools for earth science using R and Python.