Openscapes Community Call: NASA Earthdata Cloud with Coiled
Quicklinks
- Video recording
- Collaborative Notes
- Slides: Processing Terrabytes of Laser Altimetry Data in The Cloud
- Python notebook: Processing Large-scale Time Series of ICESat-2 Sea Ice Height in the Cloud
Our 9th Openscapes Community Call featured NASA Openscapes Mentors and the Coiled team demoing approaches to supporting researchers using NASA Earthdata in the Cloud. This built from a previous demo at the National Snow and Ice Data Center User Working Group that presented different Cloud Environment Opportunities to meet users where they are (blog post).
Going to AGU 2023? Come say hi to the Coiled team at their booth (right at the entrance next to Google)
Background
NASA Openscapes is a project and community supporting researchers using NASA Earthdata in the Cloud. This community call welcomed our speakers Amy Steiker, Luis Lopez, and Andy Barrett from the National Snow and Ice Data Center (NSIDC) who are NASA Openscapes Mentors, and James Bourbeau from Coiled who is collaborating with NASA Openscapes Mentors and Champions science teams.
We followed the Liberating Structures What? So What? Now What? format, with silent journal prompts for reflections and 15 mins of Q&A from questions in chat.
Easy Scalable Cloud Computing with Coiled
The call started with a few demos, first from Andy Barrett and Amy Steiker from NSIDC. Andy shared a science use case based on translating photons measurements from ICESAT-2 to sea ice thickness. These data were first accessed with the earthaccess Python library, then needed to be regridded over geographic areas, which Amy demoed in this Jupyter notebook. Amy ran this code on her laptop and used Coiled to spin up remote virtual machines (VMs) in the cloud to run her computations.
Then, James ran through two common workflows that process terabyte-scale cloud datasets. In the first example, we saw how to churn through many cloud-hosted NASA Earthdata files (~500 GB of NetCDF files) in parallel on the cloud. This involved lightly decorating an existing Python function with the Coiled Function decorator. The entire workflow ran in <10 minutes and cost ~$0.36.
In the next example, we used Xarray to process 6 TB of the cloud-hosted NOAA water model where we computed the average water table depth for each county in the US for the year 2020. We parallelized and distributed the work across 50 VMs using a Coiled cluster. The workflow ran in < 5 minutes and cost ~$1.
Luis commented on how cloud computing is a barrier for many teams, but tools like Coiled provide options for working in the cloud easily. In fact, Coiled is just half the magic (provisioning cloud resources); the rest is the open source packages, which together help science move faster.
Closing
Discussion topics included questions about egress costs, compute time, community standards, and more. See the meeting notes for full details.
Resources
- Geospatial Cloud Resources from Coiled
- Processing Terabyte-Scale NASA Cloud Datasets with Coiled
- Processing a 250 TB dataset with Coiled, Dask, and Xarray
- Cloud Environment Opportunities. Managed JupyterHub options for Cryosphere and NASA Earthdata user communities.
Citation
@online{steiker2023,
author = {Steiker, Amy and Lopez, Luis and Barrett, Andrew and
Lowndes, Julie and Robinson, Erin and Bourbeau, James},
title = {Openscapes {Community} {Call:} {NASA} {Earthdata} {Cloud}
with {Coiled}},
date = {2023-12-08},
url = {https://nasa-openscapes.org/news/2023-12-08-coiled-community-call},
langid = {en}
}