First Forays into the Cloud - first “fledging” story from NASA Openscapes!

blog
nasa-framework
champions
Aronne Merrelli describes developing cloud workflows with 2i2c, parallelizing workflows with Coiled, and attaching his university credit card to do real science in the cloud.
Author

Aronne Merrelli, University of Michigan

Published

July 22, 2024

Aronne Merrelli is a first example of a scientist “fledging” from the learning JupyterHub managed by 2i2c and Openscapes to another cloud space to do real science - here with Coiled. Aronne says cloud is “like a super power” because he can ask bigger questions, and he shared his story with the 2024 NASA Openscapes Champions science teams. Aronne’s story is available via slides and YouTube and Julie Lowndes is summarizing here as a blog.

Quick links


“In the 2023 Champions Cohort we got a teaser of how to work in the Cloud. With Coiled I did some simple things and now I’ve moved into my own workflow to Cloud. I have an Amazon account now paying my own way.” - Aronne Merrelli

Cloud Takeaways from Aronne Merrelli

Aronne Merrelli, University of Michigan, is a research scientist and a 2023 NASA Openscapes Champion, and he shared his story with 2024 Champions teams. His main takeaways:

  • Finds cloud to be “a super power”

    • New science questions: can process data too big to imagine processing in current places

    • Details: “I did an analysis for my AGU poster: 150 TB of L1 and L2 data, and I only needed some tiny fraction that were of interest to me. In the old way I would need to find a way to download it all first, and I don’t have a machine big enough. But parallelizing with Coiled, I can subset. Now once I realize that, I realize there are bigger datasets that just seemed unworkable before.” - Aronne Merrelli

  • Cloud costs: cheaper than expected

  • Infrastructure: less intensive than expected

    • Shifting workflow to earthaccess then parallelizing with Coiled. Little setup required

Aronne describes himself as a Level 1 or Level 2 algorithm scientist. He had been interested in learning how to use the cloud and had some previous unhelpful experiences, since most discussions revolved around the administration of cloud services that made it seem like there was a large overhead to using the cloud, including specialized packages that seemed hard to use, or inefficient on datasets he uses.

What makes this a fledging story

NASA Openscapes gave a chance for Aaron to learn how to run code in the cloud without much extra overhead. Learning in the 2i2c JupyterHub and then with Coiled, he could focus on the science and not require a cloud optimized data format. The key thing is that he didn’t really need to modify his workflow! He sees cloud computing as a new capability that allows him to do analyses on big data sets that would have been hard to do on any other machine. Here are the steps of what fledging looked like:

  • Learned when and how to cloud (via NASA Openscapes Champions cohort)

    • Cross-DAAC Mentors led, scientist focused, open science, inclusive (Aronne’s previous attempts to learn were geared towards engineers)
  • Experimented in JupyterHub (managed by 2i2c & Openscapes, NASA credits)

    • Extremely easy to get your ‘toes in the water’ running code in the cloud

    • Ran tutorials, prototyped code, easy with earthaccess & corn environment 

  • Experimented in Coiled (managed by Coiled & Openscapes, NASA credits)

    • Very easy to use, essentially zero administrative overhead

    • Learned to parallelize code in small pieces, easy transition to my workflow

  • Did real science in Coiled (managed by Coiled & University credit card!!!)

    • UM’s institutional AWS account is a big help here

slide showing an x-y axis of Workflow scales, with 'calc time' y axis showing laptop/desktop, research group server, university shared compute cluster with small x axis data size. Then, Cloud computing is in the top-right space, not overlapping

Slide from Aronne Merrelli showing how cloud computing is a new way to work: “see this as a new capability that allows me to do analyses on big data sets that would have been hard to do on any other machine.”

Aronne’s recommendations to proceed

For Champions curious what this could look like for them, Aronne suggests they identify a near term task that you can shift to cloud processing. Aronne chose to do the analysis for his AGU poster through Coiled, which fit his 3 criteria:

  1. something you were going to do anyway

  2. requires use of a dataset in the cloud (NASA Earthdata cloud, NOAA open data)

  3. Is “large-ish”, so you will get some immediate benefit (meaning, that you will potentially save yourself some time by doing it in the cloud)

Fledging - building momentum

Aronne’s fledging story is a huge deal, building on the work of many many people for over a decade. It took 10 years of data migration by NASA engineering teams, 3 years of NASA Openscapes Mentors developing common approaches for teaching, to get this fledging story. This first example is the hardest, and we expect more to come (like a flywheel that is toughest to spin the first time, ten times, hundred times.)

Thanks so much to Aronne for sharing what he learned with the NASA Openscapes Mentors, 2024 Champions science teams and other audiences. Thank you to Coiled for partnering with NASA Openscapes and supporting scientists like Aronne!

Citation

BibTeX citation:
@online{merrelli,_university_of_michigan2024,
  author = {Merrelli, University of Michigan, Aronne},
  title = {First {Forays} into the {Cloud} - First “Fledging” Story from
    {NASA} {Openscapes!}},
  date = {2024-07-22},
  url = {https://nasa-openscapes.github.io/news/2024-07-22-aronne-merrelli-fledging-parallelized-science-in-the-cloud/},
  langid = {en}
}
For attribution, please cite this work as:
Merrelli, University of Michigan, Aronne. 2024. “First Forays into the Cloud - First ‘Fledging’ Story from NASA Openscapes!” July 22, 2024. https://nasa-openscapes.github.io/news/2024-07-22-aronne-merrelli-fledging-parallelized-science-in-the-cloud/.