05. Direct S3 Data Access with rioxarray

Timing

  • Exercise: 20 minutes

Summary

In the previous exercises we searched for and discovered cloud data assets that met certain search criteria (i.e., intersects with our region of interest and for a specified date range). The end goal was to find and save web links to the data assets we want to use in our workflow. The links we found allow us to download data via HTTPS (Hypertext Transfer Protocol Secure). However, NASA allows for direct in-region S3 bucket access for the same assets. In addition to saving the HTTPS links, we also created and saved the S3 links for those same cloud assets and we will use them here. In this exercise we will demonstrate how to perform direct in-region S3 bucket access for Harmonized Landsat Sentinel-2 (HLS) cloud data assets.

Direct S3 Access

NASA Earthdata Cloud provides two pathways for accessing data from the cloud. The first is via HTTPS. The other is through direct S3 bucket access. Below are some benefits and considerations when choosing to use direct S3 bucket access for NASA cloud assets.

Benefits

  • Retrieving data can be much quicker
  • No need to download data! Work with data in a more efficient manner, “next to it, in the cloud”
  • Increased capacity to do parallel processing, due to working in the cloud
  • You are working completely within the AWS cloud ecosystem and thus have access to the might of all AWS offerings (e.g., infrastructure, S3 API, services, etc.)

Considerations

  • If your workflow is in the cloud, choose S3 over HTTPS
  • Access only works within AWS us-west-2 region
  • Need an AWS S3 “token” to access S3 Bucket
  • Token expires after 1 hour (currently)
  • Token only works at the DAAC that generates it, e.g.,
  • Direct S3 access on its own does not solve ‘cloud’ problems, but it is one key technology in solving big data problems
  • Still have to load things in to memory, parallelize the computation, if working with really large data volumes. There are a lot of tools that allow you to do that, but are not discussed in this tutorial

What you will learn from this tutorial

  • how to retrieve temporary S3 credentials for in-region direct S3 bucket access
  • how to configure our notebook environment for in-region direct S3 bucket access
  • how to access a single HLS file via in-region direct S3 bucket access
  • how to create an HLS time series data array from cloud assets via in-region direct S3 bucket access
  • how to plot results

This exercise can be found in the 2021 Cloud Hackathon Book


Import Required Packages

%matplotlib inline
import matplotlib.pyplot as plt
from datetime import datetime
import os
import requests
import boto3
import numpy as np
import xarray as xr
import rasterio as rio
from rasterio.session import AWSSession
from rasterio.plot import show
import rioxarray
import geoviews as gv
import hvplot.xarray
import holoviews as hv
gv.extension('bokeh', 'matplotlib')