%matplotlib inline
import matplotlib.pyplot as plt
from datetime import datetime
import os
import requests
import boto3
import numpy as np
import xarray as xr
import rasterio as rio
from rasterio.session import AWSSession
from rasterio.plot import show
import rioxarray
import geoviews as gv
import hvplot.xarray
import holoviews as hv
'bokeh', 'matplotlib') gv.extension(
05. Direct S3 Data Access with rioxarray
Timing
- Exercise: 20 minutes
Summary
In the previous exercises we searched for and discovered cloud data assets that met certain search criteria (i.e., intersects with our region of interest and for a specified date range). The end goal was to find and save web links to the data assets we want to use in our workflow. The links we found allow us to download data via HTTPS (Hypertext Transfer Protocol Secure). However, NASA allows for direct in-region S3 bucket access for the same assets. In addition to saving the HTTPS links, we also created and saved the S3 links for those same cloud assets and we will use them here. In this exercise we will demonstrate how to perform direct in-region S3 bucket access for Harmonized Landsat Sentinel-2 (HLS) cloud data assets.
Direct S3 Access
NASA Earthdata Cloud provides two pathways for accessing data from the cloud. The first is via HTTPS. The other is through direct S3 bucket access. Below are some benefits and considerations when choosing to use direct S3 bucket access for NASA cloud assets.
Benefits
- Retrieving data can be much quicker
- No need to download data! Work with data in a more efficient manner, “next to it, in the cloud”
- Increased capacity to do parallel processing, due to working in the cloud
- You are working completely within the AWS cloud ecosystem and thus have access to the might of all AWS offerings (e.g., infrastructure, S3 API, services, etc.)
Considerations
- If your workflow is in the cloud, choose S3 over HTTPS
- Access only works within AWS us-west-2 region
- Need an AWS S3 “token” to access S3 Bucket
- Token expires after 1 hour (currently)
- Token only works at the DAAC that generates it, e.g.,
- PO.DAAC token generator: https://archive.podaac.earthdata.nasa.gov/s3credentials
- LP DAAC token generator: https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials
- PO.DAAC token generator: https://archive.podaac.earthdata.nasa.gov/s3credentials
- Direct S3 access on its own does not solve ‘cloud’ problems, but it is one key technology in solving big data problems
- Still have to load things in to memory, parallelize the computation, if working with really large data volumes. There are a lot of tools that allow you to do that, but are not discussed in this tutorial
What you will learn from this tutorial
- how to retrieve temporary S3 credentials for in-region direct S3 bucket access
- how to configure our notebook environment for in-region direct S3 bucket access
- how to access a single HLS file via in-region direct S3 bucket access
- how to create an HLS time series data array from cloud assets via in-region direct S3 bucket access
- how to plot results
This exercise can be found in the 2021 Cloud Hackathon Book
Import Required Packages
Configure Local Environment and Get Temporary Credentials
To perform direct S3 data access one needs to acquire temporary S3 credentials. The credentials give users direct access to S3 buckets in NASA Earthdata Cloud. AWS credentials should not be shared, so take precautions when using them in notebooks our scripts. Note, these temporary credentials are valid for only 1 hour. For more information regarding the temporary credentials visit https://data.lpdaac.earthdatacloud.nasa.gov/s3credentialsREADME. A netrc
file is required to aquire these credentials. Use the NASA Earthdata Authentication to create a netrc
file in your home directory.
= 'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials' s3_cred_endpoint
def get_temp_creds():
= s3_cred_endpoint
temp_creds_url return requests.get(temp_creds_url).json()
= get_temp_creds()
temp_creds_req #temp_creds_req # !!! BEWARE, removing the # on this line will print your temporary S3 credentials.
Insert the credentials into our boto3
session and configure our rasterio
environment for data access
Create a boto3 Session object using your temporary credentials. This Session is used to pass credentials and configuration to AWS so we can interact wit S3 objects from applicable buckets.
= boto3.Session(aws_access_key_id=temp_creds_req['accessKeyId'],
session =temp_creds_req['secretAccessKey'],
aws_secret_access_key=temp_creds_req['sessionToken'],
aws_session_token='us-west-2') region_name
For this exercise, we are going to open up a context manager for the notebook using the rasterio.env
module to store the required GDAL and AWS configurations we need to access the data in Earthdata Cloud. While the context manager is open (rio_env.__enter__()
) we will be able to run the open or get data commands that would typically be executed within a with
statement, thus allowing us to more freely interact with the data. We’ll close the context (rio_env.__exit__()
) at the end of the notebook.
GDAL environment variables must be configured to access Earthdata Cloud data assets. Geospatial data access Python packages like rasterio
and rioxarray
depend on GDAL, leveraging GDAL’s “Virtual File Systems” to read remote files. GDAL has a lot of environment variables that control it’s behavior. Changing these settings can mean the difference being able to access a file or not. They can also have an impact on the performance.
= rio.Env(AWSSession(session),
rio_env ='EMPTY_DIR',
GDAL_DISABLE_READDIR_ON_OPEN=os.path.expanduser('~/cookies.txt'),
GDAL_HTTP_COOKIEFILE=os.path.expanduser('~/cookies.txt'))
GDAL_HTTP_COOKIEJAR__enter__() rio_env.
<rasterio.env.Env at 0x7f1619fbb0a0>
Read in S3 Links
In the CMR-STAC API tutorial we saved off multiple text file containing links, both HTTPS and S3 links, to Harmonized Landsat Sentinel-2 (HLS) cloud data assets. We will now read in one of those file and show how to access those data assets.
List the available files in the data directory
NOTE: If you are running the notebook from the tutorials-templates directory, please use the following path to connect to the data directoy: “../tutorials/data”
for f in os.listdir('./data') if '.txt' in f] [f
['HTTPS_T13TGF_B02_Links.txt',
'S3_T13TGF_B05_Links.txt',
'HTTPS_T13TGF_Fmask_Links.txt',
'S3_T13TGF_B8A_Links.txt',
'HTTPS_T13TGF_B04_Links.txt',
'S3_T13TGF_B04_Links.txt',
'S3_T13TGF_Fmask_Links.txt',
'HTTPS_T13TGF_B8A_Links.txt',
'HTTPS_T13TGF_B05_Links.txt',
'S3_T13TGF_B02_Links.txt']
We will save our list of links and a single link as Python objects for use later.
NOTE: If you are running the notebook from the tutorials-templates directory, please use the following path to connect to the data directoy: “../tutorials/data”
= open('./data/S3_T13TGF_B04_Links.txt').read().splitlines()
s3_links s3_links
['s3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021133T173859.v2.0/HLS.S30.T13TGF.2021133T173859.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021140T173021.v2.0/HLS.L30.T13TGF.2021140T173021.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021140T172859.v2.0/HLS.S30.T13TGF.2021140T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021145T172901.v2.0/HLS.S30.T13TGF.2021145T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021155T172901.v2.0/HLS.S30.T13TGF.2021155T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021156T173029.v2.0/HLS.L30.T13TGF.2021156T173029.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021158T173901.v2.0/HLS.S30.T13TGF.2021158T173901.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021163T173909.v2.0/HLS.S30.T13TGF.2021163T173909.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021165T172422.v2.0/HLS.L30.T13TGF.2021165T172422.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021165T172901.v2.0/HLS.S30.T13TGF.2021165T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021173T173909.v2.0/HLS.S30.T13TGF.2021173T173909.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021185T172901.v2.0/HLS.S30.T13TGF.2021185T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021188T173037.v2.0/HLS.L30.T13TGF.2021188T173037.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021190T172859.v2.0/HLS.S30.T13TGF.2021190T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021193T173909.v2.0/HLS.S30.T13TGF.2021193T173909.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021198T173911.v2.0/HLS.S30.T13TGF.2021198T173911.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021200T172859.v2.0/HLS.S30.T13TGF.2021200T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021203T173909.v2.0/HLS.S30.T13TGF.2021203T173909.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021204T173042.v2.0/HLS.L30.T13TGF.2021204T173042.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021208T173911.v2.0/HLS.S30.T13TGF.2021208T173911.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021210T172859.v2.0/HLS.S30.T13TGF.2021210T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021215T172901.v2.0/HLS.S30.T13TGF.2021215T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021218T173911.v2.0/HLS.S30.T13TGF.2021218T173911.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021220T173049.v2.0/HLS.L30.T13TGF.2021220T173049.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021220T172859.v2.0/HLS.S30.T13TGF.2021220T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021223T173909.v2.0/HLS.S30.T13TGF.2021223T173909.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021228T173911.v2.0/HLS.S30.T13TGF.2021228T173911.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021229T172441.v2.0/HLS.L30.T13TGF.2021229T172441.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021230T172859.v2.0/HLS.S30.T13TGF.2021230T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021235T172901.v2.0/HLS.S30.T13TGF.2021235T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021243T173859.v2.0/HLS.S30.T13TGF.2021243T173859.v2.0.B04.tif']
= s3_links[0]
s3_link s3_link
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B04.tif'
Read in a single HLS file
We’ll access the HLS S3 object using the rioxarray
Python package. The package is an extension of xarray
and rasterio
, allowing users to read in and interact with geospatial data using xarray data structures. We will also be leveraging the tight integration between xarray and dask to lazily read in data via the chunks
parameter. This allows us to connect to the HLS S3 object, reading only metadata, an not load the data into memory until we request it via the loads()
function.
= rioxarray.open_rasterio(s3_link, chuncks=True)
hls_da hls_da
<xarray.DataArray (band: 1, y: 3660, x: 3660)> [13395600 values with dtype=int16] Coordinates: * band (band) int64 1 * x (x) float64 7e+05 7e+05 7e+05 ... 8.097e+05 8.097e+05 8.097e+05 * y (y) float64 4.6e+06 4.6e+06 4.6e+06 ... 4.49e+06 4.49e+06 spatial_ref int64 0 Attributes: _FillValue: -9999.0 scale_factor: 0.0001 add_offset: 0.0 long_name: Red
- band: 1
- y: 3660
- x: 3660
- ...
[13395600 values with dtype=int16]
- band(band)int641
array([1])
- x(x)float647e+05 7e+05 ... 8.097e+05 8.097e+05
array([699975., 700005., 700035., ..., 809685., 809715., 809745.])
- y(y)float644.6e+06 4.6e+06 ... 4.49e+06
array([4600005., 4599975., 4599945., ..., 4490295., 4490265., 4490235.])
- spatial_ref()int640
- crs_wkt :
- PROJCS["UTM Zone 13, Northern Hemisphere",GEOGCS["Unknown datum based upon the WGS 84 ellipsoid",DATUM["Not_specified_based_on_WGS_84_spheroid",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
- semi_major_axis :
- 6378137.0
- semi_minor_axis :
- 6356752.314245179
- inverse_flattening :
- 298.257223563
- reference_ellipsoid_name :
- WGS 84
- longitude_of_prime_meridian :
- 0.0
- prime_meridian_name :
- Greenwich
- geographic_crs_name :
- Unknown datum based upon the WGS 84 ellipsoid
- horizontal_datum_name :
- Not_specified_based_on_WGS_84_spheroid
- projected_crs_name :
- UTM Zone 13, Northern Hemisphere
- grid_mapping_name :
- transverse_mercator
- latitude_of_projection_origin :
- 0.0
- longitude_of_central_meridian :
- -105.0
- false_easting :
- 500000.0
- false_northing :
- 0.0
- scale_factor_at_central_meridian :
- 0.9996
- spatial_ref :
- PROJCS["UTM Zone 13, Northern Hemisphere",GEOGCS["Unknown datum based upon the WGS 84 ellipsoid",DATUM["Not_specified_based_on_WGS_84_spheroid",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
- GeoTransform :
- 699960.0 30.0 0.0 4600020.0 0.0 -30.0
array(0)
- _FillValue :
- -9999.0
- scale_factor :
- 0.0001
- add_offset :
- 0.0
- long_name :
- Red
When GeoTIFFS/Cloud Optimized GeoTIFFS are read in, a band
coordinate variable is automatically created (see the print out above). In this exercise we will not use that coordinate variable, so we will remove it using the squeeze()
function to avoid confusion.
= hls_da.squeeze('band', drop=True)
hls_da hls_da
<xarray.DataArray (y: 3660, x: 3660)> [13395600 values with dtype=int16] Coordinates: * x (x) float64 7e+05 7e+05 7e+05 ... 8.097e+05 8.097e+05 8.097e+05 * y (y) float64 4.6e+06 4.6e+06 4.6e+06 ... 4.49e+06 4.49e+06 spatial_ref int64 0 Attributes: _FillValue: -9999.0 scale_factor: 0.0001 add_offset: 0.0 long_name: Red
- y: 3660
- x: 3660
- ...
[13395600 values with dtype=int16]
- x(x)float647e+05 7e+05 ... 8.097e+05 8.097e+05
array([699975., 700005., 700035., ..., 809685., 809715., 809745.])
- y(y)float644.6e+06 4.6e+06 ... 4.49e+06
array([4600005., 4599975., 4599945., ..., 4490295., 4490265., 4490235.])
- spatial_ref()int640
- crs_wkt :
- PROJCS["UTM Zone 13, Northern Hemisphere",GEOGCS["Unknown datum based upon the WGS 84 ellipsoid",DATUM["Not_specified_based_on_WGS_84_spheroid",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
- semi_major_axis :
- 6378137.0
- semi_minor_axis :
- 6356752.314245179
- inverse_flattening :
- 298.257223563
- reference_ellipsoid_name :
- WGS 84
- longitude_of_prime_meridian :
- 0.0
- prime_meridian_name :
- Greenwich
- geographic_crs_name :
- Unknown datum based upon the WGS 84 ellipsoid
- horizontal_datum_name :
- Not_specified_based_on_WGS_84_spheroid
- projected_crs_name :
- UTM Zone 13, Northern Hemisphere
- grid_mapping_name :
- transverse_mercator
- latitude_of_projection_origin :
- 0.0
- longitude_of_central_meridian :
- -105.0
- false_easting :
- 500000.0
- false_northing :
- 0.0
- scale_factor_at_central_meridian :
- 0.9996
- spatial_ref :
- PROJCS["UTM Zone 13, Northern Hemisphere",GEOGCS["Unknown datum based upon the WGS 84 ellipsoid",DATUM["Not_specified_based_on_WGS_84_spheroid",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
- GeoTransform :
- 699960.0 30.0 0.0 4600020.0 0.0 -30.0
array(0)
- _FillValue :
- -9999.0
- scale_factor :
- 0.0001
- add_offset :
- 0.0
- long_name :
- Red
Plot the HLS S3 object
='x', y='y', cmap='fire', rasterize=True, width=800, height=600, colorbar=True) # colormaps -> https://holoviews.org/user_guide/Colormaps.html hls_da.hvplot.image(x
We can print out the data value as a numpy array by typing .values
hls_da.values
array([[-9999, -9999, -9999, ..., 1527, 1440, 1412],
[-9999, -9999, -9999, ..., 1493, 1476, 1407],
[-9999, -9999, -9999, ..., 1466, 1438, 1359],
...,
[-9999, -9999, -9999, ..., 1213, 1295, 1159],
[-9999, -9999, -9999, ..., 1042, 1232, 1185],
[-9999, -9999, -9999, ..., 954, 1127, 1133]], dtype=int16)
Up to this point, we have not saved anything but metadata into memory. To save or load the data into memory we can call the .load()
function.
= hls_da.load()
hls_da_data hls_da_data
<xarray.DataArray (y: 3660, x: 3660)> array([[-9999, -9999, -9999, ..., 1527, 1440, 1412], [-9999, -9999, -9999, ..., 1493, 1476, 1407], [-9999, -9999, -9999, ..., 1466, 1438, 1359], ..., [-9999, -9999, -9999, ..., 1213, 1295, 1159], [-9999, -9999, -9999, ..., 1042, 1232, 1185], [-9999, -9999, -9999, ..., 954, 1127, 1133]], dtype=int16) Coordinates: * x (x) float64 7e+05 7e+05 7e+05 ... 8.097e+05 8.097e+05 8.097e+05 * y (y) float64 4.6e+06 4.6e+06 4.6e+06 ... 4.49e+06 4.49e+06 spatial_ref int64 0 Attributes: _FillValue: -9999.0 scale_factor: 0.0001 add_offset: 0.0 long_name: Red
- y: 3660
- x: 3660
- -9999 -9999 -9999 -9999 -9999 -9999 ... 1676 1486 1112 954 1127 1133
array([[-9999, -9999, -9999, ..., 1527, 1440, 1412], [-9999, -9999, -9999, ..., 1493, 1476, 1407], [-9999, -9999, -9999, ..., 1466, 1438, 1359], ..., [-9999, -9999, -9999, ..., 1213, 1295, 1159], [-9999, -9999, -9999, ..., 1042, 1232, 1185], [-9999, -9999, -9999, ..., 954, 1127, 1133]], dtype=int16)
- x(x)float647e+05 7e+05 ... 8.097e+05 8.097e+05
array([699975., 700005., 700035., ..., 809685., 809715., 809745.])
- y(y)float644.6e+06 4.6e+06 ... 4.49e+06
array([4600005., 4599975., 4599945., ..., 4490295., 4490265., 4490235.])
- spatial_ref()int640
- crs_wkt :
- PROJCS["UTM Zone 13, Northern Hemisphere",GEOGCS["Unknown datum based upon the WGS 84 ellipsoid",DATUM["Not_specified_based_on_WGS_84_spheroid",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
- semi_major_axis :
- 6378137.0
- semi_minor_axis :
- 6356752.314245179
- inverse_flattening :
- 298.257223563
- reference_ellipsoid_name :
- WGS 84
- longitude_of_prime_meridian :
- 0.0
- prime_meridian_name :
- Greenwich
- geographic_crs_name :
- Unknown datum based upon the WGS 84 ellipsoid
- horizontal_datum_name :
- Not_specified_based_on_WGS_84_spheroid
- projected_crs_name :
- UTM Zone 13, Northern Hemisphere
- grid_mapping_name :
- transverse_mercator
- latitude_of_projection_origin :
- 0.0
- longitude_of_central_meridian :
- -105.0
- false_easting :
- 500000.0
- false_northing :
- 0.0
- scale_factor_at_central_meridian :
- 0.9996
- spatial_ref :
- PROJCS["UTM Zone 13, Northern Hemisphere",GEOGCS["Unknown datum based upon the WGS 84 ellipsoid",DATUM["Not_specified_based_on_WGS_84_spheroid",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
- GeoTransform :
- 699960.0 30.0 0.0 4600020.0 0.0 -30.0
array(0)
- _FillValue :
- -9999.0
- scale_factor :
- 0.0001
- add_offset :
- 0.0
- long_name :
- Red
del(hls_da_data)
Read in HLS as a time series
Now we’ll read in multiple HLS S3 objects as a time series xarray. Let’s print the links list again to see what we’re working with.
s3_links
['s3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021133T172406.v2.0/HLS.L30.T13TGF.2021133T172406.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021133T173859.v2.0/HLS.S30.T13TGF.2021133T173859.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021140T173021.v2.0/HLS.L30.T13TGF.2021140T173021.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021140T172859.v2.0/HLS.S30.T13TGF.2021140T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021145T172901.v2.0/HLS.S30.T13TGF.2021145T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021155T172901.v2.0/HLS.S30.T13TGF.2021155T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021156T173029.v2.0/HLS.L30.T13TGF.2021156T173029.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021158T173901.v2.0/HLS.S30.T13TGF.2021158T173901.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021163T173909.v2.0/HLS.S30.T13TGF.2021163T173909.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021165T172422.v2.0/HLS.L30.T13TGF.2021165T172422.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021165T172901.v2.0/HLS.S30.T13TGF.2021165T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021173T173909.v2.0/HLS.S30.T13TGF.2021173T173909.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021185T172901.v2.0/HLS.S30.T13TGF.2021185T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021188T173037.v2.0/HLS.L30.T13TGF.2021188T173037.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021190T172859.v2.0/HLS.S30.T13TGF.2021190T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021193T173909.v2.0/HLS.S30.T13TGF.2021193T173909.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021198T173911.v2.0/HLS.S30.T13TGF.2021198T173911.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021200T172859.v2.0/HLS.S30.T13TGF.2021200T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021203T173909.v2.0/HLS.S30.T13TGF.2021203T173909.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021204T173042.v2.0/HLS.L30.T13TGF.2021204T173042.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021208T173911.v2.0/HLS.S30.T13TGF.2021208T173911.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021210T172859.v2.0/HLS.S30.T13TGF.2021210T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021215T172901.v2.0/HLS.S30.T13TGF.2021215T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021218T173911.v2.0/HLS.S30.T13TGF.2021218T173911.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021220T173049.v2.0/HLS.L30.T13TGF.2021220T173049.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021220T172859.v2.0/HLS.S30.T13TGF.2021220T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021223T173909.v2.0/HLS.S30.T13TGF.2021223T173909.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021228T173911.v2.0/HLS.S30.T13TGF.2021228T173911.v2.0.B04.tif',
's3://lp-prod-protected/HLSL30.020/HLS.L30.T13TGF.2021229T172441.v2.0/HLS.L30.T13TGF.2021229T172441.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021230T172859.v2.0/HLS.S30.T13TGF.2021230T172859.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021235T172901.v2.0/HLS.S30.T13TGF.2021235T172901.v2.0.B04.tif',
's3://lp-prod-protected/HLSS30.020/HLS.S30.T13TGF.2021243T173859.v2.0/HLS.S30.T13TGF.2021243T173859.v2.0.B04.tif']
Currently, the utilities and packages used in Python to read in GeoTIFF/COG file do not recognize associated dates stored in the internal metadata. To account for the dates for each file we must create a time variable and add it as a dimension in our final time series xarray. We’ll create a function that extracts the date from the file link and create an xarray variable with a time array of datetime objects.
def time_index_from_filenames(file_links):
'''
Helper function to create a pandas DatetimeIndex
'''
return [datetime.strptime(f.split('.')[-5], '%Y%jT%H%M%S') for f in file_links]
= xr.Variable('time', time_index_from_filenames(s3_links)) time
We’ll now specify a chunk size to use that matches the internal tiling of HLS files. This will help improve performance.
=dict(band=1, x=512, y=512) chunks
Now, we will create our time series.
= xr.concat([rioxarray.open_rasterio(f, chunks=chunks).squeeze('band', drop=True) for f in s3_links], dim=time)
hls_ts_da hls_ts_da
<xarray.DataArray (time: 32, y: 3660, x: 3660)> dask.array<concatenate, shape=(32, 3660, 3660), dtype=int16, chunksize=(1, 512, 512), chunktype=numpy.ndarray> Coordinates: * x (x) float64 7e+05 7e+05 7e+05 ... 8.097e+05 8.097e+05 8.097e+05 * y (y) float64 4.6e+06 4.6e+06 4.6e+06 ... 4.49e+06 4.49e+06 spatial_ref int64 0 * time (time) datetime64[ns] 2021-05-13T17:24:06 ... 2021-08-31T17:... Attributes: _FillValue: -9999.0 scale_factor: 0.0001 add_offset: 0.0 long_name: Red
- time: 32
- y: 3660
- x: 3660
- dask.array<chunksize=(1, 512, 512), meta=np.ndarray>
Array Chunk Bytes 817.60 MiB 512.00 kiB Shape (32, 3660, 3660) (1, 512, 512) Count 8224 Tasks 2048 Chunks Type int16 numpy.ndarray - x(x)float647e+05 7e+05 ... 8.097e+05 8.097e+05
array([699975., 700005., 700035., ..., 809685., 809715., 809745.])
- y(y)float644.6e+06 4.6e+06 ... 4.49e+06
array([4600005., 4599975., 4599945., ..., 4490295., 4490265., 4490235.])
- spatial_ref()int640
- crs_wkt :
- PROJCS["UTM Zone 13, Northern Hemisphere",GEOGCS["Unknown datum based upon the WGS 84 ellipsoid",DATUM["Not_specified_based_on_WGS_84_spheroid",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
- semi_major_axis :
- 6378137.0
- semi_minor_axis :
- 6356752.314245179
- inverse_flattening :
- 298.257223563
- reference_ellipsoid_name :
- WGS 84
- longitude_of_prime_meridian :
- 0.0
- prime_meridian_name :
- Greenwich
- geographic_crs_name :
- Unknown datum based upon the WGS 84 ellipsoid
- horizontal_datum_name :
- Not_specified_based_on_WGS_84_spheroid
- projected_crs_name :
- UTM Zone 13, Northern Hemisphere
- grid_mapping_name :
- transverse_mercator
- latitude_of_projection_origin :
- 0.0
- longitude_of_central_meridian :
- -105.0
- false_easting :
- 500000.0
- false_northing :
- 0.0
- scale_factor_at_central_meridian :
- 0.9996
- spatial_ref :
- PROJCS["UTM Zone 13, Northern Hemisphere",GEOGCS["Unknown datum based upon the WGS 84 ellipsoid",DATUM["Not_specified_based_on_WGS_84_spheroid",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
- GeoTransform :
- 699960.0 30.0 0.0 4600020.0 0.0 -30.0
array(0)
- time(time)datetime64[ns]2021-05-13T17:24:06 ... 2021-08-...
array(['2021-05-13T17:24:06.000000000', '2021-05-13T17:38:59.000000000', '2021-05-20T17:30:21.000000000', '2021-05-20T17:28:59.000000000', '2021-05-25T17:29:01.000000000', '2021-06-04T17:29:01.000000000', '2021-06-05T17:30:29.000000000', '2021-06-07T17:39:01.000000000', '2021-06-12T17:39:09.000000000', '2021-06-14T17:24:22.000000000', '2021-06-14T17:29:01.000000000', '2021-06-22T17:39:09.000000000', '2021-07-04T17:29:01.000000000', '2021-07-07T17:30:37.000000000', '2021-07-09T17:28:59.000000000', '2021-07-12T17:39:09.000000000', '2021-07-17T17:39:11.000000000', '2021-07-19T17:28:59.000000000', '2021-07-22T17:39:09.000000000', '2021-07-23T17:30:42.000000000', '2021-07-27T17:39:11.000000000', '2021-07-29T17:28:59.000000000', '2021-08-03T17:29:01.000000000', '2021-08-06T17:39:11.000000000', '2021-08-08T17:30:49.000000000', '2021-08-08T17:28:59.000000000', '2021-08-11T17:39:09.000000000', '2021-08-16T17:39:11.000000000', '2021-08-17T17:24:41.000000000', '2021-08-18T17:28:59.000000000', '2021-08-23T17:29:01.000000000', '2021-08-31T17:38:59.000000000'], dtype='datetime64[ns]')
- _FillValue :
- -9999.0
- scale_factor :
- 0.0001
- add_offset :
- 0.0
- long_name :
- Red
Since we used the chunks
parameter while reading the data, the hls_ts_da
object is not read into memory yet. To do that we’ll use the load()
function.
Now, we’ll see what we have. Use hvplot
to plot our time series
= hls_ts_da.load() hls_ts_da_data
='x', y='y', rasterize=True, width=800, height=600, colorbar=True, cmap='fire').opts(clim=(hls_ts_da_data.values.min(), hls_ts_da_data.values.max())) hls_ts_da_data.hvplot.image(x
# Exit our context
__exit__() rio_env.
Concluding Remarks
The above exercise demonstrated how to perform in-region direct S3 bucket access for HLS cloud data assets. HLS cloud data assets are stored as Cloud Optimized GeoTIFFs, a format that has been the benifactor of data discovery and access advancements within the Python ecosystem. Knowing what the data storage format is (e.g., COG, netcdf4, or zarr store) and/or what data access protocol you’re using is critical in determining what Python data access method you will use. For COG data, rioxarray
package is often prefered due to is ability to bring the geospatial data format into an xarray
object. For netcdf4 files, the standard xarray
package incombination with s3fs
allow users to perform in-region direct access reads into an xarray object. Finally, if you are using OPeNDAP to connect to data, specialized packages like pydap
have been integrated into xarray
for streamline access directly to an xarray
object.
Resources
- Build time series from multiple GeoTIFF files
- Hvplot/Holoview Colormap
- https://git.earthdata.nasa.gov/projects/LPDUR/repos/lpdaac_cloud_data_access/browse
- https://git.earthdata.nasa.gov/projects/LPDUR/repos/hls-tutorial/browse
- Direct S3 Data Access - Rough PODAAC ECCO SSH Example
- Direct access to ECCO data in S3 (from us-west-2)
- Direct S3 Data Access with GDAL Virtual Raster Format (VRT)
- Direct S3 Data Access with rioxarray - Clipping Example