MATLAB Access Single NASA EarthData L2 NetCDF

This tutorial was written by Erin Robinson, Luis Lopez (NASA Openscapes) and Cedric Wannaz, Lisa Kempler (Mathworks). Also available as a 9-minute Video walk-through.

Launch MATLAB in 2i2c JupyterHub

You’re able to use MATLAB in the 2i2c JupyterHub; you are able to bring your own license. If you’d like to access MATLAB from the Amazon Web Services Console, see details at the bottom of this page.

screenshot of the JupyterHub launch page that has Python, R, and MATLAB as selections, and MATLAB is selected.

Select MATLAB from the Server Options and press Start

screenshot of JupyterHub interface launch page, with MATLAB kernel options

Launch MATLAB from the JupyterHub interface by double-clicking

screenshot of MATLAB login page prompting your email address

Enter your MATLAB login information. NOTE: You will also need to be able to access your MATLAB account via MATLAB Online.

Credentials

Direct S3 access is achieved by passing NASA supplied temporary credentials to AWS so we can interact with S3 objects from applicable EarthData Cloud buckets. For now, each NASA DAAC has different AWS credentials endpoints. Below are some of the credential endpoints to various DAACs:

You will need your EarthData login to access these links. Create an account here: https://urs.earthdata.nasa.gov/users/new.

Prerequisites

If you’d rather watch these instructions, there is a short demo video here: https://drive.google.com/drive/folders/1en_SHtjtF4pUt4WU0SzyV16SEegjiwrT?usp=sharing

  1. Save the following function in a text file called loadAWSCredentials.m to your MATLAB working folder or to any folder that is on MATLAB search path.
function s3Credentials = loadAWSCredentials(daacCredentialsEndpoint, login, password)
    if nargin < 2 || isempty(login)
        login = getenv('EARTHDATA_LOGIN') ;
    end
    if nargin < 3 || isempty(password)
        password = getenv('EARTHDATA_PASSWORD') ;
    end

    % Get S3 credentials from EarthData
    opts = weboptions('ContentType', 'json', 'HeaderFields', ...
        {'Authorization', ['Basic ',matlab.net.base64encode([login,':',password])]});
    s3Credentials = webread(daacCredentialsEndpoint, opts) ;

    % Set relevant environment variables with AWS credentials/region
    setenv('AWS_ACCESS_KEY_ID', s3Credentials.accessKeyId) ;
    setenv('AWS_SECRET_ACCESS_KEY', s3Credentials.secretAccessKey) ;
    setenv('AWS_SESSION_TOKEN',  s3Credentials.sessionToken) ;
    setenv('AWS_DEFAULT_REGION', 'us-west-2') ;
end
  1. Define environment variables that specify your EarthData credentials by executing the following (updated with your credentials):
setenv("EARTHDATA_LOGIN","your_earthdata_login");
setenv("EARTHDATA_PASSWORD","your_earthdata_password");

This can be done manually in the command window, in your MATLAB startup script, or in any MATLAB function/script that is executed before calling loadAWSCredentials. We recommend not to do it from your project main script, so you can share it without divulging your credentials.

  1. Search NASA EarthData and find the S3 link you want to access.

Accessing data

MATLAB HDF5 interfaces can access netCDF4 files stored in S3/Azure, while the netCDF interfaces cannot. The example below shows how to access a netCDF file using the HDF5 interfaces. SEe also this video: MATLAB and HDF5: Compression, Cloud, and Community – Ellen Johnson, MathWorks.

MATLAB provides two interfaces for accessing HDF5 datasets:

  • High-level access functions make it easy to read and view data in an HDF5 file or write a variable from the MATLAB® workspace into an HDF5 file
  • Low-level functions in the HDF5 library packages provide direct access to the more than 300 functions in the HDF5 C library from the HDF Group.

Low-level functions allow you to access advanced HDF5 functionality and bypasses I/O overhead in the high-level interface, but has a substantially higher learning curve unless already familiar with the HDF5 C API.

Example

  1. Find the relevant DAAC/endpoint for the dataset that you want to access. For the following dataset:
s3://podaac-ops-cumulus-protected/MODIS_A-JPL-L2P-v2019.0/20100619062008-JPL-L2P_GHRSST-SSTskin-MODIS_A-N-v02.0-fv01.0.nc

The relevant DAAC is PODAAC and the relevant endpoint is:

https://archive.podaac.earthdata.nasa.gov/s3credentials
  1. Set AWS credentials using the loadAWSCredentials function, passing the relevant endpoint:
daacCredentialsEndpoint = "https://archive.podaac.earthdata.nasa.gov/s3credentials";
loadAWSCredentials(daacCredentialsEndpoint);

These credentials are temporary; they expire every 30 minutes or so and must be refreshed periodically.

  1. Access data Define path of relevant file and dataset:
FILE_NAME = "s3://podaac-ops-cumulus-protected/MODIS_A-JPL-L2P-v2019.0/20100619062008-JPL-L2P_GHRSST-SSTskin-MODIS_A-N-v02.0-fv01.0.nc";
DATASET_NAME = "sea_surface_temperature_4um";
DATASET_PATH = "/" + DATASET_NAME ;

First, check that you can access the dataset:

h5disp(FILE_NAME);

If credentials and paths are correct, h5disp will display the structure of the data file.

Using MATLAB high-level HDF5 interface

units = h5readatt(FILE_NAME,DATASET_PATH,"units");
fill_value = h5readatt(FILE_NAME,DATASET_PATH,"_FillValue");
name = h5readatt(FILE_NAME,DATASET_PATH,"long_name");
data = h5read(FILE_NAME,DATASET_PATH);
data(data == fill_value) = NaN;
lat = h5read(FILE_NAME,"/lat");
lon = h5read(FILE_NAME,"/lon");

Build a map using contour:

contour(lon,lat,data);
title(sprintf("%s [%s]",name,units));

a square map with latitude and longitude and titled sea surface temperature in kelvin.

MATLAB example map

Using MATLAB low-level HDF5 interface

% Open file and root group
file_id = H5F.open(FILE_NAME,"H5F_ACC_RDONLY","H5P_DEFAULT");
root_id = H5G.open(file_id,"/") ;

% Open/read sea surface temperature (SST) dataset
data_id = H5D.open(file_id,DATASET_PATH);
data=H5D.read(data_id,"H5T_NATIVE_DOUBLE","H5S_ALL","H5S_ALL","H5P_DEFAULT");

% Open/read/close SST units attribute
attr_id = H5A.open_by_name(root_id,DATASET_NAME,"units");
units = H5A.read(attr_id,"H5ML_DEFAULT");
H5A.close(attr_id);

% Open/read/close SST long_name attribute
attr_id = H5A.open_by_name(root_id,DATASET_NAME,"long_name");
long_name = H5A.read(attr_id,"H5ML_DEFAULT");
H5A.close(attr_id);

% Open/read/close SST _FillValue attribute
attr_id = H5A.open_by_name(root_id,DATASET_NAME,"_FillValue");
fillvalue = H5A.read(attr_id,"H5T_NATIVE_DOUBLE");
H5A.close(attr_id);

% Close SST dataset and root group
H5D.close(data_id);
H5G.close(root_id);

% Open/read/close latitude dataset
lat_id = H5D.open(file_id,"lat");
lat = H5D.read(lat_id,"H5T_NATIVE_DOUBLE","H5S_ALL","H5S_ALL","H5P_DEFAULT");
H5D.close(lat_id);

% Open/read/close longitude dataset
lon_id = H5D.open(file_id,"lon");
lon = H5D.read(lon_id,"H5T_NATIVE_DOUBLE","H5S_ALL","H5S_ALL","H5P_DEFAULT");
H5D.close(lon_id);

% Close file
H5F.close(file_id);

% Replace fill value with NaN
data(data==fillvalue) = NaN;

% Plot contour map
contour(lon,lat,data);
title(sprintf("%s [%s]",long_name{1},units{1}));

Launch with AWS Console

An alternative is to launch with the AWS Console. This chapter is for research teams currently working in MATLAB with NASA EarthData and wanting to take advantage of doing analysis in the Cloud. Our initial focus is on Amazon Web Services (AWS). For general, background on the Cloud and NASA Earthdata’s migration to the Cloud, checkout earlier chapters of the cookbook [TO DO: Add a link and specific reference].

Prerequisites

  1. MATLAB License - You must have access to a MATLAB license that allows for access to the cloud. To check your license, in the MATLAB command line enter:

    ver -support
  2. AWS Account - For the current experiments, we will access the AWS console, so you need an AWS account. It is free to set up, but will require a credit card linked to the account. https://aws.amazon.com/

  3. AWS Region: We are working in AWS US-West-2 because NASA Earthdata is hosted in US-West-2. If there is an option to pick an AWS region, pick US-West-2.

  4. GitHub Account - If you want to use GitHub as part of your workflow and don’t already have an account, create a GitHub account: https://github.com/

  5. NASA Earthdata login - Create an account here: https://urs.earthdata.nasa.gov/users/new

  6. Microsoft Remote Desktop - When we spin up the AWS virtual machine with MATLAB, we will need to have a remote desktop option. These how-to’s were done on a mac and we tested the free Microsoft Remote Desktop. Download for macs

Prerequisite: AWS EC2 Key Pair

This isn’t included in the numbered prereq list above because it is a bit more involved than creating an account or knowing what region we work in. You need to create an SSH Key Pair in the region you want to work in. This is a one-time step that you won’t need to do each time you launch the stack.

  1. Log in to AWS

  2. In the search box enter, ‘Create AWS Key Pair.’ Under features choose ‘Key Pairs’ (AWS help)
    AWS search for key pair

  3. Check that the top bar says, ‘Oregon’ (AKA - US-West-2) and if not, click that down arrow to choose US-West-2. Then click the orange ‘Create key pair’ button.


    Troubleshooting tip: If you create the key outside of the region you want to work in, it will not show up when you launch the stack below.

  4. On the form - give your key a name, choose the RSA and PEM options (these should be the defaults.)

Ok - now we are ready to start!

Creating the AWS Stack with MATLAB

Note: The first time I launched the AWS Stack it seemed to take for-ever. I thought “how could I possibly need to do this every time I want to use MATLAB in the cloud?” It does speed up, eventually it get’s a bit faster. I also have learned to plan a bit better - if I know I want to do some work, I get the launched, have coffee, and when I come back it’s ready to roll.

  1. From this MATLAB GitHub page click the release for 2022a under deployment steps.

  2. This brings up a list of MATLAB on Amazon Web Services (Linux VM). Choose & click the ‘launch stack’ link for US-West-2.

  3. This opens the ‘Quick create stack’ form based on the MATLAB template. That means that when you launch this stack it will come with MATLAB on the desktop. Fill out the form to create the AWS stack:

    1. Give the stack a name like ‘matlab-test’

    2. Keep the pre-filled options the same for now.

    3. Remote Access:

      1. “Allow Connections From:” You will need to know your IP address. You can google, “what’s my IP address?”

      2. Enter your IP address followed by a /32 like this -> [my.IP.address/32]

      3. In the SSH Key Pair - the key pair you created above should show up in the drop down. If it doesn’t show up, see the troubleshooting tip.

      4. Pick a remote password. This is not your AWS password or your GitHub password, this is the password that you will use to login with the microsoft remote desktop (username: ubuntu)

    4. Network configuration

      1. There is one VPC option - choose that

      2. For subnet - I pick the first one and it works. So pick the first option.

    5. Autoshutdown hasn’t worked for me so far, so for now I leave this set as never and delete the stack when I am finished.

    6. Check the box that “I acknowledge that AWS CloudFormation might create IAM resources.”

    7. Click ‘Create stack’

    8. Wait…. [~ 10 minutes]

  4. You can check the status by clicking the refresh button on the right corner

Launch the AWS Stack with Microsoft Remote Desktop

  1. Once the stack is created it will say ‘Create_complete’ on the left side.

  2. Click the outputs tab and copy the value text. It will start with ‘ec2-…’

  3. Open Microsoft Remote Desktop

    1. Click the + to add a PC

    2. Paste the value text as the PC Name

    3. Click on the grey box of your new computer in the remote desktop window

    4. A login will pop up

      1. Username is ubuntu

      2. Password is the password you set in 3.3.3 above in the section on ‘Creating AWS Stack’

    5. A certificate message will pop up - say ok

    6. The desktop will launch

  4. Wait … [~2 mins]

Open MATLAB on Remote Desktop

  1. Click the MATLAB icon on the remote desktop
  2. Wait … [~4 mins]
  3. Login with your MATLAB credentials
  4. You are in!

Shutdown your AWS Stack

After each session you need to turn off the AWS Stack. If you forget this step and leave it running it is like keeping a computer on for the month. For the large instance it costs $0.5/day so it’s a few dollars a month.

  1. Go back to AWS

  2. Search for stack

  3. Click on the name of your stack

  4. Click ‘Delete’

  5. Confirm the delete