matlab

NASA EarthData and the AWS Cloud for Matlab Users

This chapter is for research teams currently working in Matlab with NASA EarthData and wanting to take advantage of doing analysis in the Cloud. Our initial focus is on Amazon Web Services (AWS). For general, background on the Cloud and NASA Earthdata’s migration to the Cloud, checkout earlier chapters of the cookbook [TO DO: Add a link and specific reference].

Prerequisites

  1. Matlab License - You must have access to a Matlab license that allows for access to the cloud. To check your license, in the matlab command line enter:

    ver -support
  2. AWS Account - For the current experiments, we will access the AWS console, so you need an AWS account. It is free to set up, but will require a credit card linked to the account. https://aws.amazon.com/

  3. AWS Region: We are working in AWS US-West-2 because NASA Earthdata is hosted in US-West-2. If there is an option to pick an AWS region, pick US-West-2.

  4. Github Account - If you want to use Github as part of your workflow and don’t already have an account, create a Github account: https://github.com/

  5. NASA Earthdata login - Create an account here: https://urs.earthdata.nasa.gov/users/new

  6. Microsoft Remote Desktop - When we spin up the AWS virtual machine with Matlab, we will need to have a remote desktop option. These how-to’s were done on a mac and we tested the free Microsoft Remote Desktop. Download for macs

Prerequisite: AWS EC2 Key Pair

This isn’t included in the numbered prereq list above because it is a bit more involved than creating an account or knowing what region we work in. You need to create an SSH Key Pair in the region you want to work in. This is a one-time step that you won’t need to do each time you launch the stack.

  1. Log in to AWS

  2. In the search box enter, ‘Create AWS Key Pair.’ Under features choose ‘Key Pairs’ (AWS help)
    AWS search for key pair

  3. Check that the top bar says, ‘Oregon’ (AKA - US-West-2) and if not, click that down arrow to choose US-West-2. Then click the orange ‘Create key pair’ button.


    Troubleshooting tip: If you create the key outside of the region you want to work in, it will not show up when you launch the stack below.

  4. On the form - give your key a name, choose the RSA and PEM options (these should be the defaults.)

Ok - now we are ready to start!

Creating the AWS Stack with Matlab

Note: The first time I launched the AWS Stack it seemed to take for-ever. I thought “how could I possibly need to do this every time I want to use Matlab in the cloud?” It does speed up, eventually it get’s a bit faster. I also have learned to plan a bit better - if I know I want to do some work, I get the launched, have coffee, and when I come back it’s ready to roll.

  1. From this Matlab Github page click the release for 2022a under deployment steps.

  2. This brings up a list of Matlab on Amazon Web Services (Linux VM). Choose & click the ‘launch stack’ link for US-West-2.

  3. This opens the ‘Quick create stack’ form based on the Matlab template. That means that when you launch this stack it will come with Matlab on the desktop. Fill out the form to create the AWS stack:

    1. Give the stack a name like ‘matlab-test’

    2. Keep the pre-filled options the same for now.

    3. Remote Access:

      1. “Allow Connections From:” You will need to know your IP address. You can google, “what’s my IP address?”

      2. Enter your IP address followed by a /32 like this -> [my.IP.address/32]

      3. In the SSH Key Pair - the key pair you created above should show up in the drop down. If it doesn’t show up, see the troubleshooting tip.

      4. Pick a remote password. This is not your AWS password or your Github password, this is the password that you will use to login with the microsoft remote desktop (username: ubuntu)

    4. Network configuration

      1. There is one VPC option - choose that

      2. For subnet - I pick the first one and it works. So pick the first option.

    5. Autoshutdown hasn’t worked for me so far, so for now I leave this set as never and delete the stack when I am finished.

    6. Check the box that “I acknowledge that AWS CloudFormation might create IAM resources.”

    7. Click ‘Create stack’

    8. Wait…. [~ 10 minutes]

  4. You can check the status by clicking the refresh button on the right corner

Launch the AWS Stack with Microsoft Remote Desktop

  1. Once the stack is created it will say ‘Create_complete’ on the left side.

  2. Click the outputs tab and copy the value text. It will start with ‘ec2-…’

  3. Open Microsoft Remote Desktop

    1. Click the + to add a PC

    2. Paste the value text as the PC Name

    3. Click on the grey box of your new computer in the remote desktop window

    4. A login will pop up

      1. Username is ubuntu

      2. Password is the password you set in 3.3.3 above in the section on ‘Creating AWS Stack’

    5. A certificate message will pop up - say ok

    6. The desktop will launch

  4. Wait … [~2 mins]

Open Matlab on Remote Desktop

  1. Click the Matlab icon on the remote desktop
  2. Wait … [~4 mins]
  3. Login with your Matlab credentials
  4. You are in!

Access Single NASA EarthData L2 Netcdf (EXPERIMENTAL)

The general approach to access NASA Earthdata follows the Python tutorials. Direct S3 access is achieved by passing NASA supplied temporary credentials to AWS so we can interact with S3 objects from applicable Earthdata Cloud buckets. For now, each NASA DAAC has different AWS credentials endpoints. Below are some of the credential endpoints to various DAACs:

You will need your EarthData login to access these links.

  1. Search NASA Earthdata and find the S3 link you want to access [LINK TO TUTORIAL]

  2. Set AWS Environment Variables - Go to the link above for the Access key, secret access key and token.

  3. Copy and past those variables into the code below

    setenv('AWS_ACCESS_KEY_ID', 'REPLACE WITH PODAAC ACCESS KEY');

    setenv('AWS_SECRET_ACCESS_KEY', 'REPLACE WITH PODAAC SECRET KEY');

    setenv('AWS_SESSION_TOKEN', 'REPLACE WITH PODAAC SESSION TOKEN');

    setenv('AWS_DEFAULT_REGION', 'us-west-2');

    NOTE: These expire ever 30 mins or so and this step needs to be done. We are searching for a better method.

  4. In Matlab on AWS only the HDF commands work

    FILE_NAME = 's3://podaac-ops-cumulus-protected/MODIS_A-JPL-L2P-v2019.0/20100619062008-JPL-L2P_GHRSST-SSTskin-MODIS_A-N-v02.0-fv01.0.nc';

    file_id = H5F.open(FILE_NAME, 'H5F_ACC_RDONLY', 'H5P_DEFAULT');

  5. This code works to read the file - Copy each section into the Matlab command
    NOTE: We need a way to find all variables in a given netcdf file. I used the python code to find variables and then brought it here.

    DATAFIELD_NAME='sea_surface_temperature_4um';

    data_id = H5D.open(file_id, DATAFIELD_NAME);

    data=H5D.read(data_id,'H5T_NATIVE_DOUBLE', 'H5S_ALL', 'H5S_ALL', 'H5P_DEFAULT');

    % Read units.

    ATTRIBUTE = 'units';

    attr_id = H5A.open_name(data_id, ATTRIBUTE);

    units = H5A.read(attr_id, 'H5ML_DEFAULT');

    H5A.close(attr_id);

    % Read long_name.

    ATTRIBUTE = 'long_name';

    attr_id = H5A.open_name (data_id, ATTRIBUTE);

    long_name=H5A.read (attr_id, 'H5ML_DEFAULT');

    H5A.close(attr_id);

    % Read the fill value.

    ATTRIBUTE = '_FillValue';

    attr_id = H5A.open_name(data_id, ATTRIBUTE);

    fillvalue=H5A.read(attr_id, 'H5T_NATIVE_DOUBLE');

    H5A.close(attr_id);

    % Read latitude data.

    LAT_NAME='lat';

    lat_id=H5D.open(file_id, LAT_NAME);

    lat=H5D.read(lat_id, 'H5T_NATIVE_DOUBLE', 'H5S_ALL', 'H5S_ALL', 'H5P_DEFAULT');

    % Read longitude data.

    LON_NAME='lon';

    lon_id=H5D.open(file_id, LON_NAME);

    lon=H5D.read(lon_id, 'H5T_NATIVE_DOUBLE', 'H5S_ALL', 'H5S_ALL', 'H5P_DEFAULT');

    % Close and release resources.

    H5D.close(data_id);

    H5D.close(lat_id);

    H5D.close(lon_id);

    H5F.close(file_id);

    % Replace the fill value with NaN.

    data(data==fillvalue) = NaN;

  6. Create an image

    contour(data);

Shutdown your AWS Stack

After each session you need to turn off the AWS Stack. If you forget this step and leave it running it is like keeping a computer on for the month. For the large instance it costs $0.5/day so it’s a few dollars a month.

  1. Go back to AWS

  2. Search for stack

  3. Click on the name of your stack

  4. Click ‘Delete’

  5. Confirm the delete

To do:

  • Add Github & Matlab project instructions

  • Alternative experiments to this stack - Octave in the 2i2c Jupyter hub or Matlab in the 2i2c Jupyter hub.