ICESat-2’s Nested Variables

This notebook (download) illustrates the use of icepyx for managing lists of available and wanted ICESat-2 data variables. The two use cases for variable management within your workflow are:

  1. During the data access process, whether that’s via order and download (e.g. via NSIDC DAAC) or remote (e.g. via the cloud).

  2. When reading in data to a Python object (whether from local files or the cloud).

A given ICESat-2 product may have over 200 variable + path combinations. icepyx includes a custom Variables module that is “aware” of the ATLAS sensor and how the ICESat-2 data products are stored. The module can be accessed independently, but is optimally used as a component of a Query object (Case 1) or Read object (Case 2).

This notebook illustrates in detail how the Variables module behaves using a Query data access example. However, module usage is analogous through an icepyx ICESat-2 Read object. More detailed example workflows specifically for the query and read tools within icepyx are available as separate Jupyter Notebooks.

Questions? Be sure to check out the FAQs throughout this notebook, indicated as italic headings.

Why do ICESat-2 products need a custom variable manager?

It can be confusing and cumbersome to comb through the 200+ variable and path combinations contained in ICESat-2 data products. The icepyx Variables module makes it easier for users to quickly find and extract the specific variables they would like to work with across multiple beams, keywords, and variables and provides reader-friendly formatting to browse variables. A future development goal for icepyx includes developing an interactive widget to further improve the user experience. For data read-in, additional tools are available to target specific beam characteristics (e.g. strong versus weak beams).

Some technical details about the Variables module

For those eager to push the limits or who want to know more implementation details…

The only required input to the Variables module is vartype. vartype has two acceptible string values, ‘order’ and ‘file’. If you use the module as shown in icepyx examples (namely through a Read or Query object), then this flag will be passed automatically. It simply tells the software how to generate the list of possible variable values - either by pinging NSIDC for a list of available variables (query) or from the user-supplied file (read).

Import packages, including icepyx

import icepyx as ipx
from pprint import pprint

Interacting with ICESat-2 Data Variables

Each variables instance (which is actually an associated Variables class object) contains two variable list attributes. One is the list of possible or available variables (avail attribute) and is unmutable, or unchangeable, as it is based on the input product specifications or files. The other is the list of variables you’d like to actually have (in your downloaded file or data object) from all the potential options (wanted attribute) and is updateable.

Thus, your avail list depends on your data source and whether you are accessing or reading data, while your wanted list may change for each analysis you are working on or depending on what variables you want to see.

The variables parameter has methods to:

  • get a list of all available variables, either available from the NSIDC or the file (avail() method).

  • append new variables to the wanted list (append() method).

  • remove variables from the wanted list (remove() method).

We’ll showcase the use of all of these methods and attributes below using an icepyx.Query object. Usage is identical in the case of an icepyx.Read object. More detailed example workflows specifically for the query and read tools within icepyx are available as separate Jupyter Notebooks.

Create a query object and log in to Earthdata

For this example, we’ll be working with a land ice product (ATL06) for an area along West Greenland (Disko Bay). A second option for an atmospheric product (ATL09) that uses profiles instead of the ground track (gt) categorization is also provided.

region_a = ipx.Query('ATL06',[-55, 68, -48, 71],['2019-02-22','2019-02-28'], \
                           start_time='00:00:00', end_time='23:59:59')
# Uncomment and run the code in this cell to use the second variable subsetting suite of examples,
# with the beam specifier containing "profile" instead of "gt#l"

# region_a = ipx.Query('ATL09',[-55, 68, -48, 71],['2019-02-22','2019-02-28'], \
#              start_time='00:00:00', end_time='23:59:59')
region_a.earthdata_login('icepyx_devteam','icepyx.dev@gmail.com')

ICESat-2 data variables

ICESat-2 data is natively stored in a nested file format called hdf5. Much like a directory-file system on a computer, each variable (file) has a unique path through the heirarchy (directories) within the file. Thus, some variables (e.g. 'latitude', 'longitude') have multiple paths (one for each of the six beams in most products).

Determine what variables are available

region_a.order_vars.avail will return a list of all valid path+variable strings.

region_a.order_vars.avail()

To increase readability, you can use built in functions to show the 200+ variable + path combinations as a dictionary where the keys are variable names and the values are the paths to that variable. region_a.order_vars.parse_var_list(region_a.order_vars.avail()) will return a dictionary of variable:paths key:value pairs.

region_a.order_vars.parse_var_list(region_a.order_vars.avail())

By passing the boolean options=True to the avail method, you can obtain lists of unique possible variable inputs (var_list inputs) and path subdirectory inputs (keyword_list and beam_list inputs) for your data product. These can be helpful for building your wanted variable list.

region_a.order_vars.avail(options=True)

Building your wanted variable list

Now that you know which variables and path components are available, you need to build a list of the ones you’d like included. There are several options for generating your initial list as well as modifying it, giving the user complete control.

The options for building your initial list are:

  1. Use a default list for the product (not yet fully implemented across all products. Have a default variable list for your field/product? Submit a pull request or post it as an issue on GitHub!)

  2. Provide a list of variable names

  3. Provide a list of profiles/beams or other path keywords, where “keywords” are simply the unique subdirectory names contained in the full variable paths of the product. A full list of available keywords for the product is displayed in the error message upon entering keyword_list=[''] into the append function (see below for an example) or by running region_a.order_vars.avail(options=True), as above.

Note: all products have a short list of “mandatory” variables/paths (containing spacecraft orientation and time information needed to convert the data’s delta_time to a readable datetime) that are automatically added to any built list. If you have any recommendations for other variables that should always be included (e.g. uncertainty information), please let us know!

Examples of using each method to build and modify your wanted variable list are below.

region_a.order_vars.wanted
region_a.order_vars.append(defaults=True)
pprint(region_a.order_vars.wanted)

The keywords available for this product are shown in the error message upon entering a blank keyword_list, as seen in the next cell.

region_a.order_vars.append(keyword_list=[''])

Modifying your wanted variable list

Generating and modifying your variable request list, which is stored in region_a.order_vars.wanted, is controlled by the append and remove functions that operate on region_a.order_vars.wanted. The input options to append are as follows (the full documentation for this function can be found by executing help(region_a.order_vars.append)).

  • defaults (default False) - include the default variable list for your product (not yet fully implemented for all products; please submit your default variable list for inclusion!)

  • var_list (default None) - list of variables (entered as strings)

  • beam_list (default None) - list of beams/profiles (entered as strings)

  • keyword_list (default None) - list of keywords (entered as strings); use keyword_list=[''] to obtain a list of available keywords

Similarly, the options for remove are:

  • all (default False) - reset region_a.order_vars.wanted to None

  • var_list (as above)

  • beam_list (as above)

  • keyword_list (as above)

region_a.order_vars.remove(all=True)
pprint(region_a.order_vars.wanted)

Examples (Overview)

Below are a series of examples to show how you can use append and remove to modify your wanted variable list. For clarity, region_a.order_vars.wanted is cleared at the start of many examples. However, multiple append and remove commands can be called in succession to build your wanted variable list (see Examples 3+).

There are two example tracks. The first is for land ice (ATL06) data that is separated into beams. The second is for atmospheric data (ATL09) that is separated into profiles. Both example tracks showcase the same functionality and are provided for users of both data types.


Example Track 1 (Land Ice - run with ATL06 dataset)

Example 1.1: choose variables

Add all latitude and longitude variables across all six beam groups. Note that the additional required variables for time and spacecraft orientation are included by default.

region_a.order_vars.append(var_list=['latitude','longitude'])
pprint(region_a.order_vars.wanted)

Example 1.2: specify beams and variable

Add latitude for only gt1l and gt2l

region_a.order_vars.remove(all=True)
pprint(region_a.order_vars.wanted)
var_dict = region_a.order_vars.append(beam_list=['gt1l', 'gt2l'], var_list=['latitude'])
pprint(region_a.order_vars.wanted)

Example 1.3: add/remove selected beams+variables

Add latitude for gt3l and remove it for gt2l

region_a.order_vars.append(beam_list=['gt3l'],var_list=['latitude'])
region_a.order_vars.remove(beam_list=['gt2l'], var_list=['latitude'])
pprint(region_a.order_vars.wanted)

Example 1.4: keyword_list

Add latitude and longitude for all beams and with keyword land_ice_segments

region_a.order_vars.append(var_list=['latitude', 'longitude'],keyword_list=['land_ice_segments'])
pprint(region_a.order_vars.wanted)

Example 1.5: target a specific variable + path

Remove gt1r/land_ice_segments/longitude (but keep gt1r/land_ice_segments/latitude)

region_a.order_vars.remove(beam_list=['gt1r'], var_list=['longitude'], keyword_list=['land_ice_segments'])
pprint(region_a.order_vars.wanted)

Example 1.6: add variables not specific to beams/profiles

Add rgt under orbit_info.

region_a.order_vars.append(keyword_list=['orbit_info'],var_list=['rgt'])
pprint(region_a.order_vars.wanted)

Example 1.7: add all variables+paths of a group

In addition to adding specific variables and paths, we can filter all variables with a specific keyword as well. Here, we add all variables under orbit_info. Note that paths already in region_a.order_vars.wanted, such as 'orbit_info/rgt', are not duplicated.

region_a.order_vars.append(keyword_list=['orbit_info'])
pprint(region_a.order_vars.wanted)

Example 1.8: add all possible values for variables+paths

Append all longitude paths and all variables/paths with keyword land_ice_segments.

Similarly to what is shown in Example 4, if you submit only one append call as region_a.order_vars.append(var_list=['longitude'], keyword_list=['land_ice_segments']) rather than the two append calls shown below, you will only add the variable longitude and only paths containing land_ice_segments, not ALL paths for longitude and ANY variables with land_ice_segments in their path.

region_a.order_vars.append(var_list=['longitude'])
region_a.order_vars.append(keyword_list=['land_ice_segments'])
pprint(region_a.order_vars.wanted)

Example 1.9: remove all variables+paths associated with a beam

Remove all paths for gt1l and gt3r

region_a.order_vars.remove(beam_list=['gt1l','gt3r'])
pprint(region_a.order_vars.wanted)

Example 1.10: generate a default list for the rest of the tutorial

Generate a reasonable variable list prior to download

region_a.order_vars.remove(all=True)
region_a.order_vars.append(defaults=True)
pprint(region_a.order_vars.wanted)

Example Track 2 (Atmosphere - run with ATL09 dataset commented out at the start of the notebook)

Example 2.1: choose variables

Add all latitude and longitude variables

region_a.order_vars.append(var_list=['latitude','longitude'])
pprint(region_a.order_vars.wanted)

Example 2.2: specify beams/profiles and variable

Add latitude for only profile_1 and profile_2

region_a.order_vars.remove(all=True)
pprint(region_a.order_vars.wanted)
var_dict = region_a.order_vars.append(beam_list=['profile_1','profile_2'], var_list=['latitude'])
pprint(region_a.order_vars.wanted)

Example 2.3: add/remove selected beams+variables

Add latitude for profile_3 and remove it for profile_2

region_a.order_vars.append(beam_list=['profile_3'],var_list=['latitude'])
region_a.order_vars.remove(beam_list=['profile_2'], var_list=['latitude'])
pprint(region_a.order_vars.wanted)

Example 2.4: keyword_list

Add latitude for all profiles and with keyword low_rate

region_a.order_vars.append(var_list=['latitude'],keyword_list=['low_rate'])
pprint(region_a.order_vars.wanted)

Example 2.5: target a specific variable + path

Remove 'profile_1/high_rate/latitude' (but keep 'profile_3/high_rate/latitude')

region_a.order_vars.remove(beam_list=['profile_1'], var_list=['latitude'], keyword_list=['high_rate'])
pprint(region_a.order_vars.wanted)

Example 2.6: add variables not specific to beams/profiles

Add rgt under orbit_info.

region_a.order_vars.append(keyword_list=['orbit_info'],var_list=['rgt'])
pprint(region_a.order_vars.wanted)

Example 2.7: add all variables+paths of a group

In addition to adding specific variables and paths, we can filter all variables with a specific keyword as well. Here, we add all variables under orbit_info. Note that paths already in region_a.order_vars.wanted, such as 'orbit_info/rgt', are not duplicated.

region_a.order_vars.append(keyword_list=['orbit_info'])
pprint(region_a.order_vars.wanted)

Example 2.8: add all possible values for variables+paths

Append all longitude paths and all variables/paths with keyword high_rate. Simlarly to what is shown in Example 4, if you submit only one append call as region_a.order_vars.append(var_list=['longitude'], keyword_list=['high_rate']) rather than the two append calls shown below, you will only add the variable longitude and only paths containing high_rate, not ALL paths for longitude and ANY variables with high_rate in their path.

region_a.order_vars.append(var_list=['longitude'])
region_a.order_vars.append(keyword_list=['high_rate'])
pprint(region_a.order_vars.wanted)

Example 2.9: remove all variables+paths associated with a profile

Remove all paths for profile_1 and profile_3

region_a.order_vars.remove(beam_list=['profile_1','profile_3'])
pprint(region_a.order_vars.wanted)

Example 2.10: generate a default list for the rest of the tutorial

Generate a reasonable variable list prior to download

region_a.order_vars.remove(all=True)
region_a.order_vars.append(defaults=True)
pprint(region_a.order_vars.wanted)

Using your wanted variable list

Now that you have your wanted variables list, you need to use it within your icepyx object (Query or Read) will automatically use it.

With a Query object

In order to have your wanted variable list included with your order, you must pass it as a keyword argument to the subsetparams() attribute or the order_granules() or download_granules() (which calls order_granules under the hood if you have not already placed your order) functions.

region_a.subsetparams(Coverage=region_a.order_vars.wanted)

Or, you can put the Coverage parameter directly into order_granules: region_a.order_granules(Coverage=region_a.order_vars.wanted)

However, then you cannot view your subset parameters (region_a.subsetparams) prior to submitting your order.

region_a.order_granules()# <-- you do not need to include the 'Coverage' kwarg to
                             # order if you have already included it in a call to subsetparams
region_a.download_granules('/home/jovyan/icepyx/dev-notebooks/vardata') # <-- you do not need to include the 'Coverage' kwarg to
                             # download if you have already submitted it with your order

With a Read object

Calling the load() method on your Read object will automatically look for your wanted variable list and use it. Please see the read-in example Jupyter Notebook for a complete example of this usage.

Credits

  • based on the subsetting notebook by: Jessica Scheick and Zheng Liu