Subsetting ICESat-2 Data
This notebook (download) illustrates the use of icepyx for subsetting ICESat-2 data ordered through the NASA NSIDC DAAC. We’ll show how to find out what subsetting options are available and how to specify the subsetting options for your order.
For more information on using icepyx to find, order, and download data, see our complimentary ICESat-2 Data Access Notebook.
Questions? Be sure to check out the FAQs throughout this notebook, indicated as italic headings.
What is SUBSETTING anyway?
Anyone who’s worked with geospatial data has probably encountered subsetting. Typically, we search for data wherever it is stored and download the chunks (granules, scenes, passes, swaths, etc.) that contain something we are interested in. Then, we have to extract from each chunk the pieces we actually want to analyze. Those pieces might be geospatial (i.e., an area of interest) or temporal (i.e., certain months of a time series). This process of extracting the data we are going to use is called subsetting.
In the case of ICESat-2 data from the NASA NSIDC DAAC, we can do this subsetting step on the data prior to download, reducing our number of data processing steps and resulting in smaller, faster downloads and storage.
Import packages, including icepyx
import icepyx as ipx
import numpy as np
import xarray as xr
import pandas as pd
import h5py
import os,json
from pprint import pprint
Create a query object and log in to Earthdata
For this example, we’ll be working with a sea ice product (ATL07) for an area along West Greenland (Disko Bay).
region_a = ipx.Query('ATL07',[-55, 68, -48, 71],['2019-02-22','2019-02-28'], \
start_time='00:00:00', end_time='23:59:59')
Discover customization options
You can see the customization options for a given product by calling show_custom_options(). The options are presented as a dictionary of key-value pairs. Three options are currently available:
bboxSubset: bounding box subsettingshapeSubset: polygon subsettingtemporalSubset: temporal subsetting
outputFormats indicates that only HDF5 is a supported output format. variableSubset, concatenate, and reproject are currently unavailable (set to false).
Note that these subsetting options are available for all L2-L3A products. Subsetting options are not currently supported for L3B or Quick Looks products.
region_a.show_custom_options()
By default, spatial and temporal subsetting based on your initial inputs is applied to your order. This will be true no matter if you use the order_granules() function or the download_granules() function (which calls .order_granules under() the hood if you have not already placed your order). If you don’t want your order to be spatially subset, you can use the subset=False argument in either .order_granules() or .download_granules().
Additional subsetting options must be specified as keyword arguments to the order/download functions.
Why do I have to provide spatial bounds to icepyx even if I don’t use them to subset my data order?
Because they’re still needed for the granule level search. Spatial inputs are usually required for any data search, on any platform, even if your search parameters cover the entire globe.
The spatial information you provide is used to search the data repository and determine which granules might contain data over your area of interest.
When you use that spatial information for subsetting, it’s actually asking the NASA Harmony subsetter to extract the appropriate data from each granule.
Thus, even if you set subset=False and download entire granules, you still need to provide some inputs on what geographic area you’d like data for.
About data variables in a query object
A given ICESat-2 product may have over 200 variable + path combinations.
icepyx includes a custom Variables module that is “aware” of the ATLAS sensor and how the ICESat-2 data products are stored.
The ICESat-2 Data Variables Example provides a detailed set of examples on how to use icepyx’s built in Variables module.
While variable subsetting is not supported for ICESat-2 data, you can refer to the aforementioned Jupyter Notebook to learn how to interact with ICESat-2 variables after requesting your data.
Why not just download all the data and subset locally? What if I need more granules?
Taking advantage of the NASA Harmony subsetting service is a great way to reduce your download size and thus your download time and the amount of storage required, especially if you’re storing your data locally during analysis. By downloading your data using icepyx, it is easy to go back and get additional data with the same, similar, or different parameters. Related tools (e.g., captoolkit) will let you easily merge files if you’re uncomfortable merging them during read-in for processing.
short_name = 'ATL06'
spatial_extent = './supporting_files/simple_test_poly.gpkg'
date_range = ['2019-10-01','2019-10-05']
region_a = ipx.Query(short_name, spatial_extent
,
cycles=['03','04','05','06'], tracks=['0849','0902'])
print(region_a.product)
print(region_a.product_version)
print(region_a.cycles)
print(region_a.tracks)
print(region_a.spatial_extent)
region_a.visualize_spatial_extent()
We can also print a list of available granules for our query:
region_a.avail_granules(cloud=True)
Applying granule subsetting to your order and downloading the results
order = region_a.order_granules(subset=True)
order
Checking an order status
order.status()
Downloading subsetted granules
files = order.download_granules("./data")
Why does the subsetter say no matching data was found?
Sometimes, granules (“files”) returned in our initial search end up not containing any data in our specified area of interest. This is because the initial search is completed using summary metadata for a granule. You’ve likely encountered this before when viewing available imagery online: your spatial search turns up a bunch of images with only a few border or corner pixels, maybe even in no data regions, in your area of interest. Thus, when you go to extract the data from the area you want (i.e., spatially subset it), you don’t get any usable data from that image.
Handling large orders
By default, the Harmony subsetter will only process the first 300 granules for large orders, placing them into a “previewing” status. This allows users to check that results look correct. Once the job has completed its preview, which includes the first 100 granules, then we can resume the order if we are satisfied that our request is correct. The following guidance is commented out by default but can be uncommented to test this large order behavior.
# short_name = 'ATL06'
# spatial_extent = './supporting_files/simple_test_poly.gpkg'
# date_range = ['2018-10-01','2020-02-05']
# region_a = ipx.Query(short_name, spatial_extent, date_range)
# order = region_a.order_granules(subset=True)
# order
This order includes 311 input granules, and therefore it is automatically placed into a previewing state. We can inspect the status of this order and wait until it moves to a “paused” state, once the initial 100 granules are complete.
# order.status()
If we are satisfied with the order, then we can resume processing:
# order.resume()
# order
Working with the downloaded data
Now that the subsetted files have been downloaded, we can now work with them using the icepyx Read class. See the Reading ICESat-2 Data in for Analysis notebook for more information.
Credits
notebook contributors: Zheng Liu, Jessica Scheick, Amy Steiker, and Theresa Andersen
some source material: NSIDC Data Access Notebook by Amy Steiker and Bruce Wallin