{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "16806722-f5bb-4063-bd4b-60c8b0d24d2a",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "# QUEST Example: Finding Argo and ICESat-2 data\n",
    "\n",
    "In this notebook, we are going to find Argo and ICESat-2 data over a region of the Pacific Ocean. Normally, we would require multiple data portals or Python packages to accomplish this. However, thanks to the [QUEST (Query, Unify, Explore SpatioTemporal) module](https://icepyx.readthedocs.io/en/latest/contributing/quest-available-datasets.html), we can use icepyx to find both!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed25d839-4114-41db-9166-8c027368686c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Basic packages\n",
    "import geopandas as gpd\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "from pprint import pprint\n",
    "\n",
    "# icepyx and QUEST\n",
    "import icepyx as ipx"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c35f5df-b4fb-4a36-8d6f-d20f1552767a",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "## Define the Quest Object\n",
    "\n",
    "QUEST builds off of the general querying process originally designed for ICESat-2, but makes it applicable to other datasets.\n",
    "\n",
    "Just like the ICESat-2 Query object, we begin by defining our Quest object. We provide the following bounding parameters:\n",
    "* `spatial_extent`: Data is constrained to the given box over the Pacific Ocean.\n",
    "* `date_range`: Only grab data from April 18-19, 2022 (to keep download sizes small for this example)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5d0546d-f0b8-475d-9fd4-62ace696e316",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Spatial bounds, given as SW/NE corners\n",
    "spatial_extent = [-154, 30, -143, 37]\n",
    "\n",
    "# Start and end dates, in YYYY-MM-DD format\n",
    "date_range = ['2022-04-18', '2022-04-19']\n",
    "\n",
    "# Initialize the QUEST object\n",
    "reg_a = ipx.Quest(spatial_extent=spatial_extent, date_range=date_range)\n",
    "\n",
    "print(reg_a)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8732bf56-1d44-4182-83f7-4303a87d231a",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "Notice that we have defined our spatial and temporal domains, but we do not have any datasets in our QUEST object. The next section leads us through that process."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1598bbca-3dcb-4b63-aeb1-81c27d92a1a2",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "## Getting the data\n",
    "\n",
    "Let's first query the ICESat-2 data. If we want to extract information about the water column, the ATL03 product is likely the desired choice.\n",
    "* `short_name`: ATL03"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "309a7b26-cfc3-46fc-a683-43e154412074",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# ICESat-2 product\n",
    "short_name = 'ATL03'\n",
    "\n",
    "# Add ICESat-2 to QUEST datasets\n",
    "reg_a.add_icesat2(product=short_name)\n",
    "print(reg_a)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad4bbcfe-3199-4a28-8739-c930d1572538",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "Let's see the available files over this region."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a2b4e56f-ceff-45e7-b52c-e7725dc6c812",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "pprint(reg_a.datasets['icesat2'].avail_granules(ids=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a081854-dae4-4e99-a550-02c02a71b6de",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "Note the ICESat-2 functions shown here are the same as those used for direct icepyx queries. The user is referred to other [example workbooks](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access.html) for detailed explanations about icepyx functionality.\n",
    "\n",
    "Accessing ICESat-2 data requires Earthdata login credentials. When running the `download_all()` function below, an authentication check will be passed when attempting to download the ICESat-2 files."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8264515a-00f1-4f57-b927-668a71294079",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "Now let's grab Argo data using the same constraints. This is as simple as using the below function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c857fdcc-e271-4960-86a9-02f693cc13fe",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Add argo to the desired QUEST datasets\n",
    "reg_a.add_argo()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bade19e-5939-410a-ad54-363636289082",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "When accessing Argo data, the variables of interest will be organized as vertical profiles as a function of pressure. By default, only temperature is queried, so the user should supply a list of desired parameters using the code below. The user may also limit the pressure range of the returned data by passing `presRange=\"0,200\"`.\n",
    "\n",
    "*Note: Our example shows only physical Argo float parameters, but the process is identical for including BGC float parameters.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6739c3aa-1a88-4d8e-9fd8-479528c20e97",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Customized variable query to retrieve salinity instead of temperature\n",
    "reg_a.add_argo(params=['salinity'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d06436c-2271-4229-8196-9f5180975ab1",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "Additionally, a user may view or update the list of requested Argo and Argo-BGC parameters at any time through `reg_a.datasets['argo'].params`. If a user submits an invalid parameter (\"temp\" instead of \"temperature\", for example), an `AssertionError` will be raised. `reg_a.datasets['argo'].presRange` behaves anologously for limiting the pressure range of Argo data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e34756b8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# update the list of argo parameters\n",
    "reg_a.datasets['argo'].params = ['temperature','salinity']\n",
    "\n",
    "# show the current list\n",
    "reg_a.datasets['argo'].params"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "453900c1-cd62-40c9-820c-0615f63f17f5",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "As for ICESat-2 data, the user can interact directly with the Argo data object (`reg_a.datasets['argo']`) to search or download data outside of the `Quest.search_all()` and `Quest.download_all()` functionality shown below.\n",
    "\n",
    "The approach to directly search or download Argo data is to use `reg_a.datasets['argo'].search_data()`, and `reg_a.datasets['argo'].download()`. In both cases, the existing parameters and pressure ranges are used unless the user passes new `params` and/or `presRange` kwargs, respectively, which will directly update those values (stored attributes)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f55be4e-d261-49c1-ac14-e19d8e0ff828",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "With our current setup, let's see what Argo parameters we will get."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "435a1243",
   "metadata": {},
   "outputs": [],
   "source": [
    "# see what argo parameters will be searched for or downloaded\n",
    "reg_a.datasets['argo'].params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c15675df",
   "metadata": {},
   "outputs": [],
   "source": [
    "reg_a.datasets['argo'].search_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70d36566-0d3c-4781-a199-09bb11dad975",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "Now we can access the data for both Argo and ICESat-2! The below function will do this for us.\n",
    "\n",
    "**Important**: The Argo data will be compiled into a Pandas DataFrame, which must be manually saved by the user as demonstrated below. The ICESat-2 data is saved as processed HDF-5 files to the directory provided."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a818c5d7-d69a-4aad-90a2-bc670a54c3a7",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "path = './quest/downloaded-data/'\n",
    "\n",
    "# Access Argo and ICESat-2 data simultaneously\n",
    "reg_a.download_all(path=path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad29285e-d161-46ea-8a57-95891fa2b237",
   "metadata": {
    "tags": [],
    "user_expressions": []
   },
   "source": [
    "We now have one available Argo profile, containing `temperature` and `pressure`, in a Pandas DataFrame. BGC Argo is also available through QUEST, so we could add more variables to this list.\n",
    "\n",
    "If the user wishes to add more profiles, parameters, and/or pressure ranges to a pre-existing DataFrame, then they should use `reg_a.datasets['argo'].download(keep_existing=True)` to retain previously downloaded data and have the new data added."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6970f0ad-9364-4732-a5e6-f93cf3fc31a3",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "The `reg_a.download_all()` function also provided a file containing ICESat-2 ATL03 data. Recall that because these data files are very large, we focus on only one file for this example.\n",
    "\n",
    "The below workflow uses the icepyx Read module to quickly load ICESat-2 data into an Xarray DataSet. To read in multiple files, see the [icepyx Read tutorial](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_read-in.html) for how to change your input source."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88f4b1b0-8c58-414c-b6a8-ce1662979943",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "filename = 'processed_ATL03_20220419002753_04111506_006_02.h5'\n",
    "\n",
    "reader = ipx.Read(data_source=path+filename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "665d79a7-7360-4846-99c2-222b34df2a92",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# decide which portions of the file to read in\n",
    "reader.vars.append(beam_list=['gt2l'], \n",
    "                   var_list=['h_ph', \"lat_ph\", \"lon_ph\", 'signal_conf_ph'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7158814-50f0-4940-980c-9bb800360982",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "ds = reader.load()\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1040438c-d806-4964-b4f0-1247da9f3f1f",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "To make the data more easily plottable, let's convert the data into a Pandas DataFrame. Note that this method is memory-intensive for ATL03 data, so users are suggested to look at small spatial domains to prevent the notebook from crashing. Here, since we only have data from one granule and ground track, we have sped up the conversion to a dataframe by first removing extra data dimensions we don't need for our plots. Several of the other steps completed below using Pandas have analogous operations in Xarray that would further reduce memory requirements and computation times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "50d23a8e",
   "metadata": {},
   "outputs": [],
   "source": [
    "is2_pd =(ds.squeeze()\n",
    "        .reset_coords()\n",
    "        .drop_vars([\"source_file\",\"data_start_utc\",\"data_end_utc\",\"gran_idx\"])\n",
    "        .to_dataframe()\n",
    "        )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01bb5a12",
   "metadata": {},
   "outputs": [],
   "source": [
    "is2_pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc67e039-338c-4348-acaf-96f605cf0030",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Create a new dataframe with only \"ocean\" photons, as indicated by the \"ds_surf_type\" flag\n",
    "is2_pd = is2_pd.reset_index(level=[0,1])\n",
    "is2_pd_ocean = is2_pd[is2_pd.ds_surf_type==1].drop(columns=\"photon_idx\")\n",
    "is2_pd_ocean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "976ed530-1dc9-412f-9d2d-e51abd28c564",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Set Argo data as its own DataFrame\n",
    "argo_df = reg_a.datasets['argo'].argodata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9a3b8cf-f3b9-4522-841b-bf760672e37f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Convert both DataFrames into GeoDataFrames\n",
    "is2_gdf = gpd.GeoDataFrame(is2_pd_ocean, \n",
    "                           geometry=gpd.points_from_xy(is2_pd_ocean['lon_ph'], is2_pd_ocean['lat_ph']),\n",
    "                           crs='EPSG:4326'\n",
    ")\n",
    "argo_gdf = gpd.GeoDataFrame(argo_df, \n",
    "                            geometry=gpd.points_from_xy(argo_df.lon, argo_df.lat),\n",
    "                            crs='EPSG:4326'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86cb8463-dc14-4c1d-853e-faf7bf4300a5",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "To view the relative locations of ICESat-2 and Argo, the below cell uses the `explore()` function from GeoPandas. The time variables cause errors in the function, so we will drop those variables first. \n",
    "\n",
    "Note that for large datasets like ICESat-2, loading the map might take a while."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7178fecc-6ca1-42a1-98d4-08f57c050daa",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Drop time variables that would cause errors in explore() function\n",
    "is2_gdf = is2_gdf.drop(['delta_time','atlas_sdp_gps_epoch'], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ff40f7b-3a0f-4e32-8187-322a5b7cb44d",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Plot ICESat-2 track (medium/high confidence photons only) on a map\n",
    "m = is2_gdf[is2_gdf['signal_conf_ph']>=3].explore(column='rgt', tiles='Esri.WorldImagery',\n",
    "                                                  name='ICESat-2')\n",
    "\n",
    "# Add Argo float locations to map\n",
    "argo_gdf.explore(m=m, name='Argo', marker_kwds={\"radius\": 6}, color='red')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b7063ec-a2f8-4509-a7ce-5b0482b48682",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "While we're at it, let's plot temperature and pressure profiles for each of the Argo floats in the area."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "da2748b7-b174-4abb-a44a-bd73d1d36eba",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Plot vertical profile of temperature vs. pressure for all of the floats\n",
    "fig, ax = plt.subplots(figsize=(12, 6))\n",
    "for pid in np.unique(argo_df['profile_id']):\n",
    "    argo_df[argo_df['profile_id']==pid].plot(ax=ax, x='temperature', y='pressure', label=pid)\n",
    "plt.gca().invert_yaxis()\n",
    "plt.xlabel('Temperature [$\\degree$C]')\n",
    "plt.ylabel('Pressure [hPa]')\n",
    "plt.ylim([750, -10])\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08481fbb-2298-432b-bd50-df2e1ca45cf5",
   "metadata": {
    "user_expressions": []
   },
   "source": [
    "Lastly, let's look at some near-coincident ICESat-2 and Argo data in a multi-panel plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1269de3c-c15d-4120-8284-3b072069d5ee",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Only consider ICESat-2 signal photons\n",
    "is2_pd_signal = is2_pd_ocean[is2_pd_ocean['signal_conf_ph']>=0]\n",
    "\n",
    "## Multi-panel plot showing ICESat-2 and Argo data\n",
    "\n",
    "# Calculate Extent\n",
    "lons = [-154, -143, -143, -154, -154]\n",
    "lats = [30, 30, 37, 37, 30]\n",
    "lon_margin = (max(lons) - min(lons)) * 0.1\n",
    "lat_margin = (max(lats) - min(lats)) * 0.1\n",
    "\n",
    "# Create Plot\n",
    "fig,([ax1,ax2],[ax3,ax4]) = plt.subplots(2, 2, figsize=(12, 6))\n",
    "\n",
    "# Plot Relative Global View\n",
    "world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))\n",
    "world.plot(ax=ax1, color='0.8', edgecolor='black')\n",
    "argo_df.plot.scatter(ax=ax1, x='lon', y='lat', s=25.0, c='green', zorder=3, alpha=0.3)\n",
    "is2_pd_signal.plot.scatter(ax=ax1, x='lon_ph', y='lat_ph', s=10.0, zorder=2, alpha=0.3)\n",
    "ax1.plot(lons, lats, linewidth=1.5, color='orange', zorder=2)\n",
    "ax1.set_xlim(-160,-100)\n",
    "ax1.set_ylim(20,50)\n",
    "ax1.set_aspect('equal', adjustable='box')\n",
    "ax1.set_xlabel('Longitude', fontsize=18)\n",
    "ax1.set_ylabel('Latitude', fontsize=18)\n",
    "\n",
    "# Plot Zoomed View of Ground Tracks\n",
    "argo_df.plot.scatter(ax=ax2, x='lon', y='lat', s=50.0, c='green', zorder=3, alpha=0.3)\n",
    "is2_pd_signal.plot.scatter(ax=ax2, x='lon_ph', y='lat_ph', s=10.0, zorder=2, alpha=0.3)\n",
    "ax2.plot(lons, lats, linewidth=1.5, color='orange', zorder=1)\n",
    "ax2.set_xlim(min(lons) - lon_margin, max(lons) + lon_margin)\n",
    "ax2.set_ylim(min(lats) - lat_margin, max(lats) + lat_margin)\n",
    "ax2.set_aspect('equal', adjustable='box')\n",
    "ax2.set_xlabel('Longitude', fontsize=18)\n",
    "ax2.set_ylabel('Latitude', fontsize=18)\n",
    "\n",
    "# Plot ICESat-2 along-track vertical profile. A dotted line notes the location of a nearby Argo float\n",
    "is2 = ax3.scatter(is2_pd_signal['lat_ph'], is2_pd_signal['h_ph']+13.1, s=0.1)\n",
    "ax3.axvline(34.43885, linestyle='--', linewidth=3, color='black')\n",
    "ax3.set_xlim([34.3, 34.5])\n",
    "ax3.set_ylim([-20, 5])\n",
    "ax3.set_xlabel('Latitude', fontsize=18)\n",
    "ax3.set_ylabel('Approx. IS-2 Depth [m]', fontsize=16)\n",
    "ax3.set_yticklabels(['15', '10', '5', '0', '-5'])\n",
    "\n",
    "# Plot vertical ocean profile of the nearby Argo float\n",
    "argo_df.plot(ax=ax4, x='temperature', y='pressure', linewidth=3)\n",
    "# ax4.set_yscale('log')\n",
    "ax4.invert_yaxis()\n",
    "ax4.get_legend().remove()\n",
    "ax4.set_xlabel('Temperature [$\\degree$C]', fontsize=18)\n",
    "ax4.set_ylabel('Argo Pressure', fontsize=16)\n",
    "\n",
    "plt.tight_layout()\n",
    "\n",
    "# Save figure\n",
    "#plt.savefig('/icepyx/quest/figures/is2_argo_figure.png', dpi=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37720c79",
   "metadata": {},
   "source": [
    "Recall that the Argo data must be saved manually.\n",
    "The dataframe associated with the Quest object can be saved using `reg_a.save_all(path)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9b6548e2-0662-4c8b-a251-55ca63aff99b",
   "metadata": {},
   "outputs": [],
   "source": [
    "reg_a.save_all(path)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}