{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "bd168fbd-f5e8-4b32-906f-5c658b9758a0", "metadata": {}, "source": [ "# Derived variables\n", "\n", "This notebook shows how to use derived variables. A derived variable is a variable that is not available as an input dataset, but computed from one or more input variables." ] }, { "cell_type": "code", "execution_count": 1, "id": "f0ccfe7f-c535-4606-99ce-be24960aece1", "metadata": {}, "outputs": [], "source": [ "import yaml\n", "\n", "import esmvalcore.preprocessor\n", "from esmvalcore.cmor.table import get_tables\n", "from esmvalcore.config import CFG\n", "from esmvalcore.dataset import Dataset, datasets_to_recipe" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f4374495-19c4-4c3b-9fac-d929a5e595ad", "metadata": {}, "source": [ "First, we configure ESMValCore so it searches the ESGF for data:" ] }, { "cell_type": "code", "execution_count": 2, "id": "5d2711ea-6738-4a82-97b1-bc7d1212098a", "metadata": {}, "outputs": [], "source": [ "CFG[\"projects\"][\"CMIP6\"].pop(\n", " \"data\",\n", " None,\n", ") # Clear existing CMIP6 configuration for finding input data\n", "CFG.nested_update(\n", " {\n", " \"projects\": {\n", " \"CMIP6\": {\n", " \"data\": {\n", " \"intake-esgf\": {\n", " \"type\": \"esmvalcore.io.intake_esgf.IntakeESGFDataSource\",\n", " \"priority\": 2,\n", " \"facets\": {\n", " \"activity\": \"activity_drs\",\n", " \"dataset\": \"source_id\",\n", " \"ensemble\": \"member_id\",\n", " \"exp\": \"experiment_id\",\n", " \"institute\": \"institution_id\",\n", " \"grid\": \"grid_label\",\n", " \"mip\": \"table_id\",\n", " \"project\": \"project\",\n", " \"short_name\": \"variable_id\",\n", " },\n", " },\n", " },\n", " },\n", " },\n", " },\n", ")" ] }, { "cell_type": "markdown", "id": "d5f03519-3580-4baa-93f0-71cb406bf29a", "metadata": {}, "source": [ "## Which variables can be derived?\n", "\n", "The interface for working with derived variables from Python is not very polished yet. To list all available derived variables, we can run:" ] }, { "cell_type": "code", "execution_count": 3, "id": "57610048-42ca-4451-96bd-e787cf0eab33", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['clmmtisccp',\n", " 'sfcwind',\n", " 'clhtkisccp',\n", " 'rsntcs',\n", " 'swcre',\n", " 'hfns',\n", " 'troz',\n", " 'cllmtisccp',\n", " 'lwcre',\n", " 'hurs',\n", " 'rsus',\n", " 'clltkisccp',\n", " 'rlntcs',\n", " 'rlnst',\n", " 'sispeed',\n", " 'sm',\n", " 'soz',\n", " 'rtnt',\n", " 'xco2',\n", " 'vegfrac',\n", " 'lvp',\n", " 'toz',\n", " 'rsnstcsnorm',\n", " 'ohc',\n", " 'xch4',\n", " 'rsnst',\n", " 'lapserate',\n", " 'rsnt',\n", " 'rsns',\n", " 'alb',\n", " 'siextent',\n", " 'netcre',\n", " 'chlora',\n", " 'rlnstcs',\n", " 'rsnstcs',\n", " 'uajet',\n", " 'clmtkisccp',\n", " 'sithick',\n", " 'amoc',\n", " 'rlns',\n", " 'co2s',\n", " 'lwp',\n", " 'ctotal',\n", " 'qep',\n", " 'asr',\n", " 'rlus',\n", " 'clhmtisccp',\n", " 'et']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(esmvalcore.preprocessor._derive.ALL_DERIVED_VARIABLES) # noqa: SLF001" ] }, { "cell_type": "markdown", "id": "f0294f89-fa3d-43da-a370-d2e4613fbdda", "metadata": {}, "source": [ "Note that [modules, functions, and variables starting with a single `_` character should be considered internal](https://peps.python.org/pep-0008/#descriptive-naming-styles) and there are no guarantees about the stability of this interface. Guidance on adding new derived variables to ESMValCore is available in [Deriving a variable](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/develop/derivation.html)." ] }, { "cell_type": "markdown", "id": "d1f58094-65d1-4d55-bf02-3a14d4cdea1c", "metadata": {}, "source": [ "## Finding available datasets" ] }, { "attachments": {}, "cell_type": "markdown", "id": "aea7a272-7d26-44d9-8766-379379e5d152", "metadata": {}, "source": [ "We define a dataset template to search for all CMIP6 models that provide all required input datasets to derive `lwcre` or longwave cloud radiative effect at the top of atmosphere on a monthly resolution for the historical experiment. Note that ESMValCore uses its own names for the facets for a more uniform naming across different CMIP phases and other projects. The mapping to the facet names used on ESGF can be found in [Facets](https://docs.esmvaltool.org/projects/ESMValCore/en/latest/reference/facets.html)." ] }, { "cell_type": "code", "execution_count": 4, "id": "23c26e29-ea87-40d7-a962-85a06fc77221", "metadata": {}, "outputs": [], "source": [ "dataset_template = Dataset(\n", " short_name=\"lwcre\",\n", " mip=\"Amon\",\n", " project=\"CMIP6\",\n", " exp=\"historical\",\n", " dataset=\"*\",\n", " institute=\"*\",\n", " ensemble=\"r1i1p1f1\",\n", " grid=\"gn\",\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "baf29fbb-eed5-47bd-8805-c27ad34b0539", "metadata": {}, "source": [ "Next, we use the `Dataset.derived_variable_from_files` method to build a list of datasets from the available files. This may take a while as searching the ESGF for many files may be a bit slow. Because the search results are cached, subsequent searches will be faster." ] }, { "cell_type": "code", "execution_count": 5, "id": "d657320b-25c7-48f3-bfe1-5f3b94d7b789", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 37 datasets, showing the first 3 pairs:\n" ] }, { "data": { "text/plain": [ "[(Dataset:\n", " {'dataset': 'TaiESM1',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'rlut',\n", " 'ensemble': 'r1i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AS-RCEC'},\n", " Dataset:\n", " {'dataset': 'TaiESM1',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'rlutcs',\n", " 'ensemble': 'r1i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AS-RCEC'}),\n", " (Dataset:\n", " {'dataset': 'AWI-CM-1-1-MR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'rlut',\n", " 'ensemble': 'r1i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AWI'},\n", " Dataset:\n", " {'dataset': 'AWI-CM-1-1-MR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'rlutcs',\n", " 'ensemble': 'r1i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AWI'}),\n", " (Dataset:\n", " {'dataset': 'AWI-ESM-1-1-LR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'rlut',\n", " 'ensemble': 'r1i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AWI'},\n", " Dataset:\n", " {'dataset': 'AWI-ESM-1-1-LR',\n", " 'project': 'CMIP6',\n", " 'mip': 'Amon',\n", " 'short_name': 'rlutcs',\n", " 'ensemble': 'r1i1p1f1',\n", " 'exp': 'historical',\n", " 'grid': 'gn',\n", " 'institute': 'AWI'})]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datasets = list(dataset_template.derived_variable_from_files())\n", "print(f\"Found {len(datasets)} datasets, showing the first 3 pairs:\")\n", "datasets[:3]" ] }, { "cell_type": "markdown", "id": "ed6e8979-7b3b-4337-8e52-70e102d3d98f", "metadata": {}, "source": [ "This returned many tuples with input datasets required to derive `lwcre`. We can see `lwcre` is derived from the variables `rlut` and `rlutcs`." ] }, { "cell_type": "markdown", "id": "d79ebf03-08bf-42ae-a756-1417566e3be8", "metadata": {}, "source": [ "## Composing a recipe with derived variables" ] }, { "cell_type": "markdown", "id": "3f88a30e-9dcd-431d-b469-3efd367795de", "metadata": {}, "source": [ "To use the datasets found above in a recipe, we will want to use the name of the variable that needs to be derived, along with the `derive: true` option:" ] }, { "cell_type": "code", "execution_count": 6, "id": "f7f13430-b359-4ef6-a06b-82af64857cfa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "datasets:\n", "- dataset: ACCESS-CM2\n", " institute: CSIRO-ARCCSS\n", "- dataset: ACCESS-ESM1-5\n", " institute: CSIRO\n", "- dataset: AWI-CM-1-1-MR\n", " institute: AWI\n", "- dataset: AWI-ESM-1-1-LR\n", " institute: AWI\n", "- dataset: BCC-CSM2-MR\n", " institute: BCC\n", "- dataset: BCC-ESM1\n", " institute: BCC\n", "- dataset: CAMS-CSM1-0\n", " institute: CAMS\n", "- dataset: CAS-ESM2-0\n", " institute: CAS\n", "- dataset: CESM2\n", " institute: NCAR\n", "- dataset: CESM2-FV2\n", " institute: NCAR\n", "- dataset: CESM2-WACCM\n", " institute: NCAR\n", "- dataset: CESM2-WACCM-FV2\n", " institute: NCAR\n", "- dataset: CMCC-CM2-HR4\n", " institute: CMCC\n", "- dataset: CMCC-CM2-SR5\n", " institute: CMCC\n", "- dataset: CMCC-ESM2\n", " institute: CMCC\n", "- dataset: CanESM5\n", " institute: CCCma\n", "- dataset: CanESM5-1\n", " institute: CCCma\n", "- dataset: FGOALS-g3\n", " institute: CAS\n", "- dataset: FIO-ESM-2-0\n", " institute: FIO-QLNM\n", "- dataset: GISS-E2-1-G\n", " institute: NASA-GISS\n", "- dataset: GISS-E2-1-G-CC\n", " institute: NASA-GISS\n", "- dataset: GISS-E2-1-H\n", " institute: NASA-GISS\n", "- dataset: GISS-E2-2-G\n", " institute: NASA-GISS\n", "- dataset: GISS-E2-2-H\n", " institute: NASA-GISS\n", "- dataset: ICON-ESM-LR\n", " institute: MPI-M\n", "- dataset: IITM-ESM\n", " institute: CCCR-IITM\n", "- dataset: MIROC6\n", " institute: MIROC\n", "- dataset: MPI-ESM-1-2-HAM\n", " institute: HAMMOZ-Consortium\n", "- dataset: MPI-ESM1-2-HR\n", " institute: MPI-M\n", "- dataset: MPI-ESM1-2-LR\n", " institute: MPI-M\n", "- dataset: MRI-ESM2-0\n", " institute: MRI\n", "- dataset: NESM3\n", " institute: NUIST\n", "- dataset: NorCPM1\n", " institute: NCC\n", "- dataset: NorESM2-LM\n", " institute: NCC\n", "- dataset: NorESM2-MM\n", " institute: NCC\n", "- dataset: SAM0-UNICON\n", " institute: SNU\n", "- dataset: TaiESM1\n", " institute: AS-RCEC\n", "diagnostics:\n", " diagnostic_name:\n", " variables:\n", " lwcre:\n", " derive: true\n", " ensemble: r1i1p1f1\n", " exp: historical\n", " grid: gn\n", " mip: Amon\n", " project: CMIP6\n", "\n" ] } ], "source": [ "recipe_datasets = [\n", " input_datasets[0].copy(\n", " short_name=\"lwcre\",\n", " diagnostic=\"diagnostic_name\",\n", " derive=True,\n", " )\n", " for input_datasets in datasets\n", "]\n", "print(yaml.safe_dump(datasets_to_recipe(recipe_datasets)))" ] }, { "cell_type": "markdown", "id": "265a0d2e-2541-4171-8d0b-406a42a519e1", "metadata": {}, "source": [ "There is also a `force_derivation` option available for use in the recipe, when set to `true` that will cause the variable to be derived even if it is already available as a dataset." ] }, { "cell_type": "markdown", "id": "4d7ea302-57ac-4054-9130-13860827bfc2", "metadata": {}, "source": [ "## Computing the derived variable" ] }, { "cell_type": "markdown", "id": "79d9d439-f95e-4ae8-8585-0d2b506d338c", "metadata": {}, "source": [ "Let's load the data to derive the first dataset:" ] }, { "cell_type": "code", "execution_count": 7, "id": "22a1bd2d-f329-4610-8076-3c109dade67e", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:esmvalcore.cmor.check:There were warnings in variable rlut:\n", " rlut: attribute positive not present\n", "loaded from file \n", "WARNING:esmvalcore.cmor.check:There were warnings in variable rlutcs:\n", " rlutcs: attribute positive not present\n", "loaded from file \n" ] }, { "data": { "text/plain": [ "[,\n", " ]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cubes = [d.load() for d in datasets[0]]\n", "cubes" ] }, { "cell_type": "markdown", "id": "9a043222-d765-4725-b94a-5c18d01e1e04", "metadata": {}, "source": [ "Because the interface for using derived variables from Python isn't very polished yet, we need to pass some arguments in that can be retrieved from the CMOR table:" ] }, { "cell_type": "code", "execution_count": 8, "id": "470c3124-7ba6-48bc-b18a-432b5ccd604e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'short_name': 'lwcre',\n", " 'long_name': 'TOA Longwave Cloud Radiative Effect',\n", " 'units': 'W m-2'}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var_info = get_tables(CFG, project=\"CMIP6\").get_variable(\n", " table_name=\"Amon\",\n", " short_name=\"lwcre\",\n", " derived=True,\n", ")\n", "kwargs = {\n", " k: getattr(var_info, k) for k in [\"short_name\", \"long_name\", \"units\"]\n", "}\n", "kwargs" ] }, { "cell_type": "markdown", "id": "6ae6afe3-29f8-4a03-a635-7d1034770113", "metadata": {}, "source": [ "Now we are ready to derive the variable:" ] }, { "cell_type": "code", "execution_count": 9, "id": "0c3e81ef-5237-453f-b27a-2b1aadf371be", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "
Toa Longwave Cloud Radiative Effect (W m-2)timelatitudelongitude
Shape1980192288
Dimension coordinates
\ttimex--
\tlatitude-x-
\tlongitude--x
Attributes
\tConventions'CF-1.7 CMIP-6.2'
\tactivity_drs'CMIP'
\tactivity_id'CMIP'
\tbranch_method'Hybrid-restart from year 0671-01-01 of piControl'
\tbranch_time0.0
\tbranch_time_in_child-674885
\tbranch_time_in_parent171550.0
\tcmor_version'3.5.0'
\tcontact'Dr. Wei-Liang Lee (leelupin@gate.sinica.edu.tw)'
\tdata_specs_version'01.00.31'
\texperiment'all-forcing simulation of the recent past'
\texperiment_id'historical'
\texternal_variables'areacella'
\tforcing_index1
\tfrequency'mon'
\tfurther_info_url'https://furtherinfo.es-doc.org/CMIP6.AS-RCEC.TaiESM1.historical.none.r ...'
\tgrid'finite-volume grid with 0.9x1.25 degree lat/lon resolution'
\tgrid_label'gn'
\tinitialization_index1
\tinstitution'Research Center for Environmental Changes, Academia Sinica, Nankang, Taipei ...'
\tinstitution_id'AS-RCEC'
\tlicense'CMIP6 model data produced by NCC is licensed under a Creative Commons Attribution ...'
\tmember_id'r1i1p1f1'
\tmip_era'CMIP6'
\tmodel_id'TaiESM1'
\tnominal_resolution'100 km'
\toriginal_units'W/m2'
\tparent_activity_id'CMIP'
\tparent_experiment_id'piControl'
\tparent_mip_era'CMIP6'
\tparent_source_id'TaiESM1'
\tparent_sub_experiment_id'none'
\tparent_time_units'days since 1850-1-1 00:00:00'
\tparent_variant_label'r1i1p1f1'
\tphysics_index1
\tpositive'down'
\tproduct'model-output'
\trealization_index1
\trealm'atmos'
\treferences'10.5194/gmd-2019-377'
\trun_variant'N/A'
\tsource'TaiESM 1.0 (2018): \\naerosol: SNAP (same grid as atmos)\\natmos: TaiAM1 ...'
\tsource_id'TaiESM1'
\tsource_type'AOGCM AER BGC'
\tsub_experiment'none'
\tsub_experiment_id'none'
\ttable_id'Amon'
\ttable_info'Creation Date:(24 July 2019) MD5:0bb394a356ef9d214d027f1aca45853e'
\ttitle'TaiESM1 output prepared for CMIP6'
\tvariant_label'r1i1p1f1'
\n", " " ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cube = esmvalcore.preprocessor.derive(cubes, **kwargs)\n", "cube" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.13" } }, "nbformat": 4, "nbformat_minor": 5 }