Derived variables#

This notebook shows how to use derived variables. A derived variable is a variable that is not available as an input dataset, but computed from one or more input variables.

import yaml

import esmvalcore.preprocessor
from esmvalcore.cmor.table import get_tables
from esmvalcore.config import CFG
from esmvalcore.dataset import Dataset, datasets_to_recipe

First, we configure ESMValCore so it searches the ESGF for data:

CFG["projects"]["CMIP6"].pop(
    "data",
    None,
)  # Clear existing CMIP6 configuration for finding input data
CFG.nested_update(
    {
        "projects": {
            "CMIP6": {
                "data": {
                    "intake-esgf": {
                        "type": "esmvalcore.io.intake_esgf.IntakeESGFDataSource",
                        "priority": 2,
                        "facets": {
                            "activity": "activity_drs",
                            "dataset": "source_id",
                            "ensemble": "member_id",
                            "exp": "experiment_id",
                            "institute": "institution_id",
                            "grid": "grid_label",
                            "mip": "table_id",
                            "project": "project",
                            "short_name": "variable_id",
                        },
                    },
                },
            },
        },
    },
)

Which variables can be derived?#

The interface for working with derived variables from Python is not very polished yet. To list all available derived variables, we can run:

list(esmvalcore.preprocessor._derive.ALL_DERIVED_VARIABLES)  # noqa: SLF001
['clmmtisccp',
 'sfcwind',
 'clhtkisccp',
 'rsntcs',
 'swcre',
 'hfns',
 'troz',
 'cllmtisccp',
 'lwcre',
 'hurs',
 'rsus',
 'clltkisccp',
 'rlntcs',
 'rlnst',
 'sispeed',
 'sm',
 'soz',
 'rtnt',
 'xco2',
 'vegfrac',
 'lvp',
 'toz',
 'rsnstcsnorm',
 'ohc',
 'xch4',
 'rsnst',
 'lapserate',
 'rsnt',
 'rsns',
 'alb',
 'siextent',
 'netcre',
 'chlora',
 'rlnstcs',
 'rsnstcs',
 'uajet',
 'clmtkisccp',
 'sithick',
 'amoc',
 'rlns',
 'co2s',
 'lwp',
 'ctotal',
 'qep',
 'asr',
 'rlus',
 'clhmtisccp',
 'et']

Note that modules, functions, and variables starting with a single _ character should be considered internal and there are no guarantees about the stability of this interface. Guidance on adding new derived variables to ESMValCore is available in Deriving a variable.

Finding available datasets#

We define a dataset template to search for all CMIP6 models that provide all required input datasets to derive lwcre or longwave cloud radiative effect at the top of atmosphere on a monthly resolution for the historical experiment. Note that ESMValCore uses its own names for the facets for a more uniform naming across different CMIP phases and other projects. The mapping to the facet names used on ESGF can be found in Facets.

dataset_template = Dataset(
    short_name="lwcre",
    mip="Amon",
    project="CMIP6",
    exp="historical",
    dataset="*",
    institute="*",
    ensemble="r1i1p1f1",
    grid="gn",
)

Next, we use the Dataset.derived_variable_from_files method to build a list of datasets from the available files. This may take a while as searching the ESGF for many files may be a bit slow. Because the search results are cached, subsequent searches will be faster.

datasets = list(dataset_template.derived_variable_from_files())
print(f"Found {len(datasets)} datasets, showing the first 3 pairs:")
datasets[:3]
Found 37 datasets, showing the first 3 pairs:
[(Dataset:
  {'dataset': 'TaiESM1',
   'project': 'CMIP6',
   'mip': 'Amon',
   'short_name': 'rlut',
   'ensemble': 'r1i1p1f1',
   'exp': 'historical',
   'grid': 'gn',
   'institute': 'AS-RCEC'},
  Dataset:
  {'dataset': 'TaiESM1',
   'project': 'CMIP6',
   'mip': 'Amon',
   'short_name': 'rlutcs',
   'ensemble': 'r1i1p1f1',
   'exp': 'historical',
   'grid': 'gn',
   'institute': 'AS-RCEC'}),
 (Dataset:
  {'dataset': 'AWI-CM-1-1-MR',
   'project': 'CMIP6',
   'mip': 'Amon',
   'short_name': 'rlut',
   'ensemble': 'r1i1p1f1',
   'exp': 'historical',
   'grid': 'gn',
   'institute': 'AWI'},
  Dataset:
  {'dataset': 'AWI-CM-1-1-MR',
   'project': 'CMIP6',
   'mip': 'Amon',
   'short_name': 'rlutcs',
   'ensemble': 'r1i1p1f1',
   'exp': 'historical',
   'grid': 'gn',
   'institute': 'AWI'}),
 (Dataset:
  {'dataset': 'AWI-ESM-1-1-LR',
   'project': 'CMIP6',
   'mip': 'Amon',
   'short_name': 'rlut',
   'ensemble': 'r1i1p1f1',
   'exp': 'historical',
   'grid': 'gn',
   'institute': 'AWI'},
  Dataset:
  {'dataset': 'AWI-ESM-1-1-LR',
   'project': 'CMIP6',
   'mip': 'Amon',
   'short_name': 'rlutcs',
   'ensemble': 'r1i1p1f1',
   'exp': 'historical',
   'grid': 'gn',
   'institute': 'AWI'})]

This returned many tuples with input datasets required to derive lwcre. We can see lwcre is derived from the variables rlut and rlutcs.

Composing a recipe with derived variables#

To use the datasets found above in a recipe, we will want to use the name of the variable that needs to be derived, along with the derive: true option:

recipe_datasets = [
    input_datasets[0].copy(
        short_name="lwcre",
        diagnostic="diagnostic_name",
        derive=True,
    )
    for input_datasets in datasets
]
print(yaml.safe_dump(datasets_to_recipe(recipe_datasets)))
datasets:
- dataset: ACCESS-CM2
  institute: CSIRO-ARCCSS
- dataset: ACCESS-ESM1-5
  institute: CSIRO
- dataset: AWI-CM-1-1-MR
  institute: AWI
- dataset: AWI-ESM-1-1-LR
  institute: AWI
- dataset: BCC-CSM2-MR
  institute: BCC
- dataset: BCC-ESM1
  institute: BCC
- dataset: CAMS-CSM1-0
  institute: CAMS
- dataset: CAS-ESM2-0
  institute: CAS
- dataset: CESM2
  institute: NCAR
- dataset: CESM2-FV2
  institute: NCAR
- dataset: CESM2-WACCM
  institute: NCAR
- dataset: CESM2-WACCM-FV2
  institute: NCAR
- dataset: CMCC-CM2-HR4
  institute: CMCC
- dataset: CMCC-CM2-SR5
  institute: CMCC
- dataset: CMCC-ESM2
  institute: CMCC
- dataset: CanESM5
  institute: CCCma
- dataset: CanESM5-1
  institute: CCCma
- dataset: FGOALS-g3
  institute: CAS
- dataset: FIO-ESM-2-0
  institute: FIO-QLNM
- dataset: GISS-E2-1-G
  institute: NASA-GISS
- dataset: GISS-E2-1-G-CC
  institute: NASA-GISS
- dataset: GISS-E2-1-H
  institute: NASA-GISS
- dataset: GISS-E2-2-G
  institute: NASA-GISS
- dataset: GISS-E2-2-H
  institute: NASA-GISS
- dataset: ICON-ESM-LR
  institute: MPI-M
- dataset: IITM-ESM
  institute: CCCR-IITM
- dataset: MIROC6
  institute: MIROC
- dataset: MPI-ESM-1-2-HAM
  institute: HAMMOZ-Consortium
- dataset: MPI-ESM1-2-HR
  institute: MPI-M
- dataset: MPI-ESM1-2-LR
  institute: MPI-M
- dataset: MRI-ESM2-0
  institute: MRI
- dataset: NESM3
  institute: NUIST
- dataset: NorCPM1
  institute: NCC
- dataset: NorESM2-LM
  institute: NCC
- dataset: NorESM2-MM
  institute: NCC
- dataset: SAM0-UNICON
  institute: SNU
- dataset: TaiESM1
  institute: AS-RCEC
diagnostics:
  diagnostic_name:
    variables:
      lwcre:
        derive: true
        ensemble: r1i1p1f1
        exp: historical
        grid: gn
        mip: Amon
        project: CMIP6

There is also a force_derivation option available for use in the recipe, when set to true that will cause the variable to be derived even if it is already available as a dataset.

Computing the derived variable#

Let’s load the data to derive the first dataset:

cubes = [d.load() for d in datasets[0]]
cubes
WARNING:esmvalcore.cmor.check:There were warnings in variable rlut:
 rlut: attribute positive not present
loaded from file 
WARNING:esmvalcore.cmor.check:There were warnings in variable rlutcs:
 rlutcs: attribute positive not present
loaded from file 
[<iris 'Cube' of toa_outgoing_longwave_flux / (W m-2) (time: 1980; latitude: 192; longitude: 288)>,
 <iris 'Cube' of toa_outgoing_longwave_flux_assuming_clear_sky / (W m-2) (time: 1980; latitude: 192; longitude: 288)>]

Because the interface for using derived variables from Python isn’t very polished yet, we need to pass some arguments in that can be retrieved from the CMOR table:

var_info = get_tables(CFG, project="CMIP6").get_variable(
    table_name="Amon",
    short_name="lwcre",
    derived=True,
)
kwargs = {
    k: getattr(var_info, k) for k in ["short_name", "long_name", "units"]
}
kwargs
{'short_name': 'lwcre',
 'long_name': 'TOA Longwave Cloud Radiative Effect',
 'units': 'W m-2'}

Now we are ready to derive the variable:

cube = esmvalcore.preprocessor.derive(cubes, **kwargs)
cube
Toa Longwave Cloud Radiative Effect (W m-2) time latitude longitude
Shape 1980 192 288
Dimension coordinates
time x - -
latitude - x -
longitude - - x
Attributes
Conventions 'CF-1.7 CMIP-6.2'
activity_drs 'CMIP'
activity_id 'CMIP'
branch_method 'Hybrid-restart from year 0671-01-01 of piControl'
branch_time 0.0
branch_time_in_child -674885
branch_time_in_parent 171550.0
cmor_version '3.5.0'
contact 'Dr. Wei-Liang Lee (leelupin@gate.sinica.edu.tw)'
data_specs_version '01.00.31'
experiment 'all-forcing simulation of the recent past'
experiment_id 'historical'
external_variables 'areacella'
forcing_index 1
frequency 'mon'
further_info_url 'https://furtherinfo.es-doc.org/CMIP6.AS-RCEC.TaiESM1.historical.none.r ...'
grid 'finite-volume grid with 0.9x1.25 degree lat/lon resolution'
grid_label 'gn'
initialization_index 1
institution 'Research Center for Environmental Changes, Academia Sinica, Nankang, Taipei ...'
institution_id 'AS-RCEC'
license 'CMIP6 model data produced by NCC is licensed under a Creative Commons Attribution ...'
member_id 'r1i1p1f1'
mip_era 'CMIP6'
model_id 'TaiESM1'
nominal_resolution '100 km'
original_units 'W/m2'
parent_activity_id 'CMIP'
parent_experiment_id 'piControl'
parent_mip_era 'CMIP6'
parent_source_id 'TaiESM1'
parent_sub_experiment_id 'none'
parent_time_units 'days since 1850-1-1 00:00:00'
parent_variant_label 'r1i1p1f1'
physics_index 1
positive 'down'
product 'model-output'
realization_index 1
realm 'atmos'
references '10.5194/gmd-2019-377'
run_variant 'N/A'
source 'TaiESM 1.0 (2018): \naerosol: SNAP (same grid as atmos)\natmos: TaiAM1 ...'
source_id 'TaiESM1'
source_type 'AOGCM AER BGC'
sub_experiment 'none'
sub_experiment_id 'none'
table_id 'Amon'
table_info 'Creation Date:(24 July 2019) MD5:0bb394a356ef9d214d027f1aca45853e'
title 'TaiESM1 output prepared for CMIP6'
variant_label 'r1i1p1f1'