Derived variables#
This notebook shows how to use derived variables. A derived variable is a variable that is not available as an input dataset, but computed from one or more input variables.
import yaml
import esmvalcore.preprocessor
from esmvalcore.cmor.table import get_tables
from esmvalcore.config import CFG
from esmvalcore.dataset import Dataset, datasets_to_recipe
First, we configure ESMValCore so it searches the ESGF for data:
CFG["projects"]["CMIP6"].pop(
"data",
None,
) # Clear existing CMIP6 configuration for finding input data
CFG.nested_update(
{
"projects": {
"CMIP6": {
"data": {
"intake-esgf": {
"type": "esmvalcore.io.intake_esgf.IntakeESGFDataSource",
"priority": 2,
"facets": {
"activity": "activity_drs",
"dataset": "source_id",
"ensemble": "member_id",
"exp": "experiment_id",
"institute": "institution_id",
"grid": "grid_label",
"mip": "table_id",
"project": "project",
"short_name": "variable_id",
},
},
},
},
},
},
)
Which variables can be derived?#
The interface for working with derived variables from Python is not very polished yet. To list all available derived variables, we can run:
list(esmvalcore.preprocessor._derive.ALL_DERIVED_VARIABLES) # noqa: SLF001
['clmmtisccp',
'sfcwind',
'clhtkisccp',
'rsntcs',
'swcre',
'hfns',
'troz',
'cllmtisccp',
'lwcre',
'hurs',
'rsus',
'clltkisccp',
'rlntcs',
'rlnst',
'sispeed',
'sm',
'soz',
'rtnt',
'xco2',
'vegfrac',
'lvp',
'toz',
'rsnstcsnorm',
'ohc',
'xch4',
'rsnst',
'lapserate',
'rsnt',
'rsns',
'alb',
'siextent',
'netcre',
'chlora',
'rlnstcs',
'rsnstcs',
'uajet',
'clmtkisccp',
'sithick',
'amoc',
'rlns',
'co2s',
'lwp',
'ctotal',
'qep',
'asr',
'rlus',
'clhmtisccp',
'et']
Note that modules, functions, and variables starting with a single _ character should be considered internal and there are no guarantees about the stability of this interface. Guidance on adding new derived variables to ESMValCore is available in Deriving a variable.
Finding available datasets#
We define a dataset template to search for all CMIP6 models that provide all required input datasets to derive lwcre or longwave cloud radiative effect at the top of atmosphere on a monthly resolution for the historical experiment. Note that ESMValCore uses its own names for the facets for a more uniform naming across different CMIP phases and other projects. The mapping to the facet names used on ESGF can be found in Facets.
dataset_template = Dataset(
short_name="lwcre",
mip="Amon",
project="CMIP6",
exp="historical",
dataset="*",
institute="*",
ensemble="r1i1p1f1",
grid="gn",
)
Next, we use the Dataset.derived_variable_from_files method to build a list of datasets from the available files. This may take a while as searching the ESGF for many files may be a bit slow. Because the search results are cached, subsequent searches will be faster.
datasets = list(dataset_template.derived_variable_from_files())
print(f"Found {len(datasets)} datasets, showing the first 3 pairs:")
datasets[:3]
Found 37 datasets, showing the first 3 pairs:
[(Dataset:
{'dataset': 'TaiESM1',
'project': 'CMIP6',
'mip': 'Amon',
'short_name': 'rlut',
'ensemble': 'r1i1p1f1',
'exp': 'historical',
'grid': 'gn',
'institute': 'AS-RCEC'},
Dataset:
{'dataset': 'TaiESM1',
'project': 'CMIP6',
'mip': 'Amon',
'short_name': 'rlutcs',
'ensemble': 'r1i1p1f1',
'exp': 'historical',
'grid': 'gn',
'institute': 'AS-RCEC'}),
(Dataset:
{'dataset': 'AWI-CM-1-1-MR',
'project': 'CMIP6',
'mip': 'Amon',
'short_name': 'rlut',
'ensemble': 'r1i1p1f1',
'exp': 'historical',
'grid': 'gn',
'institute': 'AWI'},
Dataset:
{'dataset': 'AWI-CM-1-1-MR',
'project': 'CMIP6',
'mip': 'Amon',
'short_name': 'rlutcs',
'ensemble': 'r1i1p1f1',
'exp': 'historical',
'grid': 'gn',
'institute': 'AWI'}),
(Dataset:
{'dataset': 'AWI-ESM-1-1-LR',
'project': 'CMIP6',
'mip': 'Amon',
'short_name': 'rlut',
'ensemble': 'r1i1p1f1',
'exp': 'historical',
'grid': 'gn',
'institute': 'AWI'},
Dataset:
{'dataset': 'AWI-ESM-1-1-LR',
'project': 'CMIP6',
'mip': 'Amon',
'short_name': 'rlutcs',
'ensemble': 'r1i1p1f1',
'exp': 'historical',
'grid': 'gn',
'institute': 'AWI'})]
This returned many tuples with input datasets required to derive lwcre. We can see lwcre is derived from the variables rlut and rlutcs.
Composing a recipe with derived variables#
To use the datasets found above in a recipe, we will want to use the name of the variable that needs to be derived, along with the derive: true option:
recipe_datasets = [
input_datasets[0].copy(
short_name="lwcre",
diagnostic="diagnostic_name",
derive=True,
)
for input_datasets in datasets
]
print(yaml.safe_dump(datasets_to_recipe(recipe_datasets)))
datasets:
- dataset: ACCESS-CM2
institute: CSIRO-ARCCSS
- dataset: ACCESS-ESM1-5
institute: CSIRO
- dataset: AWI-CM-1-1-MR
institute: AWI
- dataset: AWI-ESM-1-1-LR
institute: AWI
- dataset: BCC-CSM2-MR
institute: BCC
- dataset: BCC-ESM1
institute: BCC
- dataset: CAMS-CSM1-0
institute: CAMS
- dataset: CAS-ESM2-0
institute: CAS
- dataset: CESM2
institute: NCAR
- dataset: CESM2-FV2
institute: NCAR
- dataset: CESM2-WACCM
institute: NCAR
- dataset: CESM2-WACCM-FV2
institute: NCAR
- dataset: CMCC-CM2-HR4
institute: CMCC
- dataset: CMCC-CM2-SR5
institute: CMCC
- dataset: CMCC-ESM2
institute: CMCC
- dataset: CanESM5
institute: CCCma
- dataset: CanESM5-1
institute: CCCma
- dataset: FGOALS-g3
institute: CAS
- dataset: FIO-ESM-2-0
institute: FIO-QLNM
- dataset: GISS-E2-1-G
institute: NASA-GISS
- dataset: GISS-E2-1-G-CC
institute: NASA-GISS
- dataset: GISS-E2-1-H
institute: NASA-GISS
- dataset: GISS-E2-2-G
institute: NASA-GISS
- dataset: GISS-E2-2-H
institute: NASA-GISS
- dataset: ICON-ESM-LR
institute: MPI-M
- dataset: IITM-ESM
institute: CCCR-IITM
- dataset: MIROC6
institute: MIROC
- dataset: MPI-ESM-1-2-HAM
institute: HAMMOZ-Consortium
- dataset: MPI-ESM1-2-HR
institute: MPI-M
- dataset: MPI-ESM1-2-LR
institute: MPI-M
- dataset: MRI-ESM2-0
institute: MRI
- dataset: NESM3
institute: NUIST
- dataset: NorCPM1
institute: NCC
- dataset: NorESM2-LM
institute: NCC
- dataset: NorESM2-MM
institute: NCC
- dataset: SAM0-UNICON
institute: SNU
- dataset: TaiESM1
institute: AS-RCEC
diagnostics:
diagnostic_name:
variables:
lwcre:
derive: true
ensemble: r1i1p1f1
exp: historical
grid: gn
mip: Amon
project: CMIP6
There is also a force_derivation option available for use in the recipe, when set to true that will cause the variable to be derived even if it is already available as a dataset.
Computing the derived variable#
Let’s load the data to derive the first dataset:
cubes = [d.load() for d in datasets[0]]
cubes
WARNING:esmvalcore.cmor.check:There were warnings in variable rlut:
rlut: attribute positive not present
loaded from file
WARNING:esmvalcore.cmor.check:There were warnings in variable rlutcs:
rlutcs: attribute positive not present
loaded from file
[<iris 'Cube' of toa_outgoing_longwave_flux / (W m-2) (time: 1980; latitude: 192; longitude: 288)>,
<iris 'Cube' of toa_outgoing_longwave_flux_assuming_clear_sky / (W m-2) (time: 1980; latitude: 192; longitude: 288)>]
Because the interface for using derived variables from Python isn’t very polished yet, we need to pass some arguments in that can be retrieved from the CMOR table:
var_info = get_tables(CFG, project="CMIP6").get_variable(
table_name="Amon",
short_name="lwcre",
derived=True,
)
kwargs = {
k: getattr(var_info, k) for k in ["short_name", "long_name", "units"]
}
kwargs
{'short_name': 'lwcre',
'long_name': 'TOA Longwave Cloud Radiative Effect',
'units': 'W m-2'}
Now we are ready to derive the variable:
cube = esmvalcore.preprocessor.derive(cubes, **kwargs)
cube
| Toa Longwave Cloud Radiative Effect (W m-2) | time | latitude | longitude |
|---|---|---|---|
| Shape | 1980 | 192 | 288 |
| Dimension coordinates | |||
| time | x | - | - |
| latitude | - | x | - |
| longitude | - | - | x |
| Attributes | |||
| Conventions | 'CF-1.7 CMIP-6.2' | ||
| activity_drs | 'CMIP' | ||
| activity_id | 'CMIP' | ||
| branch_method | 'Hybrid-restart from year 0671-01-01 of piControl' | ||
| branch_time | 0.0 | ||
| branch_time_in_child | -674885 | ||
| branch_time_in_parent | 171550.0 | ||
| cmor_version | '3.5.0' | ||
| contact | 'Dr. Wei-Liang Lee (leelupin@gate.sinica.edu.tw)' | ||
| data_specs_version | '01.00.31' | ||
| experiment | 'all-forcing simulation of the recent past' | ||
| experiment_id | 'historical' | ||
| external_variables | 'areacella' | ||
| forcing_index | 1 | ||
| frequency | 'mon' | ||
| further_info_url | 'https://furtherinfo.es-doc.org/CMIP6.AS-RCEC.TaiESM1.historical.none.r ...' | ||
| grid | 'finite-volume grid with 0.9x1.25 degree lat/lon resolution' | ||
| grid_label | 'gn' | ||
| initialization_index | 1 | ||
| institution | 'Research Center for Environmental Changes, Academia Sinica, Nankang, Taipei ...' | ||
| institution_id | 'AS-RCEC' | ||
| license | 'CMIP6 model data produced by NCC is licensed under a Creative Commons Attribution ...' | ||
| member_id | 'r1i1p1f1' | ||
| mip_era | 'CMIP6' | ||
| model_id | 'TaiESM1' | ||
| nominal_resolution | '100 km' | ||
| original_units | 'W/m2' | ||
| parent_activity_id | 'CMIP' | ||
| parent_experiment_id | 'piControl' | ||
| parent_mip_era | 'CMIP6' | ||
| parent_source_id | 'TaiESM1' | ||
| parent_sub_experiment_id | 'none' | ||
| parent_time_units | 'days since 1850-1-1 00:00:00' | ||
| parent_variant_label | 'r1i1p1f1' | ||
| physics_index | 1 | ||
| positive | 'down' | ||
| product | 'model-output' | ||
| realization_index | 1 | ||
| realm | 'atmos' | ||
| references | '10.5194/gmd-2019-377' | ||
| run_variant | 'N/A' | ||
| source | 'TaiESM 1.0 (2018): \naerosol: SNAP (same grid as atmos)\natmos: TaiAM1 ...' | ||
| source_id | 'TaiESM1' | ||
| source_type | 'AOGCM AER BGC' | ||
| sub_experiment | 'none' | ||
| sub_experiment_id | 'none' | ||
| table_id | 'Amon' | ||
| table_info | 'Creation Date:(24 July 2019) MD5:0bb394a356ef9d214d027f1aca45853e' | ||
| title | 'TaiESM1 output prepared for CMIP6' | ||
| variant_label | 'r1i1p1f1' | ||