ECE3-POSTPROC is a suite of post-processing tools for EC-Earth 3. It includes HIRESCLIM2, ECMEAN and AMWG. The last two require the output from the first one. A REPRODUCIBILITY TEST is also provided. Other tools are available but have not been clean up or tested.
The code has been ported on cca (ECMWF), rhino (KNMI), marconi (CNR), and marenostrum4 (BSC). See the Porting instructions below to include other machines.
You first need to get the code:
git clone https://github.com/plesager/ece3-postproc.git
The general idea is that you pass the experiment name (EXP) and the years to process (typically a range, say, YEAR1 YEAR2) as input to the scripts on the command line.
For this to work as intended, locations of the post-processing code, of the EC-Earth model data, and your platform, must be known. They are unlikely to change and set as environment variables, so in your shell rc file (bash.rc or similar):
export ECE3_POSTPROC_TOPDIR=<dir where this file is> export ECE3_POSTPROC_DATADIR=<dir where your ecearth init data (not run output) are located> export ECE3_POSTPROC_MACHINE=<name of your (HPC) machine>
The ”name of your (HPC) machine” is used to retrieve the machine configuration. The list of available platforms is:
ls ./conf/
If yours is not present, you need to port the code. See the Porting section below.
Optionally, you can also set your HPC account:
export ECE3_POSTPROC_ACCOUNT=<HPC account>
If not set, your default account is used to submit job to HPC. At ECMWF, this is the 1st one in the list you get with the command (on ecgate only): “account -l $USER”
If you want to temporarily change the HPC account, you can also use the command line option when calling each tool.
Finally, the code relies on $SCRATCH and $USER being defined. If not, define one. A lot of temporary data files, job scripts and their log are being written on the $SCRATCH. The $USER is used in few job manager commands (SLURM or PBS), and should already be defined.
If you use a job scheduler (SLURM, PBSpro, …), there is a set of wrappers that let you submit jobs in parallel. If not, you can still run the wrapped scripts. In all cases, the calls are made in the script sub-directory of the package [FIXME?]. The wrappers calls are described here [TODO: add description for case without wrappers].
Each tool has its specifics settings. If the code has already been ported to your machine, you will have to change very little there, as described hereafter. They define the needed executable/lib (cdo, nco, netcdf,…), location of auxiliary data, of run output, and where to save the results. For reference, the machine configurations are:
./conf/<your-machine>/conf_<tool>_<your-machine>.sh
cd ${ECE3_POSTPROC_TOPDIR}/script
./hc.sh [-a account] [-r rundir] [-m months_per_leg] [-c] EXP YEAR1 YEAR2 YREF
This will create a set of netcdf files with monthly (default) averages of variables of interest. The files are found in the ${ECE3_POSTPROC_POSTDIR}/post directory.
For more information about the script options, just call
./hc.sh -h
If you specify an alternate “rundir” to find the model output, the default EC-Earth3 output structure is assumed and must be readable:
rundir/EXP/output/{ifs,nemo}/leg_number
Upon success, ${ECE3_POSTPROC_POSTDIR}/postcheck_EXP_YYYY.txt files are created with some basic information. By repeating the command with the -c option, these files are printed. In case of problem the location of the log is printed.
In “./conf/<your-machine>/conf_hiresclim_<your-machine>.sh”, you can set some options (if you want daily or 6h output on top of the monthly one, or the nemo_extra output for example). However the most important settings that you have to change are the templates for the model output and for the results location. For example:
export IFSRESULTS0='/scratch/ms/nl/${USER}/ECEARTH-RUNS/${EXPID}/output/ifs/${LEGNB}'
export NEMORESULTS0='/scratch/ms/nl/${USER}/ECEARTH-RUNS/${EXPID}/output/nemo/${LEGNB}'
export ECE3_POSTPROC_POSTDIR='/scratch/ms/nl/${USER}/ECEARTH-RUNS/${EXPID}/post'
They must include at least either the ${year} or the ${LEGNB} to find the correct data, and must be single-quoted. The ${EXPID} will be expanded to the EXP argument passed to the script.
The NEMO variable names expected by HIRESCLIM2 may differ from those found in EC-Earth3 output. If needed, you can change the variables name from EC-Earth in conf_hiresclim_<your-machine>.sh.
./ecm.sh [-a account] [-r rundir] [-c] [-y] [-p] EXP YEAR1 YEAR2
The options are the same as for hiresclim2. For details, call
./ecm.sh -h
Output tables with Performance Indices and mean global fluxes are found in:
${ECE3_POSTPROC_DIAGDIR}/table/${EXPID}
and one line summary is found:
${ECE3_POSTPROC_DIAGDIR}/table/globtable.txt
${ECE3_POSTPROC_DIAGDIR}/table/gregory.txt
If the option -y was used, you also get yearly global means available in:
${ECE3_POSTPROC_DIAGDIR}/table/yearly_fldmean_${exp}.txt
and its subset
${ECE3_POSTPROC_DIAGDIR}/table/gregory_${exp}.txt
which has only the three variables needed for a Gregory plot.
The default output directory ${ECE3_POSTPROC_DIAGDIR} is set in the
$ECE3_POSTPROC_TOPDIR/conf/${ECE3_POSTPROC_MACHINE}/conf_ecmean_${ECE3_POSTPROC_MACHINE}.sh
config file.
You can quickly check for success by executing the command again with -c option. It will print the summary line from globtable.txt and gregory.txt files, if they exist. For more insight, have a look at the submitted scripts and logs, which are in $SCRATCH/tmp_ece3_ecmean.
EC-Mean creates a climatology from the experiment to derive the performance indices. The climatology is by default in the same directory as the HIRESCLIM2 output:
${ECE3_POSTPROC_POSTDIR}/clim-${YEAR1}-${YEAR2}
and not removed, since it can be use for other purposes (notably the reproducibility test).
TODO
The acceptance/reproducibility test consists in 4 steps:
- run an ensemble of 5 members
- running EC-mean to get the climatology and the Reichler & Kim (R&K) performance indices of each run
- cast the R&K indices into a format suitable for the next step
- repeat for another ensemble and compare
The acceptance/reproducibility test relies on a set of scripts written in R. Few R packages are needed: s2dverification, ncdf4, RColorBrewer. If you do not control your environment and R and/or the packages are missing, it may be easier to work on another machine where you can easy installed the packages. For example:
# define a personal R library location,
mkdir /usr/people/sager/Rlib
# and make sure that R is aware of it (put that one in your ~/.bashrc):
export R_LIBS=/usr/people/sager/Rlib/
# within R, install:
install.packages("s2dverification", lib="/usr/people/sager/Rlib/")
install.packages("ncdf4", lib="/usr/people/sager/Rlib/")
install.packages("RColorBrewer", lib="/usr/people/sager/Rlib/")
You must run 5 experiments for 20 years with perturbed initial conditions. Your experiments name should be made of 3 characters (the stem) followed by a number from 1-to-5. For example: cca1, cca2, cca3, cca4, cca5. The stem uniquely defines your ensemble. If you do not follow this format, collecting the R&K indices in a format suitable for the comparison scripts will be slightly more complicated but still feasible (see below). Your runs will differ by their initial conditions, which require some setup.
you can create these initial conditions on the fly, by adding a call to the perturbation script in your classic/ece-*.sh.tmpl, i.e. by replacing:
ln -s \
${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMSHECE3INIT \
ICMSH${exp_name}INIT
with
# apply AMIP perturbation to 3D temperature
${ECE3_POSTPROC_TOPDIR}/reproducibility/perturb_ifs_ic.py -s t \
${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMSHECE3INIT \
ICMSH${exp_name}INIT
A perturbation script is also available for ocean restart but has not been tested yet. But you can used perturbed ocean restarts already prepared beforehand. For example, with the following 1950 initial conditions provided by BSC, which are available through ftp, see https://dev.ec-earth.org/issues/447#note-1, and look like this once unpacked:
ic
├── atmos
│ ├── ICMGGa0raINIT
│ ├── ICMGGa0raINIUA
│ └── ICMSHa0raINIT
├── ice
│ └── a0ra_fc0_19491231_restart_ice.nc
└── ocean
├── a0ra_fc0_19491231_restart.nc
├── a0ra_fc1_19491231_restart.nc
├── a0ra_fc2_19491231_restart.nc
├── a0ra_fc3_19491231_restart.nc
└── a0ra_fc4_19491231_restart.nc
You just need to submit 5 runs that start from these different restarts. What follows is some tips to help you streamline the process. Start by reorganizing the initial conditions so you can use the same script template in all your runtime dirs. For example, you can:
cd ic/ocean/
mkdir 0{1..5}
for k in {1..5}; do cd 0$k; ln -s ../a0ra_fc$((k-1))_19491231_restart.nc restart_oce.nc ; cd - ; done
for k in {1..5}; do cd 0$k; ln -s ../../ice/a0ra_fc0_19491231_restart_ice.nc restart_ice.nc ; cd - ; done
which gives you:
[2041] >>> tree ic
ic
├── atmos
│ ├── ICMGGa0raINIT
│ ├── ICMGGa0raINIUA
│ └── ICMSHa0raINIT
├── ice
│ └── a0ra_fc0_19491231_restart_ice.nc
└── ocean
├── 01
│ ├── restart_ice.nc -> ../../ice/a0ra_fc0_19491231_restart_ice.nc
│ └── restart_oce.nc -> ../a0ra_fc0_19491231_restart.nc
├── 02
│ ├── restart_ice.nc -> ../../ice/a0ra_fc0_19491231_restart_ice.nc
│ └── restart_oce.nc -> ../a0ra_fc1_19491231_restart.nc
├── 03
│ ├── restart_ice.nc -> ../../ice/a0ra_fc0_19491231_restart_ice.nc
│ └── restart_oce.nc -> ../a0ra_fc2_19491231_restart.nc
├── 04
│ ├── restart_ice.nc -> ../../ice/a0ra_fc0_19491231_restart_ice.nc
│ └── restart_oce.nc -> ../a0ra_fc3_19491231_restart.nc
├── 05
│ ├── restart_ice.nc -> ../../ice/a0ra_fc0_19491231_restart_ice.nc
│ └── restart_oce.nc -> ../a0ra_fc4_19491231_restart.nc
├── a0ra_fc0_19491231_restart.nc
├── a0ra_fc1_19491231_restart.nc
├── a0ra_fc2_19491231_restart.nc
├── a0ra_fc3_19491231_restart.nc
└── a0ra_fc4_19491231_restart.nc
Then you modify your ece-esm.sh.tmpl template script to account for that data tree as follow (just 5 lines to change):
Index: ece-esm.sh.tmpl
===================================================================
--- ece-esm.sh.tmpl (revision 5029)
+++ ece-esm.sh.tmpl (working copy)
@@ -25,7 +25,7 @@
# config="ifs nemo lim3 rnfmapper xios:detached oasis lpjg:fdbck" # "Veg" : GCM+LPJ-Guess
# config="ifs nemo lim3 rnfmapper xios:detached oasis tm5:chem,o3,ch4,aero" # "AerChem" : GCM+TM5
-config="ifs nemo lim3 rnfmapper xios:detached oasis lpjg:fdbck tm5:co2"
+config="ifs nemo:start_from_restart lim3 rnfmapper xios:detached oasis"
# minimum sanity
has_config amip nemo && error "Cannot have both nemo and amip in config!!"
@@ -189,7 +189,7 @@
# This is only needed if the experiment is started from an existing set of NEMO
# restart files
-nem_restart_file_path=${start_dir}/nemo-rst
+nem_restart_file_path="<full-path-to-your-ic-dir>/ocean/0${exp_name:3}"
nem_restart_offset=0
@@ -450,13 +450,13 @@
# Initial data
ln -s \
- ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMGGECE3INIUA \
+ <full-path-to-your-ic-dir>/atmos/ICMGGa0raINIUA \
ICMGG${exp_name}INIUA
ln -s \
- ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMSHECE3INIT \
+ <full-path-to-your-ic-dir>/atmos/ICMSHa0raINIT \
ICMSH${exp_name}INIT
rm -f ICMGG${exp_name}INIT
- cp ${ini_data_dir}/ifs/${ifs_grid}/${leg_start_date_yyyymmdd}/ICMGGECE3INIT \
+ cp <full-path-to-your-ic-dir>/atmos/ICMGGa0raINIT \
ICMGG${exp_name}INIT
# add bare_soil_albedo to ICMGG*INIT
Then, using your favorite method, run 5 experiments with a name that ends with 1,…,5.
For each of your 5 experiments, you need to run hireclim2 followed by EC-mean to get their resulting climatology and their Reichler-Kim performance indices. For example, assuming your experiment runs from 1990-2009:
# Get monthly means
cd ${ECE3_POSTPROC_TOPDIR}/script
for k in {1..5}; do ./hc.sh cca${k} 1990 2009 1990; done
# Once the /hc.sh/ jobs are finished, get climatology and PI
for k in {1..5}; do ./ecm.sh cca${k} 1990 2009; done
Then you need to gather the PI results into a format suitable for the R scripts:
cd ${ECE3_POSTPROC_TOPDIR}/reproducibility/
./collect_ens.sh [-t] STEM NB_MEMBER YEAR1 YEAR2
The -t option let you collect both the PI indices and the climatology from each run into a tar file in your $SCRATCH. This is *useful for sharing and then being able to compare with other ensemble results*.
If your run names and/or EC-mean output do not follow the default settings, you can still collect the data without too much work. Indeed the collect_ens.sh is essentially one line of code that is easy to hack and run at the command line or an ad hoc script:
var2d="t2m msl qnet tp ewss nsss SST SSS SICE T U V Q"
for var in ${var2d}
do
for rname in your-list-of-run-names
do
cat ${path-to-rk-tables}/PI2_RK08_${rname}_${year1}_${year2}.txt | grep "^${var} " | \
tail -1 | \
awk {'print $2'} >> ${EnsembleName}_${year1}_${year2}_${var}.txt
done
doneOnce you have two ensembles processed, you can compare them. Both ensembles output collected in the previous step should be gathered in a DATADIR, where:
# For run ${nb} of ensemble ${stem}, climatological data are expected in:
$DATADIR/${stem}${nb}/post/clim-${year1}-${year2}/
# For one ensemble, ${stem}, tables are expected in:
$DATADIR/${stem}/If you use the -t option to collect all these data in a tar file (see previous step), DATADIR is just the directory where you unpack the archive. If not, it should not be difficult to re-organize your output with few mkdir and mv calls.
With the data in place, the statistics package can be run:
./compare.sh -d $DATADIR stem1 stem2 start_year end_year nb_member
A PDF file with all generated plots is created in DATADIR/plots. That default location can be overwritten at the command line with the -p option.
ec:/nm6/EC-EARTH/ECEARTH3.2b/INPUT/ece-post-proc.tar.gz
- add platform templates in a conf/<your_platform_name> directory (adapt
existing ones to your job scheduler)
conf/<your-machine>/hc_<your-machine>.tmpl conf/<your-machine>/header_<your-machine>.tmplThe job scheduler command to submit job is set in the configuration scripts.
- add a configuration script for each tools:
conf/<your-machine>/conf_hiresclim_<your-machine>.sh conf/<your-machine>/conf_timeseries_<your-machine>.sh conf/<your-machine>/conf_ecmean_<your-machine>.sh conf/<your-machine>/conf_amwg_<your-machine>.shTODO: combine those into two config files: one USER oriented (i.e anything that changes with the experiment to process), and one for the machine (i.e. setup that should not changed with the experiment/user).
- You must install nco, netcdf, python, cdo, and cdftools if missing.
- For CDFTOOLS you cannot use the light one that ships with barakuda.
- If the netCDF4 python module is not available, you cannot build
the 3D relative humidity. Set in your
./conf/<your-machine>/conf_hiresclim_<your-machine>.sh:
rh_build=0 - Some EC-Earth experiments put the water flux output from NEMO in
the SBC files instead of the grid_T files. Then you need
export use_SBC=1in your ./conf/conf_hiresclim_<your-machine>.sh config.
This is needed only if the output files of NEMO are per processes. In which case you need to do something along these lines:
cd <EC-EARTH-DIR>/sources/nemo-3.6/TOOLS/REBUILD_NEMO/ <F90-COMPILER> rebuild_nemo.f90 -o ../rebuild_nemo.exe -I<PATH-TO-NETCDF-INSTALLATION>/include -L<PATH-TO-NETCDF-INSTALLATION>/lib -lnetcdf -lnetcdff
Copied from a suite of post-processing tools from Jost (it/ccjh) on Monday, March 27, 2017. This project is a quick attempt at cleaning up the tools suite and making it easier to port. Added and adapted (Jan 2018) the code for the reproducibility test developed by Martin Ménégoz and Francois Massonnet.
Modified to work with default ecearth-3 output tree. Removed the possibility to run somebody else code (just clone it!) but can still processed output from another user.
Improved the performance of HIRECLIM2 with parallelization over the years. Can process monthly legged runs. Catch all errors with “set -e” everywhere. Try to be smart in dealing with and cleaning up temporary dirs, by using mktemp, …