aerosol table by larsbuntemeyer · Pull Request #74 · WCRP-CORDEX/data-request-table

larsbuntemeyer · 2025-07-29T11:41:09Z

The initial aerosol table created using:

import pandas as pd

def sheet_url(url, sheet_name):
    """create google spreadsheet url based on sheet name"""
    sheet_name = sheet_name.replace(" ", "%20")
    return url.format(sheet_id=sheet_id, sheet_name=sheet_name)


def retrieve_google_sheet(url, sheet_name, skiprows=4):
    """retrieve single sheet of data request"""
    return pd.read_csv(sheet_url(url, sheet_name), skiprows=skiprows, dtype=str)

def handle_inconsistencies(df):
    """handle some random inconsistencies"""
    df.loc[df["priority"] == "TIER 2", "priority"] = "TIER2"
    df.loc[df["priority"] == "TIER 1", "priority"] = "TIER1"
    return df

def freq_list(row):
    """create list of frequencies from boolean entries ('x')"""
    if row["mon"] == "fx":
        return ["fx"]
    return [f for f in freqs if row[f] == "x"]

def update_cell_methods(df):
    # special fx cases
    df.loc[df.frequency == "fx", "cell_methods"] = "area: mean"

    # flux units, see https://github.com/WCRP-CORDEX/cordex-cmip6-data-request/issues/23
    df.loc[df.units == "W m-2", "cell_methods"] = "area: time: mean"

    return df

def handle_special_cell_methods(df):
    for var, v in df.cell_methods.items():
        for f, cm in v.items():
            df.loc[(df.out_name == var) & (df.frequency == f), "cell_methods"] = cm
    return df

def clean_df(df, drop=True):
    """tidy up dataframe"""
    # remove unnamed columns
    df = df.loc[:, ~df.columns.str.contains("Unnamed")]

    df["standard_name"] = df["standard_name"].fillna("")

    # lower case column names and renaming to cmip6 formats
    df.columns = df.columns.str.lower()
    df.rename(
        columns={"output variable name": "out_name", "comments": "comment"},
        inplace=True,
    )

    # frequency columns to tidy data
    df["frequency"] = df.apply(lambda row: freq_list(row), axis=1)
    df = df.explode("frequency", ignore_index=True)

    df = handle_inconsistencies(df)  # set correct frequency name for point values

    subdaily_pt = (df["frequency"].isin(["1hr", "3hr", "6hr"])) & (df["ag"] == "i")
    # set frequency, we don't do that anymore,
    # see https://github.com/WCRP-CORDEX/cordex-cmip6-data-request/issues/24
    # df.loc[subdaily_pt, "frequency"] = df[subdaily_pt].frequency + "Pt"

    # set cell methods depending on frequency
    df["cell_methods"] = "area: time: mean"
    df.loc[subdaily_pt, "cell_methods"] = "area: mean time: point"

    # update some more cell_methods
    df = update_cell_methods(df)
    # remove trailing formatters
    df.replace(r"\n", " ", regex=True, inplace=True)
    strip_cols = ["standard_name", "long_name"]
    for col in strip_cols:
        df[col] = df[col].str.strip()
    if drop is True:
        df.drop(columns=freqs, inplace=True)
        #df.drop(columns=["ag"], inplace=True)
        df = df.dropna(subset=["out_name", "frequency"], how="all")

    # handle min max cell_methods
    df.loc[df.out_name.str.contains("min"), "cell_methods"] = "area: mean time: minimum"
    df.loc[df.out_name.str.contains("max"), "cell_methods"] = "area: mean time: maximum"

    # handle special cases
    #df = handle_special_cell_methods(df)

    # set these to lowercase
    lowercase = ["CAPE", "LI", "CIN", "CAPEmax", "LImax", "CINmax"]
    lc = df.out_name.isin(lowercase)

    df.loc[lc, "out_name"] = df[lc].out_name.str.lower()

    # set positive values
    up = ["outgoing", "upward", "upwelling"]
    down = ["incoming", "downward", "downwelling", "sinking"]
    ups = df.loc[df.standard_name.str.contains("|".join(up), case=False)]
    downs = df.loc[df.standard_name.str.contains("|".join(down), case=False)]
    df.loc[ups.index, "positive"] = "up"
    df.loc[downs.index, "positive"] = "down"

    return df

freqs = ["mon", "day", "6hr", "3hr", "1hr"]

sheet_names = ["Aersol CORE", "Aerosol Tier 1", "Aerosol Tier 2"]
#url = "https://docs.google.com/spreadsheets/d/1_KLWJuVdxryyq3DsB5NIJwoneuVqSUVN/edit?pli=1&gid=1672965248#gid=1672965248"

sheet_id = "1_KLWJuVdxryyq3DsB5NIJwoneuVqSUVN"
url = (
    "https://docs.google.com/spreadsheets/d/{sheet_id}/gviz/"
    "tq?tqx=out:csv&sheet={sheet_name}"
)

def retrieve_data_request():
    data = []
    for sheet_name in sheet_names:
        df = retrieve_google_sheet(url, sheet_name, skiprows=0).rename(columns={"Output frequency mon": "mon"})
        df.columns.values[1] = "units"
        #df = clean_df(df)
        data.append(df)
    return data

df = pd.concat(retrieve_data_request(), ignore_index=True)
df = clean_df(df)
df.to_csv("aerosol.csv", index=False)

larsbuntemeyer · 2025-07-29T12:14:21Z

@pierrenabat i added a table in this PR basically containing all requested aerosol variables and some meta data derived from the information provided. I kept the "ag" column for now to check cell methods.

The default for cell methods is (all frequencies aver averaged values)

"area: time: mean"

in case of "i" in the aggregation column, for subdaily frequencies it's

"area: mean time: point"

However, i'm unsure how to handle the "c" (cumulative). Should the subdaily cell method be something like "area: mean time: sum"? I could't find anythin im CMIP6 to hang on, e.g., no cumulative subdaily frequncies.

larsbuntemeyer · 2025-07-29T12:21:17Z

@jesusff for aerosols, i now see in the comments a lot of pressure levels requested for aerosol variables, e.g.,

List of levels: 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 925, 950, 975, 1000 hPa (all in 1 file if possible)

The default data request splits pressure levels in individual datasets with scalar coordinates. Should we stick with one approach or have both? I'm unsure...

pierrenabat · 2025-08-01T13:35:01Z

@larsbuntemeyer thanks for creating the table.
The variables with ag="cumulative" can have a cell_methods equal to "area: time: mean". These variables are equivalent to fluxes, such as existing variables like precipitation, evapotranspiration or radiation fluxes.
For the pressure levels, if it is not possible to have the different levels in the same file, you can split them in individual datasets if you prefer.
I will also complete the missing information for the variables concerned.
Thanks !

larsbuntemeyer · 2025-08-01T14:12:09Z

variables with ag="cumulative" can have a cell_methods equal to "area: time: mean"

~~Alright, i'll update that.~~
Edit: No update required, see also WCRP-CORDEX/cordex-cmip6-data-request#23

For the pressure levels, if it is not possible to have the different levels in the same file, you can split them in individual datasets if you prefer.

It should be possible, however, not consistent with the default data request. I think we need more opinions on this.

jesusff · 2025-08-08T10:48:30Z

I've added a separate discussion on the model levels issue in #76

For the moment, I'd leave this aerosol request as is now, with 3D variables including the vertical dimension. @pierrenabat, how standard is the set of levels you propose here?

larsbuntemeyer · 2025-08-13T11:09:51Z

Ok, we can keep 3D variables and i will add a coordinate. However, we still have to decide about the invalud standard names, see #34 (comment)

That is about half the variables that have invalid standard names, should we remove them for now?

Update aerosol standard_names for CF compliance

Update aerosol data request

larsbuntemeyer added 2 commits July 29, 2025 13:40

initial aerosol table

b161cf3

added ag column

9c74bcf

added positive attribute

f110e21

larsbuntemeyer self-assigned this Aug 1, 2025

jesusff mentioned this pull request Aug 8, 2025

Open the data request to 3D (spatial) fields in a single file? #76

Open

pierrenabat and others added 3 commits December 4, 2025 10:49

Add files via upload

895fcad

Update aerosol standard_names for CF compliance

Add missing fields (units and Tier2 variables)

1b739ab

Merge pull request #84 from pierrenabat/aerosol

fc40e48

Update aerosol data request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aerosol table#74

aerosol table#74
larsbuntemeyer wants to merge 6 commits intomainfrom
aerosol

larsbuntemeyer commented Jul 29, 2025 •

edited

Loading

Uh oh!

larsbuntemeyer commented Jul 29, 2025 •

edited

Loading

Uh oh!

larsbuntemeyer commented Jul 29, 2025

Uh oh!

pierrenabat commented Aug 1, 2025

Uh oh!

larsbuntemeyer commented Aug 1, 2025 •

edited

Loading

Uh oh!

jesusff commented Aug 8, 2025

Uh oh!

larsbuntemeyer commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

larsbuntemeyer commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larsbuntemeyer commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larsbuntemeyer commented Jul 29, 2025

Uh oh!

pierrenabat commented Aug 1, 2025

Uh oh!

larsbuntemeyer commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jesusff commented Aug 8, 2025

Uh oh!

larsbuntemeyer commented Aug 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

larsbuntemeyer commented Jul 29, 2025 •

edited

Loading

larsbuntemeyer commented Jul 29, 2025 •

edited

Loading

larsbuntemeyer commented Aug 1, 2025 •

edited

Loading