Skip to content

Add studies for storage estimate #1

@jesusff

Description

@jesusff

The framework is in place to estimate the internal storage required to keep the variables needed for initial Med-CORDEX community studies for Phase 3. Currently, the internal data request file contains all atmospheric and ocean variables. Study keywords (space separated) should replace the current priorities. The code/storage_estimate.py script currently shows the priorities as if they were studies:

$ python code/storage_estimate.py eval hist ssp
Considering experiments: ['eval', 'hist', 'ssp']
Total CORE                  study estimated data request size is:   31.213 TB
Total TIER1                 study estimated data request size is:  105.730 TB
Total TIER2                 study estimated data request size is:   44.042 TB
Total ALL-STUDIES           study estimated data request size is:  180.986 TB

Note that here we are restricting the estimate to the simulations with status "completed" or "running" (i.e. planned simulations are excluded). All evaluation, historical and scenario simulations are included.

A more verbose output to check for potential problems in the details:

ic| simulation_count: experiment  evaluation  historical  ssp126  ssp245  ssp370  ssp585
                      domain                                                            
                      MED-12               5           5       1       1       2       3
                      MED-25               1           0       0       0       0       0
ic| studies: {'TIER2', 'CORE', 'TIER1'}
Considering experiments: ['eval', 'hist', 'ssp']

ic| variable_count: priority   CORE
                    frequency      
                    1hr          13
                    day          19
                    fx            7
                    mon          23
ic| freq_factor: Index([8760, 365, 0, 12], dtype='int64', name='frequency')
ic| variable_records_per_yr: priority     CORE
                             frequency        
                             1hr        113880
                             day          6935
                             fx              0
                             mon           276
ic| nrecords_factor: 121091
ic| exp_factor: Index([41, 54, 86, 86, 86, 86], dtype='int64', name='experiment')
ic| size_TB: experiment  evaluation  historical  ssp126  ssp245  ssp370  ssp585
             domain                                                            
             MED-12           5.885       7.751   2.469   2.469   4.938   7.407
             MED-25           0.294       0.000   0.000   0.000   0.000   0.000
ic| size_TB.T.sum(): domain
                     MED-12    30.919
                     MED-25     0.294
                     dtype: float64
Total CORE                  study estimated data request size is:   31.213 TB

ic| variable_count: priority   TIER1
                    frequency       
                    1hr           30
                    6hr           71
                    day          116
                    mon          115
ic| freq_factor: Index([8760, 1460, 365, 12], dtype='int64', name='frequency')
ic| variable_records_per_yr: priority    TIER1
                             frequency        
                             1hr        262800
                             6hr        103660
                             day         42340
                             mon          1380
ic| nrecords_factor: 410180
ic| exp_factor: Index([41, 54, 86, 86, 86, 86], dtype='int64', name='experiment')
ic| size_TB: experiment  evaluation  historical  ssp126  ssp245  ssp370  ssp585
             domain                                                            
             MED-12          19.935      26.256   8.363   8.363  16.726   25.09
             MED-25           0.997       0.000   0.000   0.000   0.000    0.00
ic| size_TB.T.sum(): domain
                     MED-12    104.733
                     MED-25      0.997
                     dtype: float64
Total TIER1                 study estimated data request size is:  105.730 TB

ic| variable_count: priority   TIER2
                    frequency       
                    1hr            8
                    6hr           51
                    day           70
                    fx             7
                    mon           64
ic| freq_factor: Index([8760, 1460, 365, 0, 12], dtype='int64', name='frequency')
ic| variable_records_per_yr: priority   TIER2
                             frequency       
                             1hr        70080
                             6hr        74460
                             day        25550
                             fx             0
                             mon          768
ic| nrecords_factor: 170858
ic| exp_factor: Index([41, 54, 86, 86, 86, 86], dtype='int64', name='experiment')
ic| size_TB: experiment  evaluation  historical  ssp126  ssp245  ssp370  ssp585
             domain                                                            
             MED-12           8.304      10.937   3.484   3.484   6.967  10.451
             MED-25           0.415       0.000   0.000   0.000   0.000   0.000
ic| size_TB.T.sum(): domain
                     MED-12    43.627
                     MED-25     0.415
                     dtype: float64
Total TIER2                 study estimated data request size is:   44.042 TB

ic| variable_count: priority   ALL-STUDIES
                    frequency             
                    1hr                 51
                    6hr                122
                    day                205
                    fx                  14
                    mon                202
ic| freq_factor: Index([8760, 1460, 365, 0, 12], dtype='int64', name='frequency')
ic| variable_records_per_yr: priority   ALL-STUDIES
                             frequency             
                             1hr             446760
                             6hr             178120
                             day              74825
                             fx                   0
                             mon               2424
ic| nrecords_factor: 702129
ic| exp_factor: Index([41, 54, 86, 86, 86, 86], dtype='int64', name='experiment')
ic| size_TB: experiment  evaluation  historical  ssp126  ssp245  ssp370  ssp585
             domain                                                            
             MED-12          34.125      44.945  14.316  14.316  28.631  42.947
             MED-25           1.706       0.000   0.000   0.000   0.000   0.000
ic| size_TB.T.sum(): domain
                     MED-12    179.280
                     MED-25      1.706
                     dtype: float64
Total ALL-STUDIES           study estimated data request size is:  180.986 TB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions