Skip to content

26 create data pipeline for standard wrf hydro scenario creation#362

Open
Martin20494 wants to merge 10 commits intomasterfrom
26-create-data-pipeline-for-standard-wrf-hydro-scenario-creation
Open

26 create data pipeline for standard wrf hydro scenario creation#362
Martin20494 wants to merge 10 commits intomasterfrom
26-create-data-pipeline-for-standard-wrf-hydro-scenario-creation

Conversation

@Martin20494
Copy link
Contributor

DESCRIPTION OF PR:

Develop automatic process to simulate streamflow data using WRF-Hydro open source.

Developer Checklist

  • Make code change
  • Update tests
    • Update / create new tests
    • Ensure these tests have the expected behaviour
    • Test locally and ensure tests are passing
  • Update documentation
    • Readme
    • Docstrings
    • Comments
    • Wiki

Reviewer Checklist

  • Check new code for code smells
  • Check new tests
    • Ensure adequate coverage
    • Check for code smells within tests
  • Check if documentation needs updating
    • Readme
    • Docstrings
    • Comments
    • Wiki

LukeParky and others added 7 commits May 5, 2025 15:41
- Add more packages into the environment.yml
- Add necessary python modules from WRF-Hydro for preprocessing step
- Create a script to run all python modules
- Add three files including three classes: one is to generate domain files, one is to generate routing stack, and the other one is to generate forcing data from ERA5. The final class to run WRF-Hydro simulation will be added later.
- Add more files:
  * WRFHydroSimulation.py is to run simulations
  * simulatedResultManipulation.py is to flip and reproject streamflow. This file is potentially for manipulating more results
  * RUN_all.py is to run whole process to run WRF-Hydro simulation
# A Window path where geogrid folder, create_soilproperties.R, create_wrfinput.R,
# CHANPARM.TBL, GENPARM.TBL, HYDRO.TBL, MPTABLE.TBL, and SOILPARM.TBL are located
# [CHANGE INTO DIGITAL TWIN PATH - THIS SHOULD BE FIXED]
self.window_geogrid_path = r"C:\Users\mng42\wrf_wps\wrf_hydro_inputs\simulation_mataura_50m_002\geogrid"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This folder should be accessible to the digital twin in some way

Martin Nguyen added 2 commits August 11, 2025 14:04
- run_all.py has not been adjusted yet
- Silent pylint (needs checking):
  * wrfhydro_simulation.py: #pylint: disable=too-many-instance-attributes, invalid-name
  * domain_files.py: #pylint: disable=too-many-instance-attributes
  * forcing_dataset.py: #pylint: disable=differing-param-doc, differing-type-doc, wrong-import-position
- forcing_dataset.py needs to fix longitude
- Add more packages in enviroment
Comment on lines 9 to 10
os.environ["ESMFMKFILE"] = r"C:\Users\mng42\AppData\Local\anaconda3\envs\wrfhydro_005\Library\lib\esmf.mk"
from datetime import datetime, timedelta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luke to come back and help remove this environ check

Comment on lines +30 to +32
Definition:
Init function to state common arguments
References:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match the style from the main codebase (numpy style docs)

Copy link
Member

@LukeParky LukeParky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good code, I think we should create another issue about using WSL/Linux in a more native way. My main points of improvement I want to see before merging are around using the type system more where it can benefit you and help others understand.

Also there are still linting errors and differences between the documentation style.
Would you like to handle the documentation style in a different PR?

Feel free to comment on my suggestions if you need more info or don't agree with any.


return selected_as_rectangle_2193

return selected_as_rectangle_2193
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this might be an accidental change, set it back.
Adding a suggestion that you can click on to revert it if it's accidental.

Suggested change
return selected_as_rectangle_2193
return selected_as_rectangle_2193

- conda-forge
- defaults
dependencies:
- python==3.11 # 11 for StrEnum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of dependencies, so I'll come back and check these. Do you think this can be merged with the main environment.yml so we only have one?

Comment on lines +34 to +45
- cdsapi # For WRF-Hydro - download ERA5 files
- xesmf # For WRF-Hydro - regrid
- esmpy # For WRF-Hydro - regrid
- esmf # For WRF-Hydro - regrid
- xarray # For WRF-Hydro - handle geospatial array data
- netCDF4 # For WRF-Hydro - handle geospatial array data
- rioxarray # For WRF-Hydro - handle geospatial array data
- matplotlib # For WRF-Hydro - plot results (MAYBE)
- numpy # For WRF-Hydro - handle array data
- pandas=1.5.3 # For WRF-Hydro - handle tabular data
- hdf5 # For WRF-Hydro - handle geospatial array data
- gdal=3.6.2 # For WRF-Hydro - handle geospatial array data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for self: come back and check dependencies

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an example directory? Can this all be moved into other folders depending on their use? And then it will evolve to be the final production code

@@ -0,0 +1,367 @@
"""Runs to generate domain files which are inputs for WRF-Hydro"""
# pylint: disable=too-many-instance-attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be used on the class it's needed for, not the whole file

Comment on lines +34 to +42
CHRTOUT_DOMAIN: int = 1,
CHANOBS_DOMAIN: int = 1,
CHRTOUT_GRID: int = 1,
LSMOUT_DOMAIN: int = 1,
RTOUT_DOMAIN: int = 1,
output_gw: int = 1,
outlake: int = 0,
frxst_pts_out: int = 1
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these parameters be bool, so that we can use the rich python typing structure, and only convert to int when you're writing the file?

Comment on lines +255 to +262
content = f"""&HYDRO_nlist
!!!! ---------------------- SYSTEM COUPLING ----------------------- !!!!

! Specify what is being coupled: 1=HRLDAS (offline Noah-LSM), 2=WRF, 3=NASA/LIS, 4=CLM
sys_cpl = 1

!!!! ------------------- MODEL INPUT DATA FILES ------------------- !!!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a lot of these large content files could be stored as string template files that are read from disk, what are your thoughts?

Comment on lines +466 to +469
for each_file_path in selected_all_files_paths:
shutil.copy2(
str(Path(each_file_path)),
str(Path(self.domain_path) / Path(each_file_path).name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See recommendations earlier about using Path without converting to str

Comment on lines +478 to +479
Arguments:
Already defined above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Arguments:
Already defined above.

No use having that line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks repeated in other places too

Already defined above.
"""
# Define command notes and log name
command_notes = r"mpirun -np 2 ./wrf_hydro_NoahMP.exe >> run.log 2>&1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you experimented with higher values of -np? I think it is the multithreading number of processes variable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants