Skip to content

[WIP] Nexus: Enhanced simulations with error handling, branching and looping #5673

Open
kayahans wants to merge 2 commits intoQMCPACK:developfrom
kayahans:feature/enhanced_simulations
Open

[WIP] Nexus: Enhanced simulations with error handling, branching and looping #5673
kayahans wants to merge 2 commits intoQMCPACK:developfrom
kayahans:feature/enhanced_simulations

Conversation

@kayahans
Copy link
Contributor

Proposed changes

This PR introduces new capabilities in Nexus for simulation error handling, branching, and looping workflows. The implementation adds EnhancedSimulation and EnhancedProjectManager classes that extend the existing simulation framework with these advanced workflow features.

The new functionality is opt-in only and backward compatible: existing Nexus scripts continue to work unchanged. The enhanced capabilities are only activated when simulations are explicitly converted to EnhancedSimulation instances using the make_enhanced() wrapper function.

Examples in nexus/examples/quantum_espresso/ (directories 03-08) demonstrate the new features:

  • 03_machine: Machine-specific execution and dependencies
  • 04_loop: Loop-enabled simulations with iteration control
  • 05_conditionals: Conditional execution utilities
  • 06_branching: Branching workflows with create_branch()
  • 07_merging: Merging strategies for parallel branches
  • 08_error_handlers: Error handling with automatic retry and input modification

Note: The examples are designed for demonstration purposes. Error handlers may not be necessary for these simple test cases, but they illustrate the error handling capabilities available for more complex workflows.

This PR is shared for collaboration and discussion to gather feedback on the implementation approach and identify potential improvements.

What type(s) of changes does this code introduce?

  • New feature
  • Documentation or build script changes

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

Local development environment

Checklist

  • I have read the pull request guidance and develop docs
  • This PR is up to date with the current state of 'develop'
  • Code added or changed in the PR has been clang-formatted
  • This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • Documentation has been added (if appropriate)

@prckent
Copy link
Contributor

prckent commented Nov 20, 2025

Good to see! Q. Can these simply replace the current classes? We learned from QMCPACK C++ that having 2 or more of anything is a bad idea due to maintenance costs and challenges for contributors. e.g. Driver.cpp, DriverEnhanced.cpp, DriverEnhancedNew2.cpp etc.

@kayahans
Copy link
Contributor Author

kayahans commented Nov 21, 2025

@prckent they inherit from the current classes so should be easy to replace them. However, I thought it is better to use the current kind of implementation so that testing would be easier for now without breaking anything.

Examples I provided work with —status-only and —generate-only options. I am open to ideas for trying a bunch of new test cases where we can better see how the error recovery will work.

@jtkrogel
Copy link
Contributor

Very interesting Kayahan. I will have to look this over closely. I had considered this type of functionality a long time ago but never had time to pursue it. Moving from DAG to DG would be a big step forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants