Skip to content

Conversation

@jwhite242
Copy link
Collaborator

Adds additional control on how resource specs are attached to both batch jobs and launchers separately. Enables use of more dynamic resource configuration such as the gpumode introduced with flux for modern AMD machines which can change the number of logical gpus after job scheduling time. Pre 1.1.12 behavior attaches tasks to the jobspec in flux, which can result in 'unsatisfiable' job errors due to there not being sufficient logical gpu's at jobspec validation time to fulfill the job request. This case requires not binding the tasks to the jobspec, but still binding the tasks to the $(LAUNCHER) generated flux run.

This initial version adds support for flux only; slurm/lsf/etc handling to follow in subsequent release (will be no-op everywhere but flux scheduled jobs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants