-
Notifications
You must be signed in to change notification settings - Fork 49
Open
Description
Hi,
I tried to use the dynamique partition. I could add the different types of nodes but the jobs where dying quickly. The error was
slurmstepd: error: common_file_write_uints: write value '39981' to '/sys/fs/cgroup/cpuset/slurm/uid_20006/job_6457/step_batch/cgroup.procs' failed: No space left on device
slurmstepd: error: unable to add pids to '/sys/fs/cgroup/cpuset/slurm/uid_20006/job_6457/step_batch'
slurmstepd: error: task_g_set_affinity: File exists
slurmstepd: error: _exec_wait_child_wait_for_parent: failed: Interrupted system call
slurmstepd: error: job_manager: exiting abnormally: Slurmd could not execve job
When I modified the slurm config to only use task/affinity in the TaskPlugins, any jobs could run. The hpc and the htc partition do not have this problem.
This is how I was creating the node sets:
scontrol create nodename=ukdri-cluster2-dyn4-[1-10] Feature=dyn,Standard_F4s_V2 cpus=4 State=CLOUD RealMemory=7782
scontrol create nodename=ukdri-cluster2-dyn4-[1-3] Feature=dyn,StandardF48s_V2 cpus=48 State=CLOUD RealMemory=93388
Thanks,
Thibaut
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels