Skip to content

The dynamique partition does not work with the TaskPlugins task/cgroup #388

@thibauthourlier

Description

@thibauthourlier

Hi,
I tried to use the dynamique partition. I could add the different types of nodes but the jobs where dying quickly. The error was

  slurmstepd: error: common_file_write_uints: write value '39981' to '/sys/fs/cgroup/cpuset/slurm/uid_20006/job_6457/step_batch/cgroup.procs' failed: No space left on device
  slurmstepd: error: unable to add pids to '/sys/fs/cgroup/cpuset/slurm/uid_20006/job_6457/step_batch'
  slurmstepd: error: task_g_set_affinity: File exists
  slurmstepd: error: _exec_wait_child_wait_for_parent: failed: Interrupted system call
  slurmstepd: error: job_manager: exiting abnormally: Slurmd could not execve job

When I modified the slurm config to only use task/affinity in the TaskPlugins, any jobs could run. The hpc and the htc partition do not have this problem.

This is how I was creating the node sets:

scontrol create nodename=ukdri-cluster2-dyn4-[1-10] Feature=dyn,Standard_F4s_V2 cpus=4 State=CLOUD RealMemory=7782
scontrol create nodename=ukdri-cluster2-dyn4-[1-3] Feature=dyn,StandardF48s_V2 cpus=48 State=CLOUD RealMemory=93388

Thanks,
Thibaut

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions