Skip to main content

Check CPU/thread usage for a node in the Slurm job manager [Resolved]

I am working on a cluster machine that uses the Slurm job manager. I just started a multithreaded code and I would like to check the core and thread usage for a given node ID. For example,

scoreusage -N 92512

were "scoreusage" is the command that I am unsure of.


Question Credit: Austin Downey
Question Reference
Asked September 19, 2019
Posted Under: Unix Linux
15 views
2 Answers

It's been a few years since I ran a slurm cluster, but squeue should give you what you want. Try:

squeue --nodelist 92512 -o "%A %j %C %J"

(that should give your jobid, jobname, cpus, and threads for your jobs on node 92512)

BTW, unless you specifically only want details from one particular node, you might be better off searching by job id rather than node id.

There are a lot of good sites with documentation on using slurm available on the web, easily found via google - most universities etc running an HPC cluster write their own docs and help and "cheat-sheets", customised to the details of their specific cluster(s) (so take that into account and adapt any examples to YOUR cluster). There's also good generic documentation on using slurm at https://slurm.schedmd.com/documentation.html


credit: cas
Answered September 19, 2019

I find the built-in SLURM tools very basic. Instead, you can use something like htop, to monitor the (running) job in real time.

  1. Find which node the job is running on:
$ scontrol show job $JOB_ID | grep ' NodeList'
   NodeList=<HOSTNAME>
  1. ssh into the node: $ ssh <HOSTNAME>
  2. Run the monitoring program as required, e.g. $ htop

credit: Sparhawk
Answered September 19, 2019
Your Answer