Hyperthreading #32

itamblyn · 2020-09-10T12:24:33Z

The command top shows 32 CPU. Are trixie compute nodes dual socket, i.e. they have 2 Xeon6130 processors? If so, I don't understand why we aren't seeing 64 CPU in top (because of hyperthreading).

joeydumont · 2020-09-10T13:17:10Z

There are two sockets, with 16 CPUs each. HyperThreading is disabled.

itamblyn · 2020-09-10T13:25:24Z

...why is HyperTreading disabled?

joeydumont · 2020-09-10T13:36:09Z

Hyperthreading is typically off on HPC clusters, as it is expected that each processor is almost always fully utilized. If you oversubscribe threads to cores, you can lose performance due to the constant context switching.

Of course, it depends a lot on what the typical workload looks like. For CPU-bound workloads, hyperthreading is better left disabled. It might be different for other workloads.

itamblyn · 2020-09-10T15:12:37Z

Ok, I think this needs to be revisited. Trixie is not a "general" HPC machine - it was designed for GPU and data intensive workloads, so we should be tuning it for the purpose.

Hyperthreading should be turned back on, as I suspect we are bottlenecking the cards right now (or at least we have the potential to).

SamuelLarkin · 2020-09-10T15:14:47Z

Have you looked at nvidia-smi -l on one of your nodes because I see 4 v100 att 100% which to me indicates that there isn't a bottleneck where the GPUs are waiting for data from the CPUs.

itamblyn · 2020-09-10T15:20:56Z

That just indicates there isn't a bottleneck with that particular model.

I am not aware of any vendor who is supplying deep learning gear with hyperthreading disabled. The onus of proof is the other way here.

joeydumont · 2020-09-10T16:43:50Z

I am fairly sure this is how the cluster was first delivered. If you want to enable hyperthreading, we can queue that work for the next compute node refresh. We'll probably want to involve the working group on that decision.

My own experience is with CPU-based workloads, so I won't argue for/against hyperthreading here. We can implement whatever the working group thinks is best for the cluster.

kryczko · 2020-09-10T19:20:57Z

on the niagara supercomputer, if you request 40 CPUs you get 40 physical CPUs and the hyperthreads that go along with it (that would make 80 threads). The user can easily turn hyperthreading off on-the-fly using OMP_NUM_THREADS=1.

ddamoursNRC · 2020-09-14T13:44:00Z

Documentation update to reflect that hyperthreading is currently off.
Developing plan to test benchmark hyperthreading on and off for some of our workloads.

ddamoursNRC · 2020-12-02T01:34:17Z

Agreement has been reached to turn hyperthreading on for all compute nodes during the next scheduled maintenance window. Details will be communicated once this occurs.

itamblyn · 2022-08-04T07:13:35Z

Did this happen? Can this issue be closed?

itamblyn added the documentation Improvements or additions to documentation label Sep 10, 2020

itamblyn changed the title ~~Hardware specifications~~ Hyperthreading Sep 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperthreading #32

Hyperthreading #32

itamblyn commented Sep 10, 2020

joeydumont commented Sep 10, 2020

itamblyn commented Sep 10, 2020

joeydumont commented Sep 10, 2020

itamblyn commented Sep 10, 2020

SamuelLarkin commented Sep 10, 2020

itamblyn commented Sep 10, 2020

joeydumont commented Sep 10, 2020

kryczko commented Sep 10, 2020 •

edited

Loading

ddamoursNRC commented Sep 14, 2020

ddamoursNRC commented Dec 2, 2020

itamblyn commented Aug 4, 2022

Hyperthreading #32

Hyperthreading #32

Comments

itamblyn commented Sep 10, 2020

joeydumont commented Sep 10, 2020

itamblyn commented Sep 10, 2020

joeydumont commented Sep 10, 2020

itamblyn commented Sep 10, 2020

SamuelLarkin commented Sep 10, 2020

itamblyn commented Sep 10, 2020

joeydumont commented Sep 10, 2020

kryczko commented Sep 10, 2020 • edited Loading

ddamoursNRC commented Sep 14, 2020

ddamoursNRC commented Dec 2, 2020

itamblyn commented Aug 4, 2022

kryczko commented Sep 10, 2020 •

edited

Loading