-
Notifications
You must be signed in to change notification settings - Fork 2
Home
For the past two years, I have been slowly improving the features of GPUProfiler for Windows environments and often heard requests for a tool like that for Linux, or even a monitor to give users greater and immediate insight into what GPU features are doing. I wanted a tool like this and so I just started making it.
Display that utilization of GPU resources for all NVIDIA GPUs detected in a bare-metal machine, some hypervisor hosts (XenServer, RHEL RHV, ESXi) or virtual machines* (*see limitations)
UI Example - Graphics view mode

Inferencing on a Ubuntu vGPU VM

| Key | Alternate | Command | Feature | State |
|---|---|---|---|---|
| F1 | h | Help | Display | Enabled |
| c | Compute view | Display | Enabled | |
| g | Graphics view | Display | Enabled | |
| F2 | Setup | Configuration | Future use | |
| F3 | Search | Display | Future use | |
| F4 | Filter | Display | Future use | |
| F5 | Start | Profile | Future use | |
| F6 | Stop | Profile | Future use | |
| F10 | q | Exit | Operation | Enabled |
ngputop currently has two view modes, "compute" and "graphics". Each view displays more or less data based on the typical utilization metrics each use case is interested in observing.

Compute mode displays for each detected NVIDIA GPU the following 'gauges':
| Abbreviation | Meaning |
|---|---|
| SM | Shader-Module utilization |
| FB | Frame buffer utilization |
| CL | SM Clock |
| PW | Power consumption |
| TP | GPU temperature in Celsius |
| FN | Fan speed (% of maximum) |

The Graphics mode displays for each detected NVIDIA GPU the following 'gauges'
| Abbreviation | Meaning |
|---|---|
| SM | Shader-Module utilization |
| FB | Frame buffer utilization |
| MC | Memory controller utilization |
| EN | Encoder utilization |
| DE | Decoder utilization |
| PW | Power consumption |
| TP | GPU temperature in Celsius |
| NVML Function | Purpose | nvidia-smi query equivalent |
|---|---|---|
| nvmlDeviceGetUtilizationRates | Get GPU utilization |
utilization.gpu utilization.memory |
| nvmlDeviceGetTemperatureThreshold | Get the temperature thresholds | |
| nvmlDeviceGetTemperature | Get the temperature | temperature.gpu |
| nvmlDeviceGetProcessUtilization | Get GPU utilization per process | using the query "--query-compute-apps=pid,name,used_memory" |
| nvmlDeviceGetPowerUsage | Get the current power usage | power.draw |
| nvmlDeviceGetPowerManagementLimit | Get the power thresholds |
power.limit power.min_limit power.max_limit |
| nvmlDeviceGetPciInfo_v2 | Get the PCI BUS details | pci.bus_id |
| nvmlDeviceGetName | Get the GPU product name | name |
| nvmlDeviceGetMemoryInfo | Get the frame buffer thresholds |
memory.total memory.used memory.free |
| nvmlDeviceGetMaxClockInfo | Get the Clock thresholds | clocks.max.sm |
| nvmlDeviceGetHandleByIndex | Get a "Handle" for each GPU | |
| nvmlDeviceGetFanSpeed | Get the fan speed | fan.speed |
| nvmlDeviceGetEncoderUtilization | Get the encoder utilization | |
| nvmlDeviceGetDecoderUtilization | Get the decoder utilization | |
| nvmlDeviceGetCount | Get number of GPUs | count |
| nvmlDeviceGetClockInfo | Get the current clock data |
clocks.current.graphics clocks.current_sm |