Skip to content

Conversation

@aharivel
Copy link

Add a new collector that exports SR-IOV metrics for physical functions and virtual functions. Supports VFs bound to both network drivers and vfio-pci by reading stats from the parent PF when direct VF stats are unavailable.

Exported metrics include traffic counters (rx/tx bytes, unicast, multicast, broadcast), error counters (dropped, allocation failures), and TX performance metrics. Each metric includes NUMA node information for topology-aware monitoring.

Parses per-VF statistics from Intel PF drivers (ixgbe, i40e, ice) which use different naming conventions for VF stats.

Add a new collector that exports SR-IOV metrics for physical functions
and virtual functions. Supports VFs bound to both network drivers and
vfio-pci by reading stats from the parent PF when direct VF stats are
unavailable.

Exported metrics include traffic counters (rx/tx bytes, unicast,
multicast, broadcast), error counters (dropped, allocation failures),
and TX performance metrics. Each metric includes NUMA node information
for topology-aware monitoring.

Parses per-VF statistics from Intel PF drivers (ixgbe, i40e, ice)
which use different naming conventions for VF stats.

Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Copy link
Contributor

@rjarry rjarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense and/or be simpler to query the statistics using https://github.com/vishvananda/netlink.

The stats are parsed here: https://github.com/vishvananda/netlink/blob/c6faf428e8f84dcb73774e7c77a1e4fe38bbdb4d/link_linux.go#L3991

That way, no need to deal with hardware specific ethtool extended stats.

@aharivel
Copy link
Author

I wonder if it would make sense and/or be simpler to query the statistics using https://github.com/vishvananda/netlink.

The stats are parsed here: https://github.com/vishvananda/netlink/blob/c6faf428e8f84dcb73774e7c77a1e4fe38bbdb4d/link_linux.go#L3991

That way, no need to deal with hardware specific ethtool extended stats.

The problem is when the VF is bond to vfio_pci and netlink communicate with the kernel only . That's why I used "ethtool -s" to get the metrics directly from the PF.

From what I see, it works only for mlx_5 driver. because each VF has a representator on the eswitch (e.g., eth0_0, eth0_1, or enp3s0f0_0) and when the VF is bond to vfio_pci, the representor still exists and has stats.

So it's going to be tricky to filter which VF is intel and which is mlx in order to retrieves the metrics the right way.

Any other idea ?

Replace exec.Command("ethtool", "-S", ...) with the safchain/ethtool
Go library which uses ioctl directly. This removes shell exec overhead
and provides cleaner error handling while maintaining the same
functionality.

Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants