Skip to content

Conversation

@priyankakinij
Copy link
Collaborator

This PR adds code for additional validation of IB usage for the multi-node distributed pytorch test case through the counters.

  1. Read the hw counters before the test
  2. Run the sbatch which runs the distributed pytorch test
  3. Read the rdma tx/rx hw counters after the test
  4. Validate the rdma counters and display the result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant