Unify lookups and simplify smi shutdown #1069
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Existing code had few issues in handling various lists and lookup tables for GPU information. Details of PGU like kfd ids, pcie id, location info, node index et were split into multiple lists. To update one and miss any other would result in inconsitency and usage issues.
To avoid this, a unified GpuInfo object will hold all information and a list of these objects will give static info of all devices in the system. This keeps information encapsulated .
Along with this, removed find() call in lookups, which adds significant overhead for multi GPU setups(O(n) runtime . Instead a lookup hashmap which is constructed once and for every lookup, takes O(1) time, results in significant run time improvement when we hae multi GPU systems.
This file now has a single place init and shutdown for amdsmi library, instead of scattering shutdown to multiple module specific rvs_module.cpp files.
Added further tests to verify:

./bintest/unit.rvs.gpu_util