-
Notifications
You must be signed in to change notification settings - Fork 8
Add tool usage scenarios #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
harp-intel
wants to merge
1
commit into
main
Choose a base branch
from
tool-chooser
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+140
−6
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,17 +2,151 @@ | |
|
|
||
| This directory contains documentation for performance monitoring and profiling tools used in optimization work. | ||
|
|
||
| ## Tool Summaries | ||
| ## Tool Reference | ||
|
|
||
| **Intel® gProfiler** - A system-wide profiler that combines multiple sampling profilers to visualize CPU usage across native programs, Java, Python runtimes, and kernel routines. It also includes Intel® gProfiler Performance Studio, a self-hosted solution for aggregating results from multiple instances. | ||
| ### Intel® gProfiler | ||
|
|
||
| **Intel® Performance Counter Monitor (PCM)** - An API and toolset for monitoring performance and energy metrics of Intel processors. Provides real-time monitoring of key metrics including memory bandwidth, cache miss latencies, PCIe bandwidth, and energy states. Available on Linux, Windows, macOS, FreeBSD, and ChromeOS. | ||
| System-wide profiler combining multiple sampling profilers across native programs, Java, Python runtimes, and kernel routines. Includes optional gProfiler Performance Studio for cluster-wide aggregation. | ||
|
|
||
| **Intel® PerfSpect** - A comprehensive performance engineering toolkit that monitors CPU metrics, reports system configuration and health, collects system telemetry, generates flamegraphs from call-stacks, and modifies performance-related configuration settings. | ||
| 📊 **Best for:** Production monitoring, multi-language environments, cluster analysis, low-overhead continuous profiling | ||
|
|
||
| **Intel® VTune™ Profiler** - An optimization tool for application and system performance analysis across AI, HPC, cloud, IoT, and storage workloads. Capabilities include identifying microarchitecture and memory bottlenecks, optimizing accelerators, analyzing parallelism, and multi-node analysis. | ||
| ### Intel® Performance Counter Monitor (PCM) | ||
|
|
||
| ## Individual Tool Documentation | ||
| API and toolset for monitoring performance and energy metrics of Intel processors including memory bandwidth, cache behavior, PCIe bandwidth, and energy states. | ||
|
|
||
| 📊 **Best for:** Hardware-level metrics, memory analysis, power consumption, real-time dashboards | ||
|
|
||
| ### Intel® PerfSpect | ||
|
|
||
| **Easy to install and use.** Comprehensive performance engineering toolkit for system health reporting, configuration analysis, flamegraph generation, telemetry collection, and tuning parameter modification. Provides quick insights across multiple dimensions without the learning curve or deep complexity of other tools. | ||
|
|
||
| 📊 **Best for:** System assessment, configuration validation, quick troubleshooting, health checks, getting started with performance analysis | ||
|
|
||
| ⚡ **Key advantage:** Accessibility and speed of use, though with less depth than specialized tools | ||
|
|
||
| ### Intel® VTune™ Profiler | ||
|
|
||
| In-depth application and system profiler with microarchitecture analysis, parallelism examination, multi-node analysis, and GPU/accelerator optimization capabilities. | ||
|
|
||
| 📊 **Best for:** Deep application optimization, microarchitecture analysis, GPU optimization, HPC workloads, complex debugging | ||
|
|
||
| ## Choosing the Right Tool | ||
|
|
||
| Start with your primary goal or problem, then follow the decision path to find the best tool(s). | ||
|
|
||
| ### START: What is your primary goal? | ||
|
|
||
| #### **"I need a quick system assessment" (Easy start)** | ||
|
|
||
| → **Use: PerfSpect** ⭐ Easiest to install and use | ||
|
|
||
| - Validating system configuration before performance testing | ||
| - Getting a health check and performance baseline | ||
| - Quick automated system tuning recommendations | ||
| - Pre-flight checks before running benchmarks | ||
| - Understanding current system telemetry and state | ||
| - **Start here if you're new to performance analysis** – no steep learning curve | ||
|
|
||
| --- | ||
|
|
||
| #### **"My application/workload is slow - I need to find where time is spent"** | ||
|
|
||
| **→ Do you need to analyze multiple languages or continuous production monitoring?** | ||
|
|
||
| - **YES (multi-language or production monitoring)** → **Use: gProfiler** | ||
| - Multi-language environments (native, Java, Python) requiring unified profiling | ||
| - Finding performance bottlenecks in microservices architectures | ||
| - Analyzing resource utilization across production systems with low overhead | ||
| - Identifying hot functions and stack traces without code instrumentation | ||
| - Compare performance patterns across multiple machines over time | ||
|
|
||
| - **NO (single application, development/testing)** → **Use: VTune** | ||
| - Optimizing algorithm efficiency by identifying instruction-level bottlenecks | ||
| - Deep investigation with detailed performance metrics | ||
| - Identifying specific microarchitecture bottlenecks (stalls, cache misses) | ||
| - Complex performance investigations requiring advanced visualization | ||
|
|
||
| --- | ||
|
|
||
| #### **"I'm analyzing/optimizing distributed systems at scale"** | ||
|
|
||
| **→ Do you need to aggregate data from multiple machines?** | ||
|
|
||
| - **YES** → **Use: gProfiler + gProfiler Performance Studio** | ||
| - Cluster-wide performance analysis | ||
| - Comparing performance patterns across multiple machines or time periods | ||
| - Holistic view of what is happening on your entire cluster | ||
|
|
||
| - **NO (single machine analysis)** → **Use: gProfiler or VTune** (based on depth needed) | ||
|
|
||
| --- | ||
|
|
||
| #### **"I'm experiencing memory or bandwidth issues"** | ||
|
|
||
| **→ Are you investigating processor-level metrics?** | ||
|
|
||
| - **YES** → **Use: PCM** | ||
| - Analyzing memory bandwidth utilization and DRAM behavior | ||
| - Identifying memory bandwidth bottlenecks in data-intensive workloads | ||
| - Detecting inefficient cache usage patterns | ||
| - Monitoring cache miss latencies and PCIe bandwidth | ||
| - Detailed microarchitecture analysis (cache efficiency, memory stalls) | ||
| - Real-time system performance dashboards | ||
|
|
||
| - **NO (need application-level insights)** → **Use: VTune** | ||
| - Identify which parts of code are causing memory issues | ||
| - Detailed cache miss analysis at the instruction level | ||
|
|
||
| --- | ||
|
|
||
| #### **"My parallel/multi-threaded application doesn't scale"** | ||
|
|
||
| → **Use: VTune** | ||
|
|
||
| - Analyzing multi-threaded parallelism and scalability issues | ||
| - Debugging poor thread scaling in parallel applications | ||
| - Examining how effectively threads are utilized | ||
|
|
||
| --- | ||
|
|
||
| #### **"I need to optimize GPU or accelerators"** | ||
|
|
||
| → **Use: VTune** | ||
|
|
||
| - GPU/accelerator optimization and analysis | ||
| - Analyzing GPU utilization and accelerator integration | ||
| - Multi-node cluster performance analysis for HPC applications | ||
| - AI/ML workload optimization and profiling | ||
|
|
||
| --- | ||
|
|
||
| #### **"I need to monitor power consumption or energy efficiency"** | ||
|
|
||
| → **Use: PCM** | ||
|
|
||
| - Tracking energy consumption and CPU sleep states | ||
| - Power consumption analysis for cloud deployments | ||
| - Integration with monitoring systems like Prometheus for continuous tracking | ||
|
|
||
| --- | ||
|
|
||
| #### **"I need to visualize call stacks and hot code paths"** | ||
|
|
||
| **→ Do you want quick, shallow analysis or deep investigation?** | ||
|
|
||
| - **Quick and easy** → **Use: PerfSpect** | ||
| - Generating flamegraphs for visualization of call stacks | ||
| - Quick visualization of application hot paths | ||
| - Simple setup and immediate insights | ||
|
|
||
| - **Production-scale or deep analysis** → **Use: gProfiler** | ||
| - System-wide flamegraphs across all processes | ||
| - Continuous profiling with minimal overhead | ||
| - More sophisticated analysis capabilities | ||
|
|
||
| --- | ||
|
|
||
| ## More Information | ||
|
|
||
| - [gProfiler](gprofiler/README.md) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think links should be moved to the top - maybe even incorporated to sections about tools. |
||
| - [PCM](pcm/README.md) | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be helpful to add 1-2 popular analysis tools like linux perf and in what scenarios they might be useful