Docs / Performance Optimization / Profile Performance with perf and FlameGraphs

Profile Performance with perf and FlameGraphs

By Admin · Mar 15, 2026 · Updated Apr 23, 2026 · 368 views · 5 min read

Performance profiling is the foundation of meaningful optimization. Rather than guessing where bottlenecks exist, Linux perf combined with Brendan Gregg's FlameGraphs provides a visual, data-driven approach to understanding exactly where your application spends CPU time. This guide covers installing and using perf, generating FlameGraphs, and interpreting results to fix real performance issues.

Installing perf and FlameGraph Tools

The perf tool ships with the Linux kernel tools package. Install it alongside the FlameGraph scripts:

# Ubuntu/Debian
sudo apt install linux-tools-common linux-tools-$(uname -r) linux-tools-generic

# RHEL/AlmaLinux
sudo dnf install perf

# Clone FlameGraph repository
cd /opt
sudo git clone https://github.com/brendangregg/FlameGraph.git
sudo chmod +x /opt/FlameGraph/*.pl

Verify the installation by checking versions:

perf --version
# perf version 6.x.x

ls /opt/FlameGraph/stackcollapse-perf.pl
# Should exist

Basic CPU Profiling with perf

The most common use case is CPU profiling to find which functions consume the most processor time:

# Profile the entire system for 30 seconds
sudo perf record -ag -F 99 -- sleep 30

# Profile a specific process
sudo perf record -g -F 99 -p $(pgrep -f "your-app") -- sleep 30

# Profile a specific command
sudo perf record -g -F 99 -- ./your-application --args

Key flags explained:

  • -g: Capture call graphs (stack traces) — essential for FlameGraphs
  • -F 99: Sample at 99 Hz (avoids lockstep sampling artifacts at round numbers like 100)
  • -a: Profile all CPUs system-wide
  • -p PID: Profile a specific process

Reading perf report

After recording, examine results with perf report:

# Interactive TUI report
sudo perf report

# Text output sorted by overhead
sudo perf report --stdio --sort=comm,dso,symbol | head -50

Generating FlameGraphs

FlameGraphs transform perf stack traces into interactive SVG visualizations. The width of each box represents the proportion of CPU time spent in that function:

# Step 1: Record with perf
sudo perf record -ag -F 99 -- sleep 60

# Step 2: Generate the folded stack traces
sudo perf script | /opt/FlameGraph/stackcollapse-perf.pl > /tmp/out.folded

# Step 3: Generate the SVG
/opt/FlameGraph/flamegraph.pl /tmp/out.folded > /var/www/html/flamegraph.svg

# One-liner
sudo perf script | /opt/FlameGraph/stackcollapse-perf.pl | /opt/FlameGraph/flamegraph.pl > flamegraph.svg

Interpreting FlameGraphs

Understanding how to read a FlameGraph is critical:

  • X-axis: represents the proportion of total samples (wider = more CPU time). It is NOT a timeline.
  • Y-axis: shows stack depth — the bottom is the entry point, the top is the leaf function actually executing.
  • Plateaus: wide flat tops indicate functions that directly consume CPU (optimization targets).
  • Towers: narrow deep stacks suggest deep call chains but minimal CPU time per function.
  • Color: random warm colors by default; no semantic meaning unless customized.

Advanced Profiling Techniques

Off-CPU Analysis

Sometimes applications are slow not because of CPU usage but because they are waiting (I/O, locks, sleep). Off-CPU FlameGraphs reveal these waits:

# Record scheduler events
sudo perf record -e sched:sched_switch -a -g -- sleep 30

# Generate off-CPU FlameGraph
sudo perf script | /opt/FlameGraph/stackcollapse-perf.pl > /tmp/offcpu.folded
/opt/FlameGraph/flamegraph.pl --color=io --title="Off-CPU Time" /tmp/offcpu.folded > offcpu.svg

Memory Allocation Profiling

# Track memory allocations
sudo perf record -e kmem:kmalloc -a -g -- sleep 15
sudo perf script | /opt/FlameGraph/stackcollapse-perf.pl | \
    /opt/FlameGraph/flamegraph.pl --title="Memory Allocations" > malloc.svg

Differential FlameGraphs

Compare performance before and after a change:

# Before the change
sudo perf record -ag -F 99 -o perf-before.data -- sleep 30
sudo perf script -i perf-before.data | /opt/FlameGraph/stackcollapse-perf.pl > before.folded

# After the change
sudo perf record -ag -F 99 -o perf-after.data -- sleep 30
sudo perf script -i perf-after.data | /opt/FlameGraph/stackcollapse-perf.pl > after.folded

# Generate diff FlameGraph (red = regression, blue = improvement)
/opt/FlameGraph/difffolded.pl before.folded after.folded | \
    /opt/FlameGraph/flamegraph.pl > diff.svg

Profiling Specific Runtimes

Node.js with perf

# Run Node.js with perf map support
node --perf-basic-prof your-app.js &

# The above creates /tmp/perf-.map for symbol resolution
sudo perf record -g -F 99 -p $(pgrep -f "your-app") -- sleep 30
sudo perf script | /opt/FlameGraph/stackcollapse-perf.pl | /opt/FlameGraph/flamegraph.pl > node-flame.svg

Java/JVM with perf

# Use async-profiler for JVM (better than perf for Java)
# Download from https://github.com/async-profiler/async-profiler
./asprof -d 30 -f flamegraph.html 

Python with py-spy

# py-spy generates FlameGraphs directly
pip install py-spy
sudo py-spy record -o profile.svg --pid $(pgrep -f "python app.py")

Automating Performance Baselines

Create a script to capture regular performance snapshots:

#!/bin/bash
# /usr/local/bin/perf-snapshot.sh
OUTDIR="/var/log/perf-snapshots"
mkdir -p "$OUTDIR"
DATE=$(date +%Y%m%d-%H%M%S)

sudo perf record -ag -F 99 -o "$OUTDIR/perf-$DATE.data" -- sleep 60
sudo perf script -i "$OUTDIR/perf-$DATE.data" | \
    /opt/FlameGraph/stackcollapse-perf.pl | \
    /opt/FlameGraph/flamegraph.pl > "$OUTDIR/flamegraph-$DATE.svg"

# Clean up raw data older than 7 days
find "$OUTDIR" -name "perf-*.data" -mtime +7 -delete

Common Pitfalls

  • Missing symbols: Install debug symbols (-dbgsym or -debuginfo packages) for meaningful function names.
  • Kernel symbols: Run sudo sysctl kernel.kptr_restrict=0 to see kernel function names.
  • Frame pointers: Compile applications with -fno-omit-frame-pointer for accurate stack traces. Many modern distros now enable this by default.
  • Sampling bias: Use 99 Hz (not 100 Hz) to avoid synchronization artifacts with periodic timers.
  • Short profiles: Profile for at least 30-60 seconds under realistic load to get statistically significant samples.

Summary

The perf + FlameGraph combination is the gold standard for performance analysis on Linux. Start with a CPU FlameGraph to identify hot spots, use off-CPU analysis for latency issues, and leverage differential FlameGraphs to validate your optimizations. Always profile under realistic production-like workloads, and make profiling a regular part of your deployment pipeline rather than a one-time debugging exercise.

Was this article helpful?