How to Use perf for Linux Performance Analysis
The perf tool is the official Linux kernel profiler, providing hardware-level performance counters, CPU sampling, and tracing capabilities. It is invaluable for diagnosing CPU bottlenecks, cache misses, and system-level performance issues on your Breeze server.
Installing perf
# On Ubuntu/Debian
sudo apt install -y linux-tools-common linux-tools-$(uname -r)
# Verify installation
perf --version
Basic CPU Profiling
Record CPU samples for a running process or command:
# Profile a specific command
sudo perf record -g ./my-application
# Profile an already-running process by PID
sudo perf record -g -p $(pgrep nginx) -- sleep 30
# Profile the entire system for 10 seconds
sudo perf record -g -a -- sleep 10
The -g flag captures call graphs, which are essential for understanding where time is spent.
Viewing the Report
# Interactive TUI report
sudo perf report
# Text output sorted by overhead
sudo perf report --stdio --sort=overhead
# Show call graph in text mode
sudo perf report --stdio --call-graph=flat
Navigate the TUI with arrow keys. Press Enter to expand functions and see callers/callees.
Real-Time Monitoring with perf top
See which functions consume the most CPU in real time:
# System-wide live CPU profiling
sudo perf top
# Filter to a specific process
sudo perf top -p $(pgrep php-fpm)
# Show call graph
sudo perf top -g
Counting Hardware Events
Use perf stat to count specific hardware performance counters:
# Count cycles, instructions, cache misses for a command
sudo perf stat ./my-application
# Detailed statistics
sudo perf stat -d ./my-application
# Specific events
sudo perf stat -e cache-misses,cache-references,instructions,cycles ./my-application
Sample output:
Performance counter stats for './my-application':
1,234,567,890 cycles
987,654,321 instructions # 0.80 insn per cycle
5,432,100 cache-references
123,456 cache-misses # 2.27 % of all cache refs
Generating Flame Graphs
Flame graphs provide the most intuitive visualization of CPU profiles:
# Clone Brendan Gregg's flame graph tools
git clone https://github.com/brendangregg/FlameGraph.git
# Record profile data
sudo perf record -F 99 -g -a -- sleep 30
# Generate the flame graph
sudo perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > flamegraph.svg
Open the resulting SVG in a browser. Wide bars indicate functions consuming the most CPU time. Click to zoom into specific call stacks.
Tracing System Calls
# Trace system calls for a process
sudo perf trace -p $(pgrep myapp) --duration 10
# Summarize syscall statistics
sudo perf trace -s -p $(pgrep myapp) -- sleep 10
Profiling Specific Scenarios
Common use cases for perf on a Breeze server:
- High CPU from unknown source — use
perf topto identify the hot function immediately - Slow disk I/O — trace block I/O events:
sudo perf record -e block:block_rq_issue -a -- sleep 10 - Network latency — trace network events:
sudo perf record -e net:* -a -- sleep 10 - Lock contention — profile futex calls:
sudo perf record -e 'sched:sched_switch' -g -a -- sleep 10 - Memory allocation pressure — trace page faults:
sudo perf stat -e page-faults ./my-application
Best Practices
- Use -g for call graphs — without call graphs, you see hot functions but not why they are called
- Install debug symbols —
apt install -dbgsymfor meaningful function names instead of hex addresses - Profile under realistic load — use a load testing tool to generate traffic while profiling
- Keep recording short — 10-30 seconds is usually sufficient; long recordings produce unwieldy data files
- Compare before and after — record a baseline profile, make your optimization, then record again to measure the improvement