bpftrace is a high-level tracing language for Linux that uses eBPF technology to safely trace kernel and user-space programs with near-zero overhead. It fills the gap between simple tools like strace and complex frameworks like SystemTap. This guide covers practical bpftrace programs for debugging performance issues on production servers.
Installing bpftrace
# Ubuntu/Debian (22.04+)
sudo apt install bpftrace
# RHEL/AlmaLinux 9
sudo dnf install bpftrace
# Verify
bpftrace --version
# bpftrace v0.20+
Essential One-Liners
System Call Tracing
# Count syscalls by process
sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
# Syscall latency histogram
sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter { @start[tid] = nsecs; }
tracepoint:raw_syscalls:sys_exit /@start[tid]/ {
@ns = hist(nsecs - @start[tid]);
delete(@start[tid]);
}'
# Top 10 slowest syscalls
sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter { @start[tid] = nsecs; @sc[tid] = args->id; }
tracepoint:raw_syscalls:sys_exit /@start[tid]/ {
@slow[ksym(@sc[tid])] = max(nsecs - @start[tid]);
delete(@start[tid]); delete(@sc[tid]);
}'
Disk I/O Analysis
# I/O latency histogram by disk
sudo bpftrace -e '
tracepoint:block:block_rq_issue { @start[args->dev, args->sector] = nsecs; }
tracepoint:block:block_rq_complete /@start[args->dev, args->sector]/ {
@usecs = hist((nsecs - @start[args->dev, args->sector]) / 1000);
delete(@start[args->dev, args->sector]);
}'
# I/O size distribution
sudo bpftrace -e '
tracepoint:block:block_rq_issue {
@bytes = hist(args->bytes);
}
# Top files being read
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_read {
@reads[comm, pid] = count();
}'
Network Tracing
# TCP connection latency (time to establish)
sudo bpftrace -e '
kprobe:tcp_v4_connect { @start[tid] = nsecs; }
kretprobe:tcp_v4_connect /@start[tid]/ {
@connect_ms = hist((nsecs - @start[tid]) / 1000000);
delete(@start[tid]);
}'
# Packets by process
sudo bpftrace -e '
tracepoint:net:net_dev_xmit {
@[comm] = count();
}'
# TCP retransmit count by destination
sudo bpftrace -e '
tracepoint:tcp:tcp_retransmit_skb {
@retrans[ntop(args->daddr)] = count();
}'
Application-Level Tracing
USDT Probes (User-Space)
# List available USDT probes in an application
sudo bpftrace -l 'usdt:/usr/sbin/mysqld:*'
# MySQL query latency
sudo bpftrace -e '
usdt:/usr/sbin/mysqld:mysql:query__start {
@start[tid] = nsecs;
@query[tid] = str(arg0);
}
usdt:/usr/sbin/mysqld:mysql:query__done /@start[tid]/ {
$dur = (nsecs - @start[tid]) / 1000000;
if ($dur > 100) {
printf("SLOW %dms: %s\n", $dur, @query[tid]);
}
delete(@start[tid]); delete(@query[tid]);
}'
Tracing PHP-FPM
# PHP function call tracing (requires PHP with USDT/dtrace enabled)
sudo bpftrace -e '
usdt:/usr/bin/php-fpm8.3:php:function__entry {
@calls[str(arg0), str(arg1)] = count();
}'
# PHP compilation events
sudo bpftrace -e '
usdt:/usr/bin/php-fpm8.3:php:compile__file__entry {
printf("Compiling: %s\n", str(arg0));
}'
Writing bpftrace Scripts
File Open Latency by Process
#!/usr/bin/env bpftrace
// Save as file-open-latency.bt
tracepoint:syscalls:sys_enter_openat
{
@start[tid] = nsecs;
@fname[tid] = str(args->filename);
}
tracepoint:syscalls:sys_exit_openat
/@start[tid]/
{
$dur_us = (nsecs - @start[tid]) / 1000;
if ($dur_us > 1000) {
printf("%-16s %-6d %8d us %s\n",
comm, pid, $dur_us, @fname[tid]);
}
delete(@start[tid]);
delete(@fname[tid]);
}
END
{
clear(@start);
clear(@fname);
}
Memory Allocation Tracer
#!/usr/bin/env bpftrace
// Track which processes allocate the most memory via brk/mmap
tracepoint:syscalls:sys_enter_brk
{
@brk[comm] = count();
}
tracepoint:syscalls:sys_enter_mmap
{
@mmap[comm] = count();
@mmap_bytes[comm] = sum(args->len);
}
interval:s:10
{
printf("\n--- Memory Allocation Summary ---\n");
print(@mmap_bytes);
}
Production Safety
bpftrace is designed for production use with important safety guarantees:
- Verified programs: eBPF verifier prevents crashes, infinite loops, and invalid memory access
- Bounded execution: Programs have instruction limits and bounded loop counts
- No kernel modification: Read-only observation by default
- Low overhead: Typically less than 1% CPU overhead for most tracing
# Run with timeout for safety
sudo timeout 60 bpftrace -e 'tracepoint:block:block_rq_issue { @[comm] = count(); }'
# Limit output rate
sudo bpftrace -e '
interval:s:1 { print(@); clear(@); }
tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'
Comparing Tools
| Tool | Best For | Overhead |
|---|---|---|
| strace | Quick debugging, single process | High (ptrace) |
| perf | CPU profiling, PMC analysis | Low-Medium |
| bpftrace | Custom tracing, latency analysis | Very Low |
| SystemTap | Complex scripted tracing | Low |
Summary
bpftrace is the Swiss Army knife of Linux observability. Its concise syntax makes it easy to write one-liners for quick investigations, while its scripting capabilities support complex multi-probe analyses. Start with the one-liners in this guide to answer common questions like "why is I/O slow?", "which process is making the most syscalls?", and "what's causing TCP retransmissions?" — all without meaningful performance impact on your production servers.