Docs / Performance Optimization / bpftrace Advanced Tracing

bpftrace Advanced Tracing

By Admin · Mar 15, 2026 · Updated Apr 24, 2026 · 243 views · 4 min read

bpftrace is a high-level tracing language for Linux that uses eBPF technology to safely trace kernel and user-space programs with near-zero overhead. It fills the gap between simple tools like strace and complex frameworks like SystemTap. This guide covers practical bpftrace programs for debugging performance issues on production servers.

Installing bpftrace

# Ubuntu/Debian (22.04+)
sudo apt install bpftrace

# RHEL/AlmaLinux 9
sudo dnf install bpftrace

# Verify
bpftrace --version
# bpftrace v0.20+

Essential One-Liners

System Call Tracing

# Count syscalls by process
sudo bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

# Syscall latency histogram
sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter { @start[tid] = nsecs; }
tracepoint:raw_syscalls:sys_exit /@start[tid]/ {
    @ns = hist(nsecs - @start[tid]);
    delete(@start[tid]);
}'

# Top 10 slowest syscalls
sudo bpftrace -e '
tracepoint:raw_syscalls:sys_enter { @start[tid] = nsecs; @sc[tid] = args->id; }
tracepoint:raw_syscalls:sys_exit /@start[tid]/ {
    @slow[ksym(@sc[tid])] = max(nsecs - @start[tid]);
    delete(@start[tid]); delete(@sc[tid]);
}'

Disk I/O Analysis

# I/O latency histogram by disk
sudo bpftrace -e '
tracepoint:block:block_rq_issue { @start[args->dev, args->sector] = nsecs; }
tracepoint:block:block_rq_complete /@start[args->dev, args->sector]/ {
    @usecs = hist((nsecs - @start[args->dev, args->sector]) / 1000);
    delete(@start[args->dev, args->sector]);
}'

# I/O size distribution
sudo bpftrace -e '
tracepoint:block:block_rq_issue {
    @bytes = hist(args->bytes);
}

# Top files being read
sudo bpftrace -e '
tracepoint:syscalls:sys_enter_read {
    @reads[comm, pid] = count();
}'

Network Tracing

# TCP connection latency (time to establish)
sudo bpftrace -e '
kprobe:tcp_v4_connect { @start[tid] = nsecs; }
kretprobe:tcp_v4_connect /@start[tid]/ {
    @connect_ms = hist((nsecs - @start[tid]) / 1000000);
    delete(@start[tid]);
}'

# Packets by process
sudo bpftrace -e '
tracepoint:net:net_dev_xmit {
    @[comm] = count();
}'

# TCP retransmit count by destination
sudo bpftrace -e '
tracepoint:tcp:tcp_retransmit_skb {
    @retrans[ntop(args->daddr)] = count();
}'

Application-Level Tracing

USDT Probes (User-Space)

# List available USDT probes in an application
sudo bpftrace -l 'usdt:/usr/sbin/mysqld:*'

# MySQL query latency
sudo bpftrace -e '
usdt:/usr/sbin/mysqld:mysql:query__start {
    @start[tid] = nsecs;
    @query[tid] = str(arg0);
}
usdt:/usr/sbin/mysqld:mysql:query__done /@start[tid]/ {
    $dur = (nsecs - @start[tid]) / 1000000;
    if ($dur > 100) {
        printf("SLOW %dms: %s\n", $dur, @query[tid]);
    }
    delete(@start[tid]); delete(@query[tid]);
}'

Tracing PHP-FPM

# PHP function call tracing (requires PHP with USDT/dtrace enabled)
sudo bpftrace -e '
usdt:/usr/bin/php-fpm8.3:php:function__entry {
    @calls[str(arg0), str(arg1)] = count();
}'

# PHP compilation events
sudo bpftrace -e '
usdt:/usr/bin/php-fpm8.3:php:compile__file__entry {
    printf("Compiling: %s\n", str(arg0));
}'

Writing bpftrace Scripts

File Open Latency by Process

#!/usr/bin/env bpftrace
// Save as file-open-latency.bt

tracepoint:syscalls:sys_enter_openat
{
    @start[tid] = nsecs;
    @fname[tid] = str(args->filename);
}

tracepoint:syscalls:sys_exit_openat
/@start[tid]/
{
    $dur_us = (nsecs - @start[tid]) / 1000;
    if ($dur_us > 1000) {
        printf("%-16s %-6d %8d us  %s\n",
            comm, pid, $dur_us, @fname[tid]);
    }
    delete(@start[tid]);
    delete(@fname[tid]);
}

END
{
    clear(@start);
    clear(@fname);
}

Memory Allocation Tracer

#!/usr/bin/env bpftrace
// Track which processes allocate the most memory via brk/mmap

tracepoint:syscalls:sys_enter_brk
{
    @brk[comm] = count();
}

tracepoint:syscalls:sys_enter_mmap
{
    @mmap[comm] = count();
    @mmap_bytes[comm] = sum(args->len);
}

interval:s:10
{
    printf("\n--- Memory Allocation Summary ---\n");
    print(@mmap_bytes);
}

Production Safety

bpftrace is designed for production use with important safety guarantees:

  • Verified programs: eBPF verifier prevents crashes, infinite loops, and invalid memory access
  • Bounded execution: Programs have instruction limits and bounded loop counts
  • No kernel modification: Read-only observation by default
  • Low overhead: Typically less than 1% CPU overhead for most tracing
# Run with timeout for safety
sudo timeout 60 bpftrace -e 'tracepoint:block:block_rq_issue { @[comm] = count(); }'

# Limit output rate
sudo bpftrace -e '
interval:s:1 { print(@); clear(@); }
tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }'

Comparing Tools

ToolBest ForOverhead
straceQuick debugging, single processHigh (ptrace)
perfCPU profiling, PMC analysisLow-Medium
bpftraceCustom tracing, latency analysisVery Low
SystemTapComplex scripted tracingLow

Summary

bpftrace is the Swiss Army knife of Linux observability. Its concise syntax makes it easy to write one-liners for quick investigations, while its scripting capabilities support complex multi-probe analyses. Start with the one-liners in this guide to answer common questions like "why is I/O slow?", "which process is making the most syscalls?", and "what's causing TCP retransmissions?" — all without meaningful performance impact on your production servers.

Was this article helpful?