Go's compiled nature and efficient garbage collector make it performant out of the box, but production applications still benefit significantly from profiling and optimization. This guide covers Go's built-in profiling tools, common performance patterns, and optimization techniques that can reduce latency and memory usage in production deployments.
Profiling with pprof
Go includes a powerful profiling toolkit that runs with minimal overhead in production:
HTTP Server Profiling
// Import the pprof HTTP handlers
import _ "net/http/pprof"
// In your main function, start a debug server
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
// Now you can access profiles at:
// http://localhost:6060/debug/pprof/
CPU Profiling
# Collect 30-second CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
# Interactive commands in pprof:
(pprof) top 20 # Top 20 CPU consumers
(pprof) list functionName # Line-by-line for specific function
(pprof) web # Open flame graph in browser
(pprof) svg > cpu.svg # Export flame graph
Memory Profiling
# Heap profile (current allocations)
go tool pprof http://localhost:6060/debug/pprof/heap
# Allocs profile (total allocations since start)
go tool pprof http://localhost:6060/debug/pprof/allocs
# In pprof:
(pprof) top 20 -cum # Sort by cumulative allocations
(pprof) list functionName
# Compare two heap profiles to find leaks
go tool pprof -diff_base=before.prof after.prof
Reducing Allocations
Memory allocations are the most common performance bottleneck in Go. Each allocation creates GC pressure:
sync.Pool for Temporary Objects
var bufPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
func handleRequest(w http.ResponseWriter, r *http.Request) {
buf := bufPool.Get().(*bytes.Buffer)
buf.Reset()
defer bufPool.Put(buf)
// Use buf instead of creating new bytes.Buffer each time
buf.WriteString("response data")
w.Write(buf.Bytes())
}
Pre-allocate Slices and Maps
// BAD: grows dynamically, causes multiple allocations
var results []string
for _, item := range items {
results = append(results, process(item))
}
// GOOD: single allocation
results := make([]string, 0, len(items))
for _, item := range items {
results = append(results, process(item))
}
// Same for maps
m := make(map[string]int, expectedSize)
Avoid String Concatenation in Loops
// BAD: creates new string each iteration
var s string
for _, item := range items {
s += item.Name + "," // O(n²) allocations
}
// GOOD: use strings.Builder
var b strings.Builder
b.Grow(len(items) * 20) // Pre-allocate estimate
for _, item := range items {
b.WriteString(item.Name)
b.WriteByte(',')
}
s := b.String()
Escape Analysis
# See what escapes to the heap
go build -gcflags="-m" ./...
# Verbose output
go build -gcflags="-m -m" ./...
# Common escape causes:
# - Returning pointers to local variables
# - Sending to channels
# - Storing in interface values
# - Closures capturing variables
# - Slices that grow beyond stack size
// Stack allocated (fast)
func sum(a, b int) int {
return a + b
}
// Heap allocated (slower - pointer escapes)
func newUser(name string) *User {
u := User{Name: name} // escapes to heap
return &u
}
Goroutine Management
// Limit concurrent goroutines with semaphore pattern
func processItems(items []Item) {
sem := make(chan struct{}, 100) // Max 100 concurrent
var wg sync.WaitGroup
for _, item := range items {
sem