Docs / Performance Optimization / Optimize Python with cProfile

Optimize Python with cProfile

By Admin · Mar 15, 2026 · Updated Apr 23, 2026 · 398 views · 5 min read

Python's dynamic nature makes it easy to write slow code without realizing it. The cProfile module, combined with visualization tools and targeted profilers, helps you find and fix performance bottlenecks with precision. This guide covers profiling techniques from basic to advanced, applicable to Django, Flask, FastAPI, and standalone Python applications.

Basic cProfile Usage

cProfile is Python's built-in profiler with minimal overhead, suitable for production profiling:

# Profile an entire script
python -m cProfile -s cumulative your_script.py

# Save profile data for later analysis
python -m cProfile -o output.prof your_script.py

# Profile specific code sections
import cProfile
import pstats

profiler = cProfile.Profile()
profiler.enable()

# ... code to profile ...
result = expensive_function()

profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20)  # Top 20 functions

Reading cProfile Output

# Output columns explained:
# ncalls     — number of calls
# tottime    — total time IN this function (excluding sub-calls)
# percall    — tottime / ncalls
# cumtime    — cumulative time IN this function INCLUDING sub-calls
# percall    — cumtime / ncalls
# filename:lineno(function)

# Focus on functions with high cumtime first
# Then look at tottime to find the actual CPU consumers

Visualizing with snakeviz

snakeviz provides an interactive browser-based visualization of cProfile data:

pip install snakeviz

# Generate profile data
python -m cProfile -o profile.prof your_script.py

# Launch interactive viewer
snakeviz profile.prof
# Opens browser at http://localhost:8080

# For remote VPS, bind to all interfaces
snakeviz profile.prof --server --hostname 0.0.0.0 --port 8080

Line-by-Line Profiling

Once cProfile identifies slow functions, use line_profiler to find the exact slow lines:

pip install line_profiler

# Decorate functions to profile
@profile  # This decorator is added by line_profiler
def process_data(data):
    cleaned = [clean(item) for item in data]       # Line 1
    sorted_data = sorted(cleaned, key=lambda x: x['score'])  # Line 2
    results = [transform(item) for item in sorted_data]  # Line 3
    return results

# Run with kernprof
kernprof -l -v your_script.py

# Output shows time per line:
# Line #  Hits   Time    Per Hit  % Time  Line Contents
#   1     1      5.2s    5.2s     65.0%   cleaned = [clean(item)...
#   2     1      0.8s    0.8s     10.0%   sorted_data = sorted...
#   3     1      2.0s    2.0s     25.0%   results = [transform...

Memory Profiling

pip install memory_profiler

# Decorate functions
from memory_profiler import profile

@profile
def load_large_dataset():
    data = pd.read_csv('large_file.csv')  # Shows memory spike
    filtered = data[data['status'] == 'active']
    return filtered.to_dict('records')

# Run
python -m memory_profiler your_script.py

# Track memory over time
mprof run your_script.py
mprof plot  # Generates memory usage graph

Profiling Web Applications

Django Profiling

# django-silk for request profiling
pip install django-silk

# settings.py
INSTALLED_APPS += ['silk']
MIDDLEWARE += ['silk.middleware.SilkyMiddleware']

# urls.py
urlpatterns += [path('silk/', include('silk.urls'))]

# Access profiling dashboard at /silk/
# Shows per-request timing, SQL queries, and profiling data

# django-debug-toolbar for development
pip install django-debug-toolbar

# Quick cProfile middleware
class ProfileMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if 'profile' in request.GET:
            prof = cProfile.Profile()
            prof.enable()
            response = self.get_response(request)
            prof.disable()
            stats = pstats.Stats(prof)
            stats.sort_stats('cumulative')
            stats.print_stats(30)
            return response
        return self.get_response(request)

FastAPI Profiling

from fastapi import FastAPI, Request
import cProfile
import pstats
import io

app = FastAPI()

@app.middleware("http")
async def profile_request(request: Request, call_next):
    if request.query_params.get("profile"):
        profiler = cProfile.Profile()
        profiler.enable()
        response = await call_next(request)
        profiler.disable()
        stream = io.StringIO()
        stats = pstats.Stats(profiler, stream=stream)
        stats.sort_stats("cumulative")
        stats.print_stats(20)
        print(stream.getvalue())
        return response
    return await call_next(request)

Common Python Performance Fixes

Use Built-in Functions

# SLOW: Manual loop
total = 0
for item in large_list:
    total += item

# FAST: Built-in sum (implemented in C)
total = sum(large_list)

# SLOW: String concatenation in loop
result = ""
for s in strings:
    result += s  # Creates new string each iteration

# FAST: join
result = "".join(strings)

Generator Expressions for Memory

# SLOW: Creates entire list in memory
squares = [x**2 for x in range(10_000_000)]
total = sum(squares)

# FAST: Generator — no list created
total = sum(x**2 for x in range(10_000_000))

Dictionary Lookups over List Searches

# SLOW: O(n) list search
users_list = [{'id': 1, 'name': 'Alice'}, ...]
user = next(u for u in users_list if u['id'] == target_id)  # O(n)

# FAST: O(1) dict lookup
users_dict = {u['id']: u for u in users_list}
user = users_dict[target_id]  # O(1)

Slots for Memory-Heavy Classes

# Regular class: ~200 bytes per instance (uses __dict__)
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

# Slots: ~80 bytes per instance
class Point:
    __slots__ = ['x', 'y']
    def __init__(self, x, y):
        self.x = x
        self.y = y

Async Profiling

pip install yappi

import yappi

yappi.set_clock_type("wall")  # Wall time for async
yappi.start()

# Run your async code
asyncio.run(main())

yappi.stop()
stats = yappi.get_func_stats()
stats.sort("ttot", "desc")
stats.print_all(columns={
    0: ("name", 60),
    1: ("ncall", 10),
    2: ("ttot", 8),
    3: ("tsub", 8),
})

Continuous Profiling in Production

# py-spy for zero-overhead production profiling
pip install py-spy

# Attach to running process
sudo py-spy top --pid 12345

# Generate flame graph
sudo py-spy record -o profile.svg --pid 12345 --duration 60

# Record with native extensions visible
sudo py-spy record -o profile.svg --pid 12345 --native

Summary

Python profiling follows a funnel: start with cProfile to identify slow functions, drill down with line_profiler for exact bottleneck lines, and use memory_profiler for memory issues. For web applications, use framework-specific tools like django-silk or middleware-based profiling. In production, py-spy provides zero-overhead flame graphs. Always profile before optimizing — intuition about performance is frequently wrong, and data-driven optimization prevents wasting time on code that is not actually the bottleneck.

Was this article helpful?