OpenTelemetry (OTel) is the industry standard for application observability, providing a unified framework for collecting metrics, traces, and logs from your applications. Unlike vendor-specific APM tools, OTel is open source and works with any backend (Grafana, Datadog, New Relic, Jaeger). This guide covers implementing OTel for comprehensive application performance monitoring.
OpenTelemetry Architecture
- SDK — instruments your application code to emit telemetry
- Collector — receives, processes, and exports telemetry data
- Backend — stores and visualizes data (Grafana, Jaeger, etc.)
OTel Collector Setup
# docker-compose.yml
services:
otel-collector:
image: otel/opentelemetry-collector-contrib:0.96.0
command: ["--config=/etc/otel-collector-config.yaml"]
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "8889:8889" # Prometheus metrics
# otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 5s
send_batch_size: 1024
memory_limiter:
limit_mib: 512
spike_limit_mib: 128
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
loki:
endpoint: http://loki:3100/loki/api/v1/push
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch, memory_limiter]
exporters: [otlp/jaeger]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [batch]
exporters: [loki]
Auto-Instrumentation
Node.js
# Zero-code instrumentation
npm install @opentelemetry/auto-instrumentations-node
# Start with auto-instrumentation
OTEL_SERVICE_NAME=my-api \
OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4318 \
node --require @opentelemetry/auto-instrumentations-node/register app.js
# Automatically instruments: HTTP, Express, MySQL, PostgreSQL, Redis, MongoDB, gRPC
Python
# pip install opentelemetry-distro opentelemetry-exporter-otlp
# opentelemetry-bootstrap -a install # Auto-detect and install instrumentations
OTEL_SERVICE_NAME=my-api \
OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4318 \
opentelemetry-instrument python app.py
# Automatically instruments: Flask, Django, FastAPI, SQLAlchemy, Redis, requests
Java
# Download the Java agent
wget https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar
# Attach to JVM
java -javaagent:opentelemetry-javaagent.jar \
-Dotel.service.name=my-api \
-Dotel.exporter.otlp.endpoint=http://collector:4318 \
-jar myapp.jar
Key Metrics to Monitor (RED Method)
# Rate — requests per second
rate(http_server_request_duration_seconds_count[5m])
# Errors — error percentage
sum(rate(http_server_request_duration_seconds_count{http_status_code=~"5.."}[5m]))
/ sum(rate(http_server_request_duration_seconds_count[5m])) * 100
# Duration — latency percentiles
histogram_quantile(0.99, rate(http_server_request_duration_seconds_bucket[5m]))
histogram_quantile(0.95, rate(http_server_request_duration_seconds_bucket[5m]))
histogram_quantile(0.50, rate(http_server_request_duration_seconds_bucket[5m]))
Baggage and Context Propagation
// Propagate context across service boundaries
// OTel automatically propagates trace context via HTTP headers:
// traceparent: 00-trace_id-span_id-01
// tracestate: vendor=value
// Add custom baggage for cross-service context
const { propagation, context } = require('@opentelemetry/api');
const baggage = propagation.createBaggage({
'user.id': { value: '12345' },
'tenant.id': { value: 'acme' },
});
const ctx = propagation.setBaggage(context.active(), baggage);
Best Practices
- Start with auto-instrumentation — it covers HTTP, databases, and caches with zero code changes
- Use the OTel Collector as a gateway — it decouples applications from backends
- Implement head-based sampling in production to control data volume
- Add resource attributes (service.name, deployment.environment, service.version) for filtering
- Use the RED method (Rate, Errors, Duration) for service-level monitoring
- Export to open backends (Prometheus + Grafana) to avoid vendor lock-in