Jaeger is an open-source distributed tracing platform that helps you monitor and troubleshoot microservice architectures. It tracks requests as they flow through multiple services, showing exactly where latency occurs and how services interact. This guide covers deploying Jaeger and instrumenting applications for distributed tracing.
Installation
# All-in-one (development/testing)
docker run -d --name jaeger \
-p 6831:6831/udp \
-p 16686:16686 \
-p 14268:14268 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/all-in-one:1.55
# Access Jaeger UI at http://localhost:16686
# Ports:
# 6831/udp — Jaeger agent (Thrift compact)
# 14268 — Jaeger collector HTTP
# 4317 — OTLP gRPC
# 4318 — OTLP HTTP
# 16686 — Jaeger UI
Production Deployment
# docker-compose.yml
services:
jaeger-collector:
image: jaegertracing/jaeger-collector:1.55
environment:
- SPAN_STORAGE_TYPE=elasticsearch
- ES_SERVER_URLS=http://elasticsearch:9200
ports:
- "14268:14268"
- "4317:4317"
jaeger-query:
image: jaegertracing/jaeger-query:1.55
environment:
- SPAN_STORAGE_TYPE=elasticsearch
- ES_SERVER_URLS=http://elasticsearch:9200
ports:
- "16686:16686"
elasticsearch:
image: elasticsearch:8.12.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
volumes:
- es_data:/usr/share/elasticsearch/data
volumes:
es_data:
Instrumenting Applications
Node.js with OpenTelemetry
// npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/exporter-trace-otlp-http @opentelemetry/instrumentation-http @opentelemetry/instrumentation-express
// tracing.js (import before everything else)
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
const { ExpressInstrumentation } = require('@opentelemetry/instrumentation-express');
const sdk = new NodeSDK({
serviceName: 'api-service',
traceExporter: new OTLPTraceExporter({
url: 'http://jaeger:4318/v1/traces',
}),
instrumentations: [
new HttpInstrumentation(),
new ExpressInstrumentation(),
],
});
sdk.start();
Python with OpenTelemetry
# pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp opentelemetry-instrumentation-flask
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.flask import FlaskInstrumentor
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://jaeger:4318/v1/traces"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)
Custom Spans
// Node.js — add custom spans for important operations
const { trace } = require('@opentelemetry/api');
const tracer = trace.getTracer('my-service');
async function processOrder(orderId) {
return tracer.startActiveSpan('process-order', async (span) => {
span.setAttribute('order.id', orderId);
// Nested span for database query
await tracer.startActiveSpan('db-query', async (dbSpan) => {
const order = await db.query('SELECT * FROM orders WHERE id = $1', [orderId]);
dbSpan.setAttribute('db.rows_affected', 1);
dbSpan.end();
return order;
});
span.setStatus({ code: SpanStatusCode.OK });
span.end();
});
}
Analyzing Traces in Jaeger UI
- Search — find traces by service, operation, tags, or duration
- Timeline view — see the full request flow across services with timing
- Compare — compare two traces to identify performance differences
- Dependencies — visualize service dependency graph
- Monitor — track RED metrics (Rate, Errors, Duration) per service
Best Practices
- Use OpenTelemetry SDK for instrumentation — it is vendor-neutral and works with any backend
- Instrument HTTP clients/servers, database drivers, and message queue consumers automatically
- Add custom spans only for business-critical operations — too many spans add overhead
- Use sampling in production to reduce data volume (1-10% of requests)
- Add meaningful attributes to spans (user_id, order_id, error details)
- Use Elasticsearch or Cassandra for production storage, not in-memory