AI-Powered Monitoring
Traditional monitoring uses static thresholds that miss subtle anomalies and generate false alerts during normal variations. ML-based anomaly detection learns your system normal behavior and alerts only on genuine deviations.
Using Prophet for Time Series Anomaly Detection
pip install prophet pandas
from prophet import Prophet
import pandas as pd
# Load historical metrics (CPU, memory, request rate)
df = pd.read_csv("metrics.csv")
df.columns = ["ds", "y"] # Prophet requires ds (date) and y (value)
# Train model on historical data
model = Prophet(
changepoint_prior_scale=0.05,
interval_width=0.99 # 99% confidence interval
)
model.fit(df)
# Predict and detect anomalies
forecast = model.predict(df)
df["yhat_upper"] = forecast["yhat_upper"]
df["yhat_lower"] = forecast["yhat_lower"]
df["anomaly"] = (df["y"] > df["yhat_upper"]) | (df["y"] < df["yhat_lower"])
anomalies = df[df["anomaly"]]
if not anomalies.empty:
print(f"Found {len(anomalies)} anomalies!")
print(anomalies)
Real-Time Monitoring Script
#!/usr/bin/env python3
import psutil, time, numpy as np
from collections import deque
class AnomalyDetector:
def __init__(self, window=60):
self.history = deque(maxlen=window)
def check(self, value):
self.history.append(value)
if len(self.history) < 10:
return False
mean = np.mean(self.history)
std = np.std(self.history)
z_score = abs(value - mean) / (std + 1e-10)
return z_score > 3 # Alert if 3+ standard deviations
cpu_detector = AnomalyDetector()
mem_detector = AnomalyDetector()
while True:
cpu = psutil.cpu_percent(interval=5)
mem = psutil.virtual_memory().percent
if cpu_detector.check(cpu):
print(f"CPU ANOMALY: {cpu}%")
if mem_detector.check(mem):
print(f"MEMORY ANOMALY: {mem}%")
time.sleep(5)
Integration with Prometheus
# Query Prometheus for metrics history
# Feed into anomaly detection model
# Alert via Alertmanager when anomalies detected
# Use promql to export training data:
# rate(http_requests_total[5m])
# node_cpu_seconds_total
# node_memory_MemAvailable_bytes
Best Practices
- Train models on at least 2 weeks of historical data
- Account for daily and weekly seasonality patterns
- Use separate models for different metric types
- Combine AI detection with traditional threshold alerts
- Regularly retrain models as your system behavior evolves