Docs / AI & Machine Learning / AI Monitoring and Anomaly Detection

AI Monitoring and Anomaly Detection

By Admin · Mar 15, 2026 · Updated Apr 23, 2026 · 179 views · 2 min read

AI-Powered Monitoring

Traditional monitoring uses static thresholds that miss subtle anomalies and generate false alerts during normal variations. ML-based anomaly detection learns your system normal behavior and alerts only on genuine deviations.

Using Prophet for Time Series Anomaly Detection

pip install prophet pandas

from prophet import Prophet
import pandas as pd

# Load historical metrics (CPU, memory, request rate)
df = pd.read_csv("metrics.csv")
df.columns = ["ds", "y"]  # Prophet requires ds (date) and y (value)

# Train model on historical data
model = Prophet(
    changepoint_prior_scale=0.05,
    interval_width=0.99  # 99% confidence interval
)
model.fit(df)

# Predict and detect anomalies
forecast = model.predict(df)
df["yhat_upper"] = forecast["yhat_upper"]
df["yhat_lower"] = forecast["yhat_lower"]
df["anomaly"] = (df["y"] > df["yhat_upper"]) | (df["y"] < df["yhat_lower"])

anomalies = df[df["anomaly"]]
if not anomalies.empty:
    print(f"Found {len(anomalies)} anomalies!")
    print(anomalies)

Real-Time Monitoring Script

#!/usr/bin/env python3
import psutil, time, numpy as np
from collections import deque

class AnomalyDetector:
    def __init__(self, window=60):
        self.history = deque(maxlen=window)

    def check(self, value):
        self.history.append(value)
        if len(self.history) < 10:
            return False

        mean = np.mean(self.history)
        std = np.std(self.history)
        z_score = abs(value - mean) / (std + 1e-10)
        return z_score > 3  # Alert if 3+ standard deviations

cpu_detector = AnomalyDetector()
mem_detector = AnomalyDetector()

while True:
    cpu = psutil.cpu_percent(interval=5)
    mem = psutil.virtual_memory().percent

    if cpu_detector.check(cpu):
        print(f"CPU ANOMALY: {cpu}%")
    if mem_detector.check(mem):
        print(f"MEMORY ANOMALY: {mem}%")

    time.sleep(5)

Integration with Prometheus

# Query Prometheus for metrics history
# Feed into anomaly detection model
# Alert via Alertmanager when anomalies detected

# Use promql to export training data:
# rate(http_requests_total[5m])
# node_cpu_seconds_total
# node_memory_MemAvailable_bytes

Best Practices

  • Train models on at least 2 weeks of historical data
  • Account for daily and weekly seasonality patterns
  • Use separate models for different metric types
  • Combine AI detection with traditional threshold alerts
  • Regularly retrain models as your system behavior evolves

Was this article helpful?