Email archiving is essential for compliance, legal discovery, and organizational knowledge preservation. MailArchiva is a dedicated email archiving solution that captures, indexes, and stores all email passing through your mail server. This guide covers deployment options, integration with Postfix and Exchange, and search/retrieval workflows.
Why Archive Email?
- Compliance — regulations like GDPR, HIPAA, SOX, and SEC Rule 17a-4 require email retention
- Legal discovery — quickly search and export emails for litigation holds
- Business continuity — recover accidentally deleted emails
- Knowledge management — searchable organizational memory
MailArchiva Installation
# Download MailArchiva
wget https://www.mailarchiva.com/downloads/mailarchiva-server-latest.deb
# Install
sudo dpkg -i mailarchiva-server-latest.deb
sudo apt-get install -f # Fix dependencies if needed
# Start the service
sudo systemctl enable --now mailarchiva
# Access web interface
# https://your-server:8443
# Default credentials: admin / admin
Alternative: Open-Source Archiving with Postfix
For a free alternative, you can use Postfix's always_bcc feature combined with a dedicated archive mailbox:
# /etc/postfix/main.cf
always_bcc = archive@example.com
# This sends a copy of every email (inbound and outbound) to the archive address
# Combine with a search tool like Apache Solr or Elasticsearch for indexing
Building a Custom Archive Pipeline
#!/usr/bin/env python3
# archive-pipe.py — Postfix content_filter for archiving
import sys
import email
import json
import hashlib
from datetime import datetime
from pathlib import Path
import subprocess
# Read email from stdin
raw = sys.stdin.buffer.read()
msg = email.message_from_bytes(raw)
# Extract metadata
metadata = {
"message_id": msg["Message-ID"],
"from": msg["From"],
"to": msg["To"],
"cc": msg.get("Cc", ""),
"subject": msg["Subject"],
"date": msg["Date"],
"archived_at": datetime.utcnow().isoformat(),
"size": len(raw),
"hash": hashlib.sha256(raw).hexdigest()
}
# Store the raw email
archive_dir = Path("/archive/mail") / datetime.now().strftime("%Y/%m/%d")
archive_dir.mkdir(parents=True, exist_ok=True)
filename = f"{metadata['hash']}.eml"
(archive_dir / filename).write_bytes(raw)
(archive_dir / f"{metadata['hash']}.json").write_text(json.dumps(metadata, indent=2))
# Re-inject into Postfix for delivery
subprocess.run(["/usr/sbin/sendmail", "-G", "-i"] + sys.argv[1:], input=raw)
sys.exit(0)
MailArchiva Configuration
Journal-Based Archiving (Recommended)
Configure your mail server to journal (BCC) all email to MailArchiva:
# Postfix: journal all mail to MailArchiva
# /etc/postfix/main.cf
always_bcc = journal@archive.example.com
# Configure MailArchiva to receive on a dedicated port
# In MailArchiva admin → Archive → SMTP Listener
# Set port: 2525
# Set allowed hosts: your-mail-server-ip
Milter-Based Archiving
# MailArchiva can act as a milter
# /etc/postfix/main.cf
smtpd_milters = inet:localhost:8891 # MailArchiva milter port
milter_default_action = accept
Search and Retrieval
MailArchiva provides full-text search across all archived emails:
- Search operators: from:, to:, subject:, body:, date:, has:attachment
- Boolean operators: AND, OR, NOT, parentheses for grouping
- Date ranges: date:[2025-01-01 TO 2025-03-15]
- Wildcard search: invoice* matches invoice, invoices, invoicing
# Example searches
from:ceo@company.com AND subject:confidential
to:finance@company.com AND has:attachment AND date:[2025-01-01 TO *]
(from:vendor1.com OR from:vendor2.com) AND body:"purchase order"
Retention Policies
# Configure in MailArchiva admin → Policies → Retention
# Example policies:
# - General email: retain 7 years
# - Financial email: retain 10 years
# - Legal hold: retain indefinitely
# - Internal newsletters: retain 1 year
# Rules can be based on:
# - Sender/recipient domains
# - Subject line patterns
# - Date ranges
# - Custom headers
Storage Management
# MailArchiva stores emails in volumes
# Each volume is a directory containing indexed email data
# Estimate storage needs:
# Average email size: 75KB
# 100 users × 50 emails/day × 75KB = 375MB/day ≈ 137GB/year
# Use tiered storage:
# - Hot storage (SSD): recent 6 months
# - Cold storage (HDD/S3): older archives
Legal Hold and Export
# Legal hold: prevent deletion of emails matching criteria
# MailArchiva admin → Legal Hold → Create Hold
# Define scope: custodians, date range, keywords
# Held emails are protected from retention policy deletion
# Export for legal discovery:
# 1. Run search with relevant criteria
# 2. Select results and choose export format
# 3. Formats: PST, EML, PDF, MBOX
# 4. Export includes metadata, headers, and attachments
Best Practices
- Use journal-based archiving to capture all mail without impacting mail flow
- Store archives on separate storage from your mail server for resilience
- Implement retention policies from day one — retroactive compliance is difficult
- Test search and export regularly to ensure the archive is functional
- Encrypt archive storage at rest for sensitive email data
- Set up monitoring to alert if archiving stops or falls behind
- Plan storage capacity based on your organization's email volume plus 20% buffer
- Document your archiving policy for compliance audits