Migrating from MongoDB to PostgreSQL involves transforming a schema-less document model into a relational schema, converting BSON data types, and rewriting application queries. PostgreSQL's JSONB support makes this transition smoother by allowing a gradual migration where some data remains semi-structured. This guide covers planning, execution, and validation of the migration.
Why Migrate from MongoDB to PostgreSQL?
- ACID transactions — full multi-table transaction support without the complexity of MongoDB's multi-document transactions
- SQL ecosystem — rich tooling, reporting capabilities, and developer familiarity
- Cost — PostgreSQL is fully open source with no commercial licensing tiers
- JSONB — PostgreSQL handles semi-structured data almost as well as MongoDB, with SQL power on top
- Joins — replace application-level joins ($lookup) with efficient database-level joins
Migration Strategy Overview
- Schema analysis — understand your MongoDB collections and document structures
- Schema design — create PostgreSQL tables (relational + JSONB hybrid)
- Data export — dump MongoDB data as JSON
- Data transformation — convert and load into PostgreSQL
- Application migration — update queries and data access layer
- Validation — verify data integrity and performance
Step 1: Analyze MongoDB Schema
// Use variety.js to analyze document structure
// Install: npm install -g mongodb-schema
// Or manually analyze in mongosh:
var fields = {};
db.users.find().limit(1000).forEach(doc => {
Object.keys(doc).forEach(key => {
var type = typeof doc[key];
if (Array.isArray(doc[key])) type = "array";
if (doc[key] instanceof ObjectId) type = "ObjectId";
if (doc[key] instanceof Date) type = "Date";
fields[key] = fields[key] || {};
fields[key][type] = (fields[key][type] || 0) + 1;
});
});
printjson(fields);
Step 2: Design PostgreSQL Schema
The key decision is which embedded documents to normalize into separate tables versus keeping as JSONB:
-- MongoDB document:
-- {
-- _id: ObjectId("..."),
-- email: "user@example.com",
-- name: "John Doe",
-- address: { street: "123 Main", city: "NYC", zip: "10001" },
-- orders: [{ product: "Widget", qty: 3, price: 9.99 }],
-- tags: ["premium", "early-adopter"],
-- metadata: { source: "api", version: 2, custom: {...} }
-- }
-- PostgreSQL schema:
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
mongo_id VARCHAR(24) UNIQUE, -- Keep for reference during migration
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(255) NOT NULL,
-- Normalize address into columns (frequently queried)
address_street VARCHAR(255),
address_city VARCHAR(100),
address_zip VARCHAR(20),
-- Keep tags as array
tags TEXT[] DEFAULT '{}',
-- Keep flexible metadata as JSONB
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Normalize orders into a separate table (one-to-many)
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
product VARCHAR(255) NOT NULL,
quantity INT NOT NULL,
price DECIMAL(10,2) NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_tags ON users USING GIN(tags);
CREATE INDEX idx_users_metadata ON users USING GIN(metadata);
CREATE INDEX idx_orders_user_id ON orders(user_id);
Step 3: Export Data from MongoDB
# Export each collection as JSON
mongoexport --uri="mongodb://user:pass@localhost/mydb" \
--collection=users \
--out=/tmp/users.json \
--jsonArray
# For large collections, use streaming format (one doc per line)
mongoexport --uri="mongodb://user:pass@localhost/mydb" \
--collection=users \
--out=/tmp/users.jsonl
Step 4: Transform and Load Data
#!/usr/bin/env python3
# migrate_users.py
import json
import psycopg2
from psycopg2.extras import execute_values, Json
conn = psycopg2.connect("dbname=myapp user=postgres")
cur = conn.cursor()
with open('/tmp/users.jsonl') as f:
batch = []
for line in f:
doc = json.loads(line)
addr = doc.get('address', {})
row = (
doc['_id']['$oid'] if isinstance(doc['_id'], dict) else str(doc['_id']),
doc.get('email'),
doc.get('name', ''),
addr.get('street'),
addr.get('city'),
addr.get('zip'),
doc.get('tags', []),
Json(doc.get('metadata', {})),
)
batch.append(row)
if len(batch) >= 5000:
execute_values(cur, """
INSERT INTO users (mongo_id, email, name, address_street,
address_city, address_zip, tags, metadata)
VALUES %s ON CONFLICT (mongo_id) DO NOTHING
""", batch)
conn.commit()
batch = []
if batch:
execute_values(cur, """
INSERT INTO users (mongo_id, email, name, address_street,
address_city, address_zip, tags, metadata)
VALUES %s ON CONFLICT (mongo_id) DO NOTHING
""", batch)
conn.commit()
cur.close()
conn.close()
print("Migration complete")
Step 5: Query Migration
-- MongoDB: db.users.find({ email: "user@example.com" })
SELECT * FROM users WHERE email = 'user@example.com';
-- MongoDB: db.users.find({ tags: "premium" })
SELECT * FROM users WHERE 'premium' = ANY(tags);
-- MongoDB: db.users.find({ "metadata.source": "api" })
SELECT * FROM users WHERE metadata->>'source' = 'api';
-- MongoDB: db.users.find({ "address.city": "NYC" }).sort({ name: 1 })
SELECT * FROM users WHERE address_city = 'NYC' ORDER BY name;
-- MongoDB: db.users.aggregate([
-- { $lookup: { from: "orders", localField: "_id", foreignField: "user_id", as: "orders" } }
-- ])
SELECT u.*, json_agg(o.*) AS orders
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id;
-- MongoDB: db.users.updateOne({ email: "user@example.com" }, { $set: { "metadata.verified": true } })
UPDATE users SET metadata = jsonb_set(metadata, '{verified}', 'true')
WHERE email = 'user@example.com';
-- MongoDB: db.users.updateOne({ email: "user@example.com" }, { $push: { tags: "verified" } })
UPDATE users SET tags = array_append(tags, 'verified')
WHERE email = 'user@example.com';
Step 6: Validation
# Compare counts
mongo mydb --eval "db.users.countDocuments()"
psql myapp -c "SELECT COUNT(*) FROM users"
# Spot-check random documents
mongo mydb --eval "db.users.find().skip(Math.random() * 10000).limit(5).pretty()"
# Compare with PostgreSQL results for the same mongo_ids
# Verify JSONB data integrity
psql myapp -c "SELECT COUNT(*) FROM users WHERE metadata IS NOT NULL AND metadata != '{}'::jsonb"
Best Practices
- Use JSONB columns for truly flexible/sparse data — do not over-normalize
- Keep the
mongo_idcolumn during migration for cross-referencing; remove it once migration is validated - Migrate in batches of 5,000-10,000 documents to manage memory and transaction size
- Run both databases in parallel during transition to validate results before cutover
- Use PostgreSQL arrays for simple lists (tags) and JSONB for nested structures (metadata)
- Create GIN indexes on JSONB and array columns for query performance
- Test all application queries against PostgreSQL before switching production traffic