Training Custom NER Models with spaCy on VPS

By Admin · Apr 7, 2026 · Updated Jun 25, 2026 · 30 views · 3 min read

Managing spacy effectively is a crucial skill for any system administrator. This tutorial provides step-by-step instructions for ner configuration, along with best practices for production environments.

Prerequisites

Basic familiarity with the Linux command line
A VPS running Ubuntu 22.04 or later (2GB+ RAM recommended)
At least 4GB RAM (8GB+ recommended for model loading)
Root or sudo access to the server

Installing Dependencies

For production deployments, consider implementing high availability by running multiple instances behind a load balancer. This approach provides both redundancy and improved performance under heavy load.


# Install Python dependencies
pip install torch transformers accelerate
pip install spacy fastapi uvicorn

The configuration above sets the recommended values for a VPS with 2-4GB of RAM. Adjust the memory-related settings proportionally if your server has different specifications.

Important Notes

If you encounter issues during setup, check the system logs first. Most problems can be diagnosed by examining the output of journalctl or the application-specific log files in /var/log/.

Model Configuration

Performance benchmarks show that properly tuned spacy can handle significantly more concurrent connections than the default configuration. The key improvements come from adjusting worker processes and connection pooling.


from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "spacy/ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    low_cpu_mem_usage=True
)

The configuration above sets the recommended values for a VPS with 2-4GB of RAM. Adjust the memory-related settings proportionally if your server has different specifications.

Advanced Settings

The ner component plays a crucial role in the overall architecture. Understanding how it interacts with spacy will help you make better configuration decisions.

Running the Inference Server

After applying these changes, monitor the server's resource usage for at least 24 hours to ensure stability. Tools like htop, iostat, and vmstat can provide real-time insights into system performance.


# Check GPU/CPU memory usage
nvidia-smi  # For GPU
free -h     # For system RAM

# Start the inference server
python -m spacy.server --model ner --port 8000 --host 0.0.0.0

Note that file paths may vary depending on your Linux distribution. The examples here are for Debian/Ubuntu; adjust paths accordingly for RHEL/CentOS-based systems.

Common Issues and Solutions

Slow performance: Check for disk I/O bottlenecks with iostat -x 1 and network issues with mtr. Review application logs for slow queries or requests.
Permission denied errors: Ensure files and directories have the correct ownership. Use chown -R to fix ownership and chmod for permissions.

Wrapping Up

Following this guide, your spacy setup should be production-ready. Keep an eye on resource usage as your traffic grows and don't forget to test your backup and recovery procedures periodically.