RAG Pipeline with LlamaIndex

By Admin · Mar 15, 2026 · Updated Jun 25, 2026 · 208 views · 2 min read

What is RAG?

Retrieval-Augmented Generation combines a retrieval system with a language model. Instead of relying solely on the LLM training data, RAG retrieves relevant documents from your knowledge base and includes them as context, producing more accurate and up-to-date responses.

Installing LlamaIndex

pip install llama-index llama-index-llms-ollama llama-index-embeddings-huggingface
# Or with OpenAI:
pip install llama-index llama-index-llms-openai

Basic RAG Pipeline

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Configure LLM and embedding model
llm = Ollama(model="llama3", request_timeout=120)
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

# Load documents
documents = SimpleDirectoryReader("/path/to/docs").load_data()

# Create vector index
index = VectorStoreIndex.from_documents(
    documents, embed_model=embed_model
)

# Query
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What is our refund policy?")
print(response)

Advanced: Persistent Storage

from llama_index.core import StorageContext, load_index_from_storage
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

# Use ChromaDB for persistent vector storage
chroma_client = chromadb.PersistentClient(path="/opt/chroma_db")
collection = chroma_client.get_or_create_collection("my_docs")
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Index documents (only needed once)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

# Later, load existing index
index = VectorStoreIndex.from_vector_store(vector_store)

Document Types Supported

PDF, DOCX, PPTX, XLSX
HTML, Markdown, plain text
CSV, JSON
Notion, Slack, GitHub (via readers)
Databases (SQL queries as documents)

Optimization Tips

Chunk documents into 512-1024 token segments for better retrieval
Use metadata filters to narrow search scope
Implement re-ranking for higher quality results
Cache embeddings to avoid recomputation
Use hybrid search (vector + keyword) for better recall