Docs / AI & Machine Learning / RAG Pipeline with LlamaIndex

RAG Pipeline with LlamaIndex

By Admin · Mar 15, 2026 · Updated Apr 23, 2026 · 183 views · 2 min read

What is RAG?

Retrieval-Augmented Generation combines a retrieval system with a language model. Instead of relying solely on the LLM training data, RAG retrieves relevant documents from your knowledge base and includes them as context, producing more accurate and up-to-date responses.

Installing LlamaIndex

pip install llama-index llama-index-llms-ollama llama-index-embeddings-huggingface
# Or with OpenAI:
pip install llama-index llama-index-llms-openai

Basic RAG Pipeline

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Configure LLM and embedding model
llm = Ollama(model="llama3", request_timeout=120)
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

# Load documents
documents = SimpleDirectoryReader("/path/to/docs").load_data()

# Create vector index
index = VectorStoreIndex.from_documents(
    documents, embed_model=embed_model
)

# Query
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What is our refund policy?")
print(response)

Advanced: Persistent Storage

from llama_index.core import StorageContext, load_index_from_storage
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

# Use ChromaDB for persistent vector storage
chroma_client = chromadb.PersistentClient(path="/opt/chroma_db")
collection = chroma_client.get_or_create_collection("my_docs")
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# Index documents (only needed once)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

# Later, load existing index
index = VectorStoreIndex.from_vector_store(vector_store)

Document Types Supported

  • PDF, DOCX, PPTX, XLSX
  • HTML, Markdown, plain text
  • CSV, JSON
  • Notion, Slack, GitHub (via readers)
  • Databases (SQL queries as documents)

Optimization Tips

  • Chunk documents into 512-1024 token segments for better retrieval
  • Use metadata filters to narrow search scope
  • Implement re-ranking for higher quality results
  • Cache embeddings to avoid recomputation
  • Use hybrid search (vector + keyword) for better recall

Was this article helpful?