What is RAG?
Retrieval-Augmented Generation combines a retrieval system with a language model. Instead of relying solely on the LLM training data, RAG retrieves relevant documents from your knowledge base and includes them as context, producing more accurate and up-to-date responses.
Installing LlamaIndex
pip install llama-index llama-index-llms-ollama llama-index-embeddings-huggingface
# Or with OpenAI:
pip install llama-index llama-index-llms-openai
Basic RAG Pipeline
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# Configure LLM and embedding model
llm = Ollama(model="llama3", request_timeout=120)
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Load documents
documents = SimpleDirectoryReader("/path/to/docs").load_data()
# Create vector index
index = VectorStoreIndex.from_documents(
documents, embed_model=embed_model
)
# Query
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What is our refund policy?")
print(response)
Advanced: Persistent Storage
from llama_index.core import StorageContext, load_index_from_storage
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
# Use ChromaDB for persistent vector storage
chroma_client = chromadb.PersistentClient(path="/opt/chroma_db")
collection = chroma_client.get_or_create_collection("my_docs")
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Index documents (only needed once)
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
# Later, load existing index
index = VectorStoreIndex.from_vector_store(vector_store)
Document Types Supported
- PDF, DOCX, PPTX, XLSX
- HTML, Markdown, plain text
- CSV, JSON
- Notion, Slack, GitHub (via readers)
- Databases (SQL queries as documents)
Optimization Tips
- Chunk documents into 512-1024 token segments for better retrieval
- Use metadata filters to narrow search scope
- Implement re-ranking for higher quality results
- Cache embeddings to avoid recomputation
- Use hybrid search (vector + keyword) for better recall