How to Set Up a RAG Pipeline with LangChain
Retrieval-Augmented Generation (RAG) combines the power of large language models with your own documents to produce accurate, grounded answers. LangChain is the leading framework for building RAG pipelines, and running the entire stack on your Breeze keeps your data private and under your control.
Prerequisites
- A Breeze instance with at least 4 GB of RAM
- Python 3.10 or later
- A running vector database (Qdrant, ChromaDB, or FAISS)
- An LLM endpoint (local or API-based)
Installing LangChain and Dependencies
python3 -m venv ~/rag-env
source ~/rag-env/bin/activate
pip install langchain langchain-community langchain-huggingface \
sentence-transformers chromadb pypdf unstructured
Loading and Splitting Documents
The first step in any RAG pipeline is ingesting your documents and splitting them into chunks:
from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = DirectoryLoader("/data/documents/", glob="**/*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)
print(f"Split {len(documents)} documents into {len(chunks)} chunks")
Creating Embeddings and Storing in a Vector Database
Generate embeddings for each chunk and store them in ChromaDB:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="/data/chromadb"
)
Building the RAG Chain
Connect the retriever to an LLM to build the complete RAG pipeline:
from langchain.chains import RetrievalQA
from langchain_community.llms import Ollama
llm = Ollama(model="llama3", base_url="http://localhost:11434")
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True
)
result = qa_chain.invoke({"query": "What is our refund policy?"})
print(result["result"])
for doc in result["source_documents"]:
print(f" Source: {doc.metadata['source']}, Page: {doc.metadata.get('page', 'N/A')}")
Adding a Custom Prompt Template
Customize the system prompt to control the model’s behavior:
from langchain.prompts import PromptTemplate
template = """Use the following context to answer the question. If you cannot find the answer in the context, say "I don't have enough information to answer that."
Context: {context}
Question: {question}
Answer:"""
prompt = PromptTemplate(template=template, input_variables=["context", "question"])
qa_chain = RetrievalQA.from_chain_type(
llm=llm, retriever=retriever,
chain_type_kwargs={"prompt": prompt}
)
Serving the RAG Pipeline as an API
Expose your RAG pipeline via FastAPI so other applications can query it:
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class Question(BaseModel):
query: str
@app.post("/ask")
async def ask(q: Question):
result = qa_chain.invoke({"query": q.query})
return {"answer": result["result"]}
Run with uvicorn main:app --host 0.0.0.0 --port 8000 and your Breeze becomes a private knowledge base with AI-powered answers.