How to Set Up a RAG Pipeline with LangChain

By Admin · Mar 2, 2026 · Updated Jun 25, 2026 · 57 views · 3 min read

How to Set Up a RAG Pipeline with LangChain

Retrieval-Augmented Generation (RAG) combines the power of large language models with your own documents to produce accurate, grounded answers. LangChain is the leading framework for building RAG pipelines, and running the entire stack on your Breeze keeps your data private and under your control.

Prerequisites

A Breeze instance with at least 4 GB of RAM
Python 3.10 or later
A running vector database (Qdrant, ChromaDB, or FAISS)
An LLM endpoint (local or API-based)

Installing LangChain and Dependencies

python3 -m venv ~/rag-env
source ~/rag-env/bin/activate
pip install langchain langchain-community langchain-huggingface \
    sentence-transformers chromadb pypdf unstructured

Loading and Splitting Documents

The first step in any RAG pipeline is ingesting your documents and splitting them into chunks:

from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = DirectoryLoader("/data/documents/", glob="**/*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)
print(f"Split {len(documents)} documents into {len(chunks)} chunks")

Creating Embeddings and Storing in a Vector Database

Generate embeddings for each chunk and store them in ChromaDB:

from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="/data/chromadb"
)

Building the RAG Chain

Connect the retriever to an LLM to build the complete RAG pipeline:

from langchain.chains import RetrievalQA
from langchain_community.llms import Ollama

llm = Ollama(model="llama3", base_url="http://localhost:11434")
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

result = qa_chain.invoke({"query": "What is our refund policy?"})
print(result["result"])
for doc in result["source_documents"]:
    print(f"  Source: {doc.metadata['source']}, Page: {doc.metadata.get('page', 'N/A')}")

Adding a Custom Prompt Template

Customize the system prompt to control the model’s behavior:

from langchain.prompts import PromptTemplate

template = """Use the following context to answer the question. If you cannot find the answer in the context, say "I don't have enough information to answer that."

Context: {context}
Question: {question}
Answer:"""

prompt = PromptTemplate(template=template, input_variables=["context", "question"])
qa_chain = RetrievalQA.from_chain_type(
    llm=llm, retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)

Serving the RAG Pipeline as an API

Expose your RAG pipeline via FastAPI so other applications can query it:

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Question(BaseModel):
    query: str

@app.post("/ask")
async def ask(q: Question):
    result = qa_chain.invoke({"query": q.query})
    return {"answer": result["result"]}

Run with uvicorn main:app --host 0.0.0.0 --port 8000 and your Breeze becomes a private knowledge base with AI-powered answers.

How to Set Up a RAG Pipeline with LangChain