Building RAG Applications with LangChain and Vector Databases
Learn how to build production-ready Retrieval-Augmented Generation systems using LangChain, ChromaDB, and OpenAI embeddings for accurate, grounded AI responses.
LangChainRAGAIVector DB
What is RAG?
Retrieval-Augmented Generation (RAG) is a technique that combines the power of large language models with external knowledge retrieval. Instead of relying solely on the model's training data, RAG fetches relevant documents at query time and uses them as context.
This approach solves two critical problems:
- Hallucination — grounding responses in real data
- Stale knowledge — accessing up-to-date information beyond the training cutoff
Architecture Overview
A typical RAG pipeline consists of three stages:
import { ChatOpenAI } from "@langchain/openai";
import { ChromaClient } from "chromadb";
import { OpenAIEmbeddings } from "@langchain/openai";
// 1. Embed your documents
const embeddings = new OpenAIEmbeddings({
model: "text-embedding-3-small",
});
// 2. Store in a vector database
const client = new ChromaClient();
const collection = await client.getOrCreateCollection({
name: "knowledge-base",
});
// 3. Query and generate
async function askQuestion(query: string): Promise<string> {
const queryEmbedding = await embeddings.embedQuery(query);
const results = await collection.query({
queryEmbeddings: [queryEmbedding],
nResults: 5,
});
const context = results.documents.flat().join("\n\n");
const llm = new ChatOpenAI({ model: "gpt-4o" });
const response = await llm.invoke(
`Context:\n${context}\n\nQuestion: ${query}`
);
return response.content as string;
}Chunking Strategies
How you split your documents matters enormously for retrieval quality:
| Strategy | Best For | Chunk Size |
|---|---|---|
| Fixed-size | Simple documents | 500-1000 tokens |
| Recursive | Structured text | 500-800 tokens |
| Semantic | Complex docs | Variable |
| Sentence-based | Q&A systems | 3-5 sentences |
Key Takeaways
- Start with recursive character splitting — it works well for most cases
- Use hybrid search (keyword + semantic) for better recall
- Always include a reranking step before feeding context to the LLM
- Monitor your pipeline with tools like LangSmith for observability
RAG is not a silver bullet, but when implemented correctly, it dramatically improves the reliability and accuracy of your AI applications.