A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya
Building a RAG (Retrieval-Augmented Generation) application isn’t just about plugging in a few tools—it’s about choosing the right stack that makes retrieval and generation not just possible but efficient and scalable.
Let’s say you’re working on something like “Smart Chat with PDF”—an AI app that lets users interact with PDFs conversationally. It’s not as simple as just loading a file and asking questions. You need to:
- Extract relevant content from the PDF
- Chunk the text into meaningful pieces
- Store those chunks in a vector database
- Then, when a user asks something, the app runs a similarity search, fetches the most relevant chunks, and passes them to the language model to generate a coherent and accurate response
Sounds like a lot? It is. Working across multiple tools, frameworks, and databases can get overwhelming fast.
That’s exactly why I created the RAG Developer’s Stack—a curated set of tools and frameworks designed to streamline this whole process. From smart data extractors to efficient vector databases and cost-effective generation models, it’s everything you need to build robust, production-ready RAG applications without reinventing the wheel every time.
Table of contents
- Why You Need RAG Developer Stack?
- RAG Developer Stack for Your Next Project
- Large Language Models (LLMs)
- LLMs Used in Response Generation for RAG
- Frameworks
- Data Extraction
- Embeddings
- Vector Databases
- Rerankers
- Evaluation
- Open LLMs Access
- Conclusion
Why You Need RAG Developer Stack?
Firstly, here is a brief on RAG – Retrieval-Augmented Generation (RAG) enhance the capabilities of large language models (LLMs) by integrating external information retrieval mechanisms. This approach allows LLMs to generate more accurate, contextually relevant, and factually grounded responses by supplementing their static training data with up-to-date or domain-specific information.
How does RAG work?
RAG operates in four key stages:
- Indexing: Data from external sources (e.g., documents, databases) is converted into vector representations (embeddings) and stored in a vector database. This enables efficient retrieval of relevant information.
- Retrieval: When a user submits a query, the system retrieves the most relevant data from the indexed sources using similarity-based search techniques.
- Augmentation: The retrieved information is combined with the user’s query through prompt engineering, effectively “augmenting” the input to the LLM.
- Generation: The LLM uses both its internal knowledge and the augmented prompt to produce a response. This process ensures that the output is informed by both pre-trained data and real-time, authoritative sources.
Now, why do you need a RAG developer stack?
Why Do You Need a RAG Developer Stack?
- Accelerate Development: Leverage pre-built, ready-to-integrate components to move from prototype to production faster.
- Boost Accuracy: Retrieve real-time, context-relevant data to ground responses and reduce hallucinations.
- Strengthen Deployment: Built-in tools enhance security, observability, and scalability, making production readiness a smoother ride.
- Maximize Flexibility: Modular design lets you mix and match tools, adapting to the unique demands of different industries and use cases.
- Customizable by Design: Developers can hand-pick components that fit their workflow, architecture, and performance goals.
RAG Developer Stack for Your Next Project
Here are 9 things you should know to develop RAG Projects:
1. Large Language Models (LLMs)
LLMs are the brains of RAG systems, leveraging transformer-based architectures to generate coherent and contextually relevant text. These models come in two categories:
- Open-source LLMs: Examples include LLaMA, Falcon, Cohere and more, which allow customization and local deployment.
- Closed LLMs: Proprietary models like GPT-4 and Bard offer advanced capabilities but are typically accessible via APIs.
Example of LLM Usage in RAG
I have already imported the JSON Documents using the JSON Loader and here is the pipeline for understanding how LLM is used in RAG.
Prompt Template
from langchain_core.prompts import ChatPromptTemplate rag_prompt = """You are an assistant who is an expert in question-answering tasks. Answer the following question using only the following pieces of retrieved context. If the answer is not in the context, do not make up answers, just say that you don't know. Keep the answer detailed and well formatted based on the information from the context. Question: {question} Context: {context} Answer: """ rag_prompt_template = ChatPromptTemplate.from_template(rag_prompt)
Pipeline Construction
from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI # Initialize ChatGPT model chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0) # Format documents into a single string def format_docs(docs): return "\n\n".join(doc.page_content for doc in docs) # Construct the RAG pipeline qa_rag_chain = ( { "context": (similarity_retriever | format_docs), "question": RunnablePassthrough() } | rag_prompt_template | chatgpt )
Example Usage
query = "What is the difference between AI, ML, and DL?" result = qa_rag_chain.invoke(query) # Display the generated answer from IPython.display import display, Markdown display(Markdown(result.content))
Output
2. LLMs Used in Response Generation for RAG
In Retrieval-Augmented Generation (RAG) systems, the response generation LLM plays an important role as the final decision-maker — it takes the retrieved documents, user query, and context and synthesizes everything into a coherent, relevant, and often conversational response. While retrieval models bring in potentially useful information, the LLM can reason, summarize, and contextualize, which ensures the output feels intelligent and human-like.
A strong response model can filter noisy or partial information, infer unstated connections, and deliver answers that align with user intent. This is especially critical in applications like enterprise search, customer support, legal/medical assistants, and technical Q&A, where users expect precise, grounded, and trustworthy responses.
In a nutshell, without a capable generation model, even the best retrieval stack falls flat — making this component the core brain of any RAG pipeline.
Commercial LLMs
Model | Developer | Key Strengths | Common Use Cases |
---|---|---|---|
GPT-4.5 | OpenAI | Advanced text generation, summarization, conversational fluency | Chatbots, customer support, content creation |
Claude 3.7 Sonnet | Anthropic | Real-time conversations, strong reasoning, “extended thinking mode” | Business automation, customer service |
Gemini 2.0 Pro | Google DeepMind | Multimodal (text image), high performance | Data analysis, enterprise automation, content generation |
Cohere Command R | Cohere | Retrieval-Augmented Generation (RAG), enterprise-grade design | Knowledge management, support automation, moderation |
DeepSeek | DeepSeek AI | On-premise deployment, secure data handling, high customizability | Finance, healthcare, privacy-sensitive industries |
Open-Source LLMs
Model | Developer | Key Strengths | Common Use Cases |
---|---|---|---|
LLaMA 3 | Meta | Scalable (up to 405B params), multimodal capabilities | Conversational AI, research, content generation |
Mistral 7B | Mistral AI | Lightweight yet powerful, optimized for code and chat | Code generation, chatbots, content automation |
Falcon 180B | Technology Innovation Institute | Efficient, high-performance, open-access | Real-time applications, science/research bots |
DeepSeek R1 | DeepSeek AI | Strong logic/reasoning, 128K context window | Math tasks, summarization, complex reasoning |
Qwen2.5-72B-Instruct |
Alibaba Cloud | 72.7 billion parameters, supporting long contexts up to 128K tokens. coding, mathematical reasoning, and multilingual support. |
Generates structured outputs like JSON, making it highly versatile for technical applications in RAG workflows. |
3. Frameworks
The Frameworks simplify the development of RAG applications by providing pre-built components:
- LangChain: Framework for LLM application development with modular architecture for prompt management, chaining, memory handling, and agent creation. Excels at building RAG pipelines with built-in support for document loaders, retrievers, and vector stores.
- LlamaIndex: Specialized framework for data indexing and retrieval, connecting unstructured data with language models through custom indices. Optimized for ingesting, transforming, and querying large datasets for chatbots and knowledge management.
- LangGraph: It integrates LLMs with graph-based structures, allowing developers to define application logic using nodes and edges. Ideal for complex workflows with multiple branches and feedback loops, especially in multi-agent systems.
- RAGFlow: A Framework specifically for Retrieval-Augmented Generation systems, orchestrating retrievers, rankers, and generators into coherent pipelines. Enhances relevance when pulling from external data sources for search-driven interfaces and Q&A systems.
Frameworks like LangChain, LangGraph, and LlamaIndex significantly streamline RAG (Retrieval-Augmented Generation) development by offering modular tools for integrating retrieval and generation processes. LangChain simplifies chaining LLM calls, managing prompts, and connecting to vector stores. LangGraph introduces graph-based flow control, enabling dynamic and multi-step RAG workflows. LlamaIndex focuses on data ingestion, indexing, and retrieval, making large datasets queryable by LLMs. Together, they abstract away complex infrastructure, allowing developers to focus on logic and data quality. These tools enable rapid prototyping and robust deployment of RAG applications for tasks like question answering, document search, and knowledge assistance.
Example of Frameworks for RAG Building
Let’s build a simple RAG using LangChain:
%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph !pip install -qU "langchain[openai]"
Chat model
import getpass import os if not os.environ.get("OPENAI_API_KEY"): os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ") from langchain.chat_models import init_chat_model llm = init_chat_model("gpt-4o-mini", model_provider="openai")
Select embeddings model
from langchain_openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
Select vector store
from langchain_core.vectorstores import InMemoryVectorStore vector_store = InMemoryVectorStore(embeddings)
Creating the indexing pipeline
import bs4 from langchain import hub from langchain_community.document_loaders import WebBaseLoader from langchain_core.documents import Document from langchain_text_splitters import RecursiveCharacterTextSplitter from langgraph.graph import START, StateGraph from typing_extensions import List, TypedDict # Load and chunk contents of the blog loader = WebBaseLoader( web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs=dict( parse_only=bs4.SoupStrainer( class_=("post-content", "post-title", "post-header") ) ), ) docs = loader.load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200) all_splits = text_splitter.split_documents(docs) # Index chunks _ = vector_store.add_documents(documents=all_splits) # Define prompt for question-answering prompt = hub.pull("rlm/rag-prompt") # Define state for application class State(TypedDict): question: str context: List[Document] answer: str # Define application steps def retrieve(state: State): retrieved_docs = vector_store.similarity_search(state["question"]) return {"context": retrieved_docs} def generate(state: State): docs_content = "\n\n".join(doc.page_content for doc in state["context"]) messages = prompt.invoke({"question": state["question"], "context": docs_content}) response = llm.invoke(messages) return {"answer": response.content} # Compile application and test graph_builder = StateGraph(State).add_sequence([retrieve, generate]) graph_builder.add_edge(START, "retrieve") graph = graph_builder.compile()
response = graph.invoke({"question": "What are Types of Memory?"}) print(response["answer"])
Output
The types of memory include Sensory Memory, Short-Term Memory (STM), and<br> Long-Term Memory (LTM). Sensory Memory retains impressions of sensory<br> information for a few seconds, while Short-Term Memory holds currently<br> relevant information for 20-30 seconds. Long-Term Memory can store<br> information for days to decades and includes explicit (declarative) and<br> implicit (procedural) memory.
4. Data Extraction
If you are extracting the data from other sources, then data extraction tools work very well. RAG applications require robust tools for extracting structured and unstructured data from various sources:
- Websites, PDFs, Word documents, slides, etc.
- Tools like BeautifulSoup or PyPDF2 can automate this process.
Example of Data Extraction for RAG Building
pip install -U langchain-community %pip install langchain pypdf
Let’s extract content from the PDF
# %pip install langchain pypdf from langchain.document_loaders import PyPDFLoader # Define the path to your PDF file pdf_path = "/content/Multimodal Agent Using Agno Framework.pdf" # Initialize the PyPDFLoader loader = PyPDFLoader(pdf_path) # Load the PDF and split it into pages documents = loader.load() # Print the content of each page for i, doc in enumerate(documents): print(f"Page {i 1} Content:") print(doc.page_content) print("\n")
Output
5. Embeddings
Text embeddings transform textual data into numerical vectors for similarity-based retrieval. Beyond text embeddings:
- Image embeddings: Used in multimodal RAG applications.
- Multi-modal embeddings: Combine text, image, and other data types for complex tasks.
Here are the embedding models across providers:
OpenAI Embeddings
- Latest models: text-embedding-3-small (lower cost) and text-embedding-3-large (higher accuracy)
- Features: Dynamic dimension adjustment (e.g., 256-3072 dim), multilingual support, optimized for search/RAG
Cohere Embed v3
- Specializes in document quality ranking and noisy data handling
- Models: English/multilingual variants (1024/384 dim), compression-aware training for cost efficiency
Nomic Embed v2
- Open-source MoE architecture (305M active params) with Matryoshka embeddings
- Multilingual (100 languages), outperforms models 2x its size on MTEB/BEIR benchmarks
Gemini Embedding
- Experimental model (gemini-embedding-exp-03-07) with 8K token input and 3K dimensions
- MTEB leaderboard leader (68.32 mean score), supports 100 languages
Ollama Embeddings
- Hosts models like mxbai-embed-large and custom variants (e.g., suntray-embedding)
- Designed for RAG workflows with local inference and ChromaDB integration
BGE (BAAI)
- BERT-based models (large/base/small-en-v1.5) for retrieval/RAG
- Open-source, supports instruction tuning (e.g., “Represent this sentence…”)
Mixedbread
- The mxbai-embed-large-v1 model by Mixedbread AI is a state-of-the-art sentence embedding solution designed for multilingual and multimodal retrieval tasks.
- It supports advanced techniques like Matryoshka Representation Learning (MRL) and binary quantization, enabling efficient memory usage and cost reduction at scale. With strong performance across diverse tasks, it rivals larger proprietary models while maintaining open-source accessibility
Splitting the PDF content into chunks
from langchain.document_loaders import PyMuPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter def create_simple_chunks(file_path, chunk_size=3500, chunk_overlap=200): loader = PyMuPDFLoader(file_path) doc_pages = loader.load() splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap) return splitter.split_documents(doc_pages) from glob import glob pdf_files = glob('./rag_docs/*.pdf') # Process PDF files paper_docs = [] for fp in pdf_files: paper_docs.extend(create_simple_chunks(file_path=fp))
Output
Loading pages: ./rag_docs/cnn_paper.pdf<br><br>Chunking pages: ./rag_docs/cnn_paper.pdf<br><br>Finished processing: ./rag_docs/cnn_paper.pdf<br><br>Loading pages: ./rag_docs/attention_paper.pdf<br><br>Chunking pages: ./rag_docs/attention_paper.pdf<br><br>Finished processing: ./rag_docs/attention_paper.pdf<br><br>Loading pages: ./rag_docs/vision_transformer.pdf<br><br>Chunking pages: ./rag_docs/vision_transformer.pdf<br><br>Finished processing: ./rag_docs/vision_transformer.pdf<br><br>Loading pages: ./rag_docs/resnet_paper.pdf<br><br>Chunking pages: ./rag_docs/resnet_paper.pdf<br><br>Finished processing: ./rag_docs/resnet_paper.pdf
Creating the Embeddings
from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma # Initialize embedding model openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small') # Combine documents total_docs = wiki_docs_processed paper_docs # Create and save vector database chroma_db = Chroma.from_documents(documents=total_docs, collection_name='my_db', embedding=openai_embed_model, collection_metadata={"hnsw:space": "cosine"}, persist_directory="./my_db")
6. Vector Databases
Vector databases store embeddings (numerical representations of text or other data), enabling efficient retrieval of semantically similar chunks. Examples include:
- Pinecone: A managed vector database platform designed for high-performance and scalable applications, enabling efficient storage and retrieval of high-dimensional vector embeddings.
- Chroma DB: An open-source AI-native embedding database that includes features like vector search, document storage, full-text search, and metadata filtering, facilitating seamless retrieval in AI applications.
- Qdrant: An open-source vector database and search engine written in Rust, offering fast and scalable vector similarity search services with extended filtering support, suitable for neural-network or semantic-based matching.
- Milvus DB: An open-source vector database built for scalable similarity search, capable of handling large-scale and dynamic vector data, and supporting various index types for efficient retrieval.
- Weaviate: An open-source vector database that stores both objects and vectors, allowing for combining vector search with structured filtering, and is modular, cloud-native, and real-time.
Example of Vector Database for RAG Building
Note: Above we already did make the embeddings, and now we will store them in the vector database.
Using Chroma db to store the embeddings
from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma # Initialize embedding model openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small') # Combine documents total_docs = wiki_docs_processed paper_docs # Create and save vector database chroma_db = Chroma.from_documents(documents=total_docs, collection_name='my_db', embedding=openai_embed_model, collection_metadata={"hnsw:space": "cosine"}, persist_directory="./my_db")
Loading the Vector database
chroma_db = Chroma(persist_directory="./my_db", collection_name='my_db', embedding_function=openai_embed_model)
Retrieving the information and getting the output
similarity_retriever = chroma_db.as_retriever(search_type="similarity", search_kwargs={"k": 5}) # Query for semantic similarity query = "What is machine learning?" top_docs = similarity_retriever.invoke(query) # Display results from IPython.display import display, Markdown def display_docs(docs): for doc in docs: print('Metadata:', doc.metadata) print('Content Brief:') display(Markdown(doc.page_content[:1000])) print() display_docs(top_docs)
Output
7. Rerankers
Rerankers refine the retrieval process by improving the relevance of retrieved documents:
They operate in a two-stage retrieval pipeline:
- Initial recall retrieves a broad set of candidates from the vector database.
- Rerankers prioritize the most relevant documents based on additional scoring mechanisms like semantic similarity or contextual relevance.
This approach significantly enhances the precision of RAG systems.
By integrating rerankers into the stack, developers can ensure higher-quality responses tailored to user queries while optimizing retrieval efficiency.
Also read: Comprehensive Guide on Reranker for RAG
Example of Rerankers for RAG Building
%pip install --upgrade --quiet cohere
Set up the Cohere and ContextualCompressionRetriever
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever from langchain_cohere import CohereRerank from langchain_community.llms import Cohere from langchain.chains import RetrievalQA llm = Cohere(temperature=0) compressor = CohereRerank(model="rerank-english-v3.0") compression_retriever = ContextualCompressionRetriever( base_compressor=compressor, base_retriever=retriever ) chain = RetrievalQA.from_chain_type( llm=Cohere(temperature=0), retriever=compression_retriever )
Output
8. Evaluation
Evaluation ensures the accuracy and relevance of RAG systems:
- Giskard: A library for testing machine learning pipelines.
- Ragas: Specifically designed to evaluate RAG pipelines by analyzing retrieval quality and generated outputs.
- Arize Phoenix: An open-source observability library for evaluating, troubleshooting, and improving LLM outputs with features like model drift detection and cohort analysis.
- Comet Opik: A fully open-source platform for evaluating, testing, and monitoring LLM applications with tools for observability, automated scoring, and unit testing across the development lifecycle
-
DeepEval: deepevaloffers three LLM evaluation metrics to evaluate retrievals:
- ContextualPrecisionMetric: evaluates whether thererankerin your retriever ranks more relevant nodes in your retrieval context higher than irrelevant ones.
- ContextualRecallMetric: evaluates whether theembedding modelin your retriever is able to accurately capture and retrieve relevant information based on the context of the input.
- ContextualRelevancyMetric: evaluates whether thetext chunk sizeandtop-Kof your retriever is able to retrieve information without much irrelevancy.
Example of Evaluation for RAG Building
from tqdm.notebook import tqdm from datasets import load_dataset from qdrant_client import QdrantClient from tqdm import tqdm from langchain.docstore.document import Document as LangchainDocument from langchain_text_splitters import RecursiveCharacterTextSplitter from openai import OpenAI import deepeval # Get your key from https://platform.openai.com/api-keys OPENAI_API_KEY = "<openai_api_key>" # Get your Confident AI API key from https://app.confident-ai.com CONFIDENT_AI_API_KEY = "<confident_ai_api_key>" # Get a FREE forever cluster at https://cloud.qdrant.io/ # More info: https://qdrant.tech/documentation/cloud/create-cluster/ QDRANT_URL = "<qdrant_url>" QDRANT_API_KEY = "<qdrant_api_key>" COLLECTION_NAME = "qdrant-deepeval" EVAL_SIZE = 10 RETRIEVAL_SIZE = 3 dataset = load_dataset("atitaarora/qdrant_doc", split="train") langchain_docs = [ LangchainDocument( page_content=doc["text"], metadata={"source": doc["source"]} ) for doc in tqdm(dataset) ] text_splitter = RecursiveCharacterTextSplitter( chunk_size=512, chunk_overlap=50, add_start_index=True, separators=["\n\n", "\n", ".", " ", ""], ) docs_processed = [] for doc in langchain_docs: docs_processed = text_splitter.split_documents([doc]) client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY) docs_contents, docs_metadatas = [], [] for doc in docs_processed: if hasattr(doc, "page_content") and hasattr(doc, "metadata"): docs_contents.append(doc.page_content) docs_metadatas.append(doc.metadata) else: print( "Warning: Some documents do not have 'page_content' or 'metadata' attributes." ) # Uses FastEmbed - https://qdrant.tech/documentation/fastembed/ # To generate embeddings for the documents # The default model is `BAAI/bge-small-en-v1.5` client.add( collection_name=COLLECTION_NAME, metadata=docs_metadatas, documents=docs_contents, ) openai_client = OpenAI(api_key=OPENAI_API_KEY) def query_with_context(query, limit): search_result = client.query( collection_name=COLLECTION_NAME, query_text=query, limit=limit ) contexts = [ "document: " r.document ",source: " r.metadata["source"] for r in search_result ] prompt_start = """ You're assisting a user who has a question based on the documentation. Your goal is to provide a clear and concise response that addresses their query while referencing relevant information from the documentation. Remember to: Understand the user's question thoroughly. If the user's query is general (e.g., "hi," "good morning"), greet them normally and avoid using the context from the documentation. If the user's query is specific and related to the documentation, locate and extract the pertinent information. Craft a response that directly addresses the user's query and provides accurate information referring the relevant source and page from the 'source' field of fetched context from the documentation to support your answer. Use a friendly and professional tone in your response. If you cannot find the answer in the provided context, do not pretend to know it. Instead, respond with "I don't know". Context:\n""" prompt_end = f"\n\nQuestion: {query}\nAnswer:" prompt = prompt_start "\n\n---\n\n".join(contexts) prompt_end res = openai_client.completions.create( model="gpt-3.5-turbo-instruct", prompt=prompt, temperature=0, max_tokens=636, top_p=1, frequency_penalty=0, presence_penalty=0, stop=None, ) return (contexts, res.choices[0].text) qdrant_qna_dataset = load_dataset("atitaarora/qdrant_doc_qna", split="train") def create_deepeval_dataset(dataset, eval_size, retrieval_window_size): test_cases = [] for i in range(eval_size): entry = dataset[i] question = entry["question"] answer = entry["answer"] context, rag_response = query_with_context( question, retrieval_window_size ) test_case = deepeval.test_case.LLMTestCase( input=question, actual_output=rag_response, expected_output=answer, retrieval_context=context, ) test_cases.append(test_case) return test_cases test_cases = create_deepeval_dataset( qdrant_qna_dataset, EVAL_SIZE, RETRIEVAL_SIZE ) deepeval.login_with_confident_api_key(CONFIDENT_AI_API_KEY) deepeval.evaluate( test_cases=test_cases, metrics=[ deepeval.metrics.AnswerRelevancyMetric(), deepeval.metrics.FaithfulnessMetric(), deepeval.metrics.ContextualPrecisionMetric(), deepeval.metrics.ContextualRecallMetric(), deepeval.metrics.ContextualRelevancyMetric(), ], )</qdrant_api_key></qdrant_url></confident_ai_api_key></openai_api_key>
9. Open LLMs Access
Platforms enabling local or API-based access to open LLMs include:
- Ollama: Allows running open LLMs locally.
- Groq, Hugging Face, Together AI: Provide API integrations for open LLMs.
Example of Open LLMs Access for RAG Building
Download Ollama:Click here to download
curl -fsSL https://ollama.com/install.sh | sh
After this, pull the DeepSeek R1:1.5b using:
ollama pull deepseek-r1:1.5b
Install the required libraries
!pip install langchain==0.3.11 !pip install langchain-openai==0.2.12 !pip install langchain-community==0.3.11 !pip install langchain-chroma==0.1.4
Open AI Embedding Models
from langchain_openai import OpenAIEmbeddings openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')
Create a Vector DB and persist on the disk
from langchain_community.document_loaders import PyPDFLoader loader = PyPDFLoader('AgenticAI.pdf') pages = loader.load_and_split() texts = [doc.page_content for doc in pages] from langchain_chroma import Chroma chroma_db = Chroma.from_texts( texts=texts, collection_name='db_docs', collection_metadata={"hnsw:space": "cosine"}, # Set distance function to cosine embedding=openai_embed_model )
Build a RAG Chain
from langchain_core.prompts import ChatPromptTemplate prompt = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If no context is present or if you don't know the answer, just say that you don't know. Do not make up the answer unless it is there in the provided context. Keep the answer concise and to the point with regard to the question. Question: {question} Context: {context} Answer: """ prompt_template = ChatPromptTemplate.from_template(prompt)
Load Connection to LLM
from langchain_community.llms import Ollama deepseek = Ollama(model="deepseek-r1:1.5b")
LangChain Syntax for RAG Chain
from langchain.chains import Retrieval rag_chain = Retrieval.from_chain_type(llm=deepseek, chain_type="stuff", retriever=similarity_threshold_retriever, chain_type_kwargs={"prompt": prompt_template}) query = "Tell the Leaders’ Perspectives on Agentic AI" rag_chain.invoke(query) {'query': 'Tell the Leaders’ Perspectives on Agentic AI',
Output
Conclusion
Building effective RAG applications isn’t just about plugging in a language model—it’s about choosing the right RAG Developer stack across the board, from frameworks and embeddings to vector databases and retrieval tools. When these components are thoughtfully integrated, they enable intelligent, scalable systems that can chat with PDFs, pull relevant facts in real time, and generate context-aware responses. As the ecosystem continues to evolve, staying agile with your tools and grounded in solid architecture will be key to building reliable, future-proof AI solutions.
The above is the detailed content of A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Meta's Llama 3.2: A Leap Forward in Multimodal and Mobile AI Meta recently unveiled Llama 3.2, a significant advancement in AI featuring powerful vision capabilities and lightweight text models optimized for mobile devices. Building on the success o

Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let’

This week's AI landscape: A whirlwind of advancements, ethical considerations, and regulatory debates. Major players like OpenAI, Google, Meta, and Microsoft have unleashed a torrent of updates, from groundbreaking new models to crucial shifts in le

Shopify CEO Tobi Lütke's recent memo boldly declares AI proficiency a fundamental expectation for every employee, marking a significant cultural shift within the company. This isn't a fleeting trend; it's a new operational paradigm integrated into p

Introduction OpenAI has released its new model based on the much-anticipated “strawberry” architecture. This innovative model, known as o1, enhances reasoning capabilities, allowing it to think through problems mor

Introduction Imagine walking through an art gallery, surrounded by vivid paintings and sculptures. Now, what if you could ask each piece a question and get a meaningful answer? You might ask, “What story are you telling?

For those of you who might be new to my column, I broadly explore the latest advances in AI across the board, including topics such as embodied AI, AI reasoning, high-tech breakthroughs in AI, prompt engineering, training of AI, fielding of AI, AI re

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu
