A Comprehensive Guide to RAG Developer Stack

Building a RAG (Retrieval-Augmented Generation) application isn’t just about plugging in a few tools—it’s about choosing the right stack that makes retrieval and generation not just possible but efficient and scalable.

Let’s say you’re working on something like “Smart Chat with PDF”—an AI app that lets users interact with PDFs conversationally. It’s not as simple as just loading a file and asking questions. You need to:

Extract relevant content from the PDF
Chunk the text into meaningful pieces
Store those chunks in a vector database
Then, when a user asks something, the app runs a similarity search, fetches the most relevant chunks, and passes them to the language model to generate a coherent and accurate response

Sounds like a lot? It is. Working across multiple tools, frameworks, and databases can get overwhelming fast.

That’s exactly why I created the RAG Developer’s Stack—a curated set of tools and frameworks designed to streamline this whole process. From smart data extractors to efficient vector databases and cost-effective generation models, it’s everything you need to build robust, production-ready RAG applications without reinventing the wheel every time.

Why You Need RAG Developer Stack?
RAG Developer Stack for Your Next Project
Large Language Models (LLMs)
LLMs Used in Response Generation for RAG
Frameworks
Data Extraction
Embeddings
Vector Databases
Rerankers
Evaluation
Open LLMs Access
Conclusion

Why You Need RAG Developer Stack?

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Firstly, here is a brief on RAG – Retrieval-Augmented Generation (RAG) enhance the capabilities of large language models (LLMs) by integrating external information retrieval mechanisms. This approach allows LLMs to generate more accurate, contextually relevant, and factually grounded responses by supplementing their static training data with up-to-date or domain-specific information.

How does RAG work?

RAG operates in four key stages:

Indexing: Data from external sources (e.g., documents, databases) is converted into vector representations (embeddings) and stored in a vector database. This enables efficient retrieval of relevant information.
Retrieval: When a user submits a query, the system retrieves the most relevant data from the indexed sources using similarity-based search techniques.
Augmentation: The retrieved information is combined with the user’s query through prompt engineering, effectively “augmenting” the input to the LLM.
Generation: The LLM uses both its internal knowledge and the augmented prompt to produce a response. This process ensures that the output is informed by both pre-trained data and real-time, authoritative sources.

Now, why do you need a RAG developer stack?

Why Do You Need a RAG Developer Stack?

Accelerate Development: Leverage pre-built, ready-to-integrate components to move from prototype to production faster.
Boost Accuracy: Retrieve real-time, context-relevant data to ground responses and reduce hallucinations.
Strengthen Deployment: Built-in tools enhance security, observability, and scalability, making production readiness a smoother ride.
Maximize Flexibility: Modular design lets you mix and match tools, adapting to the unique demands of different industries and use cases.
Customizable by Design: Developers can hand-pick components that fit their workflow, architecture, and performance goals.

RAG Developer Stack for Your Next Project

Here are 9 things you should know to develop RAG Projects:

1. Large Language Models (LLMs)

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

LLMs are the brains of RAG systems, leveraging transformer-based architectures to generate coherent and contextually relevant text. These models come in two categories:

Open-source LLMs: Examples include LLaMA, Falcon, Cohere and more, which allow customization and local deployment.
Closed LLMs: Proprietary models like GPT-4 and Bard offer advanced capabilities but are typically accessible via APIs.

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Example of LLM Usage in RAG

I have already imported the JSON Documents using the JSON Loader and here is the pipeline for understanding how LLM is used in RAG.

Prompt Template

from langchain_core.prompts import ChatPromptTemplate
rag_prompt = """You are an assistant who is an expert in question-answering tasks.
Answer the following question using only the following pieces of retrieved context.
If the answer is not in the context, do not make up answers, just say that you don't know.
Keep the answer detailed and well formatted based on the information from the context.
Question:
{question}
Context:
{context}
Answer:
"""
rag_prompt_template = ChatPromptTemplate.from_template(rag_prompt)

Copy after login

Pipeline Construction

from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
# Initialize ChatGPT model
chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)
# Format documents into a single string
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
# Construct the RAG pipeline
qa_rag_chain = (
{
"context": (similarity_retriever | format_docs),
"question": RunnablePassthrough()
}
|
rag_prompt_template
|
chatgpt
)

Copy after login

Example Usage

query = "What is the difference between AI, ML, and DL?"
result = qa_rag_chain.invoke(query)
# Display the generated answer
from IPython.display import display, Markdown
display(Markdown(result.content))

Copy after login

Output

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

2. LLMs Used in Response Generation for RAG

In Retrieval-Augmented Generation (RAG) systems, the response generation LLM plays an important role as the final decision-maker — it takes the retrieved documents, user query, and context and synthesizes everything into a coherent, relevant, and often conversational response. While retrieval models bring in potentially useful information, the LLM can reason, summarize, and contextualize, which ensures the output feels intelligent and human-like.
A strong response model can filter noisy or partial information, infer unstated connections, and deliver answers that align with user intent. This is especially critical in applications like enterprise search, customer support, legal/medical assistants, and technical Q&A, where users expect precise, grounded, and trustworthy responses.

In a nutshell, without a capable generation model, even the best retrieval stack falls flat — making this component the core brain of any RAG pipeline.

Commercial LLMs

Model	Developer	Key Strengths	Common Use Cases
GPT-4.5	OpenAI	Advanced text generation, summarization, conversational fluency	Chatbots, customer support, content creation
Claude 3.7 Sonnet	Anthropic	Real-time conversations, strong reasoning, “extended thinking mode”	Business automation, customer service
Gemini 2.0 Pro	Google DeepMind	Multimodal (text image), high performance	Data analysis, enterprise automation, content generation
Cohere Command R	Cohere	Retrieval-Augmented Generation (RAG), enterprise-grade design	Knowledge management, support automation, moderation
DeepSeek	DeepSeek AI	On-premise deployment, secure data handling, high customizability	Finance, healthcare, privacy-sensitive industries

Open-Source LLMs

Model	Developer	Key Strengths	Common Use Cases
LLaMA 3	Meta	Scalable (up to 405B params), multimodal capabilities	Conversational AI, research, content generation
Mistral 7B	Mistral AI	Lightweight yet powerful, optimized for code and chat	Code generation, chatbots, content automation
Falcon 180B	Technology Innovation Institute	Efficient, high-performance, open-access	Real-time applications, science/research bots
DeepSeek R1	DeepSeek AI	Strong logic/reasoning, 128K context window	Math tasks, summarization, complex reasoning
Qwen2.5-72B-Instruct	Alibaba Cloud	72.7 billion parameters, supporting long contexts up to 128K tokens. coding, mathematical reasoning, and multilingual support.	Generates structured outputs like JSON, making it highly versatile for technical applications in RAG workflows.

3. Frameworks

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

The Frameworks simplify the development of RAG applications by providing pre-built components:

LangChain: Framework for LLM application development with modular architecture for prompt management, chaining, memory handling, and agent creation. Excels at building RAG pipelines with built-in support for document loaders, retrievers, and vector stores.
LlamaIndex: Specialized framework for data indexing and retrieval, connecting unstructured data with language models through custom indices. Optimized for ingesting, transforming, and querying large datasets for chatbots and knowledge management.
LangGraph: It integrates LLMs with graph-based structures, allowing developers to define application logic using nodes and edges. Ideal for complex workflows with multiple branches and feedback loops, especially in multi-agent systems.
RAGFlow: A Framework specifically for Retrieval-Augmented Generation systems, orchestrating retrievers, rankers, and generators into coherent pipelines. Enhances relevance when pulling from external data sources for search-driven interfaces and Q&A systems.

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Frameworks like LangChain, LangGraph, and LlamaIndex significantly streamline RAG (Retrieval-Augmented Generation) development by offering modular tools for integrating retrieval and generation processes. LangChain simplifies chaining LLM calls, managing prompts, and connecting to vector stores. LangGraph introduces graph-based flow control, enabling dynamic and multi-step RAG workflows. LlamaIndex focuses on data ingestion, indexing, and retrieval, making large datasets queryable by LLMs. Together, they abstract away complex infrastructure, allowing developers to focus on logic and data quality. These tools enable rapid prototyping and robust deployment of RAG applications for tasks like question answering, document search, and knowledge assistance.

Example of Frameworks for RAG Building

Let’s build a simple RAG using LangChain:

%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph
!pip install -qU "langchain[openai]"

Copy after login

Chat model

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

Copy after login

Select embeddings model

from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

Copy after login

Select vector store

from langchain_core.vectorstores import InMemoryVectorStore
vector_store = InMemoryVectorStore(embeddings)

Copy after login

Creating the indexing pipeline

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

Copy after login

response = graph.invoke({"question": "What are Types of Memory?"})
print(response["answer"])

Copy after login

Output

The types of memory include Sensory Memory, Short-Term Memory (STM), and<br> Long-Term Memory (LTM). Sensory Memory retains impressions of sensory<br> information for a few seconds, while Short-Term Memory holds currently<br> relevant information for 20-30 seconds. Long-Term Memory can store<br> information for days to decades and includes explicit (declarative) and<br> implicit (procedural) memory.

Copy after login

4. Data Extraction

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

If you are extracting the data from other sources, then data extraction tools work very well. RAG applications require robust tools for extracting structured and unstructured data from various sources:

Websites, PDFs, Word documents, slides, etc.
Tools like BeautifulSoup or PyPDF2 can automate this process.

Example of Data Extraction for RAG Building

pip install -U langchain-community
%pip install langchain pypdf

Copy after login

Let’s extract content from the PDF

# %pip install langchain pypdf

from langchain.document_loaders import PyPDFLoader

# Define the path to your PDF file
pdf_path = "/content/Multimodal Agent Using Agno Framework.pdf"

# Initialize the PyPDFLoader
loader = PyPDFLoader(pdf_path)

# Load the PDF and split it into pages
documents = loader.load()

# Print the content of each page
for i, doc in enumerate(documents):
    print(f"Page {i   1} Content:")
    print(doc.page_content)
    print("\n")

Copy after login

Output

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

5. Embeddings

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Text embeddings transform textual data into numerical vectors for similarity-based retrieval. Beyond text embeddings:

Image embeddings: Used in multimodal RAG applications.
Multi-modal embeddings: Combine text, image, and other data types for complex tasks.

Here are the embedding models across providers:

OpenAI Embeddings

Latest models: text-embedding-3-small (lower cost) and text-embedding-3-large (higher accuracy)
Features: Dynamic dimension adjustment (e.g., 256-3072 dim), multilingual support, optimized for search/RAG

Cohere Embed v3

Specializes in document quality ranking and noisy data handling
Models: English/multilingual variants (1024/384 dim), compression-aware training for cost efficiency

Nomic Embed v2

Open-source MoE architecture (305M active params) with Matryoshka embeddings
Multilingual (100 languages), outperforms models 2x its size on MTEB/BEIR benchmarks

Gemini Embedding

Experimental model (gemini-embedding-exp-03-07) with 8K token input and 3K dimensions
MTEB leaderboard leader (68.32 mean score), supports 100 languages

Ollama Embeddings

Hosts models like mxbai-embed-large and custom variants (e.g., suntray-embedding)
Designed for RAG workflows with local inference and ChromaDB integration

BGE (BAAI)

BERT-based models (large/base/small-en-v1.5) for retrieval/RAG
Open-source, supports instruction tuning (e.g., “Represent this sentence…”)

Mixedbread

The mxbai-embed-large-v1 model by Mixedbread AI is a state-of-the-art sentence embedding solution designed for multilingual and multimodal retrieval tasks.
It supports advanced techniques like Matryoshka Representation Learning (MRL) and binary quantization, enabling efficient memory usage and cost reduction at scale. With strong performance across diverse tasks, it rivals larger proprietary models while maintaining open-source accessibility

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Splitting the PDF content into chunks

from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
def create_simple_chunks(file_path, chunk_size=3500, chunk_overlap=200):
loader = PyMuPDFLoader(file_path)
doc_pages = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
return splitter.split_documents(doc_pages)
from glob import glob
pdf_files = glob('./rag_docs/*.pdf')
# Process PDF files
paper_docs = []
for fp in pdf_files:
paper_docs.extend(create_simple_chunks(file_path=fp))

Copy after login

Output

Loading pages: ./rag_docs/cnn_paper.pdf<br><br>Chunking pages: ./rag_docs/cnn_paper.pdf<br><br>Finished processing: ./rag_docs/cnn_paper.pdf<br><br>Loading pages: ./rag_docs/attention_paper.pdf<br><br>Chunking pages: ./rag_docs/attention_paper.pdf<br><br>Finished processing: ./rag_docs/attention_paper.pdf<br><br>Loading pages: ./rag_docs/vision_transformer.pdf<br><br>Chunking pages: ./rag_docs/vision_transformer.pdf<br><br>Finished processing: ./rag_docs/vision_transformer.pdf<br><br>Loading pages: ./rag_docs/resnet_paper.pdf<br><br>Chunking pages: ./rag_docs/resnet_paper.pdf<br><br>Finished processing: ./rag_docs/resnet_paper.pdf

Copy after login

Creating the Embeddings

from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
# Initialize embedding model
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')
# Combine documents
total_docs = wiki_docs_processed   paper_docs
# Create and save vector database
chroma_db = Chroma.from_documents(documents=total_docs,
collection_name='my_db',
embedding=openai_embed_model,
collection_metadata={"hnsw:space": "cosine"},
persist_directory="./my_db")

Copy after login

6. Vector Databases

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Vector databases store embeddings (numerical representations of text or other data), enabling efficient retrieval of semantically similar chunks. Examples include:

Pinecone: A managed vector database platform designed for high-performance and scalable applications, enabling efficient storage and retrieval of high-dimensional vector embeddings.
Chroma DB: An open-source AI-native embedding database that includes features like vector search, document storage, full-text search, and metadata filtering, facilitating seamless retrieval in AI applications.
Qdrant: An open-source vector database and search engine written in Rust, offering fast and scalable vector similarity search services with extended filtering support, suitable for neural-network or semantic-based matching.
Milvus DB: An open-source vector database built for scalable similarity search, capable of handling large-scale and dynamic vector data, and supporting various index types for efficient retrieval.
Weaviate: An open-source vector database that stores both objects and vectors, allowing for combining vector search with structured filtering, and is modular, cloud-native, and real-time.

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Example of Vector Database for RAG Building

Note: Above we already did make the embeddings, and now we will store them in the vector database.

Using Chroma db to store the embeddings

from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
# Initialize embedding model
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')
# Combine documents
total_docs = wiki_docs_processed   paper_docs
# Create and save vector database
chroma_db = Chroma.from_documents(documents=total_docs,
collection_name='my_db',
embedding=openai_embed_model,
collection_metadata={"hnsw:space": "cosine"},
persist_directory="./my_db")

Copy after login

Loading the Vector database

chroma_db = Chroma(persist_directory="./my_db",
collection_name='my_db',
embedding_function=openai_embed_model)

Copy after login

Retrieving the information and getting the output

similarity_retriever = chroma_db.as_retriever(search_type="similarity", search_kwargs={"k": 5})
# Query for semantic similarity
query = "What is machine learning?"
top_docs = similarity_retriever.invoke(query)
# Display results
from IPython.display import display, Markdown
def display_docs(docs):
for doc in docs:
print('Metadata:', doc.metadata)
print('Content Brief:')
display(Markdown(doc.page_content[:1000]))
print()
display_docs(top_docs)

Copy after login

Output

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

7. Rerankers

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Rerankers refine the retrieval process by improving the relevance of retrieved documents:

They operate in a two-stage retrieval pipeline:

Initial recall retrieves a broad set of candidates from the vector database.
Rerankers prioritize the most relevant documents based on additional scoring mechanisms like semantic similarity or contextual relevance.
This approach significantly enhances the precision of RAG systems.

By integrating rerankers into the stack, developers can ensure higher-quality responses tailored to user queries while optimizing retrieval efficiency.

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Also read: Comprehensive Guide on Reranker for RAG

Example of Rerankers for RAG Building

%pip install --upgrade --quiet  cohere

Copy after login

Set up the Cohere and ContextualCompressionRetriever

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
from langchain_community.llms import Cohere
from langchain.chains import RetrievalQA

llm = Cohere(temperature=0)
compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
   base_compressor=compressor, base_retriever=retriever
)
chain = RetrievalQA.from_chain_type(
   llm=Cohere(temperature=0), retriever=compression_retriever
)

Copy after login

Output

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

8. Evaluation

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Evaluation ensures the accuracy and relevance of RAG systems:

Giskard: A library for testing machine learning pipelines.
Ragas: Specifically designed to evaluate RAG pipelines by analyzing retrieval quality and generated outputs.
Arize Phoenix: An open-source observability library for evaluating, troubleshooting, and improving LLM outputs with features like model drift detection and cohort analysis.
Comet Opik: A fully open-source platform for evaluating, testing, and monitoring LLM applications with tools for observability, automated scoring, and unit testing across the development lifecycle
DeepEval: deepevaloffers three LLM evaluation metrics to evaluate retrievals:
- ContextualPrecisionMetric: evaluates whether thererankerin your retriever ranks more relevant nodes in your retrieval context higher than irrelevant ones.
- ContextualRecallMetric: evaluates whether theembedding modelin your retriever is able to accurately capture and retrieve relevant information based on the context of the input.
- ContextualRelevancyMetric: evaluates whether thetext chunk sizeandtop-Kof your retriever is able to retrieve information without much irrelevancy.

Example of Evaluation for RAG Building

from tqdm.notebook import tqdm
from datasets import load_dataset
from qdrant_client import QdrantClient
from tqdm import tqdm
from langchain.docstore.document import Document as LangchainDocument
from langchain_text_splitters import RecursiveCharacterTextSplitter
from openai import OpenAI
import deepeval

# Get your key from https://platform.openai.com/api-keys
OPENAI_API_KEY = "<openai_api_key>"

# Get your Confident AI API key from https://app.confident-ai.com
CONFIDENT_AI_API_KEY = "<confident_ai_api_key>"

# Get a FREE forever cluster at https://cloud.qdrant.io/
# More info: https://qdrant.tech/documentation/cloud/create-cluster/
QDRANT_URL = "<qdrant_url>"
QDRANT_API_KEY = "<qdrant_api_key>"
COLLECTION_NAME = "qdrant-deepeval"

EVAL_SIZE = 10
RETRIEVAL_SIZE = 3

dataset = load_dataset("atitaarora/qdrant_doc", split="train")

langchain_docs = [
    LangchainDocument(
        page_content=doc["text"], metadata={"source": doc["source"]}
    )
    for doc in tqdm(dataset)
]

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=50,
    add_start_index=True,
    separators=["\n\n", "\n", ".", " ", ""],
)

docs_processed = []
for doc in langchain_docs:
    docs_processed  = text_splitter.split_documents([doc])

client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)

docs_contents, docs_metadatas = [], []

for doc in docs_processed:
    if hasattr(doc, "page_content") and hasattr(doc, "metadata"):
        docs_contents.append(doc.page_content)
        docs_metadatas.append(doc.metadata)
    else:
        print(
            "Warning: Some documents do not have 'page_content' or 'metadata' attributes."
        )

# Uses FastEmbed - https://qdrant.tech/documentation/fastembed/
# To generate embeddings for the documents
# The default model is `BAAI/bge-small-en-v1.5`
client.add(
    collection_name=COLLECTION_NAME,
    metadata=docs_metadatas,
    documents=docs_contents,
)

openai_client = OpenAI(api_key=OPENAI_API_KEY)


def query_with_context(query, limit):

    search_result = client.query(
        collection_name=COLLECTION_NAME, query_text=query, limit=limit
    )

    contexts = [
        "document: "   r.document   ",source: "   r.metadata["source"]
        for r in search_result
    ]
    prompt_start = """ You're assisting a user who has a question based on the documentation.
        Your goal is to provide a clear and concise response that addresses their query while referencing relevant information
        from the documentation.
        Remember to:
        Understand the user's question thoroughly.
        If the user's query is general (e.g., "hi," "good morning"),
        greet them normally and avoid using the context from the documentation.
        If the user's query is specific and related to the documentation, locate and extract the pertinent information.
        Craft a response that directly addresses the user's query and provides accurate information
        referring the relevant source and page from the 'source' field of fetched context from the documentation to support your answer.
        Use a friendly and professional tone in your response.
        If you cannot find the answer in the provided context, do not pretend to know it.
        Instead, respond with "I don't know".

        Context:\n"""

    prompt_end = f"\n\nQuestion: {query}\nAnswer:"

    prompt = prompt_start   "\n\n---\n\n".join(contexts)   prompt_end

    res = openai_client.completions.create(
        model="gpt-3.5-turbo-instruct",
        prompt=prompt,
        temperature=0,
        max_tokens=636,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None,
    )

    return (contexts, res.choices[0].text)


qdrant_qna_dataset = load_dataset("atitaarora/qdrant_doc_qna", split="train")


def create_deepeval_dataset(dataset, eval_size, retrieval_window_size):
    test_cases = []
    for i in range(eval_size):
        entry = dataset[i]
        question = entry["question"]
        answer = entry["answer"]
        context, rag_response = query_with_context(
            question, retrieval_window_size
        )
        test_case = deepeval.test_case.LLMTestCase(
            input=question,
            actual_output=rag_response,
            expected_output=answer,
            retrieval_context=context,
        )
        test_cases.append(test_case)
    return test_cases


test_cases = create_deepeval_dataset(
    qdrant_qna_dataset, EVAL_SIZE, RETRIEVAL_SIZE
)

deepeval.login_with_confident_api_key(CONFIDENT_AI_API_KEY)

deepeval.evaluate(
    test_cases=test_cases,
    metrics=[
        deepeval.metrics.AnswerRelevancyMetric(),
        deepeval.metrics.FaithfulnessMetric(),
        deepeval.metrics.ContextualPrecisionMetric(),
        deepeval.metrics.ContextualRecallMetric(),
        deepeval.metrics.ContextualRelevancyMetric(),
    ],
)</qdrant_api_key></qdrant_url></confident_ai_api_key></openai_api_key>

Copy after login

9. Open LLMs Access

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Platforms enabling local or API-based access to open LLMs include:

Ollama: Allows running open LLMs locally.
Groq, Hugging Face, Together AI: Provide API integrations for open LLMs.

Example of Open LLMs Access for RAG Building

Download Ollama:Click here to download

curl -fsSL https://ollama.com/install.sh | sh

Copy after login

After this, pull the DeepSeek R1:1.5b using:

ollama pull deepseek-r1:1.5b

Copy after login

Install the required libraries

!pip install langchain==0.3.11
!pip install langchain-openai==0.2.12
!pip install langchain-community==0.3.11
!pip install langchain-chroma==0.1.4

Copy after login

Open AI Embedding Models

from langchain_openai import OpenAIEmbeddings
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

Copy after login

Create a Vector DB and persist on the disk

from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader('AgenticAI.pdf')
pages = loader.load_and_split()
texts = [doc.page_content for doc in pages]

from langchain_chroma import Chroma
chroma_db = Chroma.from_texts(
texts=texts,
collection_name='db_docs',
collection_metadata={"hnsw:space": "cosine"}, # Set distance function to cosine
embedding=openai_embed_model
)

Copy after login

Build a RAG Chain

from langchain_core.prompts import ChatPromptTemplate
prompt = """You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If no context is present or if you don't know the answer, just say that you don't know.
Do not make up the answer unless it is there in the provided context.
Keep the answer concise and to the point with regard to the question.
Question:
{question}
Context:
{context}
Answer:
"""
prompt_template = ChatPromptTemplate.from_template(prompt)

Copy after login

Load Connection to LLM

from langchain_community.llms import Ollama
deepseek = Ollama(model="deepseek-r1:1.5b")

Copy after login

LangChain Syntax for RAG Chain

from langchain.chains import Retrieval
rag_chain = Retrieval.from_chain_type(llm=deepseek,
chain_type="stuff",
retriever=similarity_threshold_retriever,
chain_type_kwargs={"prompt": prompt_template})
query = "Tell the Leaders’ Perspectives on Agentic AI"
rag_chain.invoke(query)
{'query': 'Tell the Leaders’ Perspectives on Agentic AI',

Copy after login

Output

A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya

Conclusion

Building effective RAG applications isn’t just about plugging in a language model—it’s about choosing the right RAG Developer stack across the board, from frameworks and embeddings to vector databases and retrieval tools. When these components are thoughtfully integrated, they enable intelligent, scalable systems that can chat with PDFs, pull relevant facts in real time, and generate context-aware responses. As the ecosystem continues to evolve, staying agile with your tools and grounded in solid architecture will be key to building reliable, future-proof AI solutions.

The above is the detailed content of A Comprehensive Guide to RAG Developer Stack - Analytics Vidhya. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

4 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

4 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1663

CakePHP Tutorial

1420

Laravel Tutorial

1315

PHP Tutorial

1266

C# Tutorial

1239

Related knowledge

Getting Started With Meta Llama 3.2 - Analytics Vidhya Apr 11, 2025 pm 12:04 PM

Meta's Llama 3.2: A Leap Forward in Multimodal and Mobile AI Meta recently unveiled Llama 3.2, a significant advancement in AI featuring powerful vision capabilities and lightweight text models optimized for mobile devices. Building on the success o

10 Generative AI Coding Extensions in VS Code You Must Explore Apr 13, 2025 am 01:14 AM

Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let&#8217

AV Bytes: Meta's Llama 3.2, Google's Gemini 1.5, and More Apr 11, 2025 pm 12:01 PM

This week's AI landscape: A whirlwind of advancements, ethical considerations, and regulatory debates. Major players like OpenAI, Google, Meta, and Microsoft have unleashed a torrent of updates, from groundbreaking new models to crucial shifts in le

Selling AI Strategy To Employees: Shopify CEO's Manifesto Apr 10, 2025 am 11:19 AM

Shopify CEO Tobi Lütke's recent memo boldly declares AI proficiency a fundamental expectation for every employee, marking a significant cultural shift within the company. This isn't a fleeting trend; it's a new operational paradigm integrated into p

GPT-4o vs OpenAI o1: Is the New OpenAI Model Worth the Hype? Apr 13, 2025 am 10:18 AM

Introduction OpenAI has released its new model based on the much-anticipated “strawberry” architecture. This innovative model, known as o1, enhances reasoning capabilities, allowing it to think through problems mor

A Comprehensive Guide to Vision Language Models (VLMs) Apr 12, 2025 am 11:58 AM

Introduction Imagine walking through an art gallery, surrounded by vivid paintings and sculptures. Now, what if you could ask each piece a question and get a meaningful answer? You might ask, “What story are you telling?

Newest Annual Compilation Of The Best Prompt Engineering Techniques Apr 10, 2025 am 11:22 AM

For those of you who might be new to my column, I broadly explore the latest advances in AI across the board, including topics such as embodied AI, AI reasoning, high-tech breakthroughs in AI, prompt engineering, training of AI, fielding of AI, AI re

How to Add a Column in SQL? - Analytics Vidhya Apr 17, 2025 am 11:43 AM

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

See all articles