How to Build RAG Systems and AI Agents with Qwen3-AI-php.cn

Qwen just released 8 new models as part of its latest family – Qwen3, showcasing promising capabilities. The flagship model, Qwen3-235B-A22B, outperformed most other models including DeepSeek-R1, OpenAI’s o1, o3-mini, Grok 3, and Gemini 2.5-Pro, in standard benchmarks. Meanwhile, the small Qwen3-30B-A3B outperformed QWQ-32B which has approximately 10 times the activated parameters as the new model. With such advanced capabilities, these models prove to be a great choice for a wide range of applications. In this article, we will explore the features of all the Qwen3 models and learn how to use them to build RAG systems and AI agents.

What is Qwen3?
Key Features of Qwen3
How to Access Qwen3 Models via API
Using Qwen3 to Power Your AI Solutions
- Prerequisites
- Building an AI Agent using Qwen3
- Building a RAG System using Qwen3
Applications of Qwen3
Conclusion
Frequently Asked Questions

What is Qwen3?

Qwen3 is the latest series of large language models (LLMs) in the Qwen family, consisting of 8 different models. These include Qwen3-235B-A22B, Qwen3-30B-A3B, Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B. All these models are released under Apache 2.0 license, making them freely available to individuals, developers, and enterprises.

While 6 of these models are dense, meaning they actively use all the parameters during the time of inference and training, 2 of them are open-weighted:

Qwen3-235B-A22B: A large model with 235 billion parameters, out of which 22 billion are activated parameters.
Qwen3-30B-A3B: A smaller MoE with 30 billion total parameters and 3 billion activated parameters.

Here’s a detailed comparison of all the 8 Qwen3 models:

Models	Layers	Heads (Q/KV)	Tie Embedding	Context Length
Qwen3-0.6B	28	16/8	Yes	32K
Qwen3-1.7B	28	16/8	Yes	32K
Qwen3-4B	36	32/8	Yes	32K
Qwen3-8B	36	32/8	No	128K
Qwen3-14B	40	40/8	No	128K
Qwen3-32B	64	64/8	No	128K
Qwen3-30B-A3B	48	32/4	No	128K
Qwen3-235B-A22B	94	64/4	No	128K

Here’s what the table says:

Layers: Layers represent the number of transformer blocks used. It includes multi-head self-attention mechanism, feed forward networks, positional encoding, layer normalization, and residual connections. So, when I say Qwen3-30B-A3B has 48 layers, it means that the model uses 48 transformer blocks, stacked sequentially or in parallel.
Heads: Transformers use multi-head attention, which splits its attention mechanism into several heads, each for learning a new aspect from the data. Here, Q/KV represents:
- Q (Query heads): Total number of attention heads used for generating queries.
- KV (Key and Value): The number of key/value heads per attention block.

Note: These attention heads for Key, Query, and Value are completely different from the key, query, and value vector generated by a self-attention.

Also Read: Qwen3 Models: How to Access, Performance, Features, and Applications

Key Features of Qwen3

Here are some of the key features of the Qwen3 models:

Pre-training: The pre-training process consists of three stages:
- In the first stage, the model was pretrained on over 30 trillion tokens with a context length of 4k tokens. This taught the model basic language skills and general knowledge.
- In the second stage, the quality of data was improved by increasing the proportion of knowledge-intensive data like STEM, coding, and reasoning tasks. The model was then trained over an additional 5 trillion tokens.
- In the final stage, high quality long context data was used by increasing the context length to 32K tokens. This was done to ensure that the model can handle longer inputs effectively.

Post-training: To develop a hybrid model capable of both step-by-step reasoning and rapid responses, a 4-stage training pipeline was implemented. This consisted of:
- Long chain-of-thoughts(CoT)
- Reasoning-based reinforcement learning (RL)
- Thinking mode fusion
- General RL

Hybrid Thinking Modes: Qwen3 models use a hybrid approach to problem solving, featuring two new modes:
- Thinking Mode: In this mode, models take time by breaking a complex problem statement into small and procedural steps to solve it.
- Non-Thinking Mode: In this mode, the model provides quick results and is mostly suitable for simpler questions.

Multilingual Support: Qwen3 models support 119 languages and dialects. This helps users from all around the world to benefit from these models.
Improvised Agentic Capabilities: Qwen has optimized the Qwen3 models for better coding and agentic capabilities, supporting Model Context Protocol (MCP) as well.

How to Access Qwen3 Models via API

To use the Qwen3 models, we will be accessing it via API using the Openrouter API. Here’s how to do it:

Create an account on Openrouter and go to the model search bar to find the API for that model.

How to Build RAG Systems and AI Agents with Qwen3

Select the model of your choice and click on ‘Create API key’ on the landing page to generate a new API.

How to Build RAG Systems and AI Agents with Qwen3

Using Qwen3 to Power Your AI Solutions

In this section, we’ll go through the process of building AI applications using Qwen3. We will first create an AI-powered travel planner agent using the model, and then a Q/A RAG bot using Langchain.

Prerequisites

Before building some real-world AI solutions with Qwen3, we need to first cover the basic prerequisites like:

Familiarity with the command prompt or terminal and the ability to run them through the terminal.
Ability to set up environment variables.
Python must be installed: https://www.python.org/downloads/
Knowledge on the basics of Langchain: https://www.langchain.com/

Building an AI Agent using Qwen3

In this section, we’ll be using Qwen3 to create an AI-powered travel agent that will give the major traveling spots for the city or place you are visiting. We will also enable the agent to search the internet to find updated information, and add a tool that enables currency conversion.

Step 1: Setting up Libraries and Tools

First, we will be installing and importing the necessary libraries and tools required to build the agent.

!pip install langchain langchain-community openai duckduckgo-search
from langchain.chat_models import ChatOpenAI
from langchain.agents import Tool
from langchain.tools import DuckDuckGoSearchRun
from langchain.agents import initialize_agent


llm = ChatOpenAI(
   base_url="https://openrouter.ai/api/v1",
   api_key="your_api_key",
   model="qwen/qwen3-235b-a22b:free"
)
# Web Search Tool
search = DuckDuckGoSearchRun()


# Tool for DestinationAgent
def get_destinations(destination):
   return search.run(f"Top 3 tourist spots in {destination}")


DestinationTool = Tool(
   name="Destination Recommender",
   func=get_destinations,
   description="Finds top places to visit in a city"
)


# Tool for CurrencyAgent
def convert_usd_to_inr(query):
   amount = [float(s) for s in query.split() if s.replace('.', '', 1).isdigit()]
   if amount:
       return f"{amount[0]} USD = {amount[0] * 83.2:.2f} INR"
   return "Couldn't parse amount."


CurrencyTool = Tool(
   name="Currency Converter",
   func=convert_usd_to_inr,
   description="Converts USD to inr based on static rate"
)

Copy after login

Search_tool: DuckDuckGoSearchRun() enables the agent to use web search to get real-time information about the popular tourist spots.
DestinationTool: Applies the get_destinations() function, which uses the search tool to get the top 3 tourist spots in any given city.
CurrencyTool: Uses the convert_usd_to_inr() function to convert the prices from USD to INR. You can change ‘inr’ in the function to convert it to a currency of your choice.

Also Read: Build a Travel Assistant Chatbot with HuggingFace, LangChain, and MistralAI

Step 2: Creating the Agent

Now that we have initialized all the tools, let’s proceed to creating an agent that will use the tools and give us a plan for the trip.

tools = [DestinationTool, CurrencyTool]


agent = initialize_agent(
   tools=tools,
   llm=llm,
   agent_type="zero-shot-react-description",
   verbose=True
)
def trip_planner(city, usd_budget):
   dest = get_destinations(city)
   inr_budget = convert_usd_to_inr(f"{usd_budget} USD to INR")
   return f"""Here is your travel plan:


*Top spots in {city}*:
{dest}
*Budget*:
{inr_budget}
Enjoy your day trip!"""

Copy after login

Initialize_agent: This function creates an agent with Langchain using a zero-shot reaction approach, which allows the agent to understand the tool descriptions.
Agent_type: “zero-shot-react-description” enables the agent LLM to decide which tool it should use in a certain situation without prior knowledge, by using the tool description and input.
Verbose: Verbose enables the logging of the agent’s thought process, so we can monitor each decision that the agent makes, including all the interactions and tools invoked.
trip_planner: This is a python function that manually calls tools instead of relying on the agent. It allows the user to select the best tool for a particular problem.

Step 3: Initializing the Agent

In this section, we’ll be initializing the agent and observing its response.

# Initialize the Agent
city = "Delhi"
usd_budget = 8500


# Run the multi-agent planner
response = agent.run(f"Plan a day trip to {city} with a budget of {usd_budget} USD")
from IPython.display import Markdown, display
display(Markdown(response))

Copy after login

Invocation of agent: agent.run() uses the user’s intent via prompt and plans the trip.

Output

How to Build RAG Systems and AI Agents with Qwen3

Building a RAG System using Qwen3

In this section, we’ll be creating a RAG bot that answers any query within the relevant input document from the knowledge base. This gives an informative response using qwen/qwen3-235b-a22b. The system would also be using Langchain, to produce accurate and context-aware responses.

Step 1: Setting up the Libraries and Tools

First, we will be installing and importing the necessary libraries and tools required to build the RAG system.

!pip install langchain langchain-community langchain-core openai tiktoken chromadb sentence-transformers duckduckgo-search
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
# Load your document
loader = TextLoader("/content/my_docs.txt")
docs = loader.load()

Copy after login

Loading Documents: The “TextLoader” class of Langchain loads the document like a pdf, txt, or doc file which will be used for the Q/A retrieval. Here I’ve uploaded my_docs.txt.
Selecting the Vector Setup: I have used ChromaDB to store and search the embeddings from our vector database for the Q/A process.

Step 2: Creating the Embeddings

Now that we’ve loaded our document, let’s proceed to creating embeddings out of it which will help in easing the retrieval process.

# Split into chunks
splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(docs)


# Embed with HuggingFace model
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma.from_documents(chunks, embedding=embeddings)


# Setup Qwen LLM from OpenRouter
llm = ChatOpenAI(
   base_url="https://openrouter.ai/api/v1",
   api_key="YOUR_API_KEY",
   model="qwen/qwen3-235b-a22b:free"
)


# Create RAG chain
retriever = db.as_retriever(search_kwargs={"k": 2})
rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

Copy after login

Document Splitting: The CharacterTextSplitter() splits the text into smaller chunks, which will mainly help in two things. First, it eases the retrieval process, and second, it helps in retaining the context from the previous chunk via chunk_overlap.
Embedding Documents: Embeddings convert the text into the embedding vectors of a set dimension for each token. Here we have used chunk_size of 300, which means every word/token will be converted into a vector of 300 dimensions. Now this vector embedding will have all the contextual information of that word with respect to the other words in the chunk.
RAG Chain: RAG chain combines the ChromaDB with the LLM to form a RAG. This enables us to get contextually aware answers from the document as well as from the model.

Step 3: Initializing the RAG System

# Ask a question
response = rag_chain.invoke({"query": "How can i use Qwen with MCP. Please give me a stepwise guide along with the necessary code snippets"})
display(Markdown(response['result']))

Copy after login

Query Execution: The rag_chain_invoke() method will send the user’s query to the RAG system, which then retrieves the relevant context-aware chunks from the document store (vector db) and generates a context-aware answer.

Output

How to Build RAG Systems and AI Agents with Qwen3

You can find the complete code here.

Applications of Qwen3

Here are some more applications of Qwen3 across industries:

Automated Coding: Qwen3 can generate, debug, and provide documentation for code, which helps developers to solve errors without manual effort. Its 22B parameter model excels in coding, with performances comparable to models like DeepSeek-R1, Gemini 2.5 Pro, and OpenAI’s o3-mini.
Education and Research: Qwen3 archives high accuracy in math, physics, and logical reasoning problem solving. It also rivals the Gemini 2.5 Pro, while excels with models such as OpenAI’s o1, o3-mini, DeepSeek-R1, and Grok 3 Beta.
Agent-Based Tool Integration: Qwen3 also leads in AI agent tasks by allowing the use of external tools, APIs, and MCPs for multi-step and multi-agentic workflows with its tool-calling template, which further simplifies the agentic interaction.
Advanced Reasoning Tasks: Qwen3 uses an extensive thinking capability to deliver optimal and accurate responses. The model uses chain-of-thought reasoning for complex tasks and a non-thinking mode for optimized speed.

Conclusion

In this article, we have learned how to build Qwen3-powered agentic AI and RAG systems. Qwen3’s high performance, multilingual support, and advanced reasoning capability make it a strong choice for knowledge retrieval and agent-based tasks. By integrating Qwen3 into RAG and agentic pipelines, we can get accurate, context-aware, and smooth responses, making it a strong contender for real-world applications for AI-powered systems.

Frequently Asked Questions

Q1. How does Qwen3 differ from other LLMs for RAG?

A. Qwen3 has a hybrid reasoning capability that allows it to make dynamic changes in the responses, which allows it to optimize the RAG workflows for both retrieval and complex analysis.

Q2. What are the tools needed to integrate RAG?

A. It majorly includes the Vector database, Embedding models, Langchain workflow and an API to access the model.

Q3. Can Qwen3 allow the multistep tool chaining in the agent workflow?

Yes, with the Qwen-agent built-in tool calling templates, we can parse and enable sequential tool operations like web searching, data analysis, and report generation.

Q4. How to reduce latency in Qwen3 agent responses?

A. One can reduce the latency in many ways, some of them are:
1. Use of MOE models like Qwen3-30B-A3B, which only have 3 billion active parameters.
2. By using GPU-optimized inferences.

Q5. What are the common errors when implementing Qwen3 agents?

A. The common error includes:
1. MCP server initialization failures, like json formatting and INIT.
2. Tool response pairing errors.
3. Context window overflow.

The above is the detailed content of How to Build RAG Systems and AI Agents with Qwen3. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

4 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

4 weeks ago By DDD

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

How to fix KB5055612 fails to install in Windows 10?

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1664

CakePHP Tutorial

1423

Laravel Tutorial

1317

PHP Tutorial

1268

C# Tutorial

1246

Related knowledge

Getting Started With Meta Llama 3.2 - Analytics Vidhya Apr 11, 2025 pm 12:04 PM

Meta's Llama 3.2: A Leap Forward in Multimodal and Mobile AI Meta recently unveiled Llama 3.2, a significant advancement in AI featuring powerful vision capabilities and lightweight text models optimized for mobile devices. Building on the success o

10 Generative AI Coding Extensions in VS Code You Must Explore Apr 13, 2025 am 01:14 AM

Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let&#8217

AV Bytes: Meta's Llama 3.2, Google's Gemini 1.5, and More Apr 11, 2025 pm 12:01 PM

This week's AI landscape: A whirlwind of advancements, ethical considerations, and regulatory debates. Major players like OpenAI, Google, Meta, and Microsoft have unleashed a torrent of updates, from groundbreaking new models to crucial shifts in le

Selling AI Strategy To Employees: Shopify CEO's Manifesto Apr 10, 2025 am 11:19 AM

Shopify CEO Tobi Lütke's recent memo boldly declares AI proficiency a fundamental expectation for every employee, marking a significant cultural shift within the company. This isn't a fleeting trend; it's a new operational paradigm integrated into p

GPT-4o vs OpenAI o1: Is the New OpenAI Model Worth the Hype? Apr 13, 2025 am 10:18 AM

Introduction OpenAI has released its new model based on the much-anticipated “strawberry” architecture. This innovative model, known as o1, enhances reasoning capabilities, allowing it to think through problems mor

A Comprehensive Guide to Vision Language Models (VLMs) Apr 12, 2025 am 11:58 AM

Introduction Imagine walking through an art gallery, surrounded by vivid paintings and sculptures. Now, what if you could ask each piece a question and get a meaningful answer? You might ask, “What story are you telling?

3 Methods to Run Llama 3.2 - Analytics Vidhya Apr 11, 2025 am 11:56 AM

Meta's Llama 3.2: A Multimodal AI Powerhouse Meta's latest multimodal model, Llama 3.2, represents a significant advancement in AI, boasting enhanced language comprehension, improved accuracy, and superior text generation capabilities. Its ability t

Newest Annual Compilation Of The Best Prompt Engineering Techniques Apr 10, 2025 am 11:22 AM

For those of you who might be new to my column, I broadly explore the latest advances in AI across the board, including topics such as embodied AI, AI reasoning, high-tech breakthroughs in AI, prompt engineering, training of AI, fielding of AI, AI re

See all articles