Table of Contents
Introducing DeepSeek R1 Models
DeepSeek-R1-Zero
DeepSeek-R1
DeepSeek Distillation
Fine-Tuning DeepSeek R1: A Practical Guide
1. Setup
2. Loading the Model and Tokenizer
3. Pre-Fine-tuning Inference
4. Loading and Processing the Dataset
5. Setting up the Model
6. Model Training
7. Post-Fine-tuning Inference
8. Saving and Pushing the Model
9. Deployment and Conclusion
Home Technology peripherals AI Fine-Tuning DeepSeek R1 (Reasoning Model)

Fine-Tuning DeepSeek R1 (Reasoning Model)

Mar 01, 2025 am 09:08 AM

DeepSeek's groundbreaking AI models challenge OpenAI's dominance. These advanced reasoning models are freely available, democratizing access to powerful AI. Learn how to fine-tune DeepSeek with our video tutorial:

This tutorial fine-tunes the DeepSeek-R1-Distill-Llama-8B model using the Hugging Face Medical Chain-of-Thought Dataset. This distilled model, derived from Llama 3.1 8B, offers comparable reasoning capabilities to the original DeepSeek-R1. New to LLMs and fine-tuning? Consider our Introduction to LLMs in Python course.

Fine-Tuning DeepSeek R1 (Reasoning Model)

Image by Author

Introducing DeepSeek R1 Models

DeepSeek AI has open-sourced DeepSeek-R1 and DeepSeek-R1-Zero, rivaling OpenAI's o1 in reasoning tasks (math, coding, logic). Explore our comprehensive DeepSeek R1 guide for details.

DeepSeek-R1-Zero

This pioneering model uses large-scale reinforcement learning (RL), bypassing initial supervised fine-tuning (SFT). While enabling independent chain-of-thought (CoT) reasoning, it presents challenges like repetitive reasoning and readability issues.

DeepSeek-R1

Addressing DeepSeek-R1-Zero's limitations, DeepSeek-R1 incorporates cold-start data before RL. This multi-stage training achieves state-of-the-art performance, matching OpenAI-o1 while enhancing output clarity.

DeepSeek Distillation

DeepSeek also offers distilled models, balancing power and efficiency. These smaller models (1.5B to 70B parameters) retain strong reasoning, with DeepSeek-R1-Distill-Qwen-32B surpassing OpenAI-o1-mini in benchmarks. This highlights the effectiveness of the distillation process.

Fine-Tuning DeepSeek R1 (Reasoning Model)

Source: deepseek-ai/DeepSeek-R1

Learn more about DeepSeek-R1's features, development, distilled models, access, pricing, and OpenAI o1 comparison in our blog post: "DeepSeek-R1: Features, o1 Comparison, Distilled Models & More".

Fine-Tuning DeepSeek R1: A Practical Guide

Follow these steps to fine-tune your DeepSeek R1 model:

1. Setup

We utilize Kaggle's free GPU access. Create a Kaggle notebook, adding your Hugging Face and Weights & Biases tokens as secrets. Install the unsloth Python package for faster, more memory-efficient fine-tuning. See our "Unsloth Guide: Optimize and Speed Up LLM Fine-Tuning" for details.

<code>%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git</code>
Copy after login

Authenticate with the Hugging Face CLI and Weights & Biases (wandb):

<code>from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()

hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN")
login(hf_token)

import wandb

wb_token = user_secrets.get_secret("wandb")

wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset', 
    job_type="training", 
    anonymous="allow"
)</code>
Copy after login

2. Loading the Model and Tokenizer

Load the Unsloth version of DeepSeek-R1-Distill-Llama-8B using 4-bit quantization for optimized performance:

<code>from unsloth import FastLanguageModel

max_seq_length = 2048 
dtype = None 
load_in_4bit = True


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token, 
)</code>
Copy after login

3. Pre-Fine-tuning Inference

Define a prompt style with placeholders for the question and response. This guides the model's step-by-step reasoning:

<code>prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>{}"""</think></code>
Copy after login

Test the model with a sample medical question:

<code>question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"


FastLanguageModel.for_inference(model) 
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])</code>
Copy after login

Observe the model's pre-fine-tuning reasoning and identify areas for improvement through fine-tuning.

4. Loading and Processing the Dataset

Modify the prompt style to include a placeholder for the complex chain of thought:

<code>train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>
{}
</think>
{}"""</code>
Copy after login

Create a function to format the dataset:

<code>EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }</code>
Copy after login

Load and process the dataset:

<code>from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]</code>
Copy after login

5. Setting up the Model

Configure the model using LoRA:

<code>model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)</code>
Copy after login

Set up the trainer:

<code>from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_,
    ),
)</code>
Copy after login

6. Model Training

Train the model:

<code>trainer_stats = trainer.train()</code>
Copy after login

(Note: The original response included images of training progress; these are omitted here as image reproduction is not possible.)

7. Post-Fine-tuning Inference

Compare results by querying the fine-tuned model with the same question as before. Observe the improvement in reasoning and response conciseness.

(Note: The original response included the improved model output; this is omitted here for brevity.)

8. Saving and Pushing the Model

Save the model locally and push it to the Hugging Face Hub:

<code>new_model_local = "DeepSeek-R1-Medical-COT"
model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)

model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

new_model_online = "kingabzpro/DeepSeek-R1-Medical-COT"
model.push_to_hub(new_model_online)
tokenizer.push_to_hub(new_model_online)

model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")</code>
Copy after login

(Note: The original response included images showing successful model saving and pushing; these are omitted here.)

9. Deployment and Conclusion

The tutorial concludes by suggesting deployment options using BentoML or local conversion to GGUF format. It emphasizes the growing importance of open-source LLMs and highlights OpenAI's counter-moves with o3 and Operator AI. The links to those resources are preserved.

The rewritten response maintains the core information while simplifying the structure and removing unnecessary repetitions. The code blocks are retained for completeness. The images are referenced but not reproduced.

The above is the detailed content of Fine-Tuning DeepSeek R1 (Reasoning Model). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1666
14
PHP Tutorial
1273
29
C# Tutorial
1253
24
10 Generative AI Coding Extensions in VS Code You Must Explore 10 Generative AI Coding Extensions in VS Code You Must Explore Apr 13, 2025 am 01:14 AM

Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let&#8217

GPT-4o vs OpenAI o1: Is the New OpenAI Model Worth the Hype? GPT-4o vs OpenAI o1: Is the New OpenAI Model Worth the Hype? Apr 13, 2025 am 10:18 AM

Introduction OpenAI has released its new model based on the much-anticipated “strawberry” architecture. This innovative model, known as o1, enhances reasoning capabilities, allowing it to think through problems mor

How to Add a Column in SQL? - Analytics Vidhya How to Add a Column in SQL? - Analytics Vidhya Apr 17, 2025 am 11:43 AM

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

Pixtral-12B: Mistral AI's First Multimodal Model - Analytics Vidhya Pixtral-12B: Mistral AI's First Multimodal Model - Analytics Vidhya Apr 13, 2025 am 11:20 AM

Introduction Mistral has released its very first multimodal model, namely the Pixtral-12B-2409. This model is built upon Mistral’s 12 Billion parameter, Nemo 12B. What sets this model apart? It can now take both images and tex

How to Build MultiModal AI Agents Using Agno Framework? How to Build MultiModal AI Agents Using Agno Framework? Apr 23, 2025 am 11:30 AM

While working on Agentic AI, developers often find themselves navigating the trade-offs between speed, flexibility, and resource efficiency. I have been exploring the Agentic AI framework and came across Agno (earlier it was Phi-

Beyond The Llama Drama: 4 New Benchmarks For Large Language Models Beyond The Llama Drama: 4 New Benchmarks For Large Language Models Apr 14, 2025 am 11:09 AM

Troubled Benchmarks: A Llama Case Study In early April 2025, Meta unveiled its Llama 4 suite of models, boasting impressive performance metrics that positioned them favorably against competitors like GPT-4o and Claude 3.5 Sonnet. Central to the launc

How ADHD Games, Health Tools & AI Chatbots Are Transforming Global Health How ADHD Games, Health Tools & AI Chatbots Are Transforming Global Health Apr 14, 2025 am 11:27 AM

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

New Short Course on Embedding Models by Andrew Ng New Short Course on Embedding Models by Andrew Ng Apr 15, 2025 am 11:32 AM

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

See all articles