Fine-Tuning DeepSeek R1 (Reasoning Model)
DeepSeek's groundbreaking AI models challenge OpenAI's dominance. These advanced reasoning models are freely available, democratizing access to powerful AI. Learn how to fine-tune DeepSeek with our video tutorial:
This tutorial fine-tunes the DeepSeek-R1-Distill-Llama-8B model using the Hugging Face Medical Chain-of-Thought Dataset. This distilled model, derived from Llama 3.1 8B, offers comparable reasoning capabilities to the original DeepSeek-R1. New to LLMs and fine-tuning? Consider our Introduction to LLMs in Python course.
Image by Author
Introducing DeepSeek R1 Models
DeepSeek AI has open-sourced DeepSeek-R1 and DeepSeek-R1-Zero, rivaling OpenAI's o1 in reasoning tasks (math, coding, logic). Explore our comprehensive DeepSeek R1 guide for details.
DeepSeek-R1-Zero
This pioneering model uses large-scale reinforcement learning (RL), bypassing initial supervised fine-tuning (SFT). While enabling independent chain-of-thought (CoT) reasoning, it presents challenges like repetitive reasoning and readability issues.
DeepSeek-R1
Addressing DeepSeek-R1-Zero's limitations, DeepSeek-R1 incorporates cold-start data before RL. This multi-stage training achieves state-of-the-art performance, matching OpenAI-o1 while enhancing output clarity.
DeepSeek Distillation
DeepSeek also offers distilled models, balancing power and efficiency. These smaller models (1.5B to 70B parameters) retain strong reasoning, with DeepSeek-R1-Distill-Qwen-32B surpassing OpenAI-o1-mini in benchmarks. This highlights the effectiveness of the distillation process.
Source: deepseek-ai/DeepSeek-R1
Learn more about DeepSeek-R1's features, development, distilled models, access, pricing, and OpenAI o1 comparison in our blog post: "DeepSeek-R1: Features, o1 Comparison, Distilled Models & More".
Fine-Tuning DeepSeek R1: A Practical Guide
Follow these steps to fine-tune your DeepSeek R1 model:
1. Setup
We utilize Kaggle's free GPU access. Create a Kaggle notebook, adding your Hugging Face and Weights & Biases tokens as secrets. Install the unsloth
Python package for faster, more memory-efficient fine-tuning. See our "Unsloth Guide: Optimize and Speed Up LLM Fine-Tuning" for details.
<code>%%capture !pip install unsloth !pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git</code>
Authenticate with the Hugging Face CLI and Weights & Biases (wandb):
<code>from huggingface_hub import login from kaggle_secrets import UserSecretsClient user_secrets = UserSecretsClient() hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN") login(hf_token) import wandb wb_token = user_secrets.get_secret("wandb") wandb.login(key=wb_token) run = wandb.init( project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset', job_type="training", anonymous="allow" )</code>
2. Loading the Model and Tokenizer
Load the Unsloth version of DeepSeek-R1-Distill-Llama-8B using 4-bit quantization for optimized performance:
<code>from unsloth import FastLanguageModel max_seq_length = 2048 dtype = None load_in_4bit = True model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, token = hf_token, )</code>
3. Pre-Fine-tuning Inference
Define a prompt style with placeholders for the question and response. This guides the model's step-by-step reasoning:
<code>prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response. ### Instruction: You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. ### Question: {} ### Response: <think>{}"""</think></code>
Test the model with a sample medical question:
<code>question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?" FastLanguageModel.for_inference(model) inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda") outputs = model.generate( input_ids=inputs.input_ids, attention_mask=inputs.attention_mask, max_new_tokens=1200, use_cache=True, ) response = tokenizer.batch_decode(outputs) print(response[0].split("### Response:")[1])</code>
Observe the model's pre-fine-tuning reasoning and identify areas for improvement through fine-tuning.
4. Loading and Processing the Dataset
Modify the prompt style to include a placeholder for the complex chain of thought:
<code>train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response. ### Instruction: You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. ### Question: {} ### Response: <think> {} </think> {}"""</code>
Create a function to format the dataset:
<code>EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN def formatting_prompts_func(examples): inputs = examples["Question"] cots = examples["Complex_CoT"] outputs = examples["Response"] texts = [] for input, cot, output in zip(inputs, cots, outputs): text = train_prompt_style.format(input, cot, output) + EOS_TOKEN texts.append(text) return { "text": texts, }</code>
Load and process the dataset:
<code>from datasets import load_dataset dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True) dataset = dataset.map(formatting_prompts_func, batched = True,) dataset["text"][0]</code>
5. Setting up the Model
Configure the model using LoRA:
<code>model = FastLanguageModel.get_peft_model( model, r=16, target_modules=[ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", ], lora_alpha=16, lora_dropout=0, bias="none", use_gradient_checkpointing="unsloth", # True or "unsloth" for very long context random_state=3407, use_rslora=False, loftq_config=None, )</code>
Set up the trainer:
<code>from trl import SFTTrainer from transformers import TrainingArguments from unsloth import is_bfloat16_supported trainer = SFTTrainer( model=model, tokenizer=tokenizer, train_dataset=dataset, dataset_text_field="text", max_seq_length=max_seq_length, dataset_num_proc=2, args=TrainingArguments( per_device_train_batch_size=2, gradient_accumulation_steps=4, # Use num_train_epochs = 1, warmup_ratio for full training runs! warmup_steps=5, max_steps=60, learning_rate=2e-4, fp16=not is_bfloat16_supported(), bf16=is_bfloat16_supported(), logging_steps=10, optim="adamw_8bit", weight_decay=0.01, lr_scheduler_type="linear", seed=3407, output_, ), )</code>
6. Model Training
Train the model:
<code>trainer_stats = trainer.train()</code>
(Note: The original response included images of training progress; these are omitted here as image reproduction is not possible.)
7. Post-Fine-tuning Inference
Compare results by querying the fine-tuned model with the same question as before. Observe the improvement in reasoning and response conciseness.
(Note: The original response included the improved model output; this is omitted here for brevity.)
8. Saving and Pushing the Model
Save the model locally and push it to the Hugging Face Hub:
<code>new_model_local = "DeepSeek-R1-Medical-COT" model.save_pretrained(new_model_local) tokenizer.save_pretrained(new_model_local) model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",) new_model_online = "kingabzpro/DeepSeek-R1-Medical-COT" model.push_to_hub(new_model_online) tokenizer.push_to_hub(new_model_online) model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")</code>
(Note: The original response included images showing successful model saving and pushing; these are omitted here.)
9. Deployment and Conclusion
The tutorial concludes by suggesting deployment options using BentoML or local conversion to GGUF format. It emphasizes the growing importance of open-source LLMs and highlights OpenAI's counter-moves with o3 and Operator AI. The links to those resources are preserved.
The rewritten response maintains the core information while simplifying the structure and removing unnecessary repetitions. The code blocks are retained for completeness. The images are referenced but not reproduced.
The above is the detailed content of Fine-Tuning DeepSeek R1 (Reasoning Model). For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Hey there, Coding ninja! What coding-related tasks do you have planned for the day? Before you dive further into this blog, I want you to think about all your coding-related woes—better list those down. Done? – Let’

Introduction OpenAI has released its new model based on the much-anticipated “strawberry” architecture. This innovative model, known as o1, enhances reasoning capabilities, allowing it to think through problems mor

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

Introduction Mistral has released its very first multimodal model, namely the Pixtral-12B-2409. This model is built upon Mistral’s 12 Billion parameter, Nemo 12B. What sets this model apart? It can now take both images and tex

While working on Agentic AI, developers often find themselves navigating the trade-offs between speed, flexibility, and resource efficiency. I have been exploring the Agentic AI framework and came across Agno (earlier it was Phi-

Troubled Benchmarks: A Llama Case Study In early April 2025, Meta unveiled its Llama 4 suite of models, boasting impressive performance metrics that positioned them favorably against competitors like GPT-4o and Claude 3.5 Sonnet. Central to the launc

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r
