Running DeepSeek-Rn the Browser: A Comprehensive Guide
As artificial intelligence technology continues to develop, it is becoming more and more feasible to run complex machine learning models directly in the browser. This guide walks you through learning how to load and use the DeepSeek-R1 model in your browser using JavaScript. We will also cover implementation details based on the examples provided here.
Why run NLP models in the browser?
Traditionally, natural language processing (NLP) models are deployed server-side and require an internet connection to send requests and receive responses. However, with the advancement of technologies such as WebGPU and ONNX.js, it is now possible to run advanced models such as DeepSeek-R1 directly in the browser. Its advantages include:
- Enhanced Privacy: User data never leaves their device.
- Reduced Latency: Eliminates delays associated with server communication.
- Offline Availability: Works even without an internet connection.
About DeepSeek-R1
DeepSeek-R1 is a lightweight and efficient NLP model optimized for on-device inference. It provides high-quality text processing capabilities while maintaining a small footprint, making it ideal for browser environments.
Set up your project
Prerequisites
To start running the DeepSeek-R1 model in your browser you need:
- Modern browsers supporting WebGPU/WebGL.
- @huggingface/transformers library for executing transformers models in JavaScript.
- Contains script files for loading and processing the DeepSeek-R1 model logic.
Demo: Give it a try!
Implementation details
Here is a step-by-step guide on how to load and use the DeepSeek-R1 model in your browser:
import { AutoTokenizer, AutoModelForCausalLM, TextStreamer, InterruptableStoppingCriteria, } from "@huggingface/transformers"; /** * 用于执行 WebGPU 功能检测的辅助函数 */ async function check() { try { const adapter = await navigator.gpu.requestAdapter(); if (!adapter) { throw new Error("WebGPU 不受支持(未找到适配器)"); } } catch (e) { self.postMessage({ status: "error", data: e.toString(), }); } } /** * 此类使用单例模式来启用模型的延迟加载 */ class TextGenerationPipeline { static model_id = "onnx-community/DeepSeek-R1-Distill-Qwen-1.5B-ONNX"; static async getInstance(progress_callback = null) { if (!this.tokenizer) { this.tokenizer = await AutoTokenizer.from_pretrained(this.model_id, { progress_callback, }); } if (!this.model) { this.model = await AutoModelForCausalLM.from_pretrained(this.model_id, { dtype: "q4f16", device: "webgpu", progress_callback, }); } return [this.tokenizer, this.model]; } } const stopping_criteria = new InterruptableStoppingCriteria(); let past_key_values_cache = null; async function generate(messages) { // 获取文本生成管道。 const [tokenizer, model] = await TextGenerationPipeline.getInstance(); const inputs = tokenizer.apply_chat_template(messages, { add_generation_prompt: true, return_dict: true, }); const [START_THINKING_TOKEN_ID, END_THINKING_TOKEN_ID] = tokenizer.encode( "<think></think>", { add_special_tokens: false }, ); let state = "thinking"; // 'thinking' 或 'answering' let startTime; let numTokens = 0; let tps; const token_callback_function = (tokens) => { startTime ??= performance.now(); if (numTokens++ > 0) { tps = (numTokens / (performance.now() - startTime)) * 1000; } if (tokens[0] === END_THINKING_TOKEN_ID) { state = "answering"; } }; const callback_function = (output) => { self.postMessage({ status: "update", output, tps, numTokens, state, }); }; const streamer = new TextStreamer(tokenizer, { skip_prompt: true, skip_special_tokens: true, callback_function, token_callback_function, }); // 通知主线程我们已开始 self.postMessage({ status: "start" }); const { past_key_values, sequences } = await model.generate({ ...inputs, do_sample: false, max_new_tokens: 2048, streamer, stopping_criteria, return_dict_in_generate: true, }); past_key_values_cache = past_key_values; const decoded = tokenizer.batch_decode(sequences, { skip_special_tokens: true, }); // 将输出发送回主线程 self.postMessage({ status: "complete", output: decoded, }); } async function load() { self.postMessage({ status: "loading", data: "正在加载模型...", }); // 加载管道并将其保存以供将来使用。 const [tokenizer, model] = await TextGenerationPipeline.getInstance((x) => { self.postMessage(x); }); self.postMessage({ status: "loading", data: "正在编译着色器并预热模型...", }); // 使用虚拟输入运行模型以编译着色器 const inputs = tokenizer("a"); await model.generate({ ...inputs, max_new_tokens: 1 }); self.postMessage({ status: "ready" }); } // 监听来自主线程的消息 self.addEventListener("message", async (e) => { const { type, data } = e.data; switch (type) { case "check": check(); break; case "load": load(); break; case "generate": stopping_criteria.reset(); generate(data); break; case "interrupt": stopping_criteria.interrupt(); break; case "reset": past_key_values_cache = null; stopping_criteria.reset(); break; } });
Key points
-
Feature Detection: The
check
function performs feature detection to ensure WebGPU support. -
Single case mode: The
TextGenerationPipeline
class ensures that the tokenizer and model are loaded only once, avoiding redundant initialization. -
Model loading: The
getInstance
method loads the tokenizer and model from pre-trained sources and supports progress callbacks. -
Inference: The
generate
function processes input and produces text output, using theTextStreamer
streaming tag. - Communication: The worker thread listens for messages from the main thread and performs corresponding actions based on the message type (for example, "check", "load", "generate", "interrupt", "reset") operate.
Conclusion
Running NLP models like DeepSeek-R1 in the browser marks significant progress in enhancing user experience and protecting data privacy. With just a few lines of JavaScript code and the power of the @huggingface/transformers library, you can develop responsive and powerful applications. Whether you’re building interactive tools or intelligent assistants, browser-based NLP has the potential to be a game-changer.
Explore the potential of DeepSeek-R1 in the browser and start creating smarter front-end applications today!
This guide provides a comprehensive overview of how to load and use the DeepSeek-R1 model in a browser environment, with detailed code examples. For more specific implementation details, please refer to the linked GitHub repository.
This revised output maintains the original image and its format, rephrases sentences, and uses synonyms to achieve pseudo-originality while preserving the original meaning. The code block is unchanged as it's not considered text for rewriting purposes in this context.
The above is the detailed content of Running DeepSeek-Rn the Browser: A Comprehensive Guide. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

The shift from C/C to JavaScript requires adapting to dynamic typing, garbage collection and asynchronous programming. 1) C/C is a statically typed language that requires manual memory management, while JavaScript is dynamically typed and garbage collection is automatically processed. 2) C/C needs to be compiled into machine code, while JavaScript is an interpreted language. 3) JavaScript introduces concepts such as closures, prototype chains and Promise, which enhances flexibility and asynchronous programming capabilities.

The main uses of JavaScript in web development include client interaction, form verification and asynchronous communication. 1) Dynamic content update and user interaction through DOM operations; 2) Client verification is carried out before the user submits data to improve the user experience; 3) Refreshless communication with the server is achieved through AJAX technology.

JavaScript's application in the real world includes front-end and back-end development. 1) Display front-end applications by building a TODO list application, involving DOM operations and event processing. 2) Build RESTfulAPI through Node.js and Express to demonstrate back-end applications.

Understanding how JavaScript engine works internally is important to developers because it helps write more efficient code and understand performance bottlenecks and optimization strategies. 1) The engine's workflow includes three stages: parsing, compiling and execution; 2) During the execution process, the engine will perform dynamic optimization, such as inline cache and hidden classes; 3) Best practices include avoiding global variables, optimizing loops, using const and lets, and avoiding excessive use of closures.

Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

Both Python and JavaScript's choices in development environments are important. 1) Python's development environment includes PyCharm, JupyterNotebook and Anaconda, which are suitable for data science and rapid prototyping. 2) The development environment of JavaScript includes Node.js, VSCode and Webpack, which are suitable for front-end and back-end development. Choosing the right tools according to project needs can improve development efficiency and project success rate.

C and C play a vital role in the JavaScript engine, mainly used to implement interpreters and JIT compilers. 1) C is used to parse JavaScript source code and generate an abstract syntax tree. 2) C is responsible for generating and executing bytecode. 3) C implements the JIT compiler, optimizes and compiles hot-spot code at runtime, and significantly improves the execution efficiency of JavaScript.
