


Windows on Ollama: A new tool for running large language models (LLM) locally
Recently, both OpenAI Translator and NextChat have begun to support large-scale language models running locally in Ollama, which adds a new way of playing for "newbies" enthusiasts.
Moreover, the launch of Ollama on Windows (preview version) has completely subverted the way of AI development on Windows devices. It has guided a clear path for explorers in the field of AI and ordinary "test players".
What is Ollama?
Ollama is a groundbreaking artificial intelligence (AI) and machine learning (ML) tool platform that dramatically simplifies the development and use of AI models.
In the technical community, the hardware configuration and environment construction of AI models have always been a thorny issue, and Ollama emerged to solve such critical needs:
- It not only provides a series of tools, but more importantly, these tools are very intuitive and efficient to use. Whether you are a professional in the field of AI or a novice in this field, you can find corresponding support on Ollama .
- More than just ease of use, Ollama also makes access to advanced AI models and computing resources no longer limited to a few people. For the AI and ML communities, the birth of Ollama is a milestone. It promotes the popularization of AI technology and allows more people to try and practice their own AI ideas.
Why does Ollama stand out?
Among many AI tools, Ollama stands out with the following key advantages. These features not only highlight its uniqueness, but also solve the most common problems encountered by AI developers and enthusiasts:
- Automatic hardware acceleration: Ollama can automatically identify and make full use of optimal hardware resources in Windows systems. Whether you are equipped with an NVIDIA GPU or a CPU that supports advanced instruction sets such as AVX and AVX2, Ollama can achieve targeted optimization to ensure that the AI model runs more efficiently. With it, you no longer have to worry about complex hardware configuration issues, and you can focus more time and energy on the project itself.
- No need for virtualization: When developing AI, it was often necessary to build a virtual machine or configure a complex software environment. With Ollama, all this is no longer an obstacle. You can start the development of AI projects directly, making the entire process simple and fast. This convenience lowers the barriers to entry for individuals or organizations who want to try AI technology.
- Access to the complete Ollama model library: Ollama provides users with a rich AI model library, including advanced image recognition models like LLaVA and Google's latest Gemma model. With such a comprehensive "arsenal", we can easily try and apply various open source models without having to spend time and effort searching for integrations ourselves. Whether you want to perform text analysis, image processing, or other AI tasks, Ollama's model library can provide strong support.
- Ollama’s resident API: In today’s interconnected world of software, integrating AI capabilities into your own applications is extremely valuable. Ollama's resident API greatly simplifies this process, running silently in the background, ready to seamlessly connect powerful AI capabilities to your project without the need for additional complicated setup. With it, Ollama's rich AI capabilities will be ready at any time and can be naturally integrated into your development process to further improve work efficiency.
Through these carefully designed features, Ollama not only solves common problems in AI development, but also allows more people to easily access and apply advanced AI technology, greatly expanding the application prospects of AI.
Using Ollama on Windows
Welcome to the new era of AI and ML! Next, we'll take you through every step of getting started, and we'll also provide some practical code and command examples to make sure you have a smooth journey.
Step 1: Download and Install
1Visit the Ollama Windows Preview page and download the OllamaSetup.exe
installation program.
2 Double-click the file and click "Install" to start the installation.
3After the installation is completed, you can start using Ollama on Windows. Isn’t it very simple?
Step 2: Start Ollama and get the model
To launch Ollama and get an open source AI model from the model library, follow these steps:
1 Click the Ollama icon in the "Start" menu. After running, an icon will reside in the taskbar tray.
2 Right-click the taskbar icon and select "View log" to open the command line window.
3Execute the following command to run Ollama and load the model:
ollama run [modelname]
After executing the above command, Ollama will start to initialize and automatically pull and load the selected model from the Ollama model library. Once it's ready, you can send it instructions and it will understand and respond using the chosen model.
Remember to replace the modelname
name with the name of the model to be run. Commonly used ones are:
Model | parameter | size | Installation command | Publishing Organization |
---|---|---|---|---|
Llama 2 | 7B | 3.8GB | ollama run llama2 |
Meta |
Code Llama | 7B | 3.8GB | ollama run codellama |
Meta |
Llama 2 13B | 13B | 7.3GB | ollama run llama2:13b |
Meta |
Llama 2 70B | 70B | 39GB | ollama run llama2:70b |
Meta |
Mistral | 7B | 4.1GB | ollama run mistral |
Mistral AI |
mixtral | 8x7b | 26GB | ollama run mixtral:8x7b |
Mistral AI |
Phi-2 | 2.7B | 1.7GB | ollama run phi |
Microsoft Research |
LLaVA | 7B | 4.5GB | ollama run llava |
Microsoft Research Columbia University Wisconsin |
Gemma 2B | 2B | 1.4GB | ollama run gemma:2b |
|
Gemma 7B | 7B | 4.8GB | ollama run gemma:7b |
|
Qwen 4B | 4B | 2.3GB | ollama run qwen:4b |
Alibaba |
Qwen 7B | 7B | 4.5GB | ollama run qwen:7b |
Alibaba |
Qwen 14B | 14B | 8.2GB | ollama run qwen:14b |
Alibaba |
运行 7B 至少需要 8GB 内存,运行 13B 至少需要 16GB 内存。
步骤 3:使用模型
如前所述,Ollama 支持通过各种各样的开源模型来完成不同的任务,下面就来看看怎么使用。
- 基于文本的模型:加载好文本模型后,就可以直接在命令行里输入文字开始与模型「对话」。例如,阿里的 Qwen(通义千问):
- 基于图像的模型:如果你想使用图像处理模型,如 LLaVA 1.6,可以使用以下命令来加载该模型:
ollama run llava1.6
Ollama 会使用你选择的模型来分析这张图片,并给你一些结果,比如图片的内容和分类,图片是否有修改,或者其他的分析等等(取决于所使用的模型)。
步骤 4:连接到 Ollama API
我们不可能只通过命令行来使用,将应用程序连接到 Ollama API 是一个非常重要的步骤。这样就可以把 AI 的功能整合到自己的软件里,或者在 OpenAI Translator 和 NextChat 这类的前端工具中进行调用。
以下是如何连接和使用 Ollama API 的步骤:
- 默认地址和端口:Ollama API 的默认地址是
http://localhost:11434
,可以在安装 Ollama 的系统中直接调用。 - 修改 API 的侦听地址和端口:如果要在网络中提供服务,可以修改 API 的侦听地址和端口。
1右击点击任务栏图标,选择「Quit Ollama」退出后台运行。
2使用Windows + R
快捷键打开「运行」对话框,输出以下命令,然后按Ctrl + Shift + Enter
以管理员权限启动「环境变量」。
C:Windowssystem32rundll32.exe sysdm.cpl, EditEnvironmentVariables
3要更改侦听地址和端口,可以添加以下环境变量:
- 变量名:
OLLAMA_HOST
- 变量值(端口):
:8000
只填写端口号可以同时侦听(所有) IPv4 和 IPv6 的:8000
端口。
要使用 IPv6,需要 Ollama 0.0.20 或更新版本。
4如果安装了多个模型,可以通过OLLAMA_MODELS
变量名来指定默认模型。
5更改完之后,重新运行 Ollama。然后在浏览器中测试访问,验证更改是否成功。
6示例 API 调用: 要使用 Ollama API,可以在自己的程序里发送 HTTP 请求。下面是在「终端」里使用curl
命令给 Gemma 模型发送文字提示的例子:
curl http://192.168.100.10:8000/api/generate -d '{ "model": "gemma:7b", "prompt": "天空为什么是蓝色的?" }'
返回响应的格式,目前只支持 Json 格式。
Ollama 的常用命令有:
# 查看 Ollama 版本 ollama -v # 查看已安装的模型 ollama list # 删除指定模型 ollama rm [modelname] # 模型存储路径 # C:Users\.ollamamodels
按照上述步骤,并参考命令示例,你可以在 Windows 上尽情体验 Ollama 的强大功能。不管是在命令行中直接下达指令,通过 API 将 AI 模型集成到你的软件当中,还是通过前端套壳,Ollama 的大门都已经为你敞开。
Ollama on Windows 的最佳实践
要让 Ollama 在 Windows 上充分发挥最大潜力,需要注意以下几点最佳实践和技巧,这将帮助你优化性能并解决一些常见问题:
Optimize Ollama performance:
- Check hardware configuration: Make sure your device meets Ollama's recommended hardware requirements, especially when running large models. If you have an NVIDIA GPU, you can also enjoy automatic hardware acceleration provided by Ollama, which greatly improves computing speed.
- Update Drivers: Keep your graphics card drivers up to date to ensure compatibility and optimal performance with Ollama.
- Release system resources: When running large models or performing complex tasks, please close unnecessary programs to release system resources.
- Select the appropriate model: Select the appropriate model based on task requirements. Although large-parameter models may be more accurate, they also require higher computing power. For simple tasks, it is more efficient to use small parameter models.
Ollama FAQ
Installation issues
- Make sure your Windows system is the latest version.
- Make sure you have the necessary permissions to install the software.
- Try running the installer as administrator.
Model loading error
- Check whether the entered command is correct.
- Confirm that the model name matches the name in the Ollama model library.
- Check Ollama version and update.
Ollama API connection issue
- Make sure Ollama is running.
- Check the listening address and port, especially whether the port is occupied by other applications.
In this tutorial, we learned how to install and use Ollama on Windows, including installing Ollama, executing basic commands, using the Ollama model library, and connecting to Ollama through the API. I recommend you dig into Ollama and try out a variety of different models.
Ollama has unlimited potential, and with it, you can achieve more!
The above is the detailed content of Windows on Ollama: A new tool for running large language models (LLM) locally. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











DMA in C refers to DirectMemoryAccess, a direct memory access technology, allowing hardware devices to directly transmit data to memory without CPU intervention. 1) DMA operation is highly dependent on hardware devices and drivers, and the implementation method varies from system to system. 2) Direct access to memory may bring security risks, and the correctness and security of the code must be ensured. 3) DMA can improve performance, but improper use may lead to degradation of system performance. Through practice and learning, we can master the skills of using DMA and maximize its effectiveness in scenarios such as high-speed data transmission and real-time signal processing.

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

The built-in quantization tools on the exchange include: 1. Binance: Provides Binance Futures quantitative module, low handling fees, and supports AI-assisted transactions. 2. OKX (Ouyi): Supports multi-account management and intelligent order routing, and provides institutional-level risk control. The independent quantitative strategy platforms include: 3. 3Commas: drag-and-drop strategy generator, suitable for multi-platform hedging arbitrage. 4. Quadency: Professional-level algorithm strategy library, supporting customized risk thresholds. 5. Pionex: Built-in 16 preset strategy, low transaction fee. Vertical domain tools include: 6. Cryptohopper: cloud-based quantitative platform, supporting 150 technical indicators. 7. Bitsgap:

Handling high DPI display in C can be achieved through the following steps: 1) Understand DPI and scaling, use the operating system API to obtain DPI information and adjust the graphics output; 2) Handle cross-platform compatibility, use cross-platform graphics libraries such as SDL or Qt; 3) Perform performance optimization, improve performance through cache, hardware acceleration, and dynamic adjustment of the details level; 4) Solve common problems, such as blurred text and interface elements are too small, and solve by correctly applying DPI scaling.

C performs well in real-time operating system (RTOS) programming, providing efficient execution efficiency and precise time management. 1) C Meet the needs of RTOS through direct operation of hardware resources and efficient memory management. 2) Using object-oriented features, C can design a flexible task scheduling system. 3) C supports efficient interrupt processing, but dynamic memory allocation and exception processing must be avoided to ensure real-time. 4) Template programming and inline functions help in performance optimization. 5) In practical applications, C can be used to implement an efficient logging system.

To safely and thoroughly uninstall MySQL and clean all residual files, follow the following steps: 1. Stop MySQL service; 2. Uninstall MySQL packages; 3. Clean configuration files and data directories; 4. Verify that the uninstallation is thorough.

Measuring thread performance in C can use the timing tools, performance analysis tools, and custom timers in the standard library. 1. Use the library to measure execution time. 2. Use gprof for performance analysis. The steps include adding the -pg option during compilation, running the program to generate a gmon.out file, and generating a performance report. 3. Use Valgrind's Callgrind module to perform more detailed analysis. The steps include running the program to generate the callgrind.out file and viewing the results using kcachegrind. 4. Custom timers can flexibly measure the execution time of a specific code segment. These methods help to fully understand thread performance and optimize code.

Efficient methods for batch inserting data in MySQL include: 1. Using INSERTINTO...VALUES syntax, 2. Using LOADDATAINFILE command, 3. Using transaction processing, 4. Adjust batch size, 5. Disable indexing, 6. Using INSERTIGNORE or INSERT...ONDUPLICATEKEYUPDATE, these methods can significantly improve database operation efficiency.
