


How to use scikit-learn machine learning library in Python.
Preface
scikit-learn is one of the most popular machine learning libraries in Python. It provides a variety of machine learning algorithms and tools, including classification, regression, clustering, dimensionality reduction, etc. .
The advantages of scikit-learn are:
Easy to use: The interface of scikit-learn is simple and easy to understand, allowing users to easily get started with machine learning. Unified API: The API of scikit-learn is very unified, and the methods of using various algorithms are basically the same, making learning and use more convenient.
Implements a large number of machine learning algorithms: scikit-learn implements various classic machine learning algorithms, and provides a wealth of tools and functions, making algorithm debugging and optimization more convenient. easy.
Open source and free: scikit-learn is completely open source and free, and anyone can use and modify its code.
Efficient and stable: scikit-learn implements various efficient machine learning algorithms, can handle large-scale data sets, and performs well in terms of stability and reliability. scikit-learn is very suitable for entry-level machine learning because the API is very unified and the model is relatively simple. My recommendation here is to study in conjunction with the official documentation, which not only introduces the scope of application of each model but also provides code samples.
Linear Regression Model-LinearRegression
The LinearRegression model is a model based on linear regression and is suitable for solving prediction problems of continuous variables. The basic idea of this model is to establish a linear equation, model the relationship between the independent variable and the dependent variable as a straight line, and use the training data to fit the straight line to find the coefficients of the linear equation, and then use this equation to test data for prediction.
LinearRegression model is suitable for problems where there is a linear relationship between independent variables and dependent variables, such as housing price prediction, sales prediction, user behavior prediction, etc. Of course, when the relationship between the independent variable and the dependent variable is nonlinear, the performance of the LinearRegression model will be poor. At this time, polynomial regression, ridge regression, Lasso regression and other methods can be used to solve the problem.
Prepare the data set
After putting aside the influence of other factors, there is a certain linear relationship between learning time and learning performance. Of course, the learning time here refers to the effective learning time, performance As the study time increases, the grades will also increase. So we prepare a data set of study time and grades. Part of the data in the data set is as follows:
Learning time, score
0.5,15
0.75,23
1.0,14
1.25,42
1.5,21
1.75,28
1.75,35
2.0,51
2.25,61
2.5,49
Use LinearRegression
to determine the feature sum Goal
Between study time and grades, study time is the feature, which is the independent variable; grade is the label, which is the dependent variable, so we need to extract features and labels from the prepared study time and grade data set.
import pandas as pd import numpy as np from sklearn.metrics import r2_score, mean_squared_error from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression # 读取学习时间和成绩CSV数据文件 data = pd.read_csv('data/study_time_score.csv') # 提取数据特征学习时间 X = data['学习时间'] # 提取数据目标(标签)分数 Y = data['分数']
Divide the training set and the test set
After the feature and label data are prepared, use scikit-learn's LinearRegression for training and divide the data set into a training set and a test set.
""" 将特征数据和目标数据划分为测试集和训练集 通过test_size=0.25将百分之二十五的数据划分为测试集 """ X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state=0) x_train = X_train.values.reshape(-1, 1) model.fit(x_train, Y_train)
Select the model and fit the data
After preparing the test set and training set, we can choose the appropriate model to fit the training set so that we can predict other The target corresponding to the feature
# 选择模型,选择模型为LinearRegression model = LinearRegression() # Scikit-learn中,机器学习模型的输入必须是一个二维数组。我们需要将一维数组转换为二维数组,才能在模型中使用。 x_train = X_train.values.reshape(-1, 1) # 进行拟合 model.fit(x_train, Y_train)
Get the model parameters
Since the data set only contains two learning time and grades, it is a very simple linear model, and the mathematical formula behind it is y=ax b , where the y dependent variable is grades, and the x independent variable is study time.
""" 输出模型关键参数 Intercept: 截距 即b Coefficients: 变量权重 即a """ print('Intercept:', model.intercept_) print('Coefficients:', model.coef_)
Backtest
The above fitting model only uses the test set data. Next, we need to use the test set data to conduct a backtest on the fitting of the model. After using the training set to fit, , we can predict the feature test set, and by comparing the obtained target prediction results with the actual target values, we can obtain the fitting degree of the model.
# 转换为n行1列的二维数组 x_test = X_test.values.reshape(-1, 1) # 在测试集上进行预测并计算评分 Y_pred = model.predict(x_test) # 打印测试特征数据 print(x_test) # 打印特征数据对应的预测结果 print(Y_pred) # 将预测结果与原特征数据对应的实际目标值进行比较,从而获得模型拟合度 # R2 (R-squared):模型拟合优度,取值范围在0~1之间,越接近1表示模型越好的拟合了数据。 print("R2:", r2_score(Y_test, Y_pred))
Program running results
According to the above code, we need to determine the fitting degree of the LinearRegression model, that is, whether the data is suitable or not. Use a linear model for fitting. The running results of the program are as follows:
##Prediction results:[47.43726068 33.05457106 49.83437561 63.41802692 41.84399249 37.84880093
23.46611131 37. 84880093 26.66226456 71.40841004 18.67188144 88.9872529
63.41802692 42.6430308 21.86803469 69.81033341 66.61418017 33.05457106
58.62379705 50.63341392 18.67188144 41.044954 0 .8935675710322939
The above is the detailed content of How to use scikit-learn machine learning library in Python.. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.

Running Python code in Notepad requires the Python executable and NppExec plug-in to be installed. After installing Python and adding PATH to it, configure the command "python" and the parameter "{CURRENT_DIRECTORY}{FILE_NAME}" in the NppExec plug-in to run Python code in Notepad through the shortcut key "F6".

To run Python code in Sublime Text, you need to install the Python plug-in first, then create a .py file and write the code, and finally press Ctrl B to run the code, and the output will be displayed in the console.
