


What are the common algorithms for supervised learning? How are they applied?
What is supervised learning?
Supervised learning is a subset of machine learning. Supervised learning labels the input data of the machine learning model and exercises it. Therefore, the supervised model can predict the output of the model to the maximum extent.
The concept behind supervised learning can also be found in real life, such as teachers tutoring children. Suppose the teacher wants to teach children to recognize images of cats and dogs. S/he will tutor the child by continuously showing the child an image of a cat or a dog while informing the child whether the image is a dog or a cat.
The process of displaying and informing images can be thought of as labeling data. During the training process of the machine learning model, you will be told which data belongs to which category.
What is the use of supervised learning? Supervised learning can be used for both regression and classification problems. Classification models allow algorithms to determine which group given data belongs to. Examples might include True/False, Dog/Cat, etc.
Because regression models can predict future values based on historical data, they can be used to predict employee wages or real estate sales prices.
In this article, we will list some common algorithms used for supervised learning, as well as practical tutorials on such algorithms.
Linear Regression
Linear regression is a supervised learning algorithm that predicts an output value based on a given input value. Linear regression is used when the target (output) variable returns a continuous value.
There are two main types of linear algorithms, simple linear regression and multiple linear regression.
Simple linear regression uses only one independent (input) variable. An example is predicting a child's age given a height.
On the other hand, multiple linear regression can use multiple independent variables to predict its final outcome. An example is predicting the price of a given property based on its location, size, demand, etc.
The following is the linear regression formula
For the Python example, we will use linear regression to predict the y value relative to a given x value.
The data set we are given contains only two columns: x and y. Note that the y result will return continuous values.
The following is a screenshot of the given data set:
Example of linear regression model using Python
1.Import the necessary libraries
import numpy as np <br>import pandas as pd <br>import matplotlib.pyplot as plt <br>import seaborn as sns from sklearn <br>import linear_model from sklearn.model_selection <br>import train_test_split import os
2. Reading and sampling our data set
To simplify the data set, we sampled 50 A sample of the data rows and rounds the data values to 2 significant figures.
#Please note that you should import the given dataset before completing this step.
df = pd.read_csv("../input/random-linear-regression/train.csv") <br>df=df.sample(50) df=round(df,2)
3. Filter Null and Infinite values
If the data set contains null values and infinite values, an error may occur. Therefore, we will use the clean_dataset function to clean the dataset of these values.
def clean_dataset(df): <br>assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame" <br>df.dropna(inplace=True) <br>indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1) <br>return df[indices_to_keep].astype(np.float64)<br>df=clean_dataset(df)
4. Choose our values of dependence and independence
Please note that we Convert the data to DataFrame format. The dataframe data type is a two-dimensional structure that aligns our data into rows and columns.
5. Split the data set
We divide the data set into training and Test part. The test data set size was chosen to be 20% of the total data set.
Please note that by setting random_state=1, the same data split will occur every time the model is run, resulting in the exact same training and test data set.
#This is useful in situations where you want to further tune the model.
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=1)
6.建立线性回归模型
使用导入的线性回归模型,我们可以在模型中自由使用线性回归算法,绕过我们为给定模型获得的 x 和 y 训练变量。
lm=linear_model.LinearRegression() lm.fit(x_train,y_train)
7. 以分散的方式绘制我们的数据
df.plot(kind="scatter", x="x", y="y")
8. 绘制我们的线性回归线
plt.plot(X,lm.predict(X), color="red")
蓝点表示数据点,而红线是模型绘制的最佳拟合线性回归线。线性模型算法总是会尝试绘制最佳拟合线以尽可能准确地预测结果。
逻辑回归
与线性回归类似,逻辑回归根据输入变量预测输出值,两种算法的主要区别在于逻辑回归算法的输出是分类(离散)变量。
对于 Python的示例,会使用逻辑回归将“花”分成两个不同的类别/种类。在给定的数据集中会包括不同花的多个特征。
模型的目的是将给花识别为Iris-setosa、Iris-versicolor或 Iris-virginica 几个种类。
下面是给定数据集的截图:
使用 Python 的逻辑回归模型示例
1.导入必要的库
import numpy as np <br>import pandas as pd from sklearn.model_selection <br>import train_test_split import warnings warnings.filterwarnings('ignore')
2. 导入数据集
data = pd.read_csv('../input/iris-dataset-logistic-regression/iris.csv')
3. 选择我们依赖和独立的价值观
对于独立 value(x) ,将包括除类型列之外的所有可用列。至于我们的可靠值(y),将只包括类型列。
X = data[['x0','x1','x2','x3','x4']] <br>y = data[['type']]
4. 拆分数据集
将数据集分成两部分,80% 用于训练数据集,20% 用于测试数据集。
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size=0.2, random_state=1)
5. 运行逻辑回归模型
从 linear_model 库中导入整个逻辑回归算法。然后我们可以将 X 和 y 训练数据拟合到逻辑模型中。
from sklearn.linear_model import LogisticRegression <br>model = LogisticRegression(random_state = 0) <br>model.fit(X_train, y_train)
6. 评估我们模型的性能
print(lm.score(x_test, y_test))
返回值为0.9845128775509371,这表明我们模型的高性能。
请注意,随着测试分数的增加,模型的性能也会增加。
7. 绘制图表
import matplotlib.pyplot as plt %matplotlib inline <br>plt.plot(range(len(X_test)), pred,'o',c='r')
输出图:
在逻辑图中,红点表示给定的数据点。这些点清楚地分为 3 类,Virginica、versicolor 和 setosa 花种。
使用这种技术,逻辑回归模型可以根据花在图表上的位置轻松对花类型进行分类。
支持向量机
支持向量机( SVM) 算法是另一个著名的监督机器学习模型,由 Vladimir Vapnik 创建,它能够解决分类和回归问题。实际上它更多地被用到解决分类问题。
SVM 算法能够将给定的数据点分成不同的组。算法在绘制出数据之后,可以绘制最合适的线将数据分成多个类别,从而分析数据之间的关系。
如下图所示,绘制的线将数据集完美地分成 2 个不同的组,蓝色和绿色。
SVM 模型可以根据图形的维度绘制直线或超平面。行只能用于二维数据集,这意味着只有 2 列的数据集。
如果是多个特征来预测数据集,就需要更高的维度。在数据集超过 2 维的情况下,支持向量机模型将绘制超平面。
在支持向量机 Python 的示例中,将对 3 种不同的花卉类型进行物种分类。我们的自变量包括花的所有特征,而因变量是花所属物种。
花卉品种包括Iris-setosa、 Iris-versicolor和Iris-virginica。
下面是数据集的截图:
使用 Python 的支持向量机模型示例
1.导入必要的库
import numpy as np <br>import pandas as pd from sklearn.model_selection <br>import train_test_split from sklearn.datasets <br>import load_iris
2. 读取给定的数据集
请注意,在执行此步骤之前,应该导入数据集。
data = pd.read_csv(‘../input/iris-flower-dataset/IRIS.csv’)
3. 将数据列拆分为因变量和自变量
将 X 值作为自变量,其中包含除物种列之外的所有列。
因变量y仅包含模型预测的物种列。
X = data.drop(‘species’, axis=1) y = data[‘species’]
4. 将数据集拆分为训练和测试数据集
将数据集分为两部分,其中我们将 80% 的数据放入训练数据集中,将 20% 放入测试数据集中。
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
5.导入SVM并运行模型
导入了支持向量机算法。然后,使用上面步骤中收到的 X 和 y 训练数据集运行它。
from sklearn.svm import SVC <br>model = SVC( ) <br>model.fit(X_train, y_train)
6. 测试模型的性能
model.score(X_test, y_test)
为了评估模型的性能,将使用 score 函数。在第四步中创建的 X 和 y 测试值输入到 score 方法中。
返回值为0.9666666666667,这表明模型的高性能。
请注意,随着测试分数的增加,模型的性能也会增加。
Other popular supervised machine learning algorithms
Although linear, logistic and SVM algorithms are very reliable, they stillwill Mention some supervised machine learning algorithms.
1. Decision Tree
Decision Tree Algorithm is a supervised machine learning model that uses a tree structure to make decisions. Decision trees are often used in classification problems where a model can decide which group a given item in a data set belongs to.
#Please note that the tree format used is an inverted tree format.
2. Random Forest
is considered a more complex algorithm, The random forest algorithm achieves its ultimate goal by building a large number of decision trees.
# means building multiple decision trees simultaneously, each returning its own results, which are then combined to get a better result.
#For classification problems, the random forest model will generate multiple decision trees and classify a given object based on the classification group predicted by the majority of the trees.
The model can fix overfitting caused by a single treeProblem. Also, the random forest algorithm can also be used for regression, although it may lead to undesirable results. 3. k-recent
neighbor
#kRecent
Neighbor(KNN) algorithm is a supervised machine learning method that groups all given data into in a separate group. #This grouping is based on common characteristics between different individuals. The KNN algorithm can be used for both classification and regression problems.
KNN’s
ClassicExample isClassify animal images into different groups. This article introduces Supervised machine learning and how it can solve The two types of problems , and explain classification and regression problems, gives some examples of each output data type. DetailsExplains what linear regression is and how it works, and provides a Python A specific example that will predict the Y value based on the independent X variable. ThenandIntroduction#Logistic regression model, and give Shown is an example of a classification model that classifies a given image into specific flower species. For the support vector machine algorithm, can be used It predicts a given flower species of 3 different flower species. Finallylistsother famous supervised machine learning algorithms, such as decision-making Trees, random forests, and K-nearest neighbor algorithms. Whether you are studying or work Still reading this article for fun, we think Understanding these algorithms is the start of getting into the machine A beginning in the field of learning. If you are interested and want to learn more about the field of machine learning, we recommend youGo deeper Study how such algorithms work and how such models can be tuned to further improve their performance. Translator introduction Cui Hao, 51CTO community editor and senior architect, has 18 years of software development and architecture experience and 10 years of distributed architecture experience. Formerly a technical expert at HP. He is willing to share and has written many popular technical articles with more than 600,000 reads. Author of "Principles and Practice of Distributed Architecture" . Original title: ##Primary Supervised Learning Algorithms Used in Machine Learning, Author: Kevin Vu Summary
The above is the detailed content of What are the common algorithms for supervised learning? How are they applied?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











This site reported on June 27 that Jianying is a video editing software developed by FaceMeng Technology, a subsidiary of ByteDance. It relies on the Douyin platform and basically produces short video content for users of the platform. It is compatible with iOS, Android, and Windows. , MacOS and other operating systems. Jianying officially announced the upgrade of its membership system and launched a new SVIP, which includes a variety of AI black technologies, such as intelligent translation, intelligent highlighting, intelligent packaging, digital human synthesis, etc. In terms of price, the monthly fee for clipping SVIP is 79 yuan, the annual fee is 599 yuan (note on this site: equivalent to 49.9 yuan per month), the continuous monthly subscription is 59 yuan per month, and the continuous annual subscription is 499 yuan per year (equivalent to 41.6 yuan per month) . In addition, the cut official also stated that in order to improve the user experience, those who have subscribed to the original VIP

Improve developer productivity, efficiency, and accuracy by incorporating retrieval-enhanced generation and semantic memory into AI coding assistants. Translated from EnhancingAICodingAssistantswithContextUsingRAGandSEM-RAG, author JanakiramMSV. While basic AI programming assistants are naturally helpful, they often fail to provide the most relevant and correct code suggestions because they rely on a general understanding of the software language and the most common patterns of writing software. The code generated by these coding assistants is suitable for solving the problems they are responsible for solving, but often does not conform to the coding standards, conventions and styles of the individual teams. This often results in suggestions that need to be modified or refined in order for the code to be accepted into the application

Large Language Models (LLMs) are trained on huge text databases, where they acquire large amounts of real-world knowledge. This knowledge is embedded into their parameters and can then be used when needed. The knowledge of these models is "reified" at the end of training. At the end of pre-training, the model actually stops learning. Align or fine-tune the model to learn how to leverage this knowledge and respond more naturally to user questions. But sometimes model knowledge is not enough, and although the model can access external content through RAG, it is considered beneficial to adapt the model to new domains through fine-tuning. This fine-tuning is performed using input from human annotators or other LLM creations, where the model encounters additional real-world knowledge and integrates it

To learn more about AIGC, please visit: 51CTOAI.x Community https://www.51cto.com/aigc/Translator|Jingyan Reviewer|Chonglou is different from the traditional question bank that can be seen everywhere on the Internet. These questions It requires thinking outside the box. Large Language Models (LLMs) are increasingly important in the fields of data science, generative artificial intelligence (GenAI), and artificial intelligence. These complex algorithms enhance human skills and drive efficiency and innovation in many industries, becoming the key for companies to remain competitive. LLM has a wide range of applications. It can be used in fields such as natural language processing, text generation, speech recognition and recommendation systems. By learning from large amounts of data, LLM is able to generate text

Machine learning is an important branch of artificial intelligence that gives computers the ability to learn from data and improve their capabilities without being explicitly programmed. Machine learning has a wide range of applications in various fields, from image recognition and natural language processing to recommendation systems and fraud detection, and it is changing the way we live. There are many different methods and theories in the field of machine learning, among which the five most influential methods are called the "Five Schools of Machine Learning". The five major schools are the symbolic school, the connectionist school, the evolutionary school, the Bayesian school and the analogy school. 1. Symbolism, also known as symbolism, emphasizes the use of symbols for logical reasoning and expression of knowledge. This school of thought believes that learning is a process of reverse deduction, through existing

Editor |ScienceAI Question Answering (QA) data set plays a vital role in promoting natural language processing (NLP) research. High-quality QA data sets can not only be used to fine-tune models, but also effectively evaluate the capabilities of large language models (LLM), especially the ability to understand and reason about scientific knowledge. Although there are currently many scientific QA data sets covering medicine, chemistry, biology and other fields, these data sets still have some shortcomings. First, the data form is relatively simple, most of which are multiple-choice questions. They are easy to evaluate, but limit the model's answer selection range and cannot fully test the model's ability to answer scientific questions. In contrast, open-ended Q&A

Editor | KX In the field of drug research and development, accurately and effectively predicting the binding affinity of proteins and ligands is crucial for drug screening and optimization. However, current studies do not take into account the important role of molecular surface information in protein-ligand interactions. Based on this, researchers from Xiamen University proposed a novel multi-modal feature extraction (MFE) framework, which for the first time combines information on protein surface, 3D structure and sequence, and uses a cross-attention mechanism to compare different modalities. feature alignment. Experimental results demonstrate that this method achieves state-of-the-art performance in predicting protein-ligand binding affinities. Furthermore, ablation studies demonstrate the effectiveness and necessity of protein surface information and multimodal feature alignment within this framework. Related research begins with "S

According to news from this site on August 1, SK Hynix released a blog post today (August 1), announcing that it will attend the Global Semiconductor Memory Summit FMS2024 to be held in Santa Clara, California, USA from August 6 to 8, showcasing many new technologies. generation product. Introduction to the Future Memory and Storage Summit (FutureMemoryandStorage), formerly the Flash Memory Summit (FlashMemorySummit) mainly for NAND suppliers, in the context of increasing attention to artificial intelligence technology, this year was renamed the Future Memory and Storage Summit (FutureMemoryandStorage) to invite DRAM and storage vendors and many more players. New product SK hynix launched last year
