Table of Contents
LightGBM algorithm
Principle of LightGBM
Features of LightGBM
Efficiency
Accuracy
Scalability
Ease of use
Import library
Load data
Divided data
Parameter settings
Random search parameter adjustment
Use the best parameter modeling
Home Technology peripherals AI LightGBM actual combat + random search parameter adjustment: accuracy rate 96.67%

LightGBM actual combat + random search parameter adjustment: accuracy rate 96.67%

Jun 08, 2024 pm 10:45 PM
search random LightGBM

LightGBM actual combat + random search parameter adjustment: accuracy rate 96.67%

Hello everyone, I am Peter~

LightGBM is a classic machine learning algorithm. Its background, principles and characteristics are very worthy of study. LightGBM's algorithm yields features such as high efficiency, scalability, and high accuracy. This article will briefly introduce the characteristics and principles of LightGBM as well as some cases based on LightGBM and random search optimization.

LightGBM algorithm

In the field of machine learning, Gradient Boosting Machines (GBMs) are a class of powerful ensemble learning algorithms that gradually add weak learners (usually decision trees) ) to minimize the prediction error and thereby build a powerful model. GBMs are often used to minimize the prediction error and thus build a powerful model, which can be achieved by minimizing the residual or loss function. This algorithm is widely used and often used to minimize the prediction error of strong models built with weak learners such as decision trees.

In the era of big data, the size of data sets has grown dramatically, and traditional GBMs are difficult to scale effectively due to their high computing and storage costs.

  • For example, for the horizontal segmentation decision tree growth strategy, although it can generate a balanced tree, it often leads to a decrease in the discrimination ability of the model; while for the leaf-based growth strategy, although it can improve the accuracy, it Easy to overfit.
  • In addition, most GBM implementations need to traverse the entire data set to calculate gradients in each iteration, which is inefficient when the amount of data is huge. Therefore, an algorithm that can efficiently process large-scale data while maintaining model accuracy is needed.

In order to solve these problems, Microsoft launched LightGBM (Light Gradient Boosting Machine) in 2017, a faster, lower memory consumption, and higher performance gradient boosting framework.

Official learning address: https://lightgbm.readthedocs.io/en/stable/

Principle of LightGBM

1. Decision tree algorithm based on histogram:

  • Principle: LightGBM uses histogram optimization technology to discretize continuous feature values ​​into specific bins (that is, the buckets of the histogram), reducing the amount of data that needs to be calculated when a node is split.
  • Advantages: This method can increase calculation speed while reducing memory usage.
  • Implementation details: For each feature, the algorithm maintains a histogram to record the statistical information of the feature in different buckets. When performing node splitting, the information of these histograms can be directly utilized without traversing all the data.

2. Leaf-wise tree growth strategy with depth restriction:

  • Principle: Unlike traditional horizontal splitting, the leaf-wise growth strategy starts from Select the node with the largest split profit among all current leaf nodes for splitting.
  • Advantages: This strategy can make the decision tree focus more on the abnormal parts of the data, which can usually lead to better accuracy.
  • Disadvantages: It can easily lead to overfitting, especially when there is noise in the data.
  • Improvement measures: LightGBM prevents overfitting by setting a maximum depth limit.

3. One-sided gradient sampling (GOSS):

  • Principle: For large gradient samples in the data set, the GOSS algorithm only retains a part of the data (usually the large gradient samples), reducing the amount of calculation while ensuring that too much information is not lost.
  • Advantages: This method can speed up training without significant loss of accuracy.
  • Application scenarios: Especially suitable for situations with serious data skew.

4. Mutually exclusive feature bundling (EFB):

  • Principle: EFB is a technology that reduces the number of features and improves computational efficiency. It combines mutually exclusive features (i.e. features that are never non-zero at the same time) to reduce feature dimensionality.
  • Advantages: Improved memory usage efficiency and training speed.
  • Implementation details: Through the mutual exclusivity of features, the algorithm can process more features at the same time, thereby reducing the actual number of features processed.

5. Support parallel and distributed learning:

  • Principle: LightGBM supports multi-threaded learning and can use multiple CPUs for parallel training.
  • Advantages: Significantly improves the training speed on multi-core processors.
  • Scalability: It also supports distributed learning and can use multiple machines to jointly train models.

6. Cache optimization:

  • Principle: The way of reading data is optimized, and more caches can be used to speed up data exchange.
  • Advantages: Especially on large data sets, cache optimization can significantly improve performance.

7. Supports multiple loss functions:

  • Features: In addition to commonly used regression and classification loss functions, LightGBM also supports custom loss functions to meet different needs. Business needs.

8. Regularization and pruning:

  • Principle: L1 and L2 regularization terms are provided to control model complexity and avoid overfitting.
  • Implementation: The backward pruning strategy is implemented to further prevent overfitting.

9. Model interpretability:

  • Features: Because it is a model based on decision trees, LightGBM has good model interpretability and can understand the decision-making logic of the model through feature importance and other methods.

Features of LightGBM

Efficiency

  • Speed ​​advantage: Through histogram optimization and leaf-wise growth strategy, LightGBM greatly improves accuracy while ensuring accuracy. Improved training speed.
  • Memory usage: LightGBM requires less memory than other GBM implementations, which allows it to handle larger data sets.

Accuracy

  • Best-priority growth strategy: The leaf-wise growth strategy adopted by LightGBM can fit the data more closely and can usually obtain better results than horizontal segmentation. Good accuracy.
  • Methods to avoid overfitting: By setting a maximum depth limit and backward pruning, LightGBM can avoid overfitting while improving model accuracy.

Scalability

  • Parallel and distributed learning: LightGBM is designed to support multi-threading and distributed computing, which allows it to fully utilize the computing power of modern hardware.
  • Multi-platform support: LightGBM can run on multiple operating systems such as Windows, macOS, and Linux, and supports multiple programming languages ​​such as Python, R, and Java.

Ease of use

  • Parameter tuning: LightGBM provides a wealth of parameter options to facilitate users to adjust according to specific problems.
  • Pre-trained model: Users can start from a pre-trained model to speed up their modeling process.
  • Model interpretation tools: LightGBM provides feature importance evaluation tools to help users understand the decision-making process of the model.

Import library

In [1]:

import numpy as npimport lightgbm as lgbfrom sklearn.model_selection import train_test_split, RandomizedSearchCVfrom sklearn.datasets import load_irisfrom sklearn.metrics import accuracy_scoreimport warningswarnings.filterwarnings("ignore")
Copy after login

Load data

Load the public iris data set:

In [2]:

# 加载数据集data = load_iris()X, y = data.data, data.targety = [int(i) for i in y]# 将标签转换为整数
Copy after login

In [3]:

X[:3]
Copy after login

Out[3]:

array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], [4.7, 3.2, 1.3, 0.2]])
Copy after login

In [4]:

y[:10]
Copy after login

Out[4]:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Copy after login

Divided data

In [5]:

# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Copy after login

Create LightGBM dataset at the same time:

In [6]:

lgb_train = lgb.Dataset(X_train, label=y_train)
Copy after login

Parameter settings

In [7]:

# 设置参数范围param_dist = {'boosting_type': ['gbdt', 'dart'],# 提升类型梯度提升决策树(gbdt)和Dropouts meet Multiple Additive Regression Trees(dart)'objective': ['binary', 'multiclass'],# 目标;二分类和多分类'num_leaves': range(20, 150),# 叶子节点数量'learning_rate': [0.01, 0.05, 0.1],# 学习率'feature_fraction': [0.6, 0.8, 1.0],# 特征采样比例'bagging_fraction': [0.6, 0.8, 1.0],# 数据采样比例'bagging_freq': range(0, 80),# 数据采样频率'verbose': [-1]# 是否显示训练过程中的详细信息,-1表示不显示}
Copy after login

Random search parameter adjustment

In [8]:

# 初始化模型model = lgb.LGBMClassifier()# 使用随机搜索进行参数调优random_search = RandomizedSearchCV(estimator=model, param_distributinotallow=param_dist, # 参数组合 n_iter=100,  cv=5, # 5折交叉验证 verbose=2,  random_state=42,  n_jobs=-1)# 模型训练random_search.fit(X_train, y_train)Fitting 5 folds for each of 100 candidates, totalling 500 fits
Copy after login

Output the best parameter combination:

In [9]:

# 输出最佳参数print("Best parameters found: ", random_search.best_params_)Best parameters found:{'verbose': -1, 'objective': 'multiclass', 'num_leaves': 87, 'learning_rate': 0.05, 'feature_fraction': 0.6, 'boosting_type': 'gbdt', 'bagging_freq': 22, 'bagging_fraction': 0.6}
Copy after login

Use the best parameter modeling

In [10]:

# 使用最佳参数训练模型best_model = random_search.best_estimator_best_model.fit(X_train, y_train)# 预测y_pred = best_model.predict(X_test)y_pred = [round(i) for i in y_pred]# 将概率转换为类别# 评估模型print('Accuracy: %.4f' % accuracy_score(y_test, y_pred))Accuracy: 0.9667
Copy after login

The above is the detailed content of LightGBM actual combat + random search parameter adjustment: accuracy rate 96.67%. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1667
14
PHP Tutorial
1273
29
C# Tutorial
1255
24
How to remove news and trending content from Windows 11 Search How to remove news and trending content from Windows 11 Search Oct 16, 2023 pm 08:13 PM

When you click the search field in Windows 11, the search interface automatically expands. It displays a list of recent programs on the left and web content on the right. Microsoft displays news and trending content there. Today's check promotes Bing's new DALL-E3 image generation feature, the "Chat Dragons with Bing" offer, more information about dragons, top news from the Web section, game recommendations, and the Trending Search section. The entire list of items is independent of your activity on your computer. While some users may appreciate the ability to view news, all of this is abundantly available elsewhere. Others may directly or indirectly classify it as promotion or even advertising. Microsoft uses interfaces to promote its own content,

How to use Baidu advanced search How to use Baidu advanced search Feb 22, 2024 am 11:09 AM

How to use Baidu Advanced Search Baidu search engine is currently one of the most commonly used search engines in China. It provides a wealth of search functions, one of which is advanced search. Advanced search can help users search for the information they need more accurately and improve search efficiency. So, how to use Baidu advanced search? The first step is to open the Baidu search engine homepage. First, we need to open Baidu’s official website, which is www.baidu.com. This is the entrance to Baidu search. In the second step, click the Advanced Search button. On the right side of the Baidu search box, there is

How to search for users in Xianyu How to search for users in Xianyu Feb 24, 2024 am 11:25 AM

How does Xianyu search for users? In the software Xianyu, we can directly find the users we want to communicate with in the software. But I don’t know how to search for users. Just view it among the users after searching. Next is the introduction that the editor brings to users about how to search for users. If you are interested, come and take a look! How to search for users in Xianyu? Answer: View details among the searched users. Introduction: 1. Enter the software and click on the search box. 2. Enter the user name and click Search. 3. Select [User] under the search box to find the corresponding user.

WPS table cannot find the data you are searching for, please check the search option location WPS table cannot find the data you are searching for, please check the search option location Mar 19, 2024 pm 10:13 PM

In the era dominated by intelligence, office software has also become popular, and Wps forms are adopted by the majority of office workers due to their flexibility. At work, we are required not only to learn simple form making and text entry, but also to master more operational skills in order to complete the tasks in actual work. Reports with data and using forms are more convenient, clear and accurate. The lesson we bring to you today is: The WPS table cannot find the data you are searching for. Why please check the search option location? 1. First select the Excel table and double-click to open it. Then in this interface, select all cells. 2. Then in this interface, click the "Edit" option in "File" in the top toolbar. 3. Secondly, in this interface, click "

Generate random numbers and strings in JavaScript Generate random numbers and strings in JavaScript Sep 02, 2023 am 08:57 AM

The ability to generate random numbers or alphanumeric strings comes in handy in many situations. You can use it to spawn enemies or food at different locations in the game. You can also use it to suggest random passwords to users or create filenames to save files. I wrote a tutorial on how to generate random alphanumeric strings in PHP. I said at the beginning of this post that few events are truly random, and the same applies to random number or string generation. In this tutorial, I'll show you how to generate a pseudo-random alphanumeric string in JavaScript. Generating Random Numbers in JavaScript Let’s start by generating random numbers. The first method that comes to mind is Math.random(), which returns a float

How to search a directory with a specific file extension in Java? How to search a directory with a specific file extension in Java? Aug 31, 2023 am 08:13 AM

The following example prints files in a directory based on their extension - Example importjava.io.IOException;importjava.nio.file.Files;importjava.nio.file.Path;importjava.nio.file.Paths;importjava.util.stream.Stream; publicclassDemo{ publicstaticvoidmain(String[]args)throwsIOException{&nbsp

How to search for stores on mobile Taobao How to search for store names How to search for stores on mobile Taobao How to search for store names Mar 13, 2024 am 11:00 AM

The mobile Taobao app software provides a lot of good products. You can buy them anytime and anywhere, and everything is genuine. The price tag of each product is clear. There are no complicated operations at all, making you enjoy more convenient shopping. . You can search and purchase freely as you like. The product sections of different categories are all open. Add your personal delivery address and contact number to facilitate the courier company to contact you, and check the latest logistics trends in real time. Then some new users are using it for the first time. If you don’t know how to search for products, of course you only need to enter keywords in the search bar to find all the product results. You can’t stop shopping freely. Now the editor will provide detailed online methods for mobile Taobao users to search for store names. 1. First open the Taobao app on your mobile phone,

The difference between random and pseudo-random The difference between random and pseudo-random Oct 10, 2023 am 09:27 AM

The difference between random and pseudo-random is predictability, reproducibility, uniformity and security. Detailed introduction: 1. Predictability. Random numbers cannot be predicted. Even if the past results are known, future results cannot be accurately predicted. Pseudo-random numbers can be predicted because they are generated by algorithms. As long as you know the algorithm and seed, you can regenerate the same sequence or sequence; 2. Reproducibility, random numbers are not reproducible, and the results generated each time are independent, while pseudo-random numbers are reproducible. Yes, just use the same algorithm and seeds etc.

See all articles