Home Backend Development Python Tutorial Detailed explanation of FP-Growth algorithm in Python

Detailed explanation of FP-Growth algorithm in Python

Jun 09, 2023 pm 08:24 PM
python algorithm fp-growth

FP-Growth algorithm is a classic frequent pattern mining algorithm. It is a very efficient algorithm for mining collections of items that often appear together from data sets. This article will introduce you to the principle and implementation method of FP-Growth algorithm in detail.

1. Basic Principle of FP-Growth Algorithm

The basic idea of ​​FP-Growth algorithm is to establish an FP-Tree (frequent itemset tree) to represent the frequent itemsets in the data set, and Mining frequent itemsets from FP-Tree. FP-Tree is an efficient data structure that can mine frequent itemsets without generating candidate frequent itemsets.

FP-Tree contains two parts: root node and tree node. The root node has no value, whereas the tree nodes include the name of an item and the number of times the item occurs. FP-Tree also includes links pointing to the same nodes, these links are called "link pointers".

The process of FP-Growth algorithm includes two parts: building FP-Tree and mining frequent itemsets:

  1. Building FP-Tree:

For For each transaction, non-frequent items are deleted and sorted according to the support of frequent items to obtain a frequent itemset.

Traverse each transaction, and insert the frequent itemset of each transaction into the FP-Tree in the order of appearance. If the node already exists, increase its count. If it does not exist, insert a new node. .

  1. Mining frequent itemsets:

The methods of mining frequent itemsets from FP-Tree include:

Start from the bottom of FP-Tree , find the conditional pattern library of each item set, and the conditional pattern library contains all transactions that contain the item set. Then, a new FP-Tree is recursively constructed for the conditional pattern library, and frequent itemsets in the tree are searched.

In the new FP-Tree, each frequent item is sorted according to its support, a set of candidates is constructed, and mined recursively. Repeat the above process until all frequent itemsets are found.

2. Implementation of FP-Growth algorithm

The FP-Growth algorithm can be implemented using the Python programming language. The following is a simple example to demonstrate the implementation of the FP-Growth algorithm.

First, define a data set, for example:

dataset = [['v', 'a', 'p', 'e', 's'],
           ['b', 'a', 'k', 'e'],
           ['a', 'p', 'p', 'l', 'e', 's'],
           ['d', 'i', 'n', 'n', 'e', 'r']]
Copy after login

Then, write a function to generate an ordered item set, for example:

def create_ordered_items(dataset):
    # 遍历数据集,统计每个项出现的次数
    item_dict = {}
    for trans in dataset:
        for item in trans:
            if item not in item_dict:
                item_dict[item] = 1
            else:
                item_dict[item] += 1

    # 生成有序项集
    ordered_items = [v[0] for v in sorted(item_dict.items(), key=lambda x: x[1], reverse=True)]
    return ordered_items
Copy after login

Among them, the create_ordered_items function is used to follow Get the ordered itemset by the number of occurrences of the item.

Next, write a function to build FP-Tree:

class TreeNode:
    def __init__(self, name, count, parent):
        self.name = name
        self.count = count
        self.parent = parent
        self.children = {}
        self.node_link = None

    def increase_count(self, count):
        self.count += count

def create_tree(dataset, min_support):
    # 生成有序项集
    ordered_items = create_ordered_items(dataset)

    # 建立根节点
    root_node = TreeNode('Null Set', 0, None)

    # 建立FP-Tree
    head_table = {}
    for trans in dataset:
        # 过滤非频繁项
        filtered_items = [item for item in trans if item in ordered_items]
        # 对每个事务中的项集按频繁项的支持度从大到小排序
        filtered_items.sort(key=lambda x: ordered_items.index(x))
        # 插入到FP-Tree中
        insert_tree(filtered_items, root_node, head_table)

    return root_node, head_table

def insert_tree(items, node, head_table):
    if items[0] in node.children:
        # 如果节点已存在,则增加其计数
        node.children[items[0]].increase_count(1)
    else:
        # 如果节点不存在,则插入新的节点
        new_node = TreeNode(items[0], 1, node)
        node.children[items[0]] = new_node
        # 更新链表中的指针
        if head_table.get(items[0], None) is None:
            head_table[items[0]] = new_node
        else:
            current_node = head_table[items[0]]
            while current_node.node_link is not None:
                current_node = current_node.node_link
            current_node.node_link = new_node

    if len(items) > 1:
        # 对剩余的项进行插入
        insert_tree(items[1:], node.children[items[0]], head_table)
Copy after login

The create_tree function is used to build FP-Tree.

Finally, write a function to mine frequent itemsets:

def find_freq_items(head_table, prefix, freq_items, min_support):
    # 对头指针表中的每个项按照出现的次数从小到大排序
    sorted_items = [v[0] for v in sorted(head_table.items(), key=lambda x: x[1].count)]
    for item in sorted_items:
        # 将前缀加上该项,得到新的频繁项
        freq_set = prefix + [item]
        freq_count = head_table[item].count
        freq_items.append((freq_set, freq_count))
        # 构建该项的条件模式库
        cond_pat_base = get_cond_pat_base(head_table[item])
        # 递归地构建新的FP-Tree,并寻找频繁项集
        sub_head_table, sub_freq_items = create_tree(cond_pat_base, min_support)
        if sub_head_table is not None:
            find_freq_items(sub_head_table, freq_set, freq_items, min_support)

def get_cond_pat_base(tree_node):
    cond_pat_base = []
    while tree_node is not None:
        trans = []
        curr = tree_node.parent
        while curr.parent is not None:
            trans.append(curr.name)
            curr = curr.parent
        cond_pat_base.append(trans)
        tree_node = tree_node.node_link
    return cond_pat_base

def mine_fp_tree(dataset, min_support):
    freq_items = []
    # 构建FP-Tree
    root_node, head_table = create_tree(dataset, min_support)
    # 挖掘频繁项集
    find_freq_items(head_table, [], freq_items, min_support)
    return freq_items
Copy after login

The mine_fp_tree function is used to mine frequent itemsets.

3. Summary

FP-Growth algorithm is an efficient frequent pattern mining algorithm. By constructing FP-Tree, frequent items can be mined without generating candidate frequent item sets. Collection excavation. Python is a programming language that is very suitable for implementing the FP-Growth algorithm. By using Python, we can quickly implement this algorithm and use it in practice to mine frequent itemsets. I hope this article can help you better understand the principles and implementation methods of the FP-Growth algorithm.

The above is the detailed content of Detailed explanation of FP-Growth algorithm in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1670
14
PHP Tutorial
1274
29
C# Tutorial
1256
24
PHP and Python: Different Paradigms Explained PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

Choosing Between PHP and Python: A Guide Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

How to run sublime code python How to run sublime code python Apr 16, 2025 am 08:48 AM

To run Python code in Sublime Text, you need to install the Python plug-in first, then create a .py file and write the code, and finally press Ctrl B to run the code, and the output will be displayed in the console.

PHP and Python: A Deep Dive into Their History PHP and Python: A Deep Dive into Their History Apr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Python vs. JavaScript: The Learning Curve and Ease of Use Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

Golang vs. Python: Performance and Scalability Golang vs. Python: Performance and Scalability Apr 19, 2025 am 12:18 AM

Golang is better than Python in terms of performance and scalability. 1) Golang's compilation-type characteristics and efficient concurrency model make it perform well in high concurrency scenarios. 2) Python, as an interpreted language, executes slowly, but can optimize performance through tools such as Cython.

Where to write code in vscode Where to write code in vscode Apr 15, 2025 pm 09:54 PM

Writing code in Visual Studio Code (VSCode) is simple and easy to use. Just install VSCode, create a project, select a language, create a file, write code, save and run it. The advantages of VSCode include cross-platform, free and open source, powerful features, rich extensions, and lightweight and fast.

How to run python with notepad How to run python with notepad Apr 16, 2025 pm 07:33 PM

Running Python code in Notepad requires the Python executable and NppExec plug-in to be installed. After installing Python and adding PATH to it, configure the command "python" and the parameter "{CURRENT_DIRECTORY}{FILE_NAME}" in the NppExec plug-in to run Python code in Notepad through the shortcut key "F6".

See all articles