Home Backend Development Python Tutorial Detailed introduction to collecting station B live broadcast barrages based on asyncio asynchronous coroutine framework

Detailed introduction to collecting station B live broadcast barrages based on asyncio asynchronous coroutine framework

Mar 28, 2017 pm 03:32 PM
asyncio

This article shares with you an asynchronous coroutine based on asyncioframework Implement a simple design to collect live broadcast barrage collection system of Station B, and attach the source code. Friends in need can refer to the following

">

Preface

Although the title It is the entire site, but currently it only collects all-day barrages for level top 100 live broadcast rooms.

The barrage collection system is modified based on the previous Bilibili live broadcast Danmakuji Python version. For specific protocol analysis, please see the previous article.

The live barrage protocol is directly based on the TCP protocol, so it would be difficult for Station B to take countermeasures against behavior like mine. There should be technical means that I don't know about to detect malicious behavior like mine.

I have tried connecting 100 rooms at the same time and connecting a single room 100 times, and there is no problem. >150 will be closed.

Selection of live broadcast rooms

Now the barrage collection system is relatively simple in selecting live broadcast rooms, and directly selects the top 100 level.

This part will be modified in the future, and it will be changed to regularly go to http://live.bilibili.com/all to check the newly launched live broadcast rooms, and dynamically add tasks.

Asynchronous tasks and barrage storage

The collection system still uses the asyncio asynchronous coroutine framework. For each live broadcast room, the following method is used to add it to the loop.

danmuji = bilibiliClient(url, self.lock, self.commentq, self.numq)
task1 = asyncio.ensure_future(danmuji.connectServer())
task2 = asyncio.ensure_future(danmuji.HeartbeatLoop())
Copy after login

In fact, if you put the heartbeat task HeartbeatLoop into connectorServer to start, the code will look more elegant. But the reason for this is that I need to maintain a task list, which will be described later.

I spent some time choosing the barrage storage.

Database storage is a synchronous IO process. Insert will block the barrage collection task. Although there is an asynchronous interface like aiomysql, configuring the database is too troublesome. My idea is that this small system can be easily deployed.

In the end I chose to use the built-in sqlite3. However, sqlite3 cannot perform parallel operations, so a thread is opened for database storage alone. In another thread, 100 * 2 tasks collect all barrage and number of people information, and stuff them into queue commentq, numq. The storage thread wakes up every 10 seconds, writes the data in the queue into sqlite3, and clears the queue.

With the cooperation of multi-threading and asynchronous, network traffic is not blocked.

Possible connection failure scenario processing

The barrage protocol is directly based on TCP, and the bits are directly related to each other. Once the parsing error occurs, it is easy to throw Exception (Personally, although TCP is a reliable transmission, it is possible for the B station server itself to have errors). Therefore, it is necessary to design an automatic reconnection mechanism.

Mentioned in the asyncio documentation,

Done means either that a result / exception are available, or that the future was canceled.

Function Returns normally, throws an exception or is canceled, it will exit the current task. You can use done() to determine.

Each live broadcast room corresponds to two tasks. The parsing task is the easiest to fail, but it will not affect the heartbeat task, so the corresponding heartbeat task must be found and ended.
Use a dictionary to record the two tasks in each room when creating the task,

self.tasks[url] = [task1, task2]

in During the operation, a check is made every 10 seconds.

for url in self.tasks:
  item = self.tasks[url]
  task1 = item[0]
  task2 = item[1]
  if task1.done() == True or task2.done() == True:
    if task1.done() == False:
      task1.cancel()
    if task2.done() == False:
      task2.cancel()
    danmuji = bilibiliClient(url, self.lock, self.commentq, self.numq)
    task11 = asyncio.ensure_future(danmuji.connectServer())
    task22 = asyncio.ensure_future(danmuji.HeartbeatLoop())
    self.tasks[url] = [task11, task22]
Copy after login

Actually, I have only seen one task failure scenario. It was because the host's room was blocked, making it impossible to enter the live broadcast room.

Conclusion

  1. The number of people at Station B is calculated based on the number of links connecting to the barrage server. By manipulating the number of links, you can instantly increase the number of viewers. Is there any business opportunity?

  2. In the past few days of operation, I found that even if most rooms are not live broadcasting, there can still be >5 people, including early morning. I can only guess that there are people like me collecting barrages 24 hours a day.

  3. top100 average 40M barrage data per day.

  4. What can you do with the collected barrages? I haven’t thought about it yet, maybe I can use it for user behavior analysis -_^

The above is the detailed content of Detailed introduction to collecting station B live broadcast barrages based on asyncio asynchronous coroutine framework. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Advanced Guide to Python asyncio: From Beginner to Expert Advanced Guide to Python asyncio: From Beginner to Expert Mar 04, 2024 am 09:43 AM

Concurrent and Asynchronous Programming Concurrent programming deals with multiple tasks executing simultaneously, asynchronous programming is a type of concurrent programming in which tasks do not block threads. asyncio is a library for asynchronous programming in python, which allows programs to perform I/O operations without blocking the main thread. Event loop The core of asyncio is the event loop, which monitors I/O events and schedules corresponding tasks. When a coroutine is ready, the event loop executes it until it waits for I/O operations. It then pauses the coroutine and continues executing other coroutines. Coroutines Coroutines are functions that can pause and resume execution. The asyncdef keyword is used to create coroutines. The coroutine uses the await keyword to wait for the I/O operation to complete. The following basics of asyncio

What are the commonly used functions of asyncio.task in Python Asyncio library? What are the commonly used functions of asyncio.task in Python Asyncio library? May 12, 2023 pm 07:49 PM

0. Basics In "Python Asyncio Scheduling Principles", Asyncio's two basic scheduling units, Handler and TimeHandler, are introduced. They can only be called by the loop.call_xx function. Developers do not know their existence on the surface. They are different from loop.call_xx. It is the basic function of the event loop, but these operations are single operations, and developers need to write their own code to connect their operations in series. In "The Role of Python's Waitable Objects in Asyncio", it is introduced that asyncio.Task, the initiator of the coroutine chain, can interact with the event loop through loop.call_soon and connect the entire coroutine in series.

Python asynchronous programming: A way to achieve efficient concurrency in asynchronous code Python asynchronous programming: A way to achieve efficient concurrency in asynchronous code Feb 26, 2024 am 10:00 AM

1. Why use asynchronous programming? Traditional programming uses blocking I/O, which means that the program waits for an operation to complete before continuing. This may work well for a single task, but may cause the program to slow down when processing a large number of tasks. Asynchronous programming breaks the limitations of traditional blocking I/O and uses non-blocking I/O, which means that the program can distribute tasks to different threads or event loops for execution without waiting for the task to complete. This allows the program to handle multiple tasks simultaneously, improving the program's performance and efficiency. 2. The basis of Python asynchronous programming The basis of Python asynchronous programming is coroutines and event loops. Coroutines are functions that allow a function to switch between suspending and resuming. The event loop is responsible for scheduling

Golang coroutines and asyncio Golang coroutines and asyncio Apr 15, 2024 pm 02:15 PM

Golang coroutines and Pythonasyncio are both concurrent programming tools. Coroutines are lightweight threads that run concurrently on the same thread; asyncio uses an event loop to handle I/O events. Golang's coroutine syntax is concise, its performance is better than asyncio, and it is suitable for intensive calculations; asyncio's asynchronous features are suitable for processing a large number of I/O events, and its syntax is easier to use, making it suitable for Python developers. It is important to choose the most appropriate technology based on application needs and developer preferences.

Anatomy of the GIL: Identifying and Overcoming Concurrent Obstacles Anatomy of the GIL: Identifying and Overcoming Concurrent Obstacles Mar 02, 2024 pm 04:10 PM

Python's Global Interpreter Lock (GIL) is a synchronization mechanism that ensures that the Python interpreter can only execute one thread at a time. This helps prevent data races and guarantee thread safety, but can also limit the performance of parallel computing, especially on multi-core systems. The role of GIL The role of GIL is to prevent multiple threads from accessing shared data at the same time, leading to race conditions. It does this by acquiring a lock every time the bytecode is executed. When one thread acquires the GIL, other threads are blocked until the lock is released. Disadvantages of GIL Although GIL provides thread safety, it also negatively affects the performance of multi-threaded Python programs. Since the GIL limits parallel execution, it cannot be fully exploited on multi-core systems

Python asynchronous programming: Revealing the secrets of asynchronous programming, from entry to mastery Python asynchronous programming: Revealing the secrets of asynchronous programming, from entry to mastery Feb 26, 2024 am 09:16 AM

What is asynchronous programming? Asynchronous programming is a programming paradigm that allows a program to perform multiple tasks concurrently without blocking. Unlike traditional synchronous programming, in asynchronous programming, when a task needs to wait for other tasks to complete, it will not be blocked, but can continue to perform other tasks. This way, the program can handle multiple tasks simultaneously, thereby improving the overall performance of the program. Asynchronous programming in Python Python 3.4 and higher supports asynchronous programming. Asynchronous programming is mainly implemented in Python through coroutines and asyncio modules. A coroutine is a special function in Python that allows a program to pause and resume execution without blocking. The asyncio module is an async in Python

How to use Python's asyncio common functions? How to use Python's asyncio common functions? Apr 26, 2023 pm 08:10 PM

The definition of coroutine requires the use of asyncdef statements. What can coroutines do: 1. Wait for a future result 2. Wait for another coroutine (produce a result or raise an exception) 3. Produce a result to the coroutine that is waiting for it 4 , throw an exception to the coroutine that is waiting for it to run and call the coroutine function. The coroutine will not start running, but just returns a coroutine object. There are two ways to make the coroutine object run: 1. When another coroutine is already running, Use await to wait for it in the running coroutine. 2. Plan its execution through the ensure_future function. Only when the loop of a certain thread is running, can the coroutine run. The following example: first get the default loop of the current thread, and then transfer the coroutine object. Leave it to loop.run

What is the scheduling principle of Python Asyncio What is the scheduling principle of Python Asyncio May 20, 2023 pm 02:31 PM

1. Basic introduction to Python. Asyncio is a large and comprehensive library that includes many functions. In addition to the three waitable objects, the logic related to core scheduling also has other functions, which are located in runners.py and base_event. py, event.py three files. The runners.py file has one main class - Runner. Its main responsibility is to complete the event loop for entering coroutine mode and wait for initialization, and to clean up coroutines, generators and other objects still in memory when exiting coroutine mode. . The coroutine mode is just for the convenience of understanding. For computers, there is no such distinction between event.py files except for storing EventLoop pairs.

See all articles