Home Backend Development PHP Tutorial Detailed explanation on how to collect historical message pages of WeChat public accounts

Detailed explanation on how to collect historical message pages of WeChat public accounts

Jul 07, 2018 pm 05:48 PM
WeChat public account

I will explain to you how to obtain information on the entry history message page collected from WeChat public account articles. Friends in need may refer to this content.

Collecting WeChat articles is the same as collecting website content. You need to start from a list page. The list page of WeChat articles is the view history message page in the official account. Many other WeChat collectors on the Internet now use Sogou to search. Although the collection method is much simpler, the content is incomplete. Therefore, we still have to collect it from the most standard and comprehensive public account history message page.

Due to the limitations of WeChat, the link we can copy is incomplete and cannot be opened in the browser to see the content. Therefore, we need to use anyproxy to obtain the link address of a complete WeChat public account historical message page through the method introduced in the previous article.

http://mp.weixin.qq.com/mp/getmasssendmsg?__biz=MjM5NDAwMTA2MA==&uin=NzM4MTk1ODgx&key=bf9387c4d02682e186a298a18276d8e0555e3ab51d81ca46de339e6082eb767343 bef610edd80c9e1bfda66c2b62751511f7cc091a33a029709e94f0d1604e11220fc099a27b2e2d29db75cc0849d4bf&devicetype=android-17&version=26031c34&lang=zh_CN&nettype=WIFI&as cene=3&pass_ticket=Iox5ZdpRhrSxGYEeopVJwTBP7kZj51GYyEL24AT5Zyx+BoEMdPDBtOun1F/9ENSz&wx_header =1

As mentioned in the previous article, the biz parameter is the ID of the official account, and uin is the user's ID. Currently, uin is unique among all official accounts. The other two important parameters key and pass_ticket are supplementary parameters on the WeChat client.

So before this address expires, we can get the article list of historical messages by viewing the original text with a browser. If we want to automatically analyze the content, we can also make a program to add this address with the address that has not yet expired. Submit the link address of key and pass_ticket, and then obtain the article list through a php program, for example.

Recently, a friend told me that his collection target is a single public account. I think this makes it unnecessary to use the batch collection method written in the previous article. So let's take a look at how to get the article list in the historical message page. By analyzing the article list, we can get all the content link addresses of this official account, and then collect the content.

If the certificate is configured correctly in the anyproxy web interface, the https content can be displayed. The address of the web interface is http://localhost:8002, where localhost can be replaced with your own IP address or domain name. Find the record starting with getmasssendmsg from the list. After clicking it, the details of this record will be displayed on the right side:

The red box part is the complete link address. WeChat public After the domain name of the platform is spliced ​​in front, it can be opened in the browser.

Then pull the page down to the end of the html content. We can see a json variable that is a list of historical news articles:

We copy the variable value of msgList and analyze it with the json formatting tool. We can see that the json has the following structure:

{
  "list": [
    {
      "app_msg_ext_info": {
        "author": "",
        "content": "",
        "content_url": "http://mp.weixin.qq.com/s?__biz=MzA5MzEzNDg3MQ==&mid=2652767427&idx=1&sn=37da0d7208283bf90e9a4a536e0af0ea&chksm=8b882dbbbcffa4ad2f0b8a141cc988d16bace564274018e68e5c53ee6f354f8ad56c9b98bade&scene=4#wechat_redirect",
        "copyright_stat": 100,
        "cover": "http://mmbiz.qpic.cn/mmbiz/MofBAcBsJ6X0xGrQ2XK5yQjzwb2eswxkRNBTgLtcqGziaFqwibzvtZAHCDkMeJU1fGZHpjoeibanPJ8rziaq68Akkg/0?wx_fmt=jpeg",
        "digest": "擦亮双眼,远离谣言。",
        "fileid": 505283695,
        "is_multi": 1,
        "multi_app_msg_item_list": [
          {
            "author": "",
            "content": "",
            "content_url": "http://mp.weixin.qq.com/s?__biz=MzA5MzEzNDg3MQ==&mid=2652767427&idx=2&sn=449ef1a874a37fed2429e14f724b56ef&chksm=8b882dbbbcffa4ade48a7932cda4263687e34fca8ea3a5a6233d2589d448b9f6130d3890ce93&scene=4#wechat_redirect",
            "copyright_stat": 100,
            "cover": "http://mmbiz.qpic.cn/mmbiz_png/MofBAcBsJ6XyaIn0qEDSSicBUBZbMYHYrhibia89ZnksCsUiaia2TLI1fyqjclibGa1hw3icP6oXeSpaWMjiabaghHl7yw/0?wx_fmt=png",
            "digest": "12月28日,广州亚运城综合体育馆,内附购票入口~",
            "fileid": 0,
            "source_url": "http://wechat.show.wepiao.com/detail/ff764b0731b7465db03b56b998e1f2b8?detailReferrer=1&from=groupmessage&isappinstalled=0",
            "title": "2017微信公开课Pro版即将召开"
          },
         ...//循环被省略
        ],
        "source_url": "",
        "subtype": 9,
        "title": "谣言热榜 | 十一月朋友圈十大谣言"
      },
      "comm_msg_info": {
        "content": "",
        "datetime": 1480933315,
        "fakeid": "3093134871",
        "id": 1000000010,
        "status": 2,
        "type": 49 //类型为49的时候是图文消息
      }
    },
   ...//循环被省略
  ]
}
Copy after login

Briefly analyze this json (only some important information is introduced here, others are omitted):

"list": [ //最外层的键名;只出现一次,所有内容都被它包含。
  {//这个大阔号之内是一条多图文或单图文消息,通俗的说就是一天的群发都在这里
    "app_msg_ext_info":{//图文消息的扩展信息
      "content_url": "图文消息的链接地址",
      "cover": "封面图片",
      "digest": "摘要",
      "is_multi": "是否多图文,值为1和0",
      "multi_app_msg_item_list": [//这里面包含的是从第二条开始的图文消息,如果is_multi=0,这里将为空
        {
          "content_url": "图文消息的链接地址",
          "cover": "封面图片",
          "digest": ""摘要"",
          "source_url": "阅读原文的地址",
          "title": "子内容标题"
        },
        ...//循环被省略
      ],
      "source_url": "阅读原文的地址",
      "title": "头条标题"
    },
    "comm_msg_info":{//图文消息的基本信息
      "datetime": '发布时间,值为unix时间戳',
      "type": 49 //类型为49的时候是图文消息
    }
  },
  ...//循环被省略
]
Copy after login

One more thing to mention here is that if you want To obtain the content of historical messages that are older, you need to pull the page down on your mobile phone or simulator. When you pull it to the bottom, WeChat will automatically read the content of the next page. The link address of the next page and the link address of the historical message page are also addresses starting with getmasssendmsg. But the content is only json, not html. Just parse json directly.

At this time, you can use the method introduced in the previous article to use anyproxy to match the msgList variable value regularly, submit it to the server asynchronously, and then use php's json_decode from the server to parse the json into an array. Then loop through the array. We can get the title and link address of each article.

If you only need to collect the content of a single public account, you can obtain the complete link address with key and pass_ticket through anyproxy after sending in bulk every day. Then make a program yourself and manually submit the address to your program. Use a language such as php to regularly match msgList and then parse json. In this way, there is no need to modify the rules of anyproxy, and there is no need to create a collection queue and jump page.

Related recommendations:

Explanation of the method of implementing radix sorting in PHP

How PHP implements automatic dependency injection based on the reflection mechanism Explanation

PHP ongoing-detailed explanation of variables and dynamic string insertion of variables

The above is the detailed content of Detailed explanation on how to collect historical message pages of WeChat public accounts. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Scrapy implements crawling and analysis of WeChat public account articles Scrapy implements crawling and analysis of WeChat public account articles Jun 22, 2023 am 09:41 AM

Scrapy implements article crawling and analysis of WeChat public accounts. WeChat is a popular social media application in recent years, and the public accounts operated in it also play a very important role. As we all know, WeChat public accounts are an ocean of information and knowledge, because each public account can publish articles, graphic messages and other information. This information can be widely used in many fields, such as media reports, academic research, etc. So, this article will introduce how to use the Scrapy framework to crawl and analyze WeChat public account articles. Scr

What are the differences between WeChat official account certification and non-certification? What are the differences between WeChat official account certification and non-certification? Sep 19, 2023 pm 02:15 PM

The difference between WeChat public account authentication and non-authentication lies in the authentication logo, function permissions, push frequency, interface permissions and user trust. Detailed introduction: 1. Certification logo. Certified public accounts will obtain the official certification logo, which is the blue V logo. This logo can increase the credibility and authority of the public account and make it easier for users to identify the real official public account; 2. Function permissions. Certified public accounts have more functions and permissions than uncertified public accounts. For example, certified public accounts can apply to activate the WeChat payment function to achieve online payment and commercial operations, etc.

Practical crawler combat in Python: WeChat public account crawler Practical crawler combat in Python: WeChat public account crawler Jun 10, 2023 am 09:01 AM

Python is an elegant programming language with powerful data processing and web crawling capabilities. In this digital era, the Internet is filled with a large amount of data, and crawlers have become an important means of obtaining data. Therefore, Python crawlers are widely used in data analysis and mining. In this article, we will introduce how to use Python crawler to obtain WeChat public account article information. WeChat official account is a popular social media platform for publishing articles online and is an important tool for promotion and marketing of many companies and self-media.

How to use Laravel to develop an online ordering system based on WeChat public account How to use Laravel to develop an online ordering system based on WeChat public account Nov 02, 2023 am 09:42 AM

How to use Laravel to develop an online ordering system based on WeChat official accounts. With the widespread use of WeChat official accounts, more and more companies are beginning to use them as an important channel for online marketing. In the catering industry, developing an online ordering system based on WeChat public accounts can improve the efficiency and sales of enterprises. This article will introduce how to use the Laravel framework to develop such a system and provide specific code examples. Project preparation First, you need to ensure that the Laravel framework has been installed in the local environment. OK

Use PHP to build a WeChat public account API interface Use PHP to build a WeChat public account API interface May 13, 2023 pm 12:01 PM

In today's Internet era, WeChat official accounts have become an important marketing channel for more and more companies. If you want your WeChat official account to implement more functions, you often need to write corresponding interfaces. This article will use PHP language as an example to introduce how to build a WeChat public account API interface. 1. Preparation Before writing the WeChat public account API interface, the developer needs to have a WeChat public account and apply for developer interface permissions in the WeChat public platform. After the application is successful, you can obtain the relevant developer AppID and AppSe

Can the official account only post one article per day? Can the official account only post one article per day? Jun 16, 2023 pm 02:04 PM

The public account can not only post one article per day, but can publish up to eight articles at a time. How to publish multiple articles: 1. Click "Material Management" on the left, and then click "New Graphic and Text Material" to start editing. First article; 2. After editing the first article, click the + sign under the first article on the left and click "Graphic Message" to edit the second article; 3. After finishing multiple images and text, click " Save and send in bulk" to complete the publishing of multiple articles.

Build WeChat public account application using Go language framework Build WeChat public account application using Go language framework Jun 04, 2023 am 10:40 AM

With the popularity of the Internet and the widespread use of mobile devices, WeChat official accounts have become an indispensable part of corporate marketing. Through WeChat public accounts, companies can easily interact with users, promote products and services, and increase brand awareness. In order to better develop WeChat public account applications, more and more developers and companies choose to use Go language to build WeChat public account applications. Go language is a programming language developed by Google. Its syntax is concise and suitable for building high-performance, high-concurrency real-time applications. In terms of ease of use and

PHP and WeChat public account development guide PHP and WeChat public account development guide Jun 11, 2023 pm 03:31 PM

With the gradual popularity of WeChat public accounts in social networks, more and more developers have begun to get involved in the field of WeChat public account development. Among them, PHP, as a common back-end programming language, has also begun to be widely used in the development of WeChat public accounts. This article will introduce the basic knowledge and common techniques of PHP in WeChat public account development. 1. Basics of PHP and WeChat public account development WeChat public account development WeChat public account refers to an Internet application based on the WeChat platform, which can provide users with different types of services and content, such as information push

See all articles