一个采集得到信息不全的有关问题
求助一个采集得到信息不全的问题
我要采集这个网站
http://www.tvmao.com/drama/MGxYWA==/episode/0
刚开始的时候,得到的信息是全的,
当采集到一定时候的时候,采集得到的信息只有半了,少了一些文字。
(我然后拿到其它地方用IE打开看的时候,发现先加载了一半文字,过一小会,在加载一半的文字)
(用本地浏览器打开,只有一半的文字)
还请问一下,怎么处理一下。才能获取全部信息。
------解决方案--------------------
有可能这个网站作了防采集处理,同一IP如果访问过频,针对此IP就启动防采集了,这也符合你说的刚开始可以完整采集,时间一长就不行的情况。不过这个还好了,有的网站变态到每次1K字节的间隔输出呢
------解决方案--------------------
------解决方案--------------------
防止采集:
1:用户登录才能访问网站内容
2:利用脚本语言做分页(隐藏分页)
3:防盗链办法(只许可通过本站页面连接查看,如:Request.ServerVariables(“HTTP_REFERER“) )
4:全flash、图片或者pdf来浮现网站内容
5:网站随机接纳不同模版
6:接纳动态不规则的html标签
一旦要同时搜索引擎爬虫和采集器,这是很让人无奈的工作,因为搜索引擎第一步就是采集目标网页内容,这跟采集器原理同样,所以很多防止采集的方法同时也阻碍了搜索引擎对网站的收录,无奈,是吧?以上10条建议虽然不能百分之百防采集,可是几种方法一起适用已经拒绝了一大部分采集器了。

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











HTTP status code 520 means that the server encountered an unknown error while processing the request and cannot provide more specific information. Used to indicate that an unknown error occurred when the server was processing the request, which may be caused by server configuration problems, network problems, or other unknown reasons. This is usually caused by server configuration issues, network issues, server overload, or coding errors. If you encounter a status code 520 error, it is best to contact the website administrator or technical support team for more information and assistance.

HTTP status code 403 means that the server rejected the client's request. The solution to http status code 403 is: 1. Check the authentication credentials. If the server requires authentication, ensure that the correct credentials are provided; 2. Check the IP address restrictions. If the server has restricted the IP address, ensure that the client's IP address is restricted. Whitelisted or not blacklisted; 3. Check the file permission settings. If the 403 status code is related to the permission settings of the file or directory, ensure that the client has sufficient permissions to access these files or directories, etc.

Understand the meaning of HTTP 301 status code: common application scenarios of web page redirection. With the rapid development of the Internet, people's requirements for web page interaction are becoming higher and higher. In the field of web design, web page redirection is a common and important technology, implemented through the HTTP 301 status code. This article will explore the meaning of HTTP 301 status code and common application scenarios in web page redirection. HTTP301 status code refers to permanent redirect (PermanentRedirect). When the server receives the client's

HTTP Status Code 200: Explore the Meaning and Purpose of Successful Responses HTTP status codes are numeric codes used to indicate the status of a server's response. Among them, status code 200 indicates that the request has been successfully processed by the server. This article will explore the specific meaning and use of HTTP status code 200. First, let us understand the classification of HTTP status codes. Status codes are divided into five categories, namely 1xx, 2xx, 3xx, 4xx and 5xx. Among them, 2xx indicates a successful response. And 200 is the most common status code in 2xx

How to use NginxProxyManager to implement automatic jump from HTTP to HTTPS. With the development of the Internet, more and more websites are beginning to use the HTTPS protocol to encrypt data transmission to improve data security and user privacy protection. Since the HTTPS protocol requires the support of an SSL certificate, certain technical support is required when deploying the HTTPS protocol. Nginx is a powerful and commonly used HTTP server and reverse proxy server, and NginxProxy

Use the http.PostForm function to send a POST request with form data. In the http package of the Go language, you can use the http.PostForm function to send a POST request with form data. The prototype of the http.PostForm function is as follows: funcPostForm(urlstring,dataurl.Values)(resp*http.Response,errerror)where, u

Quick Application: Practical Development Case Analysis of PHP Asynchronous HTTP Download of Multiple Files With the development of the Internet, the file download function has become one of the basic needs of many websites and applications. For scenarios where multiple files need to be downloaded at the same time, the traditional synchronous download method is often inefficient and time-consuming. For this reason, using PHP to download multiple files asynchronously over HTTP has become an increasingly common solution. This article will analyze in detail how to use PHP asynchronous HTTP through an actual development case.

How to use the urllib.request.urlopen() function in Python3.x to send a GET request. In network programming, we often need to obtain data from a remote server by sending an HTTP request. In Python, we can use the urllib.request.urlopen() function in the urllib module to send an HTTP request and get the response returned by the server. This article will introduce how to use
