Home Backend Development PHP Tutorial 一个采集得到信息不全的有关问题

一个采集得到信息不全的有关问题

Jun 13, 2016 am 10:17 AM
http referer request

求助一个采集得到信息不全的问题
我要采集这个网站
http://www.tvmao.com/drama/MGxYWA==/episode/0

刚开始的时候,得到的信息是全的,

当采集到一定时候的时候,采集得到的信息只有半了,少了一些文字。

(我然后拿到其它地方用IE打开看的时候,发现先加载了一半文字,过一小会,在加载一半的文字)
(用本地浏览器打开,只有一半的文字)
还请问一下,怎么处理一下。才能获取全部信息。
















------解决方案--------------------
有可能这个网站作了防采集处理,同一IP如果访问过频,针对此IP就启动防采集了,这也符合你说的刚开始可以完整采集,时间一长就不行的情况。不过这个还好了,有的网站变态到每次1K字节的间隔输出呢
------解决方案--------------------

探讨

这样啊,我该怎么做一下,才能不被防采集呢?
引用:

有可能这个网站作了防采集处理,同一IP如果访问过频,针对此IP就启动防采集了,这也符合你说的刚开始可以完整采集,时间一长就不行的情况。不过这个还好了,有的网站变态到每次1K字节的间隔输出呢

------解决方案--------------------
防止采集:
1:用户登录才能访问网站内容
2:利用脚本语言做分页(隐藏分页)
3:防盗链办法(只许可通过本站页面连接查看,如:Request.ServerVariables(“HTTP_REFERER“) )
4:全flash、图片或者pdf来浮现网站内容
5:网站随机接纳不同模版
6:接纳动态不规则的html标签
一旦要同时搜索引擎爬虫和采集器,这是很让人无奈的工作,因为搜索引擎第一步就是采集目标网页内容,这跟采集器原理同样,所以很多防止采集的方法同时也阻碍了搜索引擎对网站的收录,无奈,是吧?以上10条建议虽然不能百分之百防采集,可是几种方法一起适用已经拒绝了一大部分采集器了。
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1670
14
PHP Tutorial
1274
29
C# Tutorial
1256
24
What does http status code 520 mean? What does http status code 520 mean? Oct 13, 2023 pm 03:11 PM

HTTP status code 520 means that the server encountered an unknown error while processing the request and cannot provide more specific information. Used to indicate that an unknown error occurred when the server was processing the request, which may be caused by server configuration problems, network problems, or other unknown reasons. This is usually caused by server configuration issues, network issues, server overload, or coding errors. If you encounter a status code 520 error, it is best to contact the website administrator or technical support team for more information and assistance.

What is http status code 403? What is http status code 403? Oct 07, 2023 pm 02:04 PM

HTTP status code 403 means that the server rejected the client's request. The solution to http status code 403 is: 1. Check the authentication credentials. If the server requires authentication, ensure that the correct credentials are provided; 2. Check the IP address restrictions. If the server has restricted the IP address, ensure that the client's IP address is restricted. Whitelisted or not blacklisted; 3. Check the file permission settings. If the 403 status code is related to the permission settings of the file or directory, ensure that the client has sufficient permissions to access these files or directories, etc.

Understand common application scenarios of web page redirection and understand the HTTP 301 status code Understand common application scenarios of web page redirection and understand the HTTP 301 status code Feb 18, 2024 pm 08:41 PM

Understand the meaning of HTTP 301 status code: common application scenarios of web page redirection. With the rapid development of the Internet, people's requirements for web page interaction are becoming higher and higher. In the field of web design, web page redirection is a common and important technology, implemented through the HTTP 301 status code. This article will explore the meaning of HTTP 301 status code and common application scenarios in web page redirection. HTTP301 status code refers to permanent redirect (PermanentRedirect). When the server receives the client's

HTTP 200 OK: Understand the meaning and purpose of a successful response HTTP 200 OK: Understand the meaning and purpose of a successful response Dec 26, 2023 am 10:25 AM

HTTP Status Code 200: Explore the Meaning and Purpose of Successful Responses HTTP status codes are numeric codes used to indicate the status of a server's response. Among them, status code 200 indicates that the request has been successfully processed by the server. This article will explore the specific meaning and use of HTTP status code 200. First, let us understand the classification of HTTP status codes. Status codes are divided into five categories, namely 1xx, 2xx, 3xx, 4xx and 5xx. Among them, 2xx indicates a successful response. And 200 is the most common status code in 2xx

How to use Nginx Proxy Manager to implement automatic jump from HTTP to HTTPS How to use Nginx Proxy Manager to implement automatic jump from HTTP to HTTPS Sep 26, 2023 am 11:19 AM

How to use NginxProxyManager to implement automatic jump from HTTP to HTTPS. With the development of the Internet, more and more websites are beginning to use the HTTPS protocol to encrypt data transmission to improve data security and user privacy protection. Since the HTTPS protocol requires the support of an SSL certificate, certain technical support is required when deploying the HTTPS protocol. Nginx is a powerful and commonly used HTTP server and reverse proxy server, and NginxProxy

Send POST request with form data using http.PostForm function Send POST request with form data using http.PostForm function Jul 25, 2023 pm 10:51 PM

Use the http.PostForm function to send a POST request with form data. In the http package of the Go language, you can use the http.PostForm function to send a POST request with form data. The prototype of the http.PostForm function is as follows: funcPostForm(urlstring,dataurl.Values)(resp*http.Response,errerror)where, u

Quick Application: Practical Development Case Analysis of PHP Asynchronous HTTP Download of Multiple Files Quick Application: Practical Development Case Analysis of PHP Asynchronous HTTP Download of Multiple Files Sep 12, 2023 pm 01:15 PM

Quick Application: Practical Development Case Analysis of PHP Asynchronous HTTP Download of Multiple Files With the development of the Internet, the file download function has become one of the basic needs of many websites and applications. For scenarios where multiple files need to be downloaded at the same time, the traditional synchronous download method is often inefficient and time-consuming. For this reason, using PHP to download multiple files asynchronously over HTTP has become an increasingly common solution. This article will analyze in detail how to use PHP asynchronous HTTP through an actual development case.

How to use the urllib.request.urlopen() function to send a GET request in Python 3.x How to use the urllib.request.urlopen() function to send a GET request in Python 3.x Jul 30, 2023 am 11:28 AM

How to use the urllib.request.urlopen() function in Python3.x to send a GET request. In network programming, we often need to obtain data from a remote server by sending an HTTP request. In Python, we can use the urllib.request.urlopen() function in the urllib module to send an HTTP request and get the response returned by the server. This article will introduce how to use

See all articles