Home Web Front-end HTML Tutorial 当python爬虫遇到10060错误_html/css_WEB-ITnose

当python爬虫遇到10060错误_html/css_WEB-ITnose

Jun 21, 2016 am 08:50 AM

相信做过网站爬虫工作的同学都知道,python的urllib2用起来很方便,使用以下几行代码就可以轻松拿到某个网站的源码:

#coding=utf-8import urllibimport urllib2import reurl = "http://wetest.qq.com"request = urllib2.Request(url)page = urllib2.urlopen(url)html = page.read()print html
Copy after login

最后通过一定的正则匹配,解析返回的响应内容即可拿到你想要的东东。

但这样的方式在办公网和开发网下,处理部分外网站点时则会行不通。

比如: http://tieba.baidu.com/p/2460150866 ,执行时一直报10060的错误码,提示连接失败。

#coding=utf-8import urllibimport urllib2import reurl = "http://tieba.baidu.com/p/2460150866"request = urllib2.Request(url)page = urllib2.urlopen(url)html = page.read()print html
Copy after login

执行后,错误提示截图如下:

为了分析这一问题的原因,撸主采用了如下过程:

1、在浏览器里输入,可以正常打开,说明该站点是可以访问的。

2、同样的脚本放在公司的体验网上运行OK,说明脚本本身没有问题。

通过以上两个步骤,初步判断是公司对于外网的访问策略限制导致的。于是查找了下如何给urllib2设置ProxyHandler代理 ,将代码修改为如下:

#coding=utf-8import urllibimport urllib2import re# The proxy address and port:proxy_info = { 'host' : 'web-proxy.oa.com','port' : 8080 }# We create a handler for the proxyproxy_support = urllib2.ProxyHandler({"http" : "http://%(host)s:%(port)d" % proxy_info})# We create an opener which uses this handler:opener = urllib2.build_opener(proxy_support)# Then we install this opener as the default opener for urllib2:urllib2.install_opener(opener)url = "http://tieba.baidu.com/p/2460150866"request = urllib2.Request(url)page = urllib2.urlopen(url)html = page.read()print html
Copy after login

再次运行,可以拿到所要的Html页面了。到这里就完了么?没有啊!撸主想拿到贴吧里的各种美图,保存在本地,上代码吧:

#coding=utf-8import urllibimport urllib2import re# The proxy address and port:proxy_info = { 'host' : 'web-proxy.oa.com','port' : 8080 }# We create a handler for the proxyproxy_support = urllib2.ProxyHandler({"http" : "http://%(host)s:%(port)d" % proxy_info})# We create an opener which uses this handler:opener = urllib2.build_opener(proxy_support)# Then we install this opener as the default opener for urllib2:urllib2.install_opener(opener)url = "http://tieba.baidu.com/p/2460150866"request = urllib2.Request(url)page = urllib2.urlopen(url)html = page.read()#正则匹配reg = r'src="(.+?\.jpg)" pic_ext'imgre = re.compile(reg)imglist = re.findall(imgre,html)print 'start dowload pic'x = 0for imgurl in imglist:urllib.urlretrieve(imgurl,'pic\\%s.jpg' % x)x = x+1
Copy after login

再次运行,发现还是有报错!尼玛!又是10060报错,我设置了urllib2的代理了啊,为啥还是报错!

于是撸主继续想办法,一定要想拿到贴吧里的各种美图。既然通过正则匹配可以拿到贴吧里的图片的url,为何不手动去调用urllib2.urlopen去打开对应的url,获得对应的response,然后read出对应的图片二进制数据,然后保存图片到本地文件。于是有了下面的代码:

#coding=utf-8import urllibimport urllib2import re# The proxy address and port:proxy_info = { 'host' : 'web-proxy.oa.com','port' : 8080 }# We create a handler for the proxyproxy_support = urllib2.ProxyHandler({"http" : "http://%(host)s:%(port)d" % proxy_info})# We create an opener which uses this handler:opener = urllib2.build_opener(proxy_support)# Then we install this opener as the default opener for urllib2:urllib2.install_opener(opener)url = "http://tieba.baidu.com/p/2460150866"request = urllib2.Request(url)page = urllib2.urlopen(url)html = page.read()#正则匹配reg = r'src="(.+?\.jpg)" pic_ext'imgre = re.compile(reg)imglist = re.findall(imgre,html)x = 0print 'start'for imgurl in imglist:print imgurlresp = urllib2.urlopen(imgurl)respHtml = resp.read()picFile = open('%s.jpg' % x, "wb")picFile.write(respHtml)picFile.close()x = x+1print 'done'
Copy after login

再次运行,发现图片的url按预期的打印出来,并且图片也被保存下来了:

至此,已完成撸主原先要做的目的。哈哈,希望总结的东东对其他小伙伴也有用。

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1670
14
PHP Tutorial
1276
29
C# Tutorial
1256
24
HTML: The Structure, CSS: The Style, JavaScript: The Behavior HTML: The Structure, CSS: The Style, JavaScript: The Behavior Apr 18, 2025 am 12:09 AM

The roles of HTML, CSS and JavaScript in web development are: 1. HTML defines the web page structure, 2. CSS controls the web page style, and 3. JavaScript adds dynamic behavior. Together, they build the framework, aesthetics and interactivity of modern websites.

The Future of HTML, CSS, and JavaScript: Web Development Trends The Future of HTML, CSS, and JavaScript: Web Development Trends Apr 19, 2025 am 12:02 AM

The future trends of HTML are semantics and web components, the future trends of CSS are CSS-in-JS and CSSHoudini, and the future trends of JavaScript are WebAssembly and Serverless. 1. HTML semantics improve accessibility and SEO effects, and Web components improve development efficiency, but attention should be paid to browser compatibility. 2. CSS-in-JS enhances style management flexibility but may increase file size. CSSHoudini allows direct operation of CSS rendering. 3.WebAssembly optimizes browser application performance but has a steep learning curve, and Serverless simplifies development but requires optimization of cold start problems.

HTML: Building the Structure of Web Pages HTML: Building the Structure of Web Pages Apr 14, 2025 am 12:14 AM

HTML is the cornerstone of building web page structure. 1. HTML defines the content structure and semantics, and uses, etc. tags. 2. Provide semantic markers, such as, etc., to improve SEO effect. 3. To realize user interaction through tags, pay attention to form verification. 4. Use advanced elements such as, combined with JavaScript to achieve dynamic effects. 5. Common errors include unclosed labels and unquoted attribute values, and verification tools are required. 6. Optimization strategies include reducing HTTP requests, compressing HTML, using semantic tags, etc.

The Future of HTML: Evolution and Trends in Web Design The Future of HTML: Evolution and Trends in Web Design Apr 17, 2025 am 12:12 AM

The future of HTML is full of infinite possibilities. 1) New features and standards will include more semantic tags and the popularity of WebComponents. 2) The web design trend will continue to develop towards responsive and accessible design. 3) Performance optimization will improve the user experience through responsive image loading and lazy loading technologies.

HTML vs. CSS vs. JavaScript: A Comparative Overview HTML vs. CSS vs. JavaScript: A Comparative Overview Apr 16, 2025 am 12:04 AM

The roles of HTML, CSS and JavaScript in web development are: HTML is responsible for content structure, CSS is responsible for style, and JavaScript is responsible for dynamic behavior. 1. HTML defines the web page structure and content through tags to ensure semantics. 2. CSS controls the web page style through selectors and attributes to make it beautiful and easy to read. 3. JavaScript controls web page behavior through scripts to achieve dynamic and interactive functions.

HTML vs. CSS and JavaScript: Comparing Web Technologies HTML vs. CSS and JavaScript: Comparing Web Technologies Apr 23, 2025 am 12:05 AM

HTML, CSS and JavaScript are the core technologies for building modern web pages: 1. HTML defines the web page structure, 2. CSS is responsible for the appearance of the web page, 3. JavaScript provides web page dynamics and interactivity, and they work together to create a website with a good user experience.

HTML: Is It a Programming Language or Something Else? HTML: Is It a Programming Language or Something Else? Apr 15, 2025 am 12:13 AM

HTMLisnotaprogramminglanguage;itisamarkuplanguage.1)HTMLstructuresandformatswebcontentusingtags.2)ItworkswithCSSforstylingandJavaScriptforinteractivity,enhancingwebdevelopment.

What is the difference between <strong>, <b> tags and <em>, <i> tags? What is the difference between <strong>, <b> tags and <em>, <i> tags? Apr 28, 2025 pm 05:42 PM

The article discusses the differences between HTML tags , , , and , focusing on their semantic vs. presentational uses and their impact on SEO and accessibility.

See all articles