Home Backend Development Python Tutorial python scan proxy and how to get the available proxy IP example sharing

python scan proxy and how to get the available proxy IP example sharing

Aug 07, 2017 pm 03:36 PM
proxy python how

The following editor will bring you an example of python scanning proxy and obtaining the available proxy IP. The editor thinks it’s pretty good, so I’ll share it with you now and give it as a reference. Let’s follow the editor and take a look.

Today we will write a very practical tool, which is to scan and obtain available proxies

First of all , I first found a website on Baidu: www.xicidaili.com As an example

This website publishes many proxy IPs and ports available at home and abroad

We still proceed as usual. For analysis, let’s scan all domestic proxies first.

Click on the domestic part to review and find that the domestic proxy and directory are the following url:

www.xicidaili.com/nn/x

This x has almost more than 2,000 pages, so it seems that thread processing is required again. . .

As usual, we try to see if we can get the content directly with the simplest requests.get()

returns 503, then we add a simple headers

and return 200, OK

Okay, let’s first analyze the content of the web page and get the content we want

We found that the content containing IP information is within the tag, so we can easily Use bs to obtain the tag content

But we later found that the contents of ip, port, and protocol were in the 2nd, 3rd, and 6th tags of the extracted tags.

So we started to try to write, here is the writing idea:

When processing the page, we first extract the tr tag, and then add the tr tag to the page. Extracting the td tag in the tag

Therefore, two bs operations are used, and str processing is required when using the bs operation for the second time

Because after we obtain tr, we need 2 of them, Things No. 3 and 6,

But when we use the i output by a for loop, we cannot perform group operations

So we simply perform a second operation on the soup of each td separately Then directly extract 2,3,6

After extraction, just add .string to extract the content


r = requests.get(url = url,headers = headers)
 soup = bs(r.content,"html.parser")
 data = soup.find_all(name = 'tr',attrs = {'class':re.compile('|[^odd]')})
 for i in data:

  soup = bs(str(i),'html.parser')
  data2 = soup.find_all(name = 'td')
  ip = str(data2[1].string)
  port = str(data2[2].string)
  types = str(data2[5].string).lower() 

  proxy = {}
  proxy[types] = '%s:%s'%(ip,port)
Copy after login

In this way, we can get it every time in the loop Generate the corresponding proxy dictionary so that we can use the

dictionary to verify the IP availability. One thing to note here is that we have an operation to change the type to lowercase, because it is written in proxies in the get method. The protocol name should be in lowercase, and the webpage captures the content in uppercase, so a case conversion is performed

So what is the idea of ​​verifying the availability of the IP

It is very simple, we use get, add Go to our proxy and request the website:

http://1212.ip138.com/ic.asp

This is a magical website that can return your external network IP


url = 'http://1212.ip138.com/ic.asp'
r = requests.get(url = url,proxies = proxy,timeout = 6)
Copy after login

Here we need to add timeout to remove those agents that wait too long. I set it to 6 seconds

We use one IP Try and analyze the returned page

The returned content is as follows:


<html>

<head>

<meta xxxxxxxxxxxxxxxxxx>

<title> 您的IP地址 </title>

</head>

<body style="margin:0px"><center>您的IP是:[xxx.xxx.xxx.xxx] 来自:xxxxxxxx</center></body></html>
Copy after login

Then we only need to extract the [] The content can be

If our proxy is available, the proxy’s IP

will be returned (the returned address here will still be our local external network IP, although I am not the same) It's very clear, but I excluded this situation. The proxy should still be unavailable)

Then we can make a judgment. If the returned ip is the same as the ip in the proxy dictionary, then the ip is considered available. Agent and write it to the file

This is our idea. Finally, we can process the queue and threading threads

The code above:


#coding=utf-8

import requests
import re
from bs4 import BeautifulSoup as bs
import Queue
import threading 

class proxyPick(threading.Thread):
 def __init__(self,queue):
  threading.Thread.__init__(self)
  self._queue = queue

 def run(self):
  while not self._queue.empty():
   url = self._queue.get()

   proxy_spider(url)

def proxy_spider(url):
 headers = {
   .......
  }

 r = requests.get(url = url,headers = headers)
 soup = bs(r.content,"html.parser")
 data = soup.find_all(name = &#39;tr&#39;,attrs = {&#39;class&#39;:re.compile(&#39;|[^odd]&#39;)})

 for i in data:

  soup = bs(str(i),&#39;html.parser&#39;)
  data2 = soup.find_all(name = &#39;td&#39;)
  ip = str(data2[1].string)
  port = str(data2[2].string)
  types = str(data2[5].string).lower() 


  proxy = {}
  proxy[types] = &#39;%s:%s&#39;%(ip,port)
  try:
   proxy_check(proxy,ip)
  except Exception,e:
   print e
   pass

def proxy_check(proxy,ip):
 url = &#39;http://1212.ip138.com/ic.asp&#39;
 r = requests.get(url = url,proxies = proxy,timeout = 6)

 f = open(&#39;E:/url/ip_proxy.txt&#39;,&#39;a+&#39;)

 soup = bs(r.text,&#39;html.parser&#39;)
 data = soup.find_all(name = &#39;center&#39;)
 for i in data:
  a = re.findall(r&#39;\[(.*?)\]&#39;,i.string)
  if a[0] == ip:
   #print proxy
   f.write(&#39;%s&#39;%proxy+&#39;\n&#39;)
   print &#39;write down&#39;
   
 f.close()

#proxy_spider()

def main():
 queue = Queue.Queue()
 for i in range(1,2288):
  queue.put(&#39;http://www.xicidaili.com/nn/&#39;+str(i))

 threads = []
 thread_count = 10

 for i in range(thread_count):
  spider = proxyPick(queue)
  threads.append(spider)

 for i in threads:
  i.start()

 for i in threads:
  i.join()

 print "It&#39;s down,sir!"

if __name__ == &#39;__main__&#39;:
 main()
Copy after login

In this way we can write all the available proxy IPs provided on the website into the file ip_proxy.txt

The above is the detailed content of python scan proxy and how to get the available proxy IP example sharing. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1670
14
PHP Tutorial
1276
29
C# Tutorial
1256
24
PHP and Python: Different Paradigms Explained PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

Choosing Between PHP and Python: A Guide Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

How to run sublime code python How to run sublime code python Apr 16, 2025 am 08:48 AM

To run Python code in Sublime Text, you need to install the Python plug-in first, then create a .py file and write the code, and finally press Ctrl B to run the code, and the output will be displayed in the console.

PHP and Python: A Deep Dive into Their History PHP and Python: A Deep Dive into Their History Apr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Python vs. JavaScript: The Learning Curve and Ease of Use Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

Golang vs. Python: Performance and Scalability Golang vs. Python: Performance and Scalability Apr 19, 2025 am 12:18 AM

Golang is better than Python in terms of performance and scalability. 1) Golang's compilation-type characteristics and efficient concurrency model make it perform well in high concurrency scenarios. 2) Python, as an interpreted language, executes slowly, but can optimize performance through tools such as Cython.

Where to write code in vscode Where to write code in vscode Apr 15, 2025 pm 09:54 PM

Writing code in Visual Studio Code (VSCode) is simple and easy to use. Just install VSCode, create a project, select a language, create a file, write code, save and run it. The advantages of VSCode include cross-platform, free and open source, powerful features, rich extensions, and lightweight and fast.

How to run python with notepad How to run python with notepad Apr 16, 2025 pm 07:33 PM

Running Python code in Notepad requires the Python executable and NppExec plug-in to be installed. After installing Python and adding PATH to it, configure the command "python" and the parameter "{CURRENT_DIRECTORY}{FILE_NAME}" in the NppExec plug-in to run Python code in Notepad through the shortcut key "F6".

See all articles