Table of Contents

The knowledge and tools you need to know:

Exercise

3. Download the picture

Home

Backend Development

Python Tutorial

Python crawler [1] Download girl pictures in batches

PHPz

Apr 04, 2017 am 10:49 AM

python reptile

The girl pictures feature on Jiedan.com has very high-quality beautiesPictures, today I will share the method of using python to download these girl pictures in batches

The knowledge and tools you need to know:

#1 Required. Understand the basic syntax of python. For this article, you only need to know how to operate list, for...in..., and how to define functions. . Learn the functions of web crawling, analyzing and saving files as you use them

##2 Need to install the third-party library Beautif. ulSoup4. Using pip to install is a very convenient method. The latest version of python comes with the pip tool. Press the windows+x shortcut key under Windows to open the command prompt (administrator) and enter

pip install beautifulsoup4

Press Enter to run

Python crawler [1] Download girl pictures in batches

Successfully installed or something like that appears The prompt message indicates that the installation is complete.

No knowledge of HTML is required, but a browser for viewing source code and elements is still required, such as chr##. #ome and firefox. (If you don’t have pip, please search

how to install pip.)

1. Download webpage

Want to download two For all the images on more than a thousand web pages, you must first learn to download a web page:)

Exercise

The download URL is: jandan.net/ooxx/page-2397#comments. Use chrome or firefox browser. After opening, right-click the mouse to view the source code of the web page. The web pages we see are presented to us by the browser after parsing the source code written in html,

js, css, etc. The address is included in these source codes, so the first step is to download these html codes

Python crawler [1] Download girl pictures in batches #.

Python crawler [1] Download girl pictures in batches

Intercepted part of the code

Use python's own library urllib.request to download web pages. urllib.request is an extensible library that uses multiple protocols to access and open URLs.

import urllib.request

url = 'http://jandan.net/ooxx/page-2397#comments'

res = urllib.request.urlopen(url)

urllib.request.urlopen() What does this function do? As its name suggests, it can be used to open a url. It can accept either a str (what we passed) or a Request
object
. The return value of this

function

is always an object that can work like a context manager, and has its own methods such as geturl(), info(), getcode(), etc. In fact, we don’t have to worry about that much. We just need to remember that this function can accept a URL and then return us an object containing all the information of this URL. We operate on this object.

Now read out the html code in the res object and assign it to the

variable

html. Use the res.read() method.

html = res.read()

At this time, what is stored in html is the html source code!

Try
print

(html)

Python crawler [1] Download girl pictures in batches
##Intercepted part of the code.

At this time, you find that the result is different from the content that appears when you right-click the mouse - view the source code of the web page. It turns out that the return value of the read() method is n bytes... What the hell is this? Well, actually we can parse this return value and get the image address. But if you want to get the same html code as what you see in the browser, you can change the previous line of code to

html = res.read().decode('utf-8')

Then print(html)

Python crawler [1] Download girl pictures in batches

## Part of the code has been intercepted.

OK! Same, this is because the decode('utf-8') of read() can encode the return value of read() in utf-8. But we still use html = res.read() because it also contains the information we need.

So far we have only used 4 lines of python code to download and store the html code of the web page http://jandan.net/ooxx/page-2397#comments into the variable html. As follows:

import urllib.request

#Download webpage
url = 'http://jandan.net/ooxx/page-2397# comments'
res = urllib.request.urlopen(url)
html = res.read()

2. Parse the address

Next, use beautifulsoup4 to parse html.

How to determine where the html code corresponding to a certain picture is? Right click on the page - Inspect. At this time, the left half of the screen is the original web page, and the right half of the screen is html code and a bunch of functional

buttons.

Python crawler [1] Download girl pictures in batches

Elements There is a selection arrow on the left, click it and it turns blue , and then click on the picture in the web page on the left, you can see that part of the code in the html code on the right is automatically highlighted. This part of the code is the html code corresponding to this picture! This arrow is used to locate the code corresponding to an element in the web page.

Python crawler [1] Download girl pictures in batches

Look at this code carefully:

You can see The

src="//wx2.sinaimg.cn/mw600/66b3de17gy1fdrf0wcuscj20p60zktad.jpg" part is the address of this picture, and src is the source. The style after src is its style, don't worry about it. You can try it out at this time, add http: before src, visit http://wx2.sinaimg.cn/mw600/66b3de17gy1fdrf0wcuscj20p60zktad.jpg and you should be able to see the original picture.

So, the content corresponding to src is the image link address we need. Note that in the picture, src and image address link, style and

max-width are similar to key-value. This is related to the method used later to extract the address of the image.

Look at the codes corresponding to other pictures, you can see that their formats are the same, that is, they are all included in

Use BeautifulSoup() to parse html. In addition to passing in html, we also pass in a 'html.parser' parameter, which indicates that we want the BeautifulSoup() function to parse the variable html according to the parsing method of html. Parser means syntactic analysis.

soup = BeautifulSoup(html,'html.parser')

This line of code parses html into a soup object. We can easily operate on this object. For example, only extract the text content containing 'img':

result = soup.find_all('img')

Use the find_all() method.

print(result) You can see that the result is a list, and each element is a src-picture address key-value pair, but it contains

and other content we don’t need.

Python crawler [1] Download girl pictures in batches

## intercepted part of the code.

Use the get method to extract the address in double quotes and add http: at the beginning.

links=[]

for content in result:

links.app
end
('http:'+content .get('src'))
content.get('src') is to get the value corresponding to the key src in content, that is, the address in double quotes.

links.append() is a common method of adding elements to a list.

print(links) You can see that each element in this list is the original image address in double quotes. As shown below:

Python crawler [1] Download girl pictures in batches

#Intercepted part of the code

Use a browser to open any address and you can see the corresponding picture! YO! This means we’re just down to the final step, download them!

The address extraction part is completed. The code is also quite concise, as follows:

#Parse web pages

from bs4 import BeautifulSoup

soup = BeautifulSoup(html,'html.parser')

result = soup. find_all('img')

links=[]

for content in result:

links.append('http:'+content.get('src') )

3. Download the picture

The last step is to visit the addresses in the links in sequence and download the picture!

At the beginning

import os

First create a photo folder to store the downloaded pictures. The following code will create the photo folder in this program. py file is located.

if not os.path.exists('photo'):

os.makedirs('photo')

We know that links are a list, so it is best to use loop to download, name and store them one by one.

i=0

for link in links:

i+=1

filename ='photo\\'+'photo'+str(i)+'.png'

with open(filename,'w ') as file:

urllib.request.urlretrieve(link,filename)

i is the loop variable, i+=1 is the statement to control the loop.

filename names the picture, but it actually creates a file with this name first and then writes the picture into it. As can be seen from the assignment statement of filename, 'photo\\' indicates that it is located in the photo folder, and the following 'photo'+str(i) is for order. After the complete download is complete, it will look like photo1, photo2, and photo3. It feels like '.png' is the suffix. Using the + sign to connect strings is also a common practice in python.

With these two lines of statements, get the image pointed to by the address in the link locally, and then store it in filename.

open(filename,'w'), open the folder filename, 'w' means the opening method is write. That is to say, open() accepts two parameters here, one is the file name (file path), and the other is the opening method.

The function of urllib.request.urlretrieve(link,filename) is to access the link link, and then retrieve a copy and put it into filename.

After writing part 3, click Run! You can find the photo folder in the path where the .py file is located, which is full of the pictures we downloaded~

Python crawler [1] Download girl pictures in batches

The complete code is as follows:

import urllib.request

from bs4 import BeautifulSoup

import os

#Download webpage

url = 'http://jandan.net/ooxx/page-2397#comments'

res = urllib.request.urlopen(url)

html = res.read()

#Parsing web pages

soup = BeautifulSoup(html,'html.parser')

result = soup.find_all('img ')

links=[]

for content in result:

links.append('http:'+content.get('src'))

#Download and store pictures

if not os.path.exists('photo'):

os.makedirs('photo')

i=0

for link in links:

i+=1

filename ='photo\\'+'photo'+str(i)+'.png'

with open(filename,'w') as file:

urllib.request.urlretrieve(link,filename)

This small program is written in a process-oriented way. From top to bottom, there are no functions defined. This may be easier for newbies to understand.

Link to girl picture

http://jandan.net/ooxx/page-2397#comments Only the middle number will change between 1-2XXX.

url = 'http://jandan.net/ooxx/page-'+str(i)+'#comments'

Just change the value of i Downloaded in batches. However, some comments say that frequent visits to this website may result in your IP being blocked. I don’t understand this, so please try it yourself!

The above is the detailed content of Python crawler [1] Download girl pictures in batches. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

How to fix KB5055612 fails to install in Windows 10?

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1668

CakePHP Tutorial

1426

Laravel Tutorial

1328

PHP Tutorial

1273

C# Tutorial

1255

Related knowledge

PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

How to run sublime code python Apr 16, 2025 am 08:48 AM

To run Python code in Sublime Text, you need to install the Python plug-in first, then create a .py file and write the code, and finally press Ctrl B to run the code, and the output will be displayed in the console.

PHP and Python: A Deep Dive into Their History Apr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

Golang vs. Python: Performance and Scalability Apr 19, 2025 am 12:18 AM

Golang is better than Python in terms of performance and scalability. 1) Golang's compilation-type characteristics and efficient concurrency model make it perform well in high concurrency scenarios. 2) Python, as an interpreted language, executes slowly, but can optimize performance through tools such as Cython.

Where to write code in vscode Apr 15, 2025 pm 09:54 PM

Writing code in Visual Studio Code (VSCode) is simple and easy to use. Just install VSCode, create a project, select a language, create a file, write code, save and run it. The advantages of VSCode include cross-platform, free and open source, powerful features, rich extensions, and lightweight and fast.

How to run python with notepad Apr 16, 2025 pm 07:33 PM

Running Python code in Notepad requires the Python executable and NppExec plug-in to be installed. After installing Python and adding PATH to it, configure the command "python" and the parameter "{CURRENT_DIRECTORY}{FILE_NAME}" in the NppExec plug-in to run Python code in Notepad through the shortcut key "F6".

See all articles