Home Backend Development Python Tutorial Python crawler implementation code example for taking names

Python crawler implementation code example for taking names

May 10, 2017 am 11:42 AM
python reptile

Everyone will encounter something in their life. They will not care about it before it appears, but once it comes, they will find that it is extremely important and require a major decision to be made in a short period of time. That is for yourself. Give your newborn baby a name. The following article mainly introduces how to use Python crawler to give your child a good name. Friends in need can refer to it.

Preface

I believe every parent has experienced it, because it is necessary to name the child within two weeks after birth (you need to apply for a birth certificate), which is estimated to be a lot. Everyone is like me. I was very confused at first. Although I felt that there are so many Chinese characters, I could just find any character to make a name. But later I realized that it was really not a casual thing. No matter how I thought about it, I found that it was inappropriate, so I looked around in dictionaries and online. I search and read Tang and Song Dynasty poems, the Book of Songs, and even martial arts novels. However, the name I have been thinking about for a long time often encounters the opinions and objections of my family members, such as problems such as difficulty in speaking, the same accent as the name of relatives, etc., so I fall into a cycle of repeated searches and denials. The cycle becomes more and more confusing.

So we went back to the Internet again and searched various , and found many articles on the Internet such as "A complete list of good baby boy names". These articles gave hundreds of articles at once. Thousands of names are too dizzying to use. There are many websites or apps that test names. When you enter a name, you can get a rating of eight characters or five characters. This function is quite good and can be used as a reference. However, we either need to input names one by one for testing, or These websites or APPs have very few names, or they cannot meet our needs such as qualifying words, or they start charging, and in the end we can't find any useful ones.

So I want to make a program like this:

  1. The main function is to provide reference for batch names. These names are combined with the baby's Calculated from birth date and horoscope;

  2. You can expand the name library. For example, if you find a batch of good names in the Book of Songs on the Internet and want to see how they are, you can add them and use them;

  3. You can limit the characters used in the name. For example, some family trees have restrictions. If the current generation is "国", the name must have the character "国";

  4. The name list can be given scores, so that after inversion, you can look at the names from high scores to low scores;

In this way, you can get a copy There is a list of names that match your child's birth date, your family tree restrictions, and your preferences, and the list has given scores for reference. Based on this, we can figure it out one by one to find the name we like. Of course, if you have new ideas, you can add new names to the vocabulary at any time and recalculate.

Code structure of the program

Code introduction:

  • /chinese-name-score Code root directory

  • /chinese-name-score/main Code directory

  • /chinese-name-score/main/dicts Dictionary file directory

  • /chinese-name-score/main/dicts/names_boys_double.txt Dictionary file, two-letter names for boys

  • /chinese-name-score/main/dicts/names_boys_single.txt Dictionary file, single-letter names for boys

  • /chinese-name-score/ main/dicts/names_girls_single.txt Dictionary file, two-letter names for girls

  • /chinese-name-score/main/dicts/names_grils_double.txt Dictionary file, one-letter names for girls

  • /chinese-name-score/main/outputs Output data directory

  • /chinese-name-score/main/outputs/names_girls_source_wxy.txt Output Sample files

  • /chinese-name-score/main/scripts Some scripts for preprocessing dictionary files

  • /chinese-name -score/main/scripts/unique_file_lines.py Set the dictionary file to remove duplicate names and blank lines in the dictionary

  • ##/chinese-name -score/main/sys_config.py The system configuration of the program, including the crawled target URL and dictionary file path

  • /chinese-name-score/main/user_config.py The user configuration of the program , including the baby’s age, month, day, time, gender and other settings

  • /chinese-name-score/main/get_name_score.py Program running entrance


How to use the code:

  1. If there are no qualified words, find the dictionary files names_boys_double.txt and names_grils_double.txt, you can add yourself here For some name lists found, just split them by line and add them at the end;

  2. If there are qualified words, find the dictionary files names_boys_single.txt and names_girls_single.txt, and add your favorites here. A single word list can be divided by line and added at the end;

  3. Open user_config.py and configure it. See the next section for configuration items;

  4. Run the script get_name_score.py

  5. In the outputs directory, view your own output files, which can be copied to Excel for sorting and other operations;

Program The configuration entry

The configuration of the program is as follows:

# coding:GB18030
 
"""
在这里写好配置
"""
 
setting = {}
 
# 限定字,如果配置了该值,则会取用单字字典,否则取用多字字典
setting["limit_world"] = "国"
# 姓
setting["name_prefix"] = "李"
# 性别,取值为 男 或者 女
setting["sex"] = "男"
# 省份
setting["area_province"] = "北京"
# 城市
setting["area_region"] = "海淀"
# 出生的公历年份
setting['year'] = "2017"
# 出生的公历月份
setting['month'] = "1"
# 出生的公历日子
setting['day'] = "11"
# 出生的公历小时
setting['hour'] = "11"
# 出生的公历分钟
setting['minute'] = "11"
# 结果产出文件名称
setting['output_fname'] = "names_girls_source_xxx.txt"
Copy after login

According to the configuration item setting["limit_world"] , the system automatically determines whether to use a single-word dictionary or a multi-word dictionary Dictionary:

  1. If this item is set, for example, if it is equal to "国", then the program will combine all the words into names for calculation. For example, both the names Guohao and Haoguo will be calculated;

  2. If you do not set this item and keep it empty String, the program will only read the double-word dictionary of *_double.txt

Principle of the program

This is a simple crawler. You can open the life.httpcn.com/xingming.asp website to view. This is a POST form. Fill in the required parameters and click submit. A results page will open. The bottom of the results page contains the eight-character score and the five-frame score.

If you want to get scores, you need to do two things. One is to automatically submit the form to the crawler and get the results page; the other is to extract the scores from the results page;

For the first thing, it is very simple , urllib2 can achieve it (the code is in /chinese-name-score/main/get_name_score.py):

 post_data = urllib.urlencode(params)
 req = urllib2.urlopen(sys_config.REQUEST_URL, post_data)
 content = req.read()
Copy after login

The params here is a parameter dict. In this way, POST with data is submitted. Then the result data was obtained from content.

The parameters of params are set as follows:

 params = {}
 
 # 日期类型,0表示公历,1表示农历
 params['data_type'] = "0"
 params['year'] = "%s" % str(user_config.setting["year"])
 params['month'] = "%s" % str(user_config.setting["month"])
 params['day'] = "%s" % str(user_config.setting["day"])
 params['hour'] = "%s" % str(user_config.setting["hour"])
 params['minute'] = "%s" % str(user_config.setting["minute"])
 params['pid'] = "%s" % str(user_config.setting["area_province"])
 params['cid'] = "%s" % str(user_config.setting["area_region"])
 # 喜用五行,0表示自动分析,1表示自定喜用神
 params['wxxy'] = "0"
 params['xing'] = "%s" % (user_config.setting["name_prefix"])
 params['ming'] = name_postfix
 # 表示女,1表示男
 if user_config.setting["sex"] == "男":
  params['sex'] = "1"
 else:
  params['sex'] = "0"
  
 params['act'] = "submit"
 params['isbz'] = "1"
Copy after login

The second thing is to extract the required scores from the web page. We can use BeautifulSoup4 to achieve this, and its syntax is also very simple:

 soup = BeautifulSoup(content, 'html.parser', from_encoding="GB18030")
 full_name = get_full_name(name_postfix)
 
 # print soup.find(string=re.compile(u"姓名五格评分"))
 for node in soup.find_all("p", class_="chaxun_b"):
  node_cont = node.get_text()
  if u'姓名五格评分' in node_cont:
   name_wuge = node.find(string=re.compile(u"姓名五格评分"))
   result_data['wuge_score'] = name_wuge.next_sibling.b.get_text()
  
  if u'姓名八字评分' in node_cont:
   name_wuge = node.find(string=re.compile(u"姓名八字评分"))
   result_data['bazi_score'] = name_wuge.next_sibling.b.get_text()
Copy after login

Through this method, HTML can be parsed and the scores of eight characters and five grids can be extracted.

Example of running results

1/1287 李国锦 姓名八字评分=61.5 姓名五格评分=78.6 总分=140.1
2/1287 李国铁 姓名八字评分=61 姓名五格评分=89.7 总分=150.7
3/1287 李国晶 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
4/1287 李鸣国 姓名八字评分=21 姓名五格评分=90.3 总分=111.3
5/1287 李柔国 姓名八字评分=64 姓名五格评分=78.3 总分=142.3
6/1287 李国经 姓名八字评分=21 姓名五格评分=89.8 总分=110.8
7/1287 李国蒂 姓名八字评分=22 姓名五格评分=87.2 总分=109.2
8/1287 李国登 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
9/1287 李略国 姓名八字评分=21 姓名五格评分=83.7 总分=104.7
10/1287 李国添 姓名八字评分=21 姓名五格评分=81.6 总分=102.6
11/1287 李国天 姓名八字评分=22 姓名五格评分=83.7 总分=105.7
12/1287 李国田 姓名八字评分=22 姓名五格评分=93.7 总分=115.7
Copy after login

With these scores, we can sort them, which is a very practical reference.

Friendly reminder

  1. The score is related to many factors, such as the time of birth, the limited words, the strokes of the limited words, etc. These conditions It has been decided that some names will not have high scores, so don’t be affected by this, just find the ones with high relative scores;

  2. Currently, the program can only crawl the content of one website, and the address is http ://life.httpcn.com/xingming.asp

  3. This list is for reference only. I have read some articles. There are many celebrities and great people in history. Their names have very low ratings but they all made great achievements. , the name does have some influence, but sometimes catchy words are the best;

  4. After selecting a name from this list, you can check it on Baidu, Renren and other places to Just in case some negative people have the same name, or there are too many people with this name;

  5. The eight-character score is inherited from China, and the five-frame score was invented by the Japanese in modern times. Sometimes You can also try the Western zodiac naming method, and strangely, the horoscopes and five scores are very different on different websites, which further proves that this thing is for reference only;

## The code of this article has been uploaded to github

Summary

[Related recommendations]

1.

Python Free Video Tutorial

2.

Python Meets Data Collection Video Tutorial

3.

Python Learning Manual

The above is the detailed content of Python crawler implementation code example for taking names. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP and Python: Different Paradigms Explained PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

Choosing Between PHP and Python: A Guide Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

Python vs. JavaScript: The Learning Curve and Ease of Use Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

PHP and Python: A Deep Dive into Their History PHP and Python: A Deep Dive into Their History Apr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Can vs code run in Windows 8 Can vs code run in Windows 8 Apr 15, 2025 pm 07:24 PM

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

Can visual studio code be used in python Can visual studio code be used in python Apr 15, 2025 pm 08:18 PM

VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.

How to run programs in terminal vscode How to run programs in terminal vscode Apr 15, 2025 pm 06:42 PM

In VS Code, you can run the program in the terminal through the following steps: Prepare the code and open the integrated terminal to ensure that the code directory is consistent with the terminal working directory. Select the run command according to the programming language (such as Python's python your_file_name.py) to check whether it runs successfully and resolve errors. Use the debugger to improve debugging efficiency.

Is the vscode extension malicious? Is the vscode extension malicious? Apr 15, 2025 pm 07:57 PM

VS Code extensions pose malicious risks, such as hiding malicious code, exploiting vulnerabilities, and masturbating as legitimate extensions. Methods to identify malicious extensions include: checking publishers, reading comments, checking code, and installing with caution. Security measures also include: security awareness, good habits, regular updates and antivirus software.

See all articles