Design a code statistics tool using Python-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Design a code statistics tool using Python

不言

Apr 04, 2018 pm 04:57 PM

python code

This article mainly introduces relevant information about using Python to design a code statistics tool, including the number of files, the number of code lines, the number of comment lines, and the number of blank lines. Interested friends, please follow the editor of Script House to take a look

Question

Design a program to count the number of projects in a project The number of lines of code, including the number of files, lines of code, number of comment lines, and number of blank lines. Try to be more flexible in the design by inputting different parameters to count items in different languages, for example:

# type用于指定文件类型
python counter.py --type python

Copy after login

Output:

files: 10
code_lines:200
comments:100
blanks:20

Analysis

This is a We can simplify the design problem that looks simple but is a bit complicated to solve. As long as we can correctly count the number of lines of code in a file, it is not a problem to count the number of lines of code in a directory. The most complicated one is about multi-line comments. Taking Python as an example, comment code lines have the following situations:

1. Single-line comments starting with a pound sign

# Single-line comments

2. Multi-line comments When the comment characters are on the same line

"""This is a multi-line comment"""
'''This is also a multi-line comment'''
3. Multi-line comment characters

"""
These 3 lines are all comment symbols
"""

Our idea adopts a line-by-line parsing method. Multi-line comments require an additional identifier in_multi_comment to identify them. Whether the current line is in a multi-line comment, the default is False, when the multi-line comment starts, it is set to True, and when the next multi-line comment is encountered, it is set to False. The code between the start symbol of a multi-line comment and the next end symbol should belong to the comment line.

Knowledge points

How to read files correctly, common methods of string processing when reading files, etc.

Simplified version

We iterate step by step, first implement a simplified version of the program, which only counts a single file of Python code, and does not consider multi-line comments In this case, this is a function that anyone who has started with Python can achieve. The key point is that after reading each line, first use the strip() method to remove the spaces and carriage returns on both sides of the string

# -*- coding: utf-8 -*-
"""
只能统计单行注释的py文件
"""
def parse(path):
 comments = 0
 blanks = 0
 codes = 0
 with open(path, encoding=&#39;utf-8&#39;) as f:
 for line in f.readlines():
  line = line.strip()
  if line == "":
  blanks += 1
  elif line.startswith("#"):
  comments += 1
  else:
  codes += 1
 return {"comments": comments, "blanks": blanks, "codes": codes}
if __name__ == &#39;__main__&#39;:
 print(parse("xxx.py"))

Copy after login

Multi-line comment version

If you can only count the code of single-line comments, it is of little significance. Only by solving the statistics of multi-line comments can it be regarded as a real code statistician

# -*- coding: utf-8 -*-
"""

Copy after login

Can count py files containing multi-line comments

"""
def parse(path):
 in_multi_comment = False # 多行注释符标识符号
 comments = 0
 blanks = 0
 codes = 0
 with open(path, encoding="utf-8") as f:
 for line in f.readlines():
  line = line.strip()
  # 多行注释中的空行当做注释处理
  if line == "" and not in_multi_comment:
  blanks += 1
  # 注释有4种
  # 1. # 井号开头的单行注释
  # 2. 多行注释符在同一行的情况
  # 3. 多行注释符之间的行
  elif line.startswith("#") or \
    (line.startswith(&#39;"""&#39;) and line.endswith(&#39;"""&#39;) and len(line)) > 3 or \
   (line.startswith("&#39;&#39;&#39;") and line.endswith("&#39;&#39;&#39;") and len(line) > 3) or \
   (in_multi_comment and not (line.startswith(&#39;"""&#39;) or line.startswith("&#39;&#39;&#39;"))):
  comments += 1
  # 4. 多行注释符的开始行和结束行
  elif line.startswith(&#39;"""&#39;) or line.startswith("&#39;&#39;&#39;"):
  in_multi_comment = not in_multi_comment
  comments += 1
  else:
  codes += 1
 return {"comments": comments, "blanks": blanks, "codes": codes}
if __name__ == &#39;__main__&#39;:
 print(parse("xxx.py"))

Copy after login

The fourth situation above , when encountering multi-line comment symbols, the key operation is to negate the in_multi_comment identifier, instead of simply setting it to False or True. The first time it encounters """, it is True, and the second time it encounters """ It is the end character of the multi-line comment. If it is negated, it is False, and so on. The third time is the beginning, and if it is negated, it is True again.

So how to judge whether other languages need to rewrite a parsing function? If you observe carefully, the four situations of multi-line comments can abstract four judgment conditions, because most languages have single-line comments and multi-line comments, but their symbols are different.

CONF = {"py": {"start_comment": [&#39;"""&#39;, "&#39;&#39;&#39;"], "end_comment": [&#39;"""&#39;, "&#39;&#39;&#39;"], "single": "#"},
 "java": {"start_comment": ["/*"], "end_comment": ["*/"], "single": "//"}}
start_comment = CONF.get(exstansion).get("start_comment")
end_comment = CONF.get(exstansion).get("end_comment")
cond2 = False
cond3 = False
cond4 = False
for index, item in enumerate(start_comment):
 cond2 = line.startswith(item) and line.endswith(end_comment[index]) and len(line) > len(item)
 if cond2:
 break
for item in end_comment:
 if line.startswith(item):
 cond3 = True
 break
for item in start_comment+end_comment:
 if line.startswith(item):
 cond4 = True
 break
if line == "" and not in_multi_comment:
 blanks += 1
# 注释有4种
# 1. # 井号开头的单行注释
# 2. 多行注释符在同一行的情况
# 3. 多行注释符之间的行
elif line.startswith(CONF.get(exstansion).get("single")) or cond2 or \
 (in_multi_comment and not cond3):
 comments += 1
# 4. 多行注释符分布在多行时，开始行和结束行
elif cond4:
 in_multi_comment = not in_multi_comment
 comments += 1
else:
 codes += 1

Copy after login

Only one configuration constant is needed to mark the symbols of single-line and multi-line comments in all languages, corresponding to cond1 to cond4. It is ok. The remaining task is to parse multiple files, which can be done using the os.walk method.

def counter(path):
 """
 可以统计目录或者某个文件
 :param path:
 :return:
 """
 if os.path.isdir(path):
 comments, blanks, codes = 0, 0, 0
 list_dirs = os.walk(path)
 for root, dirs, files in list_dirs:
  for f in files:
  file_path = os.path.join(root, f)
  stats = parse(file_path)
  comments += stats.get("comments")
  blanks += stats.get("blanks")
  codes += stats.get("codes")
 return {"comments": comments, "blanks": blanks, "codes": codes}
 else:
 return parse(path)

Copy after login

Of course, there is still a lot of work to be done to perfect this program, including command line parsing and parsing only a certain language based on specified parameters. .

Supplement:

Python implementation of code line counting tool

We often want to count The number of lines of code of the project, but if you want to have a more complete statistical function, it may not be that simple. Today we will take a look at how to use python to implement a line of code statistics tool.

Idea:

First get all the files, then count the number of lines of code in each file, and finally add the number of lines.

Functions implemented:

Count the number of lines in each file;
Count the total number of lines;
Count the running time;
Support specified statistical files Type, exclude file types that do not want to be counted;
Recursively count the number of lines of files under the folder including sub-files;

Exclude empty lines;

# coding=utf-8
import os
import time
basedir = &#39;/root/script&#39;
filelists = []
# 指定想要统计的文件类型
whitelist = [&#39;php&#39;, &#39;py&#39;]
#遍历文件, 递归遍历文件夹中的所有
def getFile(basedir):
 global filelists
 for parent,dirnames,filenames in os.walk(basedir):
  #for dirname in dirnames:
  # getFile(os.path.join(parent,dirname)) #递归
  for filename in filenames:
   ext = filename.split(&#39;.&#39;)[-1]
   #只统计指定的文件类型，略过一些log和cache文件
   if ext in whitelist:
    filelists.append(os.path.join(parent,filename))
#统计一个文件的行数
def countLine(fname):
 count = 0
 for file_line in open(fname).xreadlines():
  if file_line != &#39;&#39; and file_line != &#39;\n&#39;: #过滤掉空行
   count += 1
 print fname + &#39;----&#39; , count
 return count
if __name__ == &#39;__main__&#39; :
 startTime = time.clock()
 getFile(basedir)
 totalline = 0
 for filelist in filelists:
  totalline = totalline + countLine(filelist)
 print &#39;total lines:&#39;,totalline
 print &#39;Done! Cost Time: %0.2f second&#39; % (time.clock() - startTime)

Copy after login

Result:

[root@pythontab script]# python countCodeLine.py
/root/script/test /gametest.php---- 16
/root/script/smtp.php---- 284
/root/script/gametest.php---- 16
/root/script/countCodeLine .py---- 33
/root/script/sendmail.php---- 17
/root/script/test/gametest.php---- 16
total lines: 382
Done! Cost Time: 0.00 second
[root@pythontab script]

#Only counts php and python files, which is very convenient.

Related recommendations:

Complete example sharing of Python design calculator function implementation

Visitors and observations in Python design pattern programming User mode example introduction

The above is the detailed content of Design a code statistics tool using Python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

1 months ago By DDD

How to fix KB5055523 fails to install in Windows 11?

3 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks ago By DDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Blue Prince: How To Get To The Basement

3 weeks ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1653

CakePHP Tutorial

1413

Laravel Tutorial

1305

PHP Tutorial

1251

C# Tutorial

1224

Related knowledge

PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

PHP and Python: A Deep Dive into Their History Apr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

How to run sublime code python Apr 16, 2025 am 08:48 AM

To run Python code in Sublime Text, you need to install the Python plug-in first, then create a .py file and write the code, and finally press Ctrl B to run the code, and the output will be displayed in the console.

Can vs code run in Windows 8 Apr 15, 2025 pm 07:24 PM

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

Where to write code in vscode Apr 15, 2025 pm 09:54 PM

Writing code in Visual Studio Code (VSCode) is simple and easy to use. Just install VSCode, create a project, select a language, create a file, write code, save and run it. The advantages of VSCode include cross-platform, free and open source, powerful features, rich extensions, and lightweight and fast.

How to run python with notepad Apr 16, 2025 pm 07:33 PM

Running Python code in Notepad requires the Python executable and NppExec plug-in to be installed. After installing Python and adding PATH to it, configure the command "python" and the parameter "{CURRENT_DIRECTORY}{FILE_NAME}" in the NppExec plug-in to run Python code in Notepad through the shortcut key "F6".

See all articles