Home Backend Development Python Tutorial Collection | Comprehensive collection of 100 common Pandas basic instructions

Collection | Comprehensive collection of 100 common Pandas basic instructions

Aug 10, 2023 pm 02:42 PM
python pandas


This issue is a summary of the common operating instructions of pandas, mainly involving reading data and saving data,data details Information,Data processing,Data slicing,FilteringSort, Group, Statistics, Table style, etc. A few parts of the content , hope it will be helpful to you.
##1. Import module

import pandas as pd
import numpy as np
Copy after login


2. Read data and save data

2.1 Read data from CSV file, encoding 'gbk'

##2.2 Read the first 6 lines, when the amount of data is relatively large, it can be read-only Take the first n rows
2.3 The first column is used as the row index, ignore the column index
2.4 Ignore the 1/3/5th when reading lines and the last two lines
2.5 Read data from a file or text qualified with delimiters (',')

2.6 Save data

# 2.1 从CSV文件读取数据,编码'gbk'
pd.read_csv(filename, encoding='gbk')

# 2.2 读取前6行,当数据量比较大时,可以只读取前n行
pd.read_csv(filename, encoding='gbk', nrows = 6)

# 2.3 第一列作为行索引,忽略列索引
pd.read_csv(filename, encoding='gbk', header=None, index_col=0)

# 2.4 读取时忽略第1/3/5行和最后两行
pd.read_csv(filename, encoding='gbk', skiprows=[1,3,5], skipfooter=2, engine='python')

# 2.5 从限定分隔符(',')的文件或文本读取数据
pd.read_table(filename, sep=',', encoding='gbk')

# 2.6 保存数据
# 保存为csv文件
df.to_csv('test_ison.csv')
# 保存为xlsx文件
df.to_excel('test_xlsx.xlsx',index=False)
# 保存为ison文件
df.to_json('test_json.txt')
Copy after login

3. 查看数据信息

3.1 查看前n行

3.2 查看后n行
3.3 查看行数和列数
3.4 查看列索引
3.5 查看行索引
3.6 查看索引、数据类型和内存信息

3.7 查看数值型列的汇总统计

3.8 查看每一列的唯一值和计数
# 3.1 查看前n行
df.head(3)

# 3.2 查看后n行
df.tail(3)

# 3.3 查看行数和列数
df.shape

# 3.4查看列索引
df.columns

# 3.5 查看行索引
df.index

# 3.6 查看索引、数据类型和内存信息
df.info()

# 3.7 查看数值型列的汇总统计
df.describe()

# 3.8 查看每一列的唯一值和计数
df.apply(pd.Series.value_counts)
Copy after login


4. 数据处理

4.1 重命名列名

4.2 选择性更改列名

4.3 批量更改索引

4.4 批量更改列名

4.5 Set the name column as row index

4.6 Check which columns contain missing values

4.7 Count the null values ​​in each column

4.8 Delete rows with null values ​​in this column

4.9 Keep only rows with null values ​​in this column

4.10 Remove a row

4.11 Remove a column

4.12 Delete all rows containing null values

4.13 Delete rows with all null values

4.14 Delete all columns containing null values

4.15 Keep rows with at least 6 non-null values

##4.16 Keep columns with at least 11 non-null values

4.17 Row data is filled downward

4.18 Column data is filled right

4.19 Replace all null values ​​with 0

4.20 Cast the data type

4.21 See how many differences there are City

4.22 Single value replacement

4.23 Multi-value replacement

4.24 Multi-value replacement of single value

4.25 To replace a certain column, the display needs to add inplace= True

4.26 Split a column and generate a new Dataframe

4.27 某一列类型转换,注意该列类型要一致,包括(NaN)
# 4.1 重命名列名
df.columns = ['姓名','性别','语文','数学','英语','城市','省份']

# 4.2 选择性更改列名
df.rename(columns={'姓名': '姓--名','性别': '性--别'},inplace=True)

# 4.3 批量更改索引
df.rename(lambda x: x + 10)

# 4.4 批量更改列名
df.rename(columns=lambda x: x + '_1')

# 4.5 设置姓名列为行索引
df.set_index('姓名')

# 4.6 检查哪些列包含缺失值
df.isnull().any()

# 4.7 统计各列空值
df.isnull().sum()

# 4.8 删除本列中空值的行
df[df['数学'].notnull()]
df[~df['数学'].isnull()]

# 4.9 仅保留本列中是空值的行
df[df['数学'].isnull()]
df[~df['数学'].notnull()]

# 4.10 去掉某行
df.drop(0, axis=0)

# 4.11 去掉某列
df.drop('英语', axis=1)

# 4.12 删除所有包含空值的行
df.dropna()

# 4.13 删除行里全都是空值的行
df.dropna(how = 'all')

# 4.14 删除所有包含空值的列
df.dropna(axis=1)

# 4.15 保留至少有6个非空值的行
df.dropna(thresh=6)

# 4.16 保留至少有11个非空值的列
df.dropna(axis=1,thresh=11)

# 4.17 行数据向下填充
df.fillna(method = 'ffill')

# 4.18 列数据向右填充
df.fillna(method = 'ffill',axis=1)

# 4.19 用0替换所有的空值
df.fillna(0)

# 4.20 强制转换数据类型
df_t1 = df.dropna()
df_t1['语文'].astype('int')

# 4.21 查看有多少不同的城市
df['城市'].unique()

# 4.22 单值替换
df.replace('苏州', '南京')

# 4.23 多值替换
df.replace({'苏州':'南京','广州':'深圳'})
df.replace(['苏州','广州'],['南京','深圳'])

# 4.24 多值替换单值
df.replace(['深圳','广州'],'东莞')

# 4.25 替换某列,显示需要加inplace=True
df['城市'] = df['城市'].replace('苏州', '南京')

# 4.26 拆分某列,生成新的Dataframe
df1 = df['姓名'].str.split('-',expand=True)
df1.columns = ['学号','姓名']

# 4.27 某一列类型转换,注意该列类型要一致,包括(NaN)
df1.dropna(inplace = True)
df1['语文'] = df1['语文'].apply(int)
Copy after login
5. 数据切片、筛选

5.1 输出城市为上海

5.2 输出城市为上海或广州

5.3 输出城市名称中含有‘海’字的行

5.4 输出城市名称以‘海’字开头的行

5.5 输出城市名称以‘海’字结尾的行

5.6 Output all names, fill missing values ​​with Null

5.7 Reset index

5.8 First two lines

5.9 The last two lines

5.10 Lines 2-8

5.11 Read every 3 lines

5.12 Lines 2-8, the step size is 2, that is, lines 2/4/6

5.13 Select 'Chinese',' Mathematics', 'English' columns

5.14 Row with student number '001', all columns

5.15 The rows where the student ID is '001' or '003', all columns

5.16 The rows where the student ID is '001' to '009', all columns

5.17 Column index is 'Name', all rows

##5.18 Column index is 'Name' to 'City', all rows

5.19 Lines with Chinese scores greater than 80

5.20 Student numbers and names of people with Chinese scores greater than 80

5.21 Output the scores of each subject of 'Zhaosi' and 'Zhou Qi'

5.22 Select the 2nd row

5.23 Before selection 3 lines

5.24 选取第2行、第4行、第6行

5.25 选取第2列

5.26 选取前3列

5.27 选取第3行的第3列

5.28 选取第1列、第3列和第4列

5.29 选取第2行的第1列、第3列、第4列

5.30 选取前3行的前3列
# 5.1 输出城市为上海
df[df['城市']=='上海']

# 5.2 输出城市为上海或广州
df[df['城市'].isin(['上海','广州'])]

# 5.3 输出城市名称中含有‘海’字的行
df[df['城市'].str.contains("海", na=False)]

# 5.4 输出城市名称以‘海’字开头的行
df[df['城市'].str.startswith("海", na=False)]

# 5.5 输出城市名称以‘海’字结尾的行
df[df['城市'].str.endswith("海", na=False)]

# 5.6 输出所有姓名,缺失值用Null填充
df['姓名'].str.cat(sep='、',na_rep='Null')

# 5.7 重置索引
df2 = df1.set_index('学号')

# 5.8 前两行
df2[:2]

# 5.9 后两行
df2[-2:]

# 5.10 2-8行
df2[2:8]

# 5.11 每隔3行读取
df2[::3]

# 5.12 2-8行,步长为2,即第2/4/6行
df2[2:8:2]

# 5.13 选取'语文','数学','英语'列
df2[['语文','数学','英语']]

# df.loc[]只能使用标签索引,不能使用整数索引,通过便签索引切边进行筛选时,前闭后闭
# 5.14 学号为'001'的行,所有列
df2.loc['001', :]

# 5.15 学号为'001'或'003'的行,所有列
df2.loc[['001','003'], :]

# 5.16 学号为'001'至'009'的行,所有列
df2.loc['001':'009', :]

# 5.17 列索引为'姓名',所有行
df2.loc[:, '姓名']

# 5.18 列索引为'姓名'至‘城市’,所有行
df2.loc[:, '姓名':'城市']

# 5.19 语文成绩大于80的行
df2.loc[df2['语文']>80,:]
df2.loc[df2.loc[:,'语文']>80, :]
df2.loc[lambda df2:df2['语文'] > 80, :]

# 5.20 语文成绩大于80的人的学号和姓名
df2.loc[df2['语文']>80,['姓名','城市']]

# 5.21 输出'赵四'和'周七'的各科成绩
df2.loc[df2['姓名'].isin(['赵四','周七']),['姓名','语文','数学','英语']]

# # df.iloc[]只能使用整数索引,不能使用标签索引,通过整数索引切边进行筛选时,前闭后开
# 5.22 选取第2行
df2.iloc[1, :]

# 5.23 选取前3行
df2.iloc[:3, :]

# 5.24 选取第2行、第4行、第6行
df2.iloc[[1,3,5],:]

# 5.25 选取第2列
df2.iloc[:, 1]

# 5.26 选取前3列
df2.iloc[:, 0:3]

# 5.27 选取第3行的第3列
df2.iloc[3, 3]

# 5.28 选取第1列、第3列和第4列
df2.iloc[:, [0,2,3]]

# 5.29 选取第2行的第1列、第3列、第4列
df2.iloc[1, [0,2,3]]

# 5.30 选取前3行的前3列
df2.iloc[:3, :3]
Copy after login


6. Data sorting

6.1 Reset index

6.2 Sort by Chinese scores in ascending order, default ascending order

6.3 Sort by math scores in descending order

6.4 Arrange first in ascending order of Chinese scores, then in descending order of math scores

6.5 Chinese scores of 80 and above

6.6 Chinese language score of 80 and above and mathematics score of 90 and above

6.7 Chinese language score of 80 and above or mathematics score of 90 and above

6.8 输出成绩100的行和列号

6.9 增加一列“省份-城市”

6.10 增加一列总分

6.11 按照总分、语文、数学、英语成绩依次排序

6.12 新增一列表示学生语文成绩等级的列(优秀、良好、中等、不及格)
# 6.1 重置索引
df_last = df1.reset_index(drop=True)

# 6.2 按照语文成绩升序排序,默认升序排列
df_last.sort_values('语文')

# 6.3 按照数学成绩降序排序
df_last.sort_values('数学', ascending=False)

# 6.4 先按语文成绩升序排列,再按数学成绩降序排列
df_last.sort_values(['语文','数学'], ascending=[True,False])

# 6.5 语文成绩80及以上
df_last[df_last['语文']>=80]
df_last.query('语文 > 80')

# 6.6 语文成绩80及以上以及数学成绩90分及以上
df_last[(df_last['语文']>=80) & (df_last['数学']>=90)]

# 6.7 语文成绩80及以上或数学成绩90分及以上
df_last[(df_last['语文']>=80) | (df_last['数学']>=90)]

# 6.8 输出成绩100的行和列号
row, col = np.where(df_last.values == 100)

# 6.9 增加一列“省份-城市”
df_last['省份-城市'] = df_last['省份'] + '-' + df_last['城市']

# 6.10 增加一列总分
df_last['总分'] = df_last[['语文','数学','英语']].sum(axis = 1)

# 6.11 按照总分、语文、数学、英语成绩依次排序
df_last.sort_values(by =['总分','语文','数学','英语'],ascending=False )

# 6.12 新增一列表示学生语文成绩等级的列(优秀、良好、中等、不及格)
def get_letter_grade(score):
    '''
    定义一个函数,根据分数返回相应的等级
    '''
    if score>=90:
        return '优秀'
    elif score>=80:
        return '良好'
    elif score>=60:
        return '中等'
    else:
        return '不及格'
    
df_last['语文等级'] = df_last['语文'].apply(lambda score: get_letter_grade(score))
Copy after login


7. 数据分组

7.1 一列分组

7.2 多列分组

7.3 Statistical data of each group (horizontal display)

7.4 Statistical data of each group (vertical display)

7.5 View the statistical information of the specified column

7.6 Group size

7.7 Maximum grouping score

7.8 Minimum group score

7.9 Total group score

7.10 Group average score

7.11 Group by province, calculate total English score and average score

7.12 按省份、城市分组计算平均成绩

7.13 不同列不同的计算方法

7.14 性别分别替换为1/0

7.15 增加一列按省份分组的语文平均分

7.16 输出语文成绩最高的男生和女生(groupby默认会去掉空值)

7.17 按列省份、城市进行分组,计算语文、数学、英语成绩最大值的透 视表
# 7.1 一列分组
df2.groupby('省份').groups

# 7.2 多列分组
df2.groupby(['省份','城市']).groups

# 7.3 每组的统计数据(横向显示)
df2.groupby('省份').describe()

# 7.4 每组的统计数据(纵向显示)
df2.groupby('省份').describe().unstack()

# 7.5 查看指定列的统计信息
df2.groupby('省份').describe()['语文']

# 7.6 分组大小
df2.groupby('省份').count()
df2.groupby('省份').agg(np.size)

# 7.7 分组成绩最大值
df2.groupby('省份').max()
df2.groupby('省份').agg(np.max)

# 7.8 分组成绩最小值
df2.groupby('省份').min()
df2.groupby('省份').agg(np.min)

# 7.9 分组成绩总和
df2.groupby('省份').sum()
df2.groupby('省份').agg(np.sum)

# 7.10 分组平均成绩
df2.groupby('省份').mean()
df2.groupby('省份').agg(np.mean)

# 7.11 按省份分组,计算英语成绩总分和平均分
df2.groupby('省份')['英语'].agg([np.sum, np.mean])

# 7.12 按省份、城市分组计算平均成绩
df2.groupby(['省份','城市']).agg(np.mean)

# 7.13 不同列不同的计算方法
df2.groupby('省份').agg({'语文': sum, # 总和
                        '数学': 'count', # 总数
                        '英语':'mean'}) # 平均

# 7.14 性别分别替换为1/0
df2 = df2.dropna()
df2['性别'] = df2['性别'].map({'男':1, '女':0})

# 7.15 增加一列按省份分组的语文平均分
df2['语文平均分'] = df2.groupby('省份')['语文'].transform('mean')

# 7.16 输出语文成绩最高的男生和女生(groupby默认会去掉空值)
def get_max(g):
    df = g.sort_values('语文',ascending=True)
    print(df)
    return df.iloc[-1,:]

df2.groupby('性别').apply(get_max)

# 7.17 按列省份、城市进行分组,计算语文、数学、英语成绩最大值的透 视表
df.pivot_table(index=['省份','城市'], values=['语文','数学','英语'], aggfunc=max)
Copy after login


8. Statistics

##8.1 Data summary statistics

8.2 Number of non-null values ​​in the column

8.3 Minimum value of the column

8.4 Column maximum value

8.5 Column mean

8.6 Column median

8.7 Correlation coefficient between columns

8.8 列的标准差

8.9 语文成绩指标

8.10 三个科目的指标
# 8.1 数据汇总统计
df.describe()

# 8.2 列中非空值的个数
df.count()

# 8.3 列最小值
df.min()

# 8.4 列最大值
df.max()

# 8.5 列均值
df.mean()

# 8.6 列中位数
df.median()

# 8.7 列与列之间的相关系数
df.corr()

# 8.8 列的标准差
df.std()

# 8.9 语文成绩指标
# 对语文列求和
sum0 = df_last['语文'].sum()
# 语文成绩方差
var = df_last['语文'].var()
# 语文成绩标准差
std = df_last['语文'].std()
# 语文平均分
mean = df_last['语文'].mean()

print('语文总分:',sum0)
print('语文平均分:',mean)
print('语文成绩标准差:',std)
print('语文成绩方差:',var)

# 8.10 三个科目的指标
mean = df_last[['语文','数学','英语']].mean()
var  = df_last[['语文','数学','英语']].var()
total = df_last[['语文','数学','英语']].sum()
std = df_last[['语文','数学','英语']].std()
rows = [total,mean,var,std]
# 索引列表
index = ['总分','平均分','方差','标准差']
# 根据指定索引和行构造 DataFrame 对象
df_tmp = pd.DataFrame(rows,index=index)
Copy after login


9. 表格样式

9.1 示例数据

9.2 列重命名

9.3 设置空值背景红色

9.4 Maximum data highlighting

9.5 Minimum data highlighting

9.6 Partial column maximum data highlighting

9.7 Partial column data highlighting (Dataframe is all data)

9.8 Display red if 95 points or more

9.9 Mixed

9.10 Set the background highlight of float type column data greater than 80.0

9.11 Set the background height of rows with math scores greater than 80.0 Bright

9.12 Set the color of row data with math scores greater than 95.0 to red

9.13 显示热度图
# 9.1 示例数据
df = pd.read_csv(filename, encoding='gbk')

# 9.2 列重命名
df.columns = ['姓名','性别','语文','数学','英语','城市','省份']

# 9.3 设置空值背景红色
df.style.highlight_null(null_color = 'red')

# 9.4 最大数据高亮
df.style.highlight_max()

# 9.5 最小数据高亮
df.style.highlight_min()

# 9.6 部分列最大数据高亮
df.style.apply(highlight_max, subset=['语文', '数学'])

# 9.7 部分列数据高亮(Dataframe全为数据)
df3 = df[['语文','数学','英语']]
def highlight_max(s):
    is_max = s == s.max()
    return ['background-color: yellow' if v else '' for v in is_max]

df3.style.apply(highlight_max)

# 9.8 95分以上显示红色
def color_negative_red(val):
    color = 'red' if val > 95.0 else 'black'
    return 'color: %s' % color

df3.style.applymap(color_negative_red)

# 9.9 混合
df3.style.applymap(color_negative_red).apply(highlight_max)

# 9.10 设置float类型列数据大于80.0的背景高亮
yellow_css = 'background-color: yellow'
sfun = lambda x: yellow_css if type(x) == float and x > 80.0 else ''
df3.style.applymap(sfun)

# 9.11 设置数学成绩大于80.0分的行背景高亮
yellow_css = 'background-color: yellow'
sfun = lambda x: [yellow_css]*len(x) if x.数学 > 80.0 else ['']*len(x)
df3.style.apply(sfun, axis=1)

# 9.12 设置数学成绩大于95.0的行数据颜色为红色
def row_color(s):
    if s.数学 > 95:
        return ['color: red']*len(s)
    else:
        return ['']*len(s)

df3.style.apply(row_color, axis=1)

# 9.13 显示热度图
import seaborn as sns
cm = sns.light_palette("green", as_cmap=True)
df3.style.background_gradient(cmap=cm)
Copy after login

The above is the detailed content of Collection | Comprehensive collection of 100 common Pandas basic instructions. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1664
14
PHP Tutorial
1269
29
C# Tutorial
1248
24
PHP and Python: Different Paradigms Explained PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

Choosing Between PHP and Python: A Guide Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

How to run sublime code python How to run sublime code python Apr 16, 2025 am 08:48 AM

To run Python code in Sublime Text, you need to install the Python plug-in first, then create a .py file and write the code, and finally press Ctrl B to run the code, and the output will be displayed in the console.

PHP and Python: A Deep Dive into Their History PHP and Python: A Deep Dive into Their History Apr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Python vs. JavaScript: The Learning Curve and Ease of Use Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

Golang vs. Python: Performance and Scalability Golang vs. Python: Performance and Scalability Apr 19, 2025 am 12:18 AM

Golang is better than Python in terms of performance and scalability. 1) Golang's compilation-type characteristics and efficient concurrency model make it perform well in high concurrency scenarios. 2) Python, as an interpreted language, executes slowly, but can optimize performance through tools such as Cython.

Where to write code in vscode Where to write code in vscode Apr 15, 2025 pm 09:54 PM

Writing code in Visual Studio Code (VSCode) is simple and easy to use. Just install VSCode, create a project, select a language, create a file, write code, save and run it. The advantages of VSCode include cross-platform, free and open source, powerful features, rich extensions, and lightweight and fast.

How to run python with notepad How to run python with notepad Apr 16, 2025 pm 07:33 PM

Running Python code in Notepad requires the Python executable and NppExec plug-in to be installed. After installing Python and adding PATH to it, configure the command "python" and the parameter "{CURRENT_DIRECTORY}{FILE_NAME}" in the NppExec plug-in to run Python code in Notepad through the shortcut key "F6".

See all articles