What are the methods for Dataframe query in pandas
This time I will bring you the DataframeQuerymethods in pandas, and theNotesof Dataframe query in pandas. The following is a practical case, let’s take a look.
pandas provides us with a variety of slicing methods, but if you don't know much about these methods, it's often easy to get confused. The following examples illustrate these slicing methods.
Data introduction
First randomly generate a set of data:
In [5]: rnd_1 = [random.randrange(1,20) for x in xrange(1000)] ...: rnd_2 = [random.randrange(1,20) for x in xrange(1000)] ...: rnd_3 = [random.randrange(1,20) for x in xrange(1000)] ...: fecha = pd.date_range('2012-4-10', '2015-1-4') ...: ...: data = pd.DataFrame({'fecha':fecha, 'rnd_1': rnd_1, 'rnd_2': rnd_2, 'rnd_3': rnd_3}) In [6]: data.describe() Out[6]: rnd_1 rnd_2 rnd_3 count 1000.000000 1000.000000 1000.000000 mean 9.946000 9.825000 9.894000 std 5.553911 5.559432 5.423484 min 1.000000 1.000000 1.000000 25% 5.000000 5.000000 5.000000 50% 10.000000 10.000000 10.000000 75% 15.000000 15.000000 14.000000 max 19.000000 19.000000 19.000000
[]Slicing method
Use square brackets to slice DataFrame, somewhat similar to python's list slicing. Row selection or column selection or block selection can be achieved according to the index.
# 行选择 In [7]: data[1:5] Out[7]: fecha rnd_1 rnd_2 rnd_3 1 2012-04-11 1 16 3 2 2012-04-12 7 6 1 3 2012-04-13 2 16 7 4 2012-04-14 4 17 7 # 列选择 In [10]: data[['rnd_1', 'rnd_3']] Out[10]: rnd_1 rnd_3 0 8 12 1 1 3 2 7 1 3 2 7 4 4 7 5 12 8 6 2 12 7 9 8 8 13 17 9 4 7 10 14 14 11 19 16 12 2 12 13 15 18 14 13 18 15 13 11 16 17 7 17 14 10 18 9 6 19 11 15 20 16 13 21 18 9 22 1 18 23 4 3 24 6 11 25 2 13 26 7 17 27 11 8 28 3 12 29 4 2 .. ... ... 970 8 14 971 19 5 972 13 2 973 8 10 974 8 17 975 6 16 976 3 2 977 12 6 978 12 10 979 15 13 980 8 4 981 17 3 982 1 17 983 11 5 984 7 7 985 13 14 986 6 19 987 13 9 988 3 15 989 19 6 990 7 11 991 11 7 992 19 12 993 2 15 994 10 4 995 14 13 996 12 11 997 11 15 998 17 14 999 3 8 [1000 rows x 2 columns] # 区块选择 In [11]: data[:7][['rnd_1', 'rnd_2']] Out[11]: rnd_1 rnd_2 0 8 17 1 1 16 2 7 6 3 2 16 4 4 17 5 12 19 6 2 7
However, for multi-column selection, you cannot use the 1:5 method like when selecting rows.
In [12]: data[['rnd_1':'rnd_3']] File "<ipython-input-13-6291b6a83eb0>", line 1 data[['rnd_1':'rnd_3']] ^ SyntaxError: invalid syntax
loc
loc allows you to select rows and columns based on index.
In [13]: data.loc[1:5] Out[13]: fecha rnd_1 rnd_2 rnd_3 1 2012-04-11 1 16 3 2 2012-04-12 7 6 1 3 2012-04-13 2 16 7 4 2012-04-14 4 17 7 5 2012-04-15 12 19 8
It should be noted here that the difference between loc and the first method is that it will also select the 5th row, while the first method will only select the 4th row.
data.loc[2:4, ['rnd_2', 'fecha']] Out[14]: rnd_2 fecha 2 6 2012-04-12 3 16 2012-04-13 4 17 2012-04-14
loc can select data between two specific dates. It should be noted that both dates must be in the index.
In [15]: data_fecha = data.set_index('fecha') ...: data_fecha.head() Out[15]: rnd_1 rnd_2 rnd_3 fecha 2012-04-10 8 17 12 2012-04-11 1 16 3 2012-04-12 7 6 1 2012-04-13 2 16 7 2012-04-14 4 17 7 In [16]: # 生成两个特定日期 ...: fecha_1 = dt.datetime(2013, 4, 14) ...: fecha_2 = dt.datetime(2013, 4, 18) ...: ...: # 生成切片数据 ...: data_fecha.loc[fecha_1: fecha_2] Out[16]: rnd_1 rnd_2 rnd_3 fecha 2013-04-14 17 10 5 2013-04-15 14 4 9 2013-04-16 1 2 18 2013-04-17 9 15 1 2013-04-18 16 7 17
Update:If there are no special needs, it is strongly recommended to use loc and use [] as little as possible, because loc is reprocessing the DataFrame. Chained indexing problems will be avoided during assignment operations. When using [], the compiler is likely to give a warning about SettingWithCopy.
For details, please refer to the official documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
iloc
If loc is selected according to the value of the index, then iloc is selected according to the position of the index. iloc doesn't care about the specific value of the index, it only cares about the position, so when using iloc, only numerical values can be used in square brackets.
# 行选择 In [17]: data_fecha[10: 15] Out[17]: rnd_1 rnd_2 rnd_3 fecha 2012-04-20 14 6 14 2012-04-21 19 14 16 2012-04-22 2 6 12 2012-04-23 15 8 18 2012-04-24 13 8 18 # 列选择 In [18]: data_fecha.iloc[:,[1,2]].head() Out[18]: rnd_2 rnd_3 fecha 2012-04-10 17 12 2012-04-11 16 3 2012-04-12 6 1 2012-04-13 16 7 2012-04-14 17 7 # 切片选择 In [19]: data_fecha.iloc[[1,12,34],[0,2]] Out[19]: rnd_1 rnd_3 fecha 2012-04-11 1 3 2012-04-22 2 12 2012-05-14 17 10
at
The usage of at is similar to loc, but it has faster access to data than loc, and can only access A single element, multiple elements cannot be accessed.
In [20]: timeit data_fecha.at[fecha_1,'rnd_1'] The slowest run took 3783.11 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 11.3 µs per loop In [21]: timeit data_fecha.loc[fecha_1,'rnd_1'] The slowest run took 121.24 times longer than the fastest. This could mean that an intermediate result is being cached. 10000 loops, best of 3: 192 µs per loop In [22]: data_fecha.at[fecha_1,'rnd_1'] Out[22]: 17
iat
iat is to iloc what at is to loc, a faster option based on index position Method, like at, can only access a single element.
In [23]: data_fecha.iat[1,0] Out[23]: 1 In [24]: timeit data_fecha.iat[1,0] The slowest run took 6.23 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 8.77 µs per loop In [25]: timeit data_fecha.iloc[1,0] 10000 loops, best of 3: 158 µs per loop
ix
The methods mentioned above all require that the rank of the query is in the index, or the position does not exceed the length range, and ix allows you to get data that is not in the DataFrame index.
In [28]: date_1 = dt.datetime(2013, 1, 10, 8, 30) ...: date_2 = dt.datetime(2013, 1, 13, 4, 20) ...: ...: # 生成切片数据 ...: data_fecha.ix[date_1: date_2] Out[28]: rnd_1 rnd_2 rnd_3 fecha 2013-01-11 19 17 19 2013-01-12 10 9 17 2013-01-13 15 3 10
As shown in the above example, January 10, 2013 was not selected because this time point is regarded as 0:00, which is earlier than 8:30.
I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!
Recommended reading:
How python implements the steps of Baidu speech recognition API
How python calls the API to achieve intelligence Reply function
The above is the detailed content of What are the methods for Dataframe query in pandas. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Pandas installation tutorial: Analysis of common installation errors and their solutions, specific code examples are required Introduction: Pandas is a powerful data analysis tool that is widely used in data cleaning, data processing, and data visualization, so it is highly respected in the field of data science . However, due to environment configuration and dependency issues, you may encounter some difficulties and errors when installing pandas. This article will provide you with a pandas installation tutorial and analyze some common installation errors and their solutions. 1. Install pandas

How to use pandas to read txt files correctly requires specific code examples. Pandas is a widely used Python data analysis library. It can be used to process a variety of data types, including CSV files, Excel files, SQL databases, etc. At the same time, it can also be used to read text files, such as txt files. However, when reading txt files, we sometimes encounter some problems, such as encoding problems, delimiter problems, etc. This article will introduce how to read txt correctly using pandas

Pandas is a powerful data analysis tool that can easily read and process various types of data files. Among them, CSV files are one of the most common and commonly used data file formats. This article will introduce how to use Pandas to read CSV files and perform data analysis, and provide specific code examples. 1. Import the necessary libraries First, we need to import the Pandas library and other related libraries that may be needed, as shown below: importpandasaspd 2. Read the CSV file using Pan

Python can install pandas by using pip, using conda, from source code, and using the IDE integrated package management tool. Detailed introduction: 1. Use pip and run the pip install pandas command in the terminal or command prompt to install pandas; 2. Use conda and run the conda install pandas command in the terminal or command prompt to install pandas; 3. From Source code installation and more.

Data processing tool: Pandas reads data in SQL databases and requires specific code examples. As the amount of data continues to grow and its complexity increases, data processing has become an important part of modern society. In the data processing process, Pandas has become one of the preferred tools for many data analysts and scientists. This article will introduce how to use the Pandas library to read data from a SQL database and provide some specific code examples. Pandas is a powerful data processing and analysis tool based on Python

Steps to install pandas in python: 1. Open the terminal or command prompt; 2. Enter the "pip install pandas" command to install the pandas library; 3. Wait for the installation to complete, and you can import and use the pandas library in the Python script; 4. Use It is a specific virtual environment. Make sure to activate the corresponding virtual environment before installing pandas; 5. If you are using an integrated development environment, you can add the "import pandas as pd" code to import the pandas library.

Practical tips for reading txt files using pandas, specific code examples are required. In data analysis and data processing, txt files are a common data format. Using pandas to read txt files allows for fast and convenient data processing. This article will introduce several practical techniques to help you better use pandas to read txt files, along with specific code examples. Reading txt files with delimiters When using pandas to read txt files with delimiters, you can use read_c

The secret of Pandas deduplication method: a fast and efficient way to deduplicate data, which requires specific code examples. In the process of data analysis and processing, duplication in the data is often encountered. Duplicate data may mislead the analysis results, so deduplication is a very important step. Pandas, a powerful data processing library, provides a variety of methods to achieve data deduplication. This article will introduce some commonly used deduplication methods, and attach specific code examples. The most common case of deduplication based on a single column is based on whether the value of a certain column is duplicated.
