Home Backend Development Python Tutorial Python For Data Analysis learning path

Python For Data Analysis learning path

Jun 23, 2017 pm 04:25 PM
analysis data for python study notes

In the introductory chapter, an example of processing the MovieLens 1M data set is introduced. The book introduces that the data set comes from GroupLens Research (), this address will jump directly to it, which provides various evaluation data sets from the MovieLens website, and you can download the corresponding compressed package. The MovieLens 1M data set we need is also there. in.

The downloaded and decompressed folder is as follows:

These three dat tables will be used in the example. The Chinese version (PDF) of "Python For Data Analysis" I read is the first edition in 2014. All the examples in it are written based on Python 2.7 and pandas 0.8.2, and I installed Python 3.5.2 and pandas 0.8.2. pandas 0.20.2, some functions and methods in it will be quite different. Some of them are parameters changed in the new version, while some are deprecated in the new version. This caused me to run according to the book When sample code, you will encounter some Errors and Warnings. When testing the MovieLens 1M data set code, under the same configuration environment as mine, I will encounter the following problems.

  • When reading dat data into a pandas DataFrame object, the code given in the book is:

    users = pd.read_table('ml-1m/users.dat', sep='::', header=None, names=unames)
    
    rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    ratings = pd.read_table('ml-1m/ratings.dat', sep='::', header=None, names=rnames)
    
    mnames = ['movie_id', 'title', 'genres']
    movies = pd.read_table('ml-1m/movies.dat', sep='::', header=None, names=mnames)
    Copy after login

    When running directly, a Warning will appear:

    F:/python/HelloWorld/DataAnalysisByPython-1.py:4: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
      users = pd.read_table('ml-1m/users.dat', sep='::', header=None, names=unames)
    F:/python/HelloWorld/DataAnalysisByPython-1.py:7: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
      ratings = pd.read_table('ml-1m/ratings.dat', sep='::', header=None, names=rnames)
    F:/python/HelloWorld/DataAnalysisByPython-1.py:10: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
      movies = pd.read_table('ml-1m/movies.dat', sep='::', header=None, names=mnames)
    Copy after login

    Although it can also be run, as a perfect obsessive-compulsive disorder, I still want to solve this Warning . This warning means that because the 'C' engine does not support it, it can only fall back to the 'Python' engine, and there happens to be an engine parameter in the pandas.read_table method, which is used to set which parsing engine to use, including 'C' and 'Python' These two options. Since the 'C' engine does not support it, we only need to set the engine to 'Python'.

    users = pd.read_table('ml-1m/users.dat', sep='::', header=None, names=unames, engine = 'python')
    
    rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    ratings = pd.read_table('ml-1m/ratings.dat', sep='::', header=None, names=rnames, engine = 'python')
    
    mnames = ['movie_id', 'title', 'genres']
    movies = pd.read_table('ml-1m/movies.dat', sep='::', header=None, names=mnames, engine = 'python')
    Copy after login

  • Use the pivot_table method to calculate the average score of each movie by gender on the aggregated data. The code given in the book is:

    mean_ratings = data.pivot_table('rating', rows='title', cols='gender', aggfunc='mean')
    Copy after login

    If you run it directly, an error will be reported and this code cannot be run:

    Traceback (most recent call last):
      File "F:/python/HelloWorld/DataAnalysisByPython-1.py", line 19, in <module>mean_ratings = data.pivot_table('rating', rows='title', cols='gender', aggfunc='mean')
    TypeError: pivot_table() got an unexpected keyword argument 'rows'
    Copy after login

    TypeError indicates that the 'rows' parameter here is not a keyword parameter available in the method. What is going on? I checked the pandas API usage documentation () on the official website and found that the keyword parameters in pandas.pivot_table have changed in version 0.20.2. In order to achieve the same effect, just replace rows with index. That's it, and there is no cols parameter, so use columns instead.

    mean_ratings = data.pivot_table('rating', index='title', columns='gender', aggfunc='mean')
    Copy after login

  • #In order to understand the favorite movies of female audiences, use the DataFrame method to sort column F in descending order , the sample code in the book is:

    top_female_ratings = mean_ratings.sort_index(by='F', ascending=False)
    Copy after login

    This only gives a Warning and will not interfere with the program:

    F:/python/HelloWorld/DataAnalysisByPython-1.py:32: FutureWarning: by argument to sort_index is deprecated, pls use .sort_values(by=...)
      top_female_ratings = mean_ratings.sort_index(by='F', ascending=False)
    Copy after login

    This means that the sort_index method for sorting may change in the language or library in the future, and it is recommended to use sort_values ​​instead. In the API usage documentation, the description of pandas.DataFrame.sort_index is "Sort object by labels (along an axis)", while the description of pandas.DataFrame.sort_values ​​is "Sort by the values ​​along either axis". Both can To achieve the same effect, then I will just replace it with sort_values. Sort_index will also be used in the following "Calculate score difference", and can also be replaced by sort_values.

    top_female_ratings = mean_ratings.sort_values(by='F', ascending=False)
    Copy after login

  • The last error is still related to sorting. After calculating the standard deviation of the score data in "Calculate Rating Difference", sort the Series in descending order according to the filtered value. The code in the book is:

    print(rating_std_by_title.order(ascending=False)[:10])
    Copy after login

    这里的错误是:

    Traceback (most recent call last):
      File "F:/python/HelloWorld/DataAnalysisByPython-1.py", line 47, in <module>print(rating_std_by_title.order(ascending=False)[:10])
      File "E:\Program Files\Python35\lib\site-packages\pandas\core\generic.py", line 2970, in __getattr__return object.__getattribute__(self, name)
    AttributeError: 'Series' object has no attribute 'order'
    Copy after login

    居然已经没有这个order的方法了,只好去API文档中找替代的方法用。有两个,sort_index和sort_values,这和DataFrame中的方法一样,为了保险起见,我选择使用sort_values:

    print(rating_std_by_title.sort_values(ascending=False)[:10]
    Copy after login

    得到的结果和数据展示的结果一样,可以放心使用。

第三方库不同版本间的差异还是挺明显的,建议是使用最新的版本,在使用时配合官网网站上的API使用文档,轻松解决各类问题~

The above is the detailed content of Python For Data Analysis learning path. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP and Python: Different Paradigms Explained PHP and Python: Different Paradigms Explained Apr 18, 2025 am 12:26 AM

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

Choosing Between PHP and Python: A Guide Choosing Between PHP and Python: A Guide Apr 18, 2025 am 12:24 AM

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

Python vs. JavaScript: The Learning Curve and Ease of Use Python vs. JavaScript: The Learning Curve and Ease of Use Apr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

PHP and Python: A Deep Dive into Their History PHP and Python: A Deep Dive into Their History Apr 18, 2025 am 12:25 AM

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

Can vs code run in Windows 8 Can vs code run in Windows 8 Apr 15, 2025 pm 07:24 PM

VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.

Can visual studio code be used in python Can visual studio code be used in python Apr 15, 2025 pm 08:18 PM

VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.

How to run python with notepad How to run python with notepad Apr 16, 2025 pm 07:33 PM

Running Python code in Notepad requires the Python executable and NppExec plug-in to be installed. After installing Python and adding PATH to it, configure the command "python" and the parameter "{CURRENT_DIRECTORY}{FILE_NAME}" in the NppExec plug-in to run Python code in Notepad through the shortcut key "F6".

Where to write code in vscode Where to write code in vscode Apr 15, 2025 pm 09:54 PM

Writing code in Visual Studio Code (VSCode) is simple and easy to use. Just install VSCode, create a project, select a language, create a file, write code, save and run it. The advantages of VSCode include cross-platform, free and open source, powerful features, rich extensions, and lightweight and fast.

See all articles