Home Backend Development Python Tutorial FireDucks: Get performance beyond pandas with zero learning cost!

FireDucks: Get performance beyond pandas with zero learning cost!

Oct 03, 2024 am 06:23 AM

Pandas is one of the most popular libraries, when I was looking for an easier way to speed up its performance, I discovered FireDucks and became interested in it!

Comparison with pandas: Why FireDucks?

A Pandas program might encounter a serious performance issue depending on how it has been written. However, being a data scientist, I want to spend more and more time analyzing data rather than improving my code performance. So, it would be great if it could do something like interchange the order of processes and speed up the program performance automatically. For example, Process A =>Process B will be slower, so we will replace it as Process B =>Process A. (Of course, the result is guaranteed to be the same.) It is said that data scientists spend about 45% of their time preparing the data, and when I was thinking of doing something to speed-up the process, I came across a module called FireDucks.

From the FireDucks documentation, it seems to be supported for Linux only platforms. Since I use Windows on my main machine, I would like to try it from WSL2 (Windows Subsystem for Linux), an environment that can run Linux on Windows.

The environment I tried is as follows.

  • OS Microsoft Windows 11 Pro
  • Version 10.0.22631 Build 22631
  • System model Z690 Pro RS
  • System Type x64-based
  • PC Processor 12th Gen Intel(R) Core(TM) i3–12100, 3300 Mhz, 4 Cores, 8 Logical Processors
  • Baseboard Product Z690 Pro RS
  • Platform Role Desktop
  • Installed Physical Memory (RAM)64.0 GB

Installing and Configuring FireDucks

Install WSL

WSL was installed with the help of the following Microsoft documentation; the Linux distribution is Ubuntu 22.04.1 LTS.

Install FireDucks

Then actually install FireDucks. It is very easy to install, though.
pip install fireducks

It will take a few minutes to install FireDucks (along with pyarrow, pandas and other libraries).

I tried executing below code, the loading speed was so fast, pandas took 4 sec and fireDucks took only 74.5 ns.

# 1. analysis based on time period and creative duration
# convert timestamp to date/time object
df['timestamp_converted'] = pd.to_datetime(df['timestamp'], unit='s ')

# define time period 
def get_part_of_day(hour): 
  if 5 <= hour < 12: 
    return 'morning'
  elif 12 <= hour < 17:
    return 'afternoon'
  else: 
    return 'evening'

# Add time period in new column 
df['part_of_day'] = df['timestamp_converted'].apply(lambda x: get_part_of_day(x.hour))

# Calculate average creative duration by time period 
df_ duration_by_time = df.groupby('part_of_day')['creative_duration'].mean() print(df_duration_by_time) 

# 2. campaign performance per different advertiser 
df_ campaigns_per_advertiser = df.groupby('advertiser_id')['campaign_id'].nunique() 
df_creatives_per_advertiser = df.groupby('advertiser_id ')['creatives_id'].nunique() 
print(df_campaigns_per_advertiser) 
print(df_creatives_per_advertiser)

# 3. language and website association 
df_common_website_ per_language = df.groupby('placement_language')['website_id'].apply(lambda x: x.mode()[0]) 
print(df_common_website_per_language) 

# 4. Analyze referrer information 
def extract_domain(referrer): 
  # if referrer is a float (e.g. NaN), return empty string 
  if isinstance(referrer, float): 
    return '' 
  # otherwise, extract domain name
  return referrer.split('/')[0] 

df['referrer_domain'] = df['referrer_deep_three'].apply(extract_domain) 
df_referrer_distribution = df['referrer_domain'].value_counts() 
print(df_referrer_distribution)


Copy after login

All these data preprocessing and analysis took around 8 seconds in pandas, whereas it could be completed within 4 seconds when using FireDucks. Almost 2 times speed up could be achieved.

Improved performance

One of the most stressful things about using pandas is waiting when loading large data sets, and then I have to wait for complex operation like groupby. On the other hand, since FireDucks does lazy evaluation, loading itself takes no time at all, so processing is done where it is needed, and I felt it was very significant with a great reduction in total waiting time.

As for other performance, it seems that up to 16 times faster compared to pandas has been achieved, as officially announced by the organization. (I will compare the performance with various competing libraries next time.)

FireDucks: Get performance beyond pandas with zero learning cost!

zero learning cost

The ability to follow the exact pandas notation without having to think about anything is a huge advantage. Apart from FireDucks, there are other data frame acceleration libraries, but they are too expensive to learn and too easy to forget.

For example, if you want to add columns with polars, you have to write something like this.

# pandas df["new_col"] = df["A"] + 1
# polars 
df = df.with_columns((pl.col("A") + 1).alias("new_col"))
Copy after login

Nearly no need to change an existing code

I have several ETLs and other projects that use pandas, and it would be nice to see a performance improvement just by installing and replacing the import statement with FireDucks.

If you wanted to add it further, feel free to comment down below.

The above is the detailed content of FireDucks: Get performance beyond pandas with zero learning cost!. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1666
14
PHP Tutorial
1273
29
C# Tutorial
1255
24
Python: Games, GUIs, and More Python: Games, GUIs, and More Apr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python vs. C  : Learning Curves and Ease of Use Python vs. C : Learning Curves and Ease of Use Apr 19, 2025 am 12:20 AM

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

Python and Time: Making the Most of Your Study Time Python and Time: Making the Most of Your Study Time Apr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python vs. C  : Exploring Performance and Efficiency Python vs. C : Exploring Performance and Efficiency Apr 18, 2025 am 12:20 AM

Python is better than C in development efficiency, but C is higher in execution performance. 1. Python's concise syntax and rich libraries improve development efficiency. 2.C's compilation-type characteristics and hardware control improve execution performance. When making a choice, you need to weigh the development speed and execution efficiency based on project needs.

Which is part of the Python standard library: lists or arrays? Which is part of the Python standard library: lists or arrays? Apr 27, 2025 am 12:03 AM

Pythonlistsarepartofthestandardlibrary,whilearraysarenot.Listsarebuilt-in,versatile,andusedforstoringcollections,whereasarraysareprovidedbythearraymoduleandlesscommonlyusedduetolimitedfunctionality.

Python: Automation, Scripting, and Task Management Python: Automation, Scripting, and Task Management Apr 16, 2025 am 12:14 AM

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

Learning Python: Is 2 Hours of Daily Study Sufficient? Learning Python: Is 2 Hours of Daily Study Sufficient? Apr 18, 2025 am 12:22 AM

Is it enough to learn Python for two hours a day? It depends on your goals and learning methods. 1) Develop a clear learning plan, 2) Select appropriate learning resources and methods, 3) Practice and review and consolidate hands-on practice and review and consolidate, and you can gradually master the basic knowledge and advanced functions of Python during this period.

Python vs. C  : Understanding the Key Differences Python vs. C : Understanding the Key Differences Apr 21, 2025 am 12:18 AM

Python and C each have their own advantages, and the choice should be based on project requirements. 1) Python is suitable for rapid development and data processing due to its concise syntax and dynamic typing. 2)C is suitable for high performance and system programming due to its static typing and manual memory management.

See all articles