Home Backend Development Python Tutorial Why use Python for data analysis (when you have Excel or Google Sheets)

Why use Python for data analysis (when you have Excel or Google Sheets)

Nov 17, 2024 pm 04:58 PM

TL;DR: While spreadsheets are perfect for many data tasks, Python becomes essential when you need to handle large datasets, create advanced visualizations, automate workflows, or use machine learning models. The key is knowing when to leverage each tool's strengths for your specific data analysis needs.

While Python is often considered essential for data work, spreadsheets remain the most practical tool for many analysts' daily needs – and that's perfectly fine. But knowing when to graduate beyond them is crucial for advancing your data capabilities.

If you look at any data analyst or data scientist curriculum, you'll find the same core tools: spreadsheets, SQL, Python, and various Business Intelligence (BI) solutions. Yet when I talk with data practitioners and leaders, a common question emerges: "Why switch to Python when spreadsheets handle most of my needs?"

As someone who co-founded a company built on SQL, Python, and AI, my stance might surprise you: if a spreadsheet can do the job, use it. These tools have endured since the 1970s for good reason – they're intuitive, flexible, and excellent for explaining your work to others.

But they have their limits.

When you start conducting more ad hoc analysis or exploratory data analysis or dealing with more data in the enterprise, you’ll quickly run into a few issues:

  • They struggle with large datasets
  • They offer limited visualization and dashboarding capabilities
  • They make it difficult to build automated data pipelines
  • They lack advanced statistical and machine learning capabilities
  • They don't support version control, making it hard to follow engineering best practices Below, I’ll break down why spreadsheets remain invaluable for many tasks, and when Python becomes the necessary next step in your data journey.

Why use Excel or Google Sheets?

At their core, spreadsheets are powerful because they put you in complete control of your data workspace. Like having your own custom-built dashboard, they let you instantly manipulate, visualize, and analyze data exactly how you want.

There are two main reasons that folks gravitate toward spreadsheets:

1. Spreadsheets are flexible and personalized

Let’s start with the most obvious reasons why data practitioners, regardless of skill level, love spreadsheets: They’re incredibly flexible and customizable.

In a spreadsheet, you’re working in your own environments, and you have full control over it. You want to highlight specific rows and create a quick chart? Easy. You want to add some conditional formatting to highlight a specific pattern? No problem. You even want to add a row or column to add some inputs? Go right ahead.

Why use Python for data analysis (when you have Excel or Google Sheets)

As a user, you’re in full control, even in shared workspace environments like Google Sheets. This is really powerful, especially in contrast with traditional BI solutions where you can’t edit the data directly in line the same way, nor can you call out specific pieces of data without having to slice the data into smaller subsets, which can quickly get out of hand. As a matter of fact, some new BI solutions such as Sigma are capitalizing on this idea with a spreadsheet-like interface being their main pitch.

All in all, there’s something deeply intuitive about the user experience of a spreadsheet. We learn math from a young age, and spreadsheets offer a nicely structured way of looking at data and understanding how all the numbers add up.

2. Spreadsheets are reactive & explainable

Reactivity in spreadsheets means that when you change one number, everything connected to it updates automatically. This instant feedback makes them perfect for understanding how different pieces of data affect each other.

For example, let’s say you have cells that are connected like:

C1 = A1 B2

Reactivity means that when you update A1 or B2, C1 is automatically updated. There’s effectively a DAG which tracks the dependencies, or lineage, between all cells. This is an incredibly powerful concept, because, unlike with code, you don’t have to “run” the spreadsheet. You can simply create a model of the world and adjust inputs and see how the results react to that change.

This reactivity is also in very large part what contributes to the ease of understanding of a spreadsheet. I can view an easy-understood formula, click on it to highlight the dependent cells, and I adjust the dependent cells to understand how the number I’m looking at reacts and relates to it.

Why use Python for data analysis (when you have Excel or Google Sheets)

As you can see in the image above, if you want to know what numbers contribute most to Net Income Before Tax, you can simply click on the cell, view the dependent cells, and immediately understand what variables Net Income Before Taxes.

For these reasons, if you’re able to do your work in a spreadsheet, it’s probably a good idea.

Why use Python

While spreadsheets excel at many tasks, Python opens up a whole new world of possibilities for data work. From handling massive datasets to creating complex visualizations and automating repetitive tasks, there are five reasons why Python is a powerful tool for your data workflows.

1. Python easily tackles large amounts of data

The first and most obvious reason to use Python is illustrated when dealing with large datasets. Excel can support approximately 1M rows by 17k columns and Google Sheets can support approximately 10M cells. This may sound like a lot, and in many cases this is plenty, but chances are, you’ll quickly run up against this limit. In contrast, Python on a powerful machine can support many orders of magnitude more data. This is especially true if you leverage new technologies like polars and DuckDB.

We may see an increase in limits with spreadsheets over time, but Python (especially in tandem with SQL) will always be able to handle more.

2. Python supports advanced & customized visualizations

Spreadsheets can offer some pretty powerful visuals, but it’s only a small fraction of what you can do with Python. I’m a big believer that bar charts, line charts, and maps cover the vast majority of cases, but telling a story with data often requires breaking from the mundane and creating an engaging canvas.

For example, I love a good Sankey diagram to tell the story of how data flows from point A to point B. Or perhaps you want to create a radar plot to compare attributes from different categories.

These can be incredibly easy to build in Python with libraries like plotly, seaborn or bokeh.

To give you an example, let’s go back to our Superdope example from previous posts and say you want to compare product performance on a sunburst plot like the one below:

Why use Python for data analysis (when you have Excel or Google Sheets)

Generating this chart with code using a library such as plotly is rather straightforward:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

import plotly.express as px

 

# Create the sunburst plot

fig = px.sunburst(

    df_sunburst,

    path=['Region', 'Category', 'Product'],

    values='Sales',

    color='Region',

    title='Sales Distribution by Region, Category, and Product',

    width=800,

    height=450

)

 

# Update layout

fig.update_layout(

    margin=dict(t=50, l=0, r=0, b=0)

)

 

# Show the plot

fig.show()

And this code can be generated by AI in about 3 seconds. Building something similar in a spreadsheet would require a lot more time and effort.

Copy after login

3. Python helps you automate data pipelines & cleaning

When working with data, you’ll oftentimes find yourself doing repetitive data transformation tasks. Say, for example, you work in an industry where your clients regularly send you CSV or Excel files and you have to clean up and format the data, and turn it into a report or prepare it for another step. This is a perfect task for Python. If you’re managing your own server and are resourceful, you can write a script and schedule it to run using a Cron job, or if you would like to go with managed solutions that work out of the box and handle orchestration and more complex jobs, you can use a solution like Dagster or Airflow.

As a general rule, these days it’s usually best to avoid home-grown Cron jobs unless you know exactly what you’re doing. Ensuring that these remain up and running, have proper logging and monitoring and are orchestrated properly can quickly turn into a lot of work.

Note: If you’re simply looking for a lightweight and quick way to build data pipelines, Fabi.ai may be a good option for you. We can help you easily set up a data wrangling and cleaning pipeline from and to any source, including CSV files and Excel files, in a matter of minutes.

4. Python supports for complex data analysis & machine learning

You can do a lot in a spreadsheet, but building and using more advanced statistical and machine learning models is not generally one of them. If you’re simply doing a univariate data analysis and some simple calculations like distributions, averages etc. a spreadsheet should be able to get the job done. But if you want to venture into more advanced multivariate analysis, or perhaps even clustering, forecasting and churn prediction, Python is equipped with a rich suite of tools that work out of the box.

Here are a few examples of the types of analysis you may want to do along with the corresponding Python package:

  • Buyer or customer grouping using clustering: sklean.cluster (ex. Kmeans)
  • Sales or marketing pipeline time series forecasting: Prophet or statsmodels (ex. ARIMA)
  • Customer churn prediction: scikit-survival These are all advanced machine learning and statistical models implemented by some of the best engineers and researchers in the world, available for free and immediately ready to use in Python.

5. Python helps you follow code versioning & engineering best practices

Finally, in a lot of cases, it’s good practice to ensure that your work is traceable and reproducible.

In practice, what this means is that when someone else (or perhaps yourself at a later date), looks at your analysis, this individual should be able to understand:

  • Where the data came from
  • How the data was manipulated and how you got to your results
  • Be able to reproduce the same results independently So, if working in a spreadsheet means exporting data and manipulating it somewhere that is disconnected from the original source, it can make the results very hard to reproduce. This also means the steps you take during your analysis aren’t version controlled. As you conduct your analysis and make adjustments, the exact steps may not get recorded. This can set you up for a tough situation that we’ve all been in at least once: You’ve built a beautiful analysis in a spreadsheet, shared it with some co-workers, went back at a later date and noticed that the data was different. You may go through the change history to understand what happened, to no avail.

Using a version control system like Github or Gitlab and committing changes to the underlying code as you conduct your analysis can help you avoid this type of situation.

Verdict: For large data sets; advanced analysis and visualization; and automation, Python wins?

If you’re looking to do complex ad hoc or exploratory data analysis, use advanced machine learning techniques, or build complex visualizations, Python is one of the best and most powerful tools for the job.

Yes, spreadsheets are incredibly popular for good reason. If you’re dealing with relatively small datasets, in a one-off analysis that doesn’t need to be automated, Excel or Google Sheets are great tools.

However, Python performs exceptionally well when dealing with large datasets which would be an issue for Excel or Google Sheets. Python is also commonly used to automate data pipelines, especially if it requires some form of data transformation and cleaning.

Like most things, there’s a time and place to use certain tools to make the most of their strengths. We built Fabi.ai to act as the bridge between all the tools, so you can have the best of both worlds.

We make it incredibly easy to connect to any data source, including spreadsheets and files and build lightweight data pipelines. Our built-in SQL and Python interface, augmented with AI, makes it incredibly easy to leverage advanced machine learning and statistical models, regardless of prior experience. If you’re interested in checking us out, you can get started for free today in less than 2 minutes.

The above is the detailed content of Why use Python for data analysis (when you have Excel or Google Sheets). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1657
14
PHP Tutorial
1257
29
C# Tutorial
1229
24
Python vs. C  : Applications and Use Cases Compared Python vs. C : Applications and Use Cases Compared Apr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

Python: Games, GUIs, and More Python: Games, GUIs, and More Apr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

How Much Python Can You Learn in 2 Hours? How Much Python Can You Learn in 2 Hours? Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

The 2-Hour Python Plan: A Realistic Approach The 2-Hour Python Plan: A Realistic Approach Apr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python vs. C  : Learning Curves and Ease of Use Python vs. C : Learning Curves and Ease of Use Apr 19, 2025 am 12:20 AM

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

Python and Time: Making the Most of Your Study Time Python and Time: Making the Most of Your Study Time Apr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: Exploring Its Primary Applications Python: Exploring Its Primary Applications Apr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

Python: Automation, Scripting, and Task Management Python: Automation, Scripting, and Task Management Apr 16, 2025 am 12:14 AM

Python excels in automation, scripting, and task management. 1) Automation: File backup is realized through standard libraries such as os and shutil. 2) Script writing: Use the psutil library to monitor system resources. 3) Task management: Use the schedule library to schedule tasks. Python's ease of use and rich library support makes it the preferred tool in these areas.

See all articles