Home Backend Development Python Tutorial Building a video insights generator using Gemini Flash

Building a video insights generator using Gemini Flash

Nov 26, 2024 pm 08:24 PM

Video understanding or video insights are crucial across various industries and applications due to their multifaceted benefits. They enhance content analysis and management by automatically generating metadata, categorizing content, and making videos more searchable. Moreover, video insights provide critical data that drive decision-making, enhance user experiences, and improve operational efficiencies across diverse sectors.

Google’s Gemini 1.5 model brings significant advancements to this field. Beyond its impressive improvements in language processing, this model can handle an enormous input context of up to 1 million tokens. To further its capabilities, Gemini 1.5 is trained as a multimodal model, natively processing text, images, audio, and video. This powerful combination of varied input types and extensive context size opens up new possibilities for processing long videos effectively.

In this article, we will dive into how Gemini 1.5 can be leveraged for generating valuable video insights, transforming the way we understand and utilize video content across different domains.

Getting Started

Table of contents

  • What is Gemini 1.5
  • Prerequisites
  • Installing dependencies
  • Setting up the Gemini API key
  • Setting up the environment variables
  • Importing the libraries
  • Initializing the project
  • Saving uploaded files
  • Generating insights from videos
  • Upload a video to the Files API
  • Get File
  • Response Generation
  • Delete File
  • Combining the stages
  • Creating the interface
  • Creating the streamlit app

What is Gemini 1.5

Google’s Gemini 1.5 represents a significant leap forward in AI performance and efficiency. Building upon extensive research and engineering innovations, this model features a new Mixture-of-Experts (MoE) architecture, enhancing both training and serving efficiency. Available in public preview, Gemini 1.5 Pro and 1.5 Flash offer an impressive 1 million token context window through Google AI Studio and Vertex AI.

Building a video insights generator using Gemini Flash

Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra (blog.google)
The 1.5 Flash model, the newest addition to the Gemini family, is the fastest and most optimized for high-volume, high-frequency tasks. It is designed for cost-efficiency and excels in applications such as summarization, chat, image and video captioning, and extracting data from extensive documents and tables. With these advancements, Gemini 1.5 sets a new standard for performance and versatility in AI models.

Prerequisites

  • Python 3.9 (https://www.python.org/downloads)
  • google-generativeai
  • streamlit

Installing dependencies

  • Create and activate a virtual environment by executing the following command.
python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows
Copy after login
Copy after login
  • Install google-generativeai, streamlit, python-dotenv library using pip. Note that generativeai requires python 3.9 version to work.
pip install google-generativeai streamlit python-dotenv
Copy after login
Copy after login

Setting up the Gemini API key

To access the Gemini API and begin working with its functionalities, you can acquire a free Google API Key by registering with Google AI Studio. Google AI Studio, offered by Google, provides a user-friendly, visual-based interface for interacting with the Gemini API. Within Google AI Studio, you can seamlessly engage with Generative Models through its intuitive UI, and if desired, generate an API Token for enhanced control and customization.

Follow the steps to generate a Gemini API key:

  • To initiate the process, you can either click the link (https://aistudio.google.com/app) to be redirected to Google AI Studio or perform a quick search on Google to locate it.
  • Accept the terms of service and click on continue.
  • Click on Get API key link from the sidebar and Create API key in new project button to generate the key.
  • Copy the generated API key.

Building a video insights generator using Gemini Flash

Setting up the environment variables

Begin by creating a new folder for your project. Choose a name that reflects the purpose of your project.
Inside your new project folder, create a file named .env. This file will store your environment variables, including your Gemini API key.
Open the .env file and add the following code to specify your Gemini API key:

GOOGLE_API_KEY=AIzaSy......
Copy after login
Copy after login

Importing the libraries

To get started with your project and ensure you have all the necessary tools, you need to import several key libraries as follows.

import os
import time
import google.generativeai as genai
import streamlit as st
from dotenv import load_dotenv
Copy after login
Copy after login
  • google.generativeai as genai: Imports the Google Generative AI library for interacting with the Gemini API.
  • streamlit as st: Imports Streamlit for creating web apps.
  • from dotenv import load_dotenv: Loads environment variables from a .env file.

Initializing the project

To set up your project, you need to configure the API key and create a directory for temporary file storage for uploaded files.

Define the media folder and configure the Gemini API key by initializing the necessary settings. Add the following code to your script:

python -m venv venv
source venv/bin/activate #for ubuntu
venv/Scripts/activate #for windows
Copy after login
Copy after login

Saving uploaded files

To store uploaded files in the media folder and return their paths, define a method called save_uploaded_file and add the following code to it.

pip install google-generativeai streamlit python-dotenv
Copy after login
Copy after login

Generating insights from videos

Generating insights from videos involves several crucial stages, including uploading, processing, and response generation.

1. Upload a video to the Files API

The Gemini API directly accepts video file formats. The File API supports files up to 2GB in size and allows storage of up to 20GB per project. Uploaded files remain available for 2 days and cannot be downloaded from the API.

GOOGLE_API_KEY=AIzaSy......
Copy after login
Copy after login

2. Get File

After uploading a file, you can verify that the API has successfully received it by using the files.get method. This method allows you to view the files uploaded to the File API that are associated with the Cloud project linked to your API key. Only the file name and the URI are unique identifiers.

import os
import time
import google.generativeai as genai
import streamlit as st
from dotenv import load_dotenv
Copy after login
Copy after login

3. Response Generation

After the video has been uploaded, you can make GenerateContent requests that reference the File API URI.

MEDIA_FOLDER = 'medias'

def __init__():
    # Create the media directory if it doesn't exist
    if not os.path.exists(MEDIA_FOLDER):
        os.makedirs(MEDIA_FOLDER)

    # Load environment variables from the .env file
    load_dotenv()

    # Retrieve the API key from the environment variables
    api_key = os.getenv("GEMINI_API_KEY")

    # Configure the Gemini API with your API key
    genai.configure(api_key=api_key)
Copy after login

4. Delete File

Files are automatically deleted after 2 days or you can manually delete them using files.delete().

def save_uploaded_file(uploaded_file):
    """Save the uploaded file to the media folder and return the file path."""
    file_path = os.path.join(MEDIA_FOLDER, uploaded_file.name)
    with open(file_path, 'wb') as f:
        f.write(uploaded_file.read())
    return file_path
Copy after login

5. Combining the stages

Create a method called get_insights and add the following code to it. Instead print(), use streamlit write() method to see the messages on the website.

video_file = genai.upload_file(path=video_path)
Copy after login

Creating the interface

To streamline the process of uploading videos and generating insights within a Streamlit app, you can create a method named app. This method will provide an upload button, display the uploaded video, and generate insights from it.

import time

while video_file.state.name == "PROCESSING":
    print('Waiting for video to be processed.')
    time.sleep(10)
    video_file = genai.get_file(video_file.name)

if video_file.state.name == "FAILED":
  raise ValueError(video_file.state.name)
Copy after login

Creating the streamlit app

To create a complete and functional Streamlit application that allows users to upload videos and generate insights using the Gemini 1.5 Flash model, combine all the components into a single file named app.py.

Here is the final code:

# Create the prompt.
prompt = "Describe the video. Provides the insights from the video."

# Set the model to Gemini 1.5 Flash.
model = genai.GenerativeModel(model_name="models/gemini-1.5-flash")

# Make the LLM request.
print("Making LLM inference request...")
response = model.generate_content([prompt, video_file],
                                  request_options={"timeout": 600})
print(response.text)
Copy after login

Running the application

Execute the following code to run the application.

genai.delete_file(video_file.name)
Copy after login

You can open the link provided in the console to see the output.

Building a video insights generator using Gemini Flash

Thanks for reading this article !!

If you enjoyed this article, please click on the heart button ♥ and share to help others find it!

The full source code for this tutorial can be found here,

GitHub - codemaker2015/video-insights-generator

The above is the detailed content of Building a video insights generator using Gemini Flash. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1653
14
PHP Tutorial
1251
29
C# Tutorial
1224
24
Python vs. C  : Applications and Use Cases Compared Python vs. C : Applications and Use Cases Compared Apr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

How Much Python Can You Learn in 2 Hours? How Much Python Can You Learn in 2 Hours? Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

Python: Games, GUIs, and More Python: Games, GUIs, and More Apr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

The 2-Hour Python Plan: A Realistic Approach The 2-Hour Python Plan: A Realistic Approach Apr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python vs. C  : Learning Curves and Ease of Use Python vs. C : Learning Curves and Ease of Use Apr 19, 2025 am 12:20 AM

Python is easier to learn and use, while C is more powerful but complex. 1. Python syntax is concise and suitable for beginners. Dynamic typing and automatic memory management make it easy to use, but may cause runtime errors. 2.C provides low-level control and advanced features, suitable for high-performance applications, but has a high learning threshold and requires manual memory and type safety management.

Python: Exploring Its Primary Applications Python: Exploring Its Primary Applications Apr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

Python and Time: Making the Most of Your Study Time Python and Time: Making the Most of Your Study Time Apr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: The Power of Versatile Programming Python: The Power of Versatile Programming Apr 17, 2025 am 12:09 AM

Python is highly favored for its simplicity and power, suitable for all needs from beginners to advanced developers. Its versatility is reflected in: 1) Easy to learn and use, simple syntax; 2) Rich libraries and frameworks, such as NumPy, Pandas, etc.; 3) Cross-platform support, which can be run on a variety of operating systems; 4) Suitable for scripting and automation tasks to improve work efficiency.

See all articles