


Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration
Introduction
Building a Chrome extension that leverages AI technologies can significantly enhance user experience by adding powerful features directly into the browser.
In this tutorial, we'll cover the entire process of building a Chrome extension from scratch with AI/ML API, Deepgram Aura, and IndexDB, from setup to deployment. We'll start by setting up our development environment, including installing necessary tools and configuring our project. Then, we'll dive into the core components of our Chrome extension: manifest.json contains basic metadata about your extension, scripts.js responsible how our extension will behave, and styles.css to add some styling. We'll explore how integrate these technologies with Deepgram Aura through AI/ML API, and use IndexDB as temporary storage for generated audio file. Along the way, we'll discuss best practices for building Chrome extension, handling user queries, and saving data in the database. By the end of this tutorial, you'll have a solid foundation in building Chrome extension and be well-equipped to build any AI-powered Chrome extension.
Let's get a brief overview of technologies we are going to utilize.
AI/ML API
AI/ML API is a game-changing platform for developers and SaaS entrepreneurs looking to integrate cutting-edge AI capabilities into their products. AI/ML API offers a single point of access to over 200 state-of-the-art AI models, covering everything from NLP to computer vision.
Key Features for Developers:
- Extensive Model Library: 200 pre-trained models for rapid prototyping and deployment
- Customization Options: Fine-tune models to fit your specific use case
- Developer-Friendly Integration: RESTful APIs and SDKs for seamless incorporation into your stack
- Serverless Architecture: Focus on coding, not infrastructure management
Deep Dive into AI/ML API Documentation; https://docs.aimlapi.com/
Chrome Extension
Chrome extension is a small software program that modifies or enhances the functionality of the Google Chrome web browser. These extensions are built using web technologies such as HTML, CSS, and JavaScript, and are designed to serve a single purpose, making them easy to understand and use.
Browse Chrome Web Store; https://chromewebstore.google.com/
Deepgram Aura
Deepgram Aura is the first text-to-speech (TTS) AI model designed for real-time, conversational AI agents and applications. It delivers human-like voice quality with unparalleled speed and efficiency, making it a game-changer for building responsive, high-throughput voice AI experiences.
Learn more about technical details; https://aimlapi.com/models/aura
IndexDB
IndexedDB is a low-level API for client-side storage of significant amounts of structured data, including files/blobs. IndexedDB is a JavaScript-based object-oriented database.
Learn more about key concepts and usage; https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API
Getting Started with Chrome Extension
Building a Chrome extension involves understanding its structure, permissions, and how it interacts with web pages. We'll start by setting up our development environment and creating the foundational files required for our extension.
Setting Up Your Development Environment
Before we begin coding, ensure you have the following:
- Chrome Browser: The browser where we'll load and test our extension.
- Text Editor or IDE: Tools like Visual Studio Code, Sublime Text, or Atom are suitable for editing code. We'll use Visual Studio Code in this tutorial.
- Basic Knowledge of HTML, CSS, and JavaScript: Familiarity with these technologies is essential for building Chrome extensions.
Creating the Project Structure
A minimal Chrome extension requires at least three files:
- manifest.json: Contains metadata and configuration for the extension.
- scripts.js: Holds the JavaScript code that defines the extension's behavior.
- styles.css: Includes any styling for the extension's UI elements.
Let's create a directory for our project and set up these files.
Step 1: Create a New Directory
Open your terminal and run the following commands to create a new folder for your extension:
mkdir my-first-chrome-extension cd my-first-chrome-extension
Step 2: Create Essential Files
Within the new directory, create the necessary files:
touch manifest.json touch scripts.js touch styles.css
Understanding manifest.json
The manifest.json file is the heart of your Chrome extension. It tells the browser about your extension, what it does, and what permissions it needs. Let's delve into configuring this file properly.
{ "manifest_version": 3, "name": "Read Aloud", "version": "1.0", "description": "Read Aloud anything in any tab", "host_permissions": [ "*://*.aimlapi.com/*" ], "permissions": [ "activeTab" ], "content_scripts": [ { "matches": ["<all_urls>"], "js": ["scripts.js"], "css": ["styles.css"] } ], "icons": { "16": "icons/icon.png", "48": "icons/icon.png", "128": "icons/icon.png" } }
Essential Fields in manifest.json
At a minimum, manifest.json must include:
- manifest_version: Specifies the version of the manifest file format. Chrome currently uses version 3.
- name: The name of your extension, as it will appear to users.
- version: The version number of your extension, following semantic versioning.
Adding Metadata and Permissions
Beyond the essential fields, we'll add:
- description: A brief summary of what your extension does.
- host_permissions: Specifies which domains the extension can access. For our integration with the AI/ML API, we'll need access to *.aimlapi.com.
- permissions: Declares special permissions needed, such as accessing the active tab.
- content_scripts: Defines scripts and styles to inject into web pages.
- icons: Provides icons for the extension at various sizes.
Explanation of Key Fields
- manifest_version: Set to 3 to use the latest Chrome extension features.
- name: We'll name our extension "Read Aloud" reflecting its functionality.
- version: Starting with "1.0" indicates the initial release.
- description: "Read Aloud anything in any tab" informs users about the extension's purpose.
- host_permissions: The wildcard *://*.aimlapi.com/* allows the extension to communicate with any subdomain of aimlapi.com, necessary for API calls.
- permissions: "activeTab" allows the extension to interact with the content of the current tab.
- content_scripts: Specifies that scripts.js and styles.css should be injected into all web pages ("
"). - icons: References icon files for the extension (ensure you have appropriate icon files in an icons directory).
Generating icon
Open your browser and go to chatgpt.com. Now let's generate icon for our Chrome extension. We'll use one icon for different sizes (it's totally ok).
Enter the following prompt:
Generate black and white icon for my "Read Aloud" Chrome extension. This extension allows users to highlight the specific text in the website and listen to it. It's AI-powered Chrome extension. The background should be in white and solid.
Wait a couple of seconds until ChatGPT generates the icon (image). Click download and rename it to icon.png. Then put inside icons folder.
Finalizing manifest.json
With all fields properly defined, your manifest.json will enable browser to understand and correctly load your extension.
Developing scripts.js
The scripts.js file contains the logic that controls how your extension behaves. We'll outline the key functionalities your script needs to implement.
Variables and Initialization
Start by setting up necessary variables:
- API Key: You'll need an API key from the AI/ML API platform to authenticate your requests.
- Overlay Elements: Create DOM elements for the overlay and the "Read Aloud" button.
- Selection Variables: Store information about the user's selected text and its position.
mkdir my-first-chrome-extension cd my-first-chrome-extension
Handling Text Selection
Your extension should detect when a user selects text on a webpage:
- Event Listener: Attach a mouseup event listener to the document to detect when the user finishes selecting text.
mkdir my-first-chrome-extension cd my-first-chrome-extension
- Selection Detection: Check if the selected text is not empty and store it.
touch manifest.json touch scripts.js touch styles.css
- Overlay Positioning: Calculate where to place the overlay so it's near the selected text.
{ "manifest_version": 3, "name": "Read Aloud", "version": "1.0", "description": "Read Aloud anything in any tab", "host_permissions": [ "*://*.aimlapi.com/*" ], "permissions": [ "activeTab" ], "content_scripts": [ { "matches": ["<all_urls>"], "js": ["scripts.js"], "css": ["styles.css"] } ], "icons": { "16": "icons/icon.png", "48": "icons/icon.png", "128": "icons/icon.png" } }
- Overlay Management: Ensure that any existing overlay is removed before adding a new one.
// Set your AIML_API_KEY key const AIML_API_KEY = ''; // Replace with your AIML_API_KEY key // Create the overlay const overlay = document.createElement('div'); overlay.id = 'read-aloud-overlay'; // Create the "Read Aloud" button const askButton = document.createElement('button'); askButton.id = 'read-aloud-button'; askButton.innerText = 'Read Aloud'; // Append the button to the overlay overlay.appendChild(askButton); // Variables to store selected text and range let selectedText = ''; let selectedRange = null;
Full Code:
document.addEventListener('mouseup', (event) => { console.log('mouseup event: ', event); //...code }
Interacting with the AI/ML API
When the user clicks the "Read Aloud" button:
- Input Validation: Check if the selected text meets any length requirements.
const selection = window.getSelection(); const text = selection.toString().trim(); if (text !== '') { const range = selection.getRangeAt(0); const rect = range.getBoundingClientRect();
- Disable Button: Prevent multiple clicks by disabling the button during processing.
// Set the position of the overlay overlay.style.top = `${window.scrollY + rect.top - 50}px`; // Adjust as needed overlay.style.left = `${window.scrollX + rect.left + rect.width / 2 - 70}px`; // Adjust to center the overlay selectedText = text; selectedRange = range;
- API Request: Send a POST request to the AI/ML API with the selected text for text-to-speech conversion.
// Remove existing overlay if any const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } // Append the overlay to the document body document.body.appendChild(overlay); } else { // Remove overlay if no text is selected const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } }
- Error Handling: Handle any errors that occur during the API request gracefully.
// Function to handle text selection document.addEventListener('mouseup', (event) => { console.log('mouseup event: ', event); const selection = window.getSelection(); const text = selection.toString().trim(); if (text !== '') { const range = selection.getRangeAt(0); const rect = range.getBoundingClientRect(); // Set the position of the overlay overlay.style.top = `${window.scrollY + rect.top - 50}px`; // Adjust as needed overlay.style.left = `${window.scrollX + rect.left + rect.width / 2 - 70}px`; // Adjust to center the overlay selectedText = text; selectedRange = range; // Remove existing overlay if any const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } // Append the overlay to the document body document.body.appendChild(overlay); } else { // Remove overlay if no text is selected const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } } });
- Audio Playback: Once the audio is received, play it back to the user.
if (selectedText.length > 200) { // ...code }
Using IndexedDB for Storage
To manage audio files efficiently:
- Open Database: Create or open an IndexedDB database to store audio blobs.
// Disable the button askButton.disabled = true; askButton.innerText = 'Loading...';
- Save Audio: Store the audio blob in IndexedDB after receiving it from the API.
// Send the selected text to your AI/ML API for TTS const response = await fetch('https://api.aimlapi.com/tts', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${AIML_API_KEY}`, // Replace with your actual API key }, body: JSON.stringify({ model: '#g1_aura-asteria-en', // Replace with your specific model if needed text: selectedText }) });
- Retrieve Audio: Fetch the audio blob from IndexedDB for playback.
try { // ...code if (!response.ok) { throw new Error('API request failed'); } // ...code } catch (error) { console.error('Error:', error); askButton.disabled = false; askButton.innerText = 'Read Aloud'; alert('An error occurred while fetching the audio.'); }
- Delete Audio: Remove the audio blob from the database after playback to free up space.
// Play the audio audio.play();
Cleanup and User Experience
- Overlay Removal: Remove the overlay if the user clicks elsewhere or deselects the text.
// Open IndexedDB const db = await openDatabase(); const audioId = 'audio_' + Date.now(); // Generate a unique ID for the audio
- Re-enable Button: Ensure the "Read Aloud" button is re-enabled after processing is complete.
- User Feedback: Provide visual cues, like changing button text to "Loading…", to inform the user that processing is underway.
Full code:
// Save audio blob to IndexedDB await saveAudioToIndexedDB(db, audioId, audioBlob);
Implementing IndexedDB Functions
IndexedDB is a powerful client-side storage system that allows us to store large amounts of data, including files and blobs.
Functionality to Implement
You'll need to create four primary functions to interact with IndexedDB:
- openDatabase(): Opens a connection to the database and creates an object store if it doesn't exist.
mkdir my-first-chrome-extension cd my-first-chrome-extension
- saveAudioToIndexedDB(): Saves the audio blob with a unique ID.
touch manifest.json touch scripts.js touch styles.css
- getAudioFromIndexedDB(): Retrieves the audio blob using its ID.
{ "manifest_version": 3, "name": "Read Aloud", "version": "1.0", "description": "Read Aloud anything in any tab", "host_permissions": [ "*://*.aimlapi.com/*" ], "permissions": [ "activeTab" ], "content_scripts": [ { "matches": ["<all_urls>"], "js": ["scripts.js"], "css": ["styles.css"] } ], "icons": { "16": "icons/icon.png", "48": "icons/icon.png", "128": "icons/icon.png" } }
- deleteAudioFromIndexedDB(): Deletes the audio blob after playback.
// Set your AIML_API_KEY key const AIML_API_KEY = ''; // Replace with your AIML_API_KEY key // Create the overlay const overlay = document.createElement('div'); overlay.id = 'read-aloud-overlay'; // Create the "Read Aloud" button const askButton = document.createElement('button'); askButton.id = 'read-aloud-button'; askButton.innerText = 'Read Aloud'; // Append the button to the overlay overlay.appendChild(askButton); // Variables to store selected text and range let selectedText = ''; let selectedRange = null;
Key Concepts
- Transactions: All interactions with IndexedDB occur within a transaction. Ensure you specify the correct transaction mode (readonly or readwrite).
- Object Stores: Similar to tables in SQL databases, object stores hold the data. We'll use an object store named "audios".
- Error Handling: Always handle errors for database operations to prevent unexpected behavior.
Styling with styles.css
To provide a seamless user experience, your extension should have a clean and intuitive interface.
Styling the Overlay and Button
Define styles for:
- Overlay Positioning: Absolute positioning to place the overlay near the selected text.
document.addEventListener('mouseup', (event) => { console.log('mouseup event: ', event); //...code }
- Button Appearance: Styling the "Read Aloud" button to match the overlay and be easily clickable.
const selection = window.getSelection(); const text = selection.toString().trim(); if (text !== '') { const range = selection.getRangeAt(0); const rect = range.getBoundingClientRect();
- Hover Effects: Enhance user interaction with hover effects on the button.
// Set the position of the overlay overlay.style.top = `${window.scrollY + rect.top - 50}px`; // Adjust as needed overlay.style.left = `${window.scrollX + rect.left + rect.width / 2 - 70}px`; // Adjust to center the overlay selectedText = text; selectedRange = range;
- Disabled State: Visually indicate when the button is disabled.
// Remove existing overlay if any const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } // Append the overlay to the document body document.body.appendChild(overlay); } else { // Remove overlay if no text is selected const existingOverlay = document.getElementById('read-aloud-overlay'); if (existingOverlay) { existingOverlay.remove(); } }
Obtaining and Setting Your API Key
To interact with the AI/ML API and Deepgram Aura model, you'll need an API key.
Steps to Obtain Your API Key
- Visit the AI/ML API Platform: Navigate to aimlapi.com.
- Sign In: Click "Get API Key" and sign in using your Google account.
- Access the Dashboard: After signing in, you'll be redirected to your dashboard.
- Create an API Key: Go to the "Key Management" tab and click "Create API Key."
- Copy the API Key: Once generated, copy your API key.
Setting the API Key in Your Extension
- Security Note: Never hardcode your API key into your scripts if you plan to distribute your extension. Consider using environment variables or prompting the user to enter their API key.
mkdir my-first-chrome-extension cd my-first-chrome-extension
Now put your API Key:
touch manifest.json touch scripts.js touch styles.css
But it won't work instantly. Using .env in Chrome extensions requires other extra configurations. We'll talk about this in upcoming tutorials.
- For Testing: In your scripts.js, assign your API key to the variable handling authentication for API requests.
{ "manifest_version": 3, "name": "Read Aloud", "version": "1.0", "description": "Read Aloud anything in any tab", "host_permissions": [ "*://*.aimlapi.com/*" ], "permissions": [ "activeTab" ], "content_scripts": [ { "matches": ["<all_urls>"], "js": ["scripts.js"], "css": ["styles.css"] } ], "icons": { "16": "icons/icon.png", "48": "icons/icon.png", "128": "icons/icon.png" } }
Running and Testing the Extension
With all components in place, it's time to load your extension into Chrome browser and see it in action.
Loading the Extension
- Open Extensions Page: In Chrome, navigate to chrome://extensions/.
Enable Developer Mode: Toggle the "Developer mode" switch in the top right corner.
- Load Unpacked Extension: Click "Load unpacked" and select your my-first-chrome-extension folder. (p.s. in my case it's aimlapi-tutorial-one).
- Verify Installation: The extension should now appear in the list with its name and icon.
Testing Functionality
- Navigate to a Webpage: Open a webpage with textual content, such as an article or blog post.
- Select Text: Highlight a paragraph or sentence.
- Interact with the Overlay: The "Loading…" overlay should appear above the selected text. Wait a couple of seconds while initiating the text-to-speech process.
- Listen: After a brief processing period, you should hear the text read aloud by the AI voice.
Troubleshooting Tips
- Overlay Doesn't Appear: Check if content_scripts are correctly specified in manifest.json.
- No Audio Playback: Verify your API key is correctly set and that API requests are successful.
- Console Errors: Use the browser's developer tools to inspect any JavaScript errors or network issues.
Project Summary
In this tutorial, we've:
- Set Up the Development Environment: Created the necessary project structure and files for a Chrome extension.
- Configured manifest.json: Defined essential metadata and permissions, understanding the importance of each field.
- Developed scripts.js: Outlined the logic for handling text selection, interacting with the AI/ML API, and managing audio playback.
- Implemented IndexedDB Integration: Learned how to use IndexedDB for storing and retrieving audio files locally.
- Styled the Extension with styles.css: Applied CSS to enhance the user interface and improve user experience.
- Obtained and Set Up an API Key: Acquired an API key from the AI/ML API platform and integrated it securely into our extension.
- Loaded and Tested the Extension: Deployed the extension in Chrome and validated its functionality on live web pages.
- Discussed Best Practices: Emphasized the importance of security, user experience, and error handling in extension development.
Next Steps
With a solid foundation, you can enhance your extension further:
- Add Customization Options: Allow users to choose different voices or languages.
- Improve Error Handling: Provide user-friendly messages and fallback options if the API is unavailable.
- Optimize Performance: Implement caching strategies or optimize API requests to reduce latency.
- Publish Your Extension: Share your creation with others by publishing it on the Chrome Web Store.
Conclusion
Congratulations on building a Chrome extension that integrates advanced AI capabilities! This project showcases how combining web technologies with powerful APIs can create engaging and accessible user experiences. You're now equipped with the knowledge to develop and expand upon this extension or create entirely new ones that leverage AI/ML APIs.
Full implementation available on Github; https://github.com/TechWithAbee/Building-a-Chrome-Extension-from-Scratch-with-AI-ML-API-Deepgram-Aura-and-IndexDB-Integration
Should you have any questions or need further assistance, don't hesitate to reach out via email at abdibrokhim@gmail.com.
The above is the detailed content of Building a Chrome Extension from Scratch with AI/ML API, Deepgram Aura, and IndexedDB Integration. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Frequently Asked Questions and Solutions for Front-end Thermal Paper Ticket Printing In Front-end Development, Ticket Printing is a common requirement. However, many developers are implementing...

JavaScript is the cornerstone of modern web development, and its main functions include event-driven programming, dynamic content generation and asynchronous programming. 1) Event-driven programming allows web pages to change dynamically according to user operations. 2) Dynamic content generation allows page content to be adjusted according to conditions. 3) Asynchronous programming ensures that the user interface is not blocked. JavaScript is widely used in web interaction, single-page application and server-side development, greatly improving the flexibility of user experience and cross-platform development.

There is no absolute salary for Python and JavaScript developers, depending on skills and industry needs. 1. Python may be paid more in data science and machine learning. 2. JavaScript has great demand in front-end and full-stack development, and its salary is also considerable. 3. Influencing factors include experience, geographical location, company size and specific skills.

Discussion on the realization of parallax scrolling and element animation effects in this article will explore how to achieve similar to Shiseido official website (https://www.shiseido.co.jp/sb/wonderland/)...

The latest trends in JavaScript include the rise of TypeScript, the popularity of modern frameworks and libraries, and the application of WebAssembly. Future prospects cover more powerful type systems, the development of server-side JavaScript, the expansion of artificial intelligence and machine learning, and the potential of IoT and edge computing.

Learning JavaScript is not difficult, but it is challenging. 1) Understand basic concepts such as variables, data types, functions, etc. 2) Master asynchronous programming and implement it through event loops. 3) Use DOM operations and Promise to handle asynchronous requests. 4) Avoid common mistakes and use debugging techniques. 5) Optimize performance and follow best practices.

How to merge array elements with the same ID into one object in JavaScript? When processing data, we often encounter the need to have the same ID...

Explore the implementation of panel drag and drop adjustment function similar to VSCode in the front-end. In front-end development, how to implement VSCode similar to VSCode...
