Fine-Tuning GPT-4o Mini for Financial Sentiment Analysis
Sentiment analysis in finance is a powerful tool for understanding market trends and investor behavior. However, general sentiment analysis models often fall short when applied to financial texts due to their complexity and nuanced nature. This project proposes a solution by fine-tuning GPT-4o mini, a lightweight language model. By utilizing the TRC2 dataset, a collection of Reuters financial news articles labeled with sentiment classes by the expert model FinBERT, we aim to enhance GPT-4o mini’s ability to capture financial sentiment nuances.
This project provides an efficient and scalable approach to financial sentiment analysis, opening the door for more nuanced sentiment-based analysis in finance. By the end, we demonstrate that GPT-4o mini, when fine-tuned with domain-specific data, can serve as a viable alternative to more complex models like FinBERT in financial contexts.
Learning Outcomes
- Understand the process of fine-tuning GPT-4o mini for financial sentiment analysis using domain-specific data.
- Learn how to preprocess and format financial text data for model training in a structured and scalable manner.
- Gain insights into the application of sentiment analysis for financial texts and its impact on market trends.
- Discover how to leverage expert-labeled datasets like FinBERT for improving model performance in financial sentiment analysis.
- Explore the practical deployment of a fine-tuned GPT-4o mini model in real-world financial applications such as market analysis and automated news sentiment tracking.
This article was published as a part of theData Science Blogathon.
Table of contents
- Exploring the Dataset: Essential Data for Sentiment Analysis
- Research Methodology: Steps to Analyze Financial Sentiment
- Fine-Tuning GPT-4o Mini for Financial Sentiment Analysis
- Conclusion
- Frequently Asked Questions
Exploring the Dataset: Essential Data for Sentiment Analysis
For this project, we use the TRC2 (TREC Reuters Corpus, Volume 2) dataset, a collection of financial news articles curated by Reuters and made available through the National Institute of Standards and Technology (NIST). The TRC2 dataset includes a comprehensive selection of Reuters financial news articles, often used in financial language models due to its wide coverage and relevance to financial events.
Accessing the TRC2 Dataset
To obtain the TRC2 dataset, researchers and organizations need to request access through NIST. The dataset is available at NIST TREC Reuters Corpus, which provides details on licensing and usage agreements. You will need to:
- Visit the NISTTRECReutersCorpus.
- Follow the dataset request process specified on the website.
- Ensure compliance with the licensing requirements to use the dataset in research or commercial projects.
Once you obtain the dataset, preprocess and segment it into sentences for sentiment analysis, allowing you to apply FinBERT to generate expert-labeled sentiment classes.
Research Methodology: Steps to Analyze Financial Sentiment
The methodology for fine-tuning GPT-4o mini with sentiment labels derived from FinBERT consists of the following main steps:
Step1: FinBERT Labeling
To create the fine-tuning dataset, we leverage FinBERT, a financial language model pre-trained on the financial domain. We apply FinBERT to each sentence in the TRC2 dataset, generating expert sentiment labels across three classes: Positive, Negative, and Neutral. This process produces a labeled dataset where each sentence from TRC2 is associated with a sentiment, thus providing a foundation for training GPT-4o mini with reliable labels.
Step2: Data Preprocessing and JSONL Formatting
The labeled data is then preprocessed and formatted into a JSONL structure suitable for OpenAI’s fine-tuning API. We format each data point with the following structure:
- A system message specifying the assistant’s role as a financial expert.
- A user message containing the financial sentence.
- An assistant response that states the predicted sentiment label from FinBERT.
After labeling, we perform additional preprocessing steps, such as converting labels to lowercase for consistency and stratifying the data to ensure balanced label representation. We also split the dataset into training and validation sets, reserving 80% of the data for training and 20% for validation, which helps assess the model’s generalization ability.
Step3: Fine-Tuning GPT-4o Mini
Using OpenAI’s fine-tuning API, we fine-tune GPT-4o mini with the pre-labeled dataset. Fine-tuning settings, such as learning rate, batch size, and number of epochs, are optimized to achieve a balance between model accuracy and generalizability. This process enables GPT-4o mini to learn from domain-specific data and improves its performance on financial sentiment analysis tasks.
Step4: Evaluation and Benchmarking
After training, the model’s performance is evaluated using common sentiment analysis metrics like accuracy and F1-score, allowing a direct comparison with FinBERT’s performance on the same data. This benchmarking demonstrates how well GPT-4o mini generalizes sentiment classifications within the financial domain and confirms if it can consistently outperform FinBERT in accuracy.
Step5: Deployment and Practical Application
Upon confirming superior performance, GPT-4o mini is ready for deployment in real-world financial applications, such as market analysis, investment advisory, and automated news sentiment tracking. This fine-tuned model provides an efficient alternative to more complex financial models, offering robust, scalable sentiment analysis capabilities suitable for integration into financial systems.
If you want to learn the basics of Sentiment Analysis, checkout our article on Sentiment Analysis using Python!
Fine-Tuning GPT-4o Mini for Financial Sentiment Analysis
Follow this structured, step-by-step approach to seamlessly navigate through each stage of the process. Whether you’re a beginner or experienced, this guide ensures clarity and successful implementation from start to finish.
Step1: Initial Setup
Load Required Libraries and Configure the Environment.
from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch import pandas as pd from tqdm import tqdm tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert") model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert") device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model.to(device)
Step2: Define a Function to Generate Sentiment Labels with FinBERT
- This function accepts text input, tokenizes it, and uses FinBERT to predict sentiment labels.
- Label Mapping: FinBERT outputs three classes—Positive, Negative, and Neutral.
def get_sentiment(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(device) with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits sentiment = torch.argmax(logits, dim=1).item() sentiment_label = ["Positive", "Negative", "Neutral"][sentiment] return sentiment_label
Step3: Data Preprocessing and Sampling the TRC2 Dataset
You must carefully preprocess the TRC2 dataset to retain only relevant sentences for fine-tuning. The following steps outline how to read, clean, split, and filter the data from the TRC2 dataset.
Given the constraints of non-disclosure, this section provides a high-level overview of the data preprocessing workflow withpseudocode.
- Load and Extract Data: The dataset, provided in a compressed format, was loaded and extracted using standard text handling methods. Relevant sections of each document were isolated to focus on key textual content.
- Text Cleaning and Sentence Segmentation: After isolating content sections, each document was cleaned to remove extraneous characters and ensure consistency in formatting. This prepared the content for splitting into sentences or smaller text units, which enhances model performance by providing manageable segments for sentiment analysis.
- Structured Data Storage: To facilitate streamlined processing, the data was organized into a structured format where each row represents an individual sentence or text segment. This setup allows for efficient processing, filtering, and labeling, making it suitable for fine-tuning language models.
- Filter and Screen for Relevant Text Segments: To maintain high data quality, we applied various criteria to filter out irrelevant or noisy text segments. These criteria included eliminating overly short segments, removing those with specific patterns indicative of non-sentiment-bearing content, and excluding segments with excessive special characters or specific formatting characteristics.
- Final Preprocessing: Only the segments that met predefined quality standards were retained for model training. The filtered data was saved as a structured file for easy reference in the fine-tuning workflow.
# Load the compressed dataset from file open compressed_file as file: # Read the contents of the file into memory data = read_file(file) # Extract relevant sections of each document for each document in data: extract document_id extract date extract main_text_content # Define a function to clean and segment text content function clean_and_segment_text(text): # Remove unwanted characters and whitespace cleaned_text = remove_special_characters(text) cleaned_text = standardize_whitespace(cleaned_text) # Split the cleaned text into sentences or text segments sentences = split_into_sentences(cleaned_text) return sentences # Apply the cleaning and segmentation function to each document’s content for each document in data: sentences = clean_and_segment_text(document['main_text_content']) save sentences to structured format # Create a structured data storage for individual sentences initialize empty list of structured_data for each sentence in sentences: # Append sentence to structured data structured_data.append(sentence) # Define a function to filter out unwanted sentences based on specific criteria function filter_sentences(sentence): if sentence is too short: return False if sentence contains specific patterns (e.g., dates or excessive symbols): return False if sentence matches unwanted formatting characteristics: return False return True # Apply the filter to structured data filtered_data = [sentence for sentence in structured_data if filter_sentences(sentence)] # Further filter the sentences based on minimum length or other criteria final_data = [sentence for sentence in filtered_data if meets_minimum_length(sentence)] # Save the final data structure for model training save final_data as structured_file
- Load the dataset and sample 1,000,000 sentences randomly to ensure a manageable dataset size for fine-tuning.
- Store the sampled sentences in a DataFrame to enable structured handling and easy processing.
df_sampled = df.sample(n=1000000, random_state=42).reset_index(drop=True)
Step4: Generate Labels and Prepare JSONL Data for Fine-Tuning
- Loop through the sampled sentences, use FinBERT to label each sentence, and format it as JSONL for GPT-4o mini fine-tuning.
- Structure for JSONL: Each entry includes a system message, user content, and the assistant’s sentiment response.
import json jsonl_data = [] for _, row in tqdm(df_sampled.iterrows(), total=df_sampled.shape[0]): content = row['sentence'] sentiment = get_sentiment(content) jsonl_entry = { "messages": [ {"role": "system", "content": "The assistant is a financial expert."}, {"role": "user", "content": content}, {"role": "assistant", "content": sentiment} ] } jsonl_data.append(jsonl_entry) with open('finetuning_data.jsonl', 'w') as jsonl_file: for entry in jsonl_data: jsonl_file.write(json.dumps(entry) '\n')
Step5: Convert Labels to Lowercase
- Ensure label consistency by converting sentiment labels to lowercase, aligning with OpenAI’s formatting for fine-tuning.
with open('finetuning_data.jsonl', 'r') as jsonl_file: data = [json.loads(line) for line in jsonl_file] for entry in data: entry["messages"][2]["content"] = entry["messages"][2]["content"].lower() with open('finetuning_data_lowercase.jsonl', 'w') as new_jsonl_file: for entry in data: new_jsonl_file.write(json.dumps(entry) '\n')
Step6: Shuffle and Split the Dataset into Training and Validation Sets
- Shuffle the Data: Randomize the order of entries to eliminate ordering bias.
- Split into 80% Training and 20% Validation Sets.
import random random.seed(42) random.shuffle(data) split_ratio = 0.8 split_index = int(len(data) * split_ratio) training_data = data[:split_index] validation_data = data[split_index:] with open('training_data.jsonl', 'w') as train_file: for entry in training_data: train_file.write(json.dumps(entry) '\n') with open('validation_data.jsonl', 'w') as val_file: for entry in validation_data: val_file.write(json.dumps(entry) '\n')
Step7: Perform Stratified Sampling and Save Reduced Dataset
- To further optimize, perform stratified sampling to create a reduced dataset while maintaining label proportions.
- Use Stratified Sampling: Ensure equal distribution of labels across both training and validation sets for balanced fine-tuning.
from sklearn.model_selection import train_test_split data_df = pd.DataFrame({ 'content': [entry["messages"][1]["content"] for entry in data], 'label': [entry["messages"][2]["content"] for entry in data] }) df_sampled, _ = train_test_split(data_df, stratify=data_df['label'], test_size=0.9, random_state=42) train_df, val_df = train_test_split(df_sampled, stratify=df_sampled['label'], test_size=0.2, random_state=42) def df_to_jsonl(df, filename): jsonl_data = [] for _, row in df.iterrows(): jsonl_entry = { "messages": [ {"role": "system", "content": "The assistant is a financial expert."}, {"role": "user", "content": row['content']}, {"role": "assistant", "content": row['label']} ] } jsonl_data.append(jsonl_entry) with open(filename, 'w') as jsonl_file: for entry in jsonl_data: jsonl_file.write(json.dumps(entry) '\n') df_to_jsonl(train_df, 'reduced_training_data.jsonl') df_to_jsonl(val_df, 'reduced_validation_data.jsonl')
Step8: Fine-Tune GPT-4o Mini Using OpenAI’s Fine-Tuning API
- With your prepared JSONL files, follow OpenAI’s documentation to fine-tune GPT-4o mini on the prepared training and validation datasets.
- Upload Data and Start Fine-Tuning: Upload the JSONL files to OpenAI’s platform and follow their API instructions to initiate the fine-tuning process.
Step9: Model Testing and Evaluation
To evaluate the fine-tuned GPT-4o mini model’s performance, we tested it on a labeled financial sentiment dataset available on Kaggle. This dataset contains 5,843 labeled sentences in financial contexts, which allows for a meaningful comparison between the fine-tuned model and FinBERT.
FinBERT scored an accuracy of 75.81%, while the fine-tuned GPT-4o mini model achieved 76.46%, demonstrating a slight improvement.
Here’s the code used for testing:
import pandas as pd import os import openai from dotenv import load_dotenv # Load the CSV file csv_file_path = 'data.csv' # Replace with your actual file path df = pd.read_csv(csv_file_path) # Convert DataFrame to text format with open('sentences.txt', 'w', encoding='utf-8') as f: for index, row in df.iterrows(): sentence = row['Sentence'].strip() # Clean sentence sentiment = row['Sentiment'].strip().lower() # Ensure sentiment is lowercase and clean f.write(f"{sentence} @{sentiment}\n") # Load environment variables load_dotenv() # Set your OpenAI API key openai.api_key = os.getenv("OPENAI_API_KEY") # Ensure OPENAI_API_KEY is set in your environment variables # Path to the dataset text file file_path = 'sentences.txt' # Text file containing sentences and labels # Read sentences and true labels from the dataset sentences = [] true_labels = [] with open(file_path, 'r', encoding='utf-8') as file: lines = file.readlines() # Extract sentences and labels for line in lines: line = line.strip() if '@' in line: sentence, label = line.rsplit('@', 1) sentences.append(sentence.strip()) true_labels.append(label.strip()) # Function to get predictions from the fine-tuned model def get_openai_predictions(sentence, model="your_finetuned_model_name"): # Replace with your model name try: response = openai.ChatCompletion.create( model=model, messages=[ {"role": "system", "content": "You are a financial sentiment analysis expert."}, {"role": "user", "content": sentence} ], max_tokens=50, temperature=0.5 ) return response['choices'][0]['message']['content'].strip() except Exception as e: print(f"Error generating prediction for sentence: '{sentence}'. Error: {e}") return "unknown" # Generate predictions for the dataset predicted_labels = [] for sentence in sentences: prediction = get_openai_predictions(sentence) # Normalize the predictions to 'positive', 'neutral', 'negative' if 'positive' in prediction.lower(): predicted_labels.append('positive') elif 'neutral' in prediction.lower(): predicted_labels.append('neutral') elif 'negative' in prediction.lower(): predicted_labels.append('negative') else: predicted_labels.append('unknown') # Calculate the model's accuracy correct_count = sum([pred == true for pred, true in zip(predicted_labels, true_labels)]) accuracy = correct_count / len(sentences) print(f'Accuracy: {accuracy:.4f}') # Expected output: 0.7646
Conclusion
By combining the expertise of FinBERT’s financial domain labels with the flexibility of GPT-4o mini, this project achieves a high-performance financial sentiment model that surpasses FinBERT in accuracy. This guide and methodology pave the way for replicable, scalable, and interpretable sentiment analysis, specifically tailored to the financial industry.
Key Takeaways
- Fine-tuning GPT-4o mini with domain-specific data enhances its ability to capture nuanced financial sentiment, outperforming models like FinBERT in accuracy.
- The TRC2 dataset, curated by Reuters, provides high-quality financial news articles for effective sentiment analysis training.
- Preprocessing and labeling with FinBERT enable GPT-4o mini to generate more accurate sentiment predictions for financial texts.
- The approach demonstrates the scalability of GPT-4o mini for real-world financial applications, offering a lightweight alternative to complex models.
- By leveraging OpenAI’s fine-tuning API, this method optimizes GPT-4o mini for efficient and effective financial sentiment analysis.
Frequently Asked Questions
Q1. Why use GPT-4o mini instead of FinBERT for financial sentiment analysis?A. GPT-4o mini provides a lightweight, flexible alternative and can outperform FinBERT on specific tasks with fine-tuning. By fine-tuning with domain-specific data, GPT-4o mini can capture nuanced sentiment patterns in financial texts while being more computationally efficient and easier to deploy.
Q2. How do I request access to the TRC2 dataset?A. To access the TRC2 dataset, submit a request through the National Institute of Standards and Technology (NIST) at this link. Review the website’s instructions to complete licensing and usage agreements, typically required for both research and commercial use.
Q3. Can I use a different dataset for financial sentiment analysis?A. You can also use other datasets like the Financial PhraseBank or custom datasets containing labeled financial texts. The TRC2 dataset suits training sentiment models particularly well, as it includes financial news content and covers a wide range of financial topics.
Q4. How does FinBERT generate the sentiment labels?A. FinBERT is a financial domain-specific language model that pre-trains on financial data and fine-tunes for sentiment analysis. When applied to the TRC2 sentences, it categorizes each sentence into Positive, Negative, or Neutral sentiment based on the language context in financial texts.
Q5. Why do we need to convert the labels to lowercase in JSONL?A. Converting labels to lowercase ensures consistency with OpenAI’s fine-tuning requirements, which often expect labels to be case-sensitive. It also helps prevent mismatches during evaluation and maintains a uniform structure in the JSONL dataset.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
The above is the detailed content of Fine-Tuning GPT-4o Mini for Financial Sentiment Analysis. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











While working on Agentic AI, developers often find themselves navigating the trade-offs between speed, flexibility, and resource efficiency. I have been exploring the Agentic AI framework and came across Agno (earlier it was Phi-

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

The release includes three distinct models, GPT-4.1, GPT-4.1 mini and GPT-4.1 nano, signaling a move toward task-specific optimizations within the large language model landscape. These models are not immediately replacing user-facing interfaces like

Troubled Benchmarks: A Llama Case Study In early April 2025, Meta unveiled its Llama 4 suite of models, boasting impressive performance metrics that positioned them favorably against competitors like GPT-4o and Claude 3.5 Sonnet. Central to the launc

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Can a video game ease anxiety, build focus, or support a child with ADHD? As healthcare challenges surge globally — especially among youth — innovators are turning to an unlikely tool: video games. Now one of the world’s largest entertainment indus

Simulate Rocket Launches with RocketPy: A Comprehensive Guide This article guides you through simulating high-power rocket launches using RocketPy, a powerful Python library. We'll cover everything from defining rocket components to analyzing simula

Gemini as the Foundation of Google’s AI Strategy Gemini is the cornerstone of Google’s AI agent strategy, leveraging its advanced multimodal capabilities to process and generate responses across text, images, audio, video and code. Developed by DeepM
