Home Backend Development PHP Tutorial Machine Learning in PHP: Build a News Classifier Using Rubix ML

Machine Learning in PHP: Build a News Classifier Using Rubix ML

Nov 03, 2024 am 03:33 AM

Machine Learning in PHP: Build a News Classifier Using Rubix ML

Introduction

Machine learning is everywhere—recommending movies, tagging images, and now even classifying news articles. Imagine if you could do that within PHP! With Rubix ML, you can bring the power of machine learning to PHP in a way that’s straightforward and accessible. This guide will walk you through building a simple news classifier that sorts articles into categories like “Sports” or “Technology.” By the end, you’ll have a working classifier that can predict categories for new articles based on their content.

This project is perfect for beginners who want to dip their toes into machine learning using PHP, and you can follow along with the complete code on GitHub.

Table of Contents

  1. What is Rubix ML?
  2. Setting Up the Project
  3. Creating the News Classification Class
  4. Training the Model
  5. Predicting New Samples
  6. Final Thoughts

What is Rubix ML?

Rubix ML is a machine learning library for PHP that brings ML tools and algorithms into a PHP-friendly environment. Whether you’re working on classification, regression, clustering, or even natural language processing, Rubix ML has you covered. It allows you to load and preprocess data, train models, and evaluate performance—all in PHP.

Rubix ML supports a wide range of machine learning tasks, such as:

  • Classification: Categorizing data, like labeling emails as spam or not spam.
  • Regression: Predicting continuous values, like housing prices.
  • Clustering: Grouping data without labels, like finding customer segments.
  • Natural Language Processing (NLP): Working with text data, such as tokenizing and transforming it into usable formats for ML.

Let’s dive into how you can use Rubix ML to build a simple news classifier in PHP!

Setting Up the Project

We’ll start by setting up a new PHP project with Rubix ML and configuring autoloading.

Step 1: Initialize the Project Directory

Create a new project directory and navigate into it:

mkdir NewsClassifier
cd NewsClassifier
Copy after login
Copy after login

Step 2: Install Rubix ML with Composer

Make sure you have Composer installed, then add Rubix ML to your project by running:

composer require rubix/ml
Copy after login
Copy after login

Step 3: Configure Autoloading in composer.json

To autoload classes from our project’s src directory, open or create a composer.json file and add the following configuration:

{
    "autoload": {
        "psr-4": {
            "NewsClassifier\": "src/"
        }
    },
    "require": {
        "rubix/ml": "^2.5"
    }
}
Copy after login
Copy after login

This tells Composer to autoload any classes within the src folder under the NewsClassifier namespace.

Step 4: Run Composer Autoload Dump

After adding the autoload configuration, run the following command to regenerate Composer’s autoloader:

mkdir NewsClassifier
cd NewsClassifier
Copy after login
Copy after login

Step 5: Directory Structure

Your project directory should look like this:

composer require rubix/ml
Copy after login
Copy after login
  • src/: Contains your PHP scripts.
  • storage/: Where the trained model will be saved.
  • vendor/: Contains dependencies installed by Composer.

Creating the News Classification Class

In src/, create a file called Classification.php. This file will contain the methods for training the model and predicting news categories.

{
    "autoload": {
        "psr-4": {
            "NewsClassifier\": "src/"
        }
    },
    "require": {
        "rubix/ml": "^2.5"
    }
}
Copy after login
Copy after login

This Classification class contains methods to:

  • Train: Create and train a pipeline-based model.
  • Save the Model: Save the trained model to the specified path.
  • Predict: Load the saved model and predict the category for new samples.

Training the Model

Create a script called train.php in src/ to train the model.

composer dump-autoload
Copy after login

Run this script to train the model:

NewsClassifier/
├── src/
│   ├── Classification.php
│   └── train.php
├── storage/
├── vendor/
├── composer.json
└── composer.lock
Copy after login

If successful, you’ll see:

<?php

namespace NewsClassifier;

use Rubix\ML\Classifiers\KNearestNeighbors;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
use Rubix\ML\PersistentModel;
use Rubix\ML\Pipeline;
use Rubix\ML\Tokenizers\Word;
use Rubix\ML\Transformers\TfIdfTransformer;
use Rubix\ML\Transformers\WordCountVectorizer;
use Rubix\ML\Persisters\Filesystem;

class Classification
{
    private $modelPath;

    public function __construct($modelPath)
    {
        $this->modelPath = $modelPath;
    }

    public function train()
    {
        // Sample data and corresponding labels
        $samples = [
            ['The team played an amazing game of soccer'],
            ['The new programming language has been released'],
            ['The match between the two teams was incredible'],
            ['The new tech gadget has been launched'],
        ];

        $labels = [
            'sports',
            'technology',
            'sports',
            'technology',
        ];

        // Create a labeled dataset
        $dataset = new Labeled($samples, $labels);

        // Set up the pipeline with a text transformer and K-Nearest Neighbors classifier
        $estimator = new Pipeline([
            new WordCountVectorizer(10000, 1, 1, new Word()),
            new TfIdfTransformer(),
        ], new KNearestNeighbors(4));

        // Train the model
        $estimator->train($dataset);

        // Save the model
        $this->saveModel($estimator);

        echo "Training completed and model saved.\n";
    }

    private function saveModel($estimator)
    {
        $persister = new Filesystem($this->modelPath);
        $model = new PersistentModel($estimator, $persister);
        $model->save();
    }

    public function predict(array $samples)
    {
        // Load the saved model
        $persister = new Filesystem($this->modelPath);
        $model = PersistentModel::load($persister);

        // Predict categories for new samples
        $dataset = new Unlabeled($samples);
        return $model->predict($dataset);
    }
}
Copy after login

Predicting New Samples

Create another script, predict.php, in src/ to classify new articles based on the trained model.

<?php

require __DIR__ . '/../vendor/autoload.php';

use NewsClassifier\Classification;

// Define the model path
$modelPath = __DIR__ . '/../storage/model.rbx';

// Initialize the Classification object
$classifier = new Classification($modelPath);

// Train the model and save it
$classifier->train();
Copy after login

Run the prediction script to classify the samples:

php src/train.php
Copy after login

The output should show each sample text with its predicted category.

Final Thoughts

With this guide, you’ve successfully built a simple news classifier in PHP using Rubix ML! This demonstrates how PHP can be more versatile than you might think, bringing in machine learning capabilities for tasks like text classification, recommendation systems, and more. The full code for this project is available on GitHub.

Experiment with different algorithms or data to expand the classifier. Who knew PHP could do machine learning? Now you do.
Happy coding!

The above is the detailed content of Machine Learning in PHP: Build a News Classifier Using Rubix ML. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

How does session hijacking work and how can you mitigate it in PHP? How does session hijacking work and how can you mitigate it in PHP? Apr 06, 2025 am 12:02 AM

Session hijacking can be achieved through the following steps: 1. Obtain the session ID, 2. Use the session ID, 3. Keep the session active. The methods to prevent session hijacking in PHP include: 1. Use the session_regenerate_id() function to regenerate the session ID, 2. Store session data through the database, 3. Ensure that all session data is transmitted through HTTPS.

Describe the SOLID principles and how they apply to PHP development. Describe the SOLID principles and how they apply to PHP development. Apr 03, 2025 am 12:04 AM

The application of SOLID principle in PHP development includes: 1. Single responsibility principle (SRP): Each class is responsible for only one function. 2. Open and close principle (OCP): Changes are achieved through extension rather than modification. 3. Lisch's Substitution Principle (LSP): Subclasses can replace base classes without affecting program accuracy. 4. Interface isolation principle (ISP): Use fine-grained interfaces to avoid dependencies and unused methods. 5. Dependency inversion principle (DIP): High and low-level modules rely on abstraction and are implemented through dependency injection.

How to debug CLI mode in PHPStorm? How to debug CLI mode in PHPStorm? Apr 01, 2025 pm 02:57 PM

How to debug CLI mode in PHPStorm? When developing with PHPStorm, sometimes we need to debug PHP in command line interface (CLI) mode...

Framework Security Features: Protecting against vulnerabilities. Framework Security Features: Protecting against vulnerabilities. Mar 28, 2025 pm 05:11 PM

Article discusses essential security features in frameworks to protect against vulnerabilities, including input validation, authentication, and regular updates.

How to automatically set permissions of unixsocket after system restart? How to automatically set permissions of unixsocket after system restart? Mar 31, 2025 pm 11:54 PM

How to automatically set the permissions of unixsocket after the system restarts. Every time the system restarts, we need to execute the following command to modify the permissions of unixsocket: sudo...

Explain late static binding in PHP (static::). Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

See all articles