Home Backend Development Golang How to Build Your Own Distributed KV Storage System Using the etcd Raft Library

How to Build Your Own Distributed KV Storage System Using the etcd Raft Library

Jul 17, 2024 am 10:55 AM

How to Build Your Own Distributed KV Storage System Using the etcd Raft Library

Introduction

raftexample is an example provided by etcd that demonstrates the use of the etcd raft consensus algorithm library. raftexample ultimately implements a distributed key-value storage service that provides a REST API.

This article will read and analyze the code of raftexample, hoping to help readers better understand how to use the etcd raft library and the implementation logic of the raft library.

Architecture

The architecture of raftexample is very simple, with the main files as follows:

  • main.go: Responsible for organizing the interaction between the raft module, the httpapi module, and the kvstore module;
  • raft.go: Responsible for interacting with the raft library, including submitting proposals, receiving RPC messages that need to be sent, and performing network transmission, etc.;
  • httpapi.go: Responsible for providing the REST API, serving as the entry point for user requests;
  • kvstore.go: Responsible for persistently storing committed log entries, equivalent to the state machine in the raft protocol.

The Processing Flow of a Write Request

A write request arrives in the ServeHTTP method of the httpapi module via an HTTP PUT request.

curl -L http://127.0.0.1:12380/key -XPUT -d value
Copy after login

After matching the HTTP request method via switch, it enters the PUT method processing flow:

  • Read the content from the HTTP request body (i.e., the value);
  • Construct a proposal through the Propose method of the kvstore module (adding a key-value pair with key as key and value as value);
  • Since there is no data to return, respond to the client with 204 StatusNoContent;

The proposal is submitted to the raft algorithm library through the Propose method provided by the raft algorithm library.

The content of a proposal can be adding a new key-value pair, updating an existing key-value pair, etc.

// httpapi.go
v, err := io.ReadAll(r.Body)
if err != nil {
    log.Printf("Failed to read on PUT (%v)\n", err)
    http.Error(w, "Failed on PUT", http.StatusBadRequest)
    return
}
h.store.Propose(key, string(v))
w.WriteHeader(http.StatusNoContent)
Copy after login

Next, let's look into the Propose method of the kvstore module to see how a proposal is constructed and processed.

In the Propose method, we first encode the key-value pair to be written using gob, and then pass the encoded content to proposeC, a channel responsible for transmitting proposals constructed by the kvstore module to the raft module.

// kvstore.go
func (s *kvstore) Propose(k string, v string) {
    var buf strings.Builder
    if err := gob.NewEncoder(&buf).Encode(kv{k, v}); err != nil {
        log.Fatal(err)
    }
    s.proposeC <- buf.String()
}
Copy after login

The proposal constructed by kvstore and passed to proposeC is received and processed by the serveChannels method in the raft module.

After confirming that proposeC has not been closed, the raft module submits the proposal to the raft algorithm library for processing using the Propose method provided by the raft algorithm library.

// raft.go
select {
    case prop, ok := <-rc.proposeC:
    if !ok {
        rc.proposeC = nil
    } else {
        rc.node.Propose(context.TODO(), []byte(prop))
    }
Copy after login

After a proposal is submitted, it follows the raft algorithm process. The proposal will eventually be forwarded to the leader node (if the current node is not the leader and you allow followers to forward proposals, controlled by the DisableProposalForwarding configuration). The leader will add the proposal as a log entry to its raft log and synchronize it with other follower nodes. After being deemed committed, it will be applied to the state machine and the result will be returned to the user.

However, since the etcd raft library itself does not handle communication between nodes, appending to the raft log, applying to the state machine, etc., the raft library only prepares the data required for these operations. The actual operations must be performed by us.

Therefore, we need to receive this data from the raft library and process it accordingly based on its type. The Ready method returns a read-only channel through which we can receive the data that needs to be processed.

It should be noted that the received data includes multiple fields, such as snapshots to be applied, log entries to be appended to the raft log, messages to be transmitted over the network, etc.

Continuing with our write request example (leader node), after receiving the corresponding data, we need to persistently save snapshots, HardState, and Entries to handle issues caused by server crashes (e.g., a follower voting for multiple candidates). HardState and Entries together comprise the Persistent state on all servers as mentioned in the paper. After persistently saving them, we can apply the snapshot and append to the raft log.

Since we are currently the leader node, the raft library will return MsgApp type messages to us (corresponding to AppendEntries RPC in the paper). We need to send these messages to the follower nodes. Here, we use the rafthttp provided by etcd for node communication and send the messages to follower nodes using the Send method.

// raft.go
case rd := <-rc.node.Ready():
    if !raft.IsEmptySnap(rd.Snapshot) {
        rc.saveSnap(rd.Snapshot)
    }
    rc.wal.Save(rd.HardState, rd.Entries)
    if !raft.IsEmptySnap(rd.Snapshot) {
        rc.raftStorage.ApplySnapshot(rd.Snapshot)
        rc.publishSnapshot(rd.Snapshot)
    }
    rc.raftStorage.Append(rd.Entries)
    rc.transport.Send(rc.processMessages(rd.Messages))
    applyDoneC, ok := rc.publishEntries(rc.entriesToApply(rd.CommittedEntries))
    if !ok {
        rc.stop()
        return
    }
    rc.maybeTriggerSnapshot(applyDoneC)
    rc.node.Advance()
Copy after login

Next, we use the publishEntries method to apply the committed raft log entries to the state machine. As mentioned earlier, in raftexample, the kvstore module acts as the state machine. In the publishEntries method, we pass the log entries that need to be applied to the state machine to commitC. Similar to the earlier proposeC, commitC is responsible for transmitting the log entries that the raft module has deemed committed to the kvstore module for application to the state machine.

// raft.go
rc.commitC <- &commit{data, applyDoneC}
Copy after login

In the readCommits method of the kvstore module, messages read from commitC are gob-decoded to retrieve the original key-value pairs, which are then stored in a map structure within the kvstore module.

// kvstore.go
for commit := range commitC {
    ...
    for _, data := range commit.data {
        var dataKv kv
        dec := gob.NewDecoder(bytes.NewBufferString(data))
        if err := dec.Decode(&dataKv); err != nil {
            log.Fatalf("raftexample: could not decode message (%v)", err)
        }
        s.mu.Lock()
        s.kvStore[dataKv.Key] = dataKv.Val
        s.mu.Unlock()
    }
    close(commit.applyDoneC)
}
Copy after login

Returning to the raft module, we use the Advance method to notify the raft library that we have finished processing the data read from the Ready channel and are ready to process the next batch of data.

Earlier, on the leader node, we sent MsgApp type messages to the follower nodes using the Send method. The follower node's rafthttp listens on the corresponding port to receive requests and return responses. Whether it's a request received by a follower node or a response received by a leader node, it will be submitted to the raft library for processing through the Step method.

raftNode implements the Raft interface in rafthttp, and the Process method of the Raft interface is called to handle the received request content (such as MsgApp messages).

// raft.go
func (rc *raftNode) Process(ctx context.Context, m raftpb.Message) error {
    return rc.node.Step(ctx, m)
}
Copy after login

The above describes the complete processing flow of a write request in raftexample.

Summary

This concludes the content of this article. By outlining the structure of raftexample and detailing the processing flow of a write request, I hope to help you better understand how to use the etcd raft library to build your own distributed KV storage service.

If there are any mistakes or issues, please feel free to comment or message me directly. Thank you.

References

  • https://github.com/etcd-io/etcd/tree/main/contrib/raftexample

  • https://github.com/etcd-io/raft

  • https://raft.github.io/raft.pdf

The above is the detailed content of How to Build Your Own Distributed KV Storage System Using the etcd Raft Library. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1673
14
PHP Tutorial
1278
29
C# Tutorial
1257
24
Golang vs. Python: Performance and Scalability Golang vs. Python: Performance and Scalability Apr 19, 2025 am 12:18 AM

Golang is better than Python in terms of performance and scalability. 1) Golang's compilation-type characteristics and efficient concurrency model make it perform well in high concurrency scenarios. 2) Python, as an interpreted language, executes slowly, but can optimize performance through tools such as Cython.

Golang and C  : Concurrency vs. Raw Speed Golang and C : Concurrency vs. Raw Speed Apr 21, 2025 am 12:16 AM

Golang is better than C in concurrency, while C is better than Golang in raw speed. 1) Golang achieves efficient concurrency through goroutine and channel, which is suitable for handling a large number of concurrent tasks. 2)C Through compiler optimization and standard library, it provides high performance close to hardware, suitable for applications that require extreme optimization.

Getting Started with Go: A Beginner's Guide Getting Started with Go: A Beginner's Guide Apr 26, 2025 am 12:21 AM

Goisidealforbeginnersandsuitableforcloudandnetworkservicesduetoitssimplicity,efficiency,andconcurrencyfeatures.1)InstallGofromtheofficialwebsiteandverifywith'goversion'.2)Createandrunyourfirstprogramwith'gorunhello.go'.3)Exploreconcurrencyusinggorout

Golang vs. C  : Performance and Speed Comparison Golang vs. C : Performance and Speed Comparison Apr 21, 2025 am 12:13 AM

Golang is suitable for rapid development and concurrent scenarios, and C is suitable for scenarios where extreme performance and low-level control are required. 1) Golang improves performance through garbage collection and concurrency mechanisms, and is suitable for high-concurrency Web service development. 2) C achieves the ultimate performance through manual memory management and compiler optimization, and is suitable for embedded system development.

Golang vs. Python: Key Differences and Similarities Golang vs. Python: Key Differences and Similarities Apr 17, 2025 am 12:15 AM

Golang and Python each have their own advantages: Golang is suitable for high performance and concurrent programming, while Python is suitable for data science and web development. Golang is known for its concurrency model and efficient performance, while Python is known for its concise syntax and rich library ecosystem.

Golang and C  : The Trade-offs in Performance Golang and C : The Trade-offs in Performance Apr 17, 2025 am 12:18 AM

The performance differences between Golang and C are mainly reflected in memory management, compilation optimization and runtime efficiency. 1) Golang's garbage collection mechanism is convenient but may affect performance, 2) C's manual memory management and compiler optimization are more efficient in recursive computing.

The Performance Race: Golang vs. C The Performance Race: Golang vs. C Apr 16, 2025 am 12:07 AM

Golang and C each have their own advantages in performance competitions: 1) Golang is suitable for high concurrency and rapid development, and 2) C provides higher performance and fine-grained control. The selection should be based on project requirements and team technology stack.

Golang vs. Python: The Pros and Cons Golang vs. Python: The Pros and Cons Apr 21, 2025 am 12:17 AM

Golangisidealforbuildingscalablesystemsduetoitsefficiencyandconcurrency,whilePythonexcelsinquickscriptinganddataanalysisduetoitssimplicityandvastecosystem.Golang'sdesignencouragesclean,readablecodeanditsgoroutinesenableefficientconcurrentoperations,t

See all articles