Algorithm of Massively Parallel Networking in C++

Vladislav Shpilevoy

2025

🦆

Plan

Networking in C++

Existing solutions

New scheduler

Examples

Benchmarks

Networking in C++

Networking in C++

Motivation

System is already C++

Super expertise in C++

Need ultra performance

Available inventory

Networking in C++

socket(), send(), recv(), connect(),

accept(), bind(), listen()

epoll

io_uring

IOCP

IoRing

kqueue

Existing solutions

Existing solutions

boost::asio

thrift, gRPC

libev, libuv, libevent

userver

seastar

New scheduler

Motivation

New scheduler

boost::asio

thrift, gRPC

 - unreadable

userver

 - no Windows

seastar

 - no Windows, no MacOS

 - enforce their protocol

Your company lives in its own framework

libev, libuv, libevent

 - too C, single threaded

Requirements

New scheduler

Fairness

Coroutines

Events

Task Scheduler

Scheduler Kingdom

The Battle For The Key

The Front

Tavern

The Waiting Prison

The Map

The Ready

Arena

The Many Tasks

The Strong Workers

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Example of scheduling

Task Scheduler

Wakeup

Task Scheduler

Wakeup

Task Scheduler

😭

Wakeup

Task Scheduler

😴

Wakeup

Task Scheduler

😴

The King Calls To Battle

Wakeup

Task Scheduler

😳

The King Calls To Battle

Wakeup

Task Scheduler

🤔

The King Calls To Battle

Wakeup

Task Scheduler

🤔

The King Calls To Battle

Wakeup

Task Scheduler

The Kernel

Castle

The King Calls To Battle

Implementation

Task Scheduler

Implementation

Task Scheduler

Less than 2k lines*

Lock-free*

Low memory

Simple code

Formally verified in TLA+

clang, gcc, msvc

MacOS, Windows, Linux

epoll, io_uring, kqueue, IOCP

64 bit: x86, ARM

>= C++17

Examples

Simple task

Examples

Simple task

int
main()
{
    mg::sch::TaskScheduler sched("tst",
        1, // Thread count.
        5  // Subqueue size.
    );
    sched.Post(new mg::sch::Task([&](mg::sch::Task *self) {
        std::cout << "Executed in scheduler!\n";
        delete self;
    }));
    return 0;
}

Create the scheduler

Post a task

The body is a lambda function

Examples

Multistep task

Examples

Multistep task [1]

class MyTask : public mg::sch::Task
{
public:
    MyTask() : Task([this](mg::sch::Task* aSelf) {
        TaskSendRequest(aSelf);
    }) {}

private:
    // Step 1
    void
    TaskSendRequest(
        mg::sch::Task* aSelf);

    // Step 2
    void
    TaskRecvResponse(
        mg::sch::Task* aSelf);

    // Step 3
    void
    TaskFinish(
        mg::sch::Task* aSelf);
};

Yield between the steps.

Can inherit to add context

Examples

Multistep task [2]

    void
    TaskSendRequest(mg::sch::Task* aSelf)
    {
        std::cout << "Send\n";
        aSelf->SetCallback([this](mg::sch::Task* aSelf) {
            TaskRecvResponse(aSelf);
        });
        mg::sch::TaskScheduler::This().Post(aSelf);
    }

    void
    TaskRecvResponse(mg::sch::Task* aSelf)
    {
        std::cout << "Receive\n";
        aSelf->SetCallback([this](mg::sch::Task *aSelf) {
            TaskFinish(aSelf);
        });
        mg::sch::TaskScheduler::This().Post(aSelf);
    }

    void
    TaskFinish(mg::sch::Task* aSelf)
    {
        std::cout << "Finish\n";
    }

1. Do something.

2. Prepare next step.

3. Post self.

1. Do something.

2. Prepare next step.

3. Post self.

1. Finish the work.

2. Delete/destroy/reuse.

Examples

Multistep task [3]

int
main()
{
    MyTask task;
    mg::sch::TaskScheduler scheduler("tst",
        1, // Thread count.
        5  // Subqueue size.
    );
    scheduler.Post(&task);
    return 0;
}

Examples

Coroutine task

Examples

Coroutine task

int
main()
{
    mg::sch::Task task;
    task.SetCallback([](
        mg::sch::Task& aSelf) -> mg::box::Coro {

        std::cout << "Sending request ...\n";

        co_await aSelf.AsyncYield();

        std::cout << "Received response!\n";

        co_await aSelf.AsyncYield();

        std::cout << "Finish\n";
        co_return;
    }(task));

    mg::sch::TaskScheduler scheduler("tst",
        1, // Thread count.
        5  // Subqueue size.
    );
    scheduler.Post(&task);
    return 0;
}

Coroutine enabler

Reschedule self. Let other tasks to work

Examples

Interacting tasks

Examples

Interacting tasks [1]

static void
TaskSubmitRequest(
    mg::sch::Task& aSender)
{
    mg::sch::Task* worker = new mg::sch::Task();

    worker->SetCallback([](
        mg::sch::Task& aSelf,
        mg::sch::Task& aSender) -> mg::box::Coro {

        aSelf.SetDelay(1000);
        co_await aSelf.AsyncYield();

        aSender.PostSignal();

        co_await aSelf.AsyncExitDelete();
    }(*worker, aSender));

    mg::sch::TaskScheduler::This().Post(worker);
}

Temporary worker task

"Work" for 1 sec, wakeup the owner

Examples

Interacting tasks [2]

int
main()
{
    mg::sch::Task task;
    mg::sch::TaskScheduler scheduler("tst",
        1, // Thread count.
        5  // Subqueue size.
    );
    task.SetCallback([](
        mg::sch::Task& aSelf) -> mg::box::Coro {

        TaskSubmitRequest(aSelf);
        do {
            aSelf.SetWait();
        } while (!co_await aSelf.AsyncReceiveSignal());

        co_return;
    }(task));

    scheduler.Post(&task);
    return 0;
}

Start work and wait for its comletion

Examples

Simple TCP

Examples

Simple TCP [1]

int
main()
{
    mg::aio::IOCore core;
    core.Start(3 /* threads */);

    MyServer server(core);
    uint16_t port = server.Bind();
    server.Start();

    for (int i = 0; i < theClientCount; ++i)
        new MyClient(i + 1, core, port);

    core.WaitEmpty();
    return 0;
}

IO task scheduler

Task as a server

Tasks as clients

Examples

Simple TCP [2]

class MyServer final
    : private mg::aio::TCPServerSubscription
{
public:
    MyServer(
        mg::aio::IOCore& aCore)
        : myServer(mg::aio::TCPServer::NewShared(aCore)) {}

    uint16_t
    Bind()
    {
        myServer->Bind(mg::net::HostMakeAllIPV4(0));
        return myServer->GetPort();
    }

    void
    Start()
    {
        myServer->Listen(this);
    }

    // ... Some methods ...

    mg::aio::TCPServer::Ptr myServer;
}

Server receives task-events via the "subscription"

Server socket is created attached to IOCore

Bind + get the resulting port if it was random

After Listen(subscription) the socket is active, runs in IOCore, and delivers events

Examples

Simple TCP [3]

    void
    MyServer::OnAccept(
        mg::net::Socket aSock,
        const mg::net::Host& aPeerAddress) final
    {
        new MyPeer(myServer->GetCore(), aSock);
    }

Invoked by IOCore workers

Spawn a new task to handle the peer socket

Examples

Simple TCP [4]

class MyClient final
    : private mg::aio::TCPSocketSubscription
{
public:
    MyClient(
        mg::aio::IOCore& aCore,
        uint16_t aPort)
        : mySock(new mg::aio::TCPSocket(aCore))
    {
        mySock->Open({});
        mg::aio::TCPSocketConnectParams connParams;
        connParams.myEndpoint = mg::net::HostMakeLocalIPV4(aPort).ToString();

        mySock->PostConnect(connParams, this);
    }

    // ... Some methods ...

    mg::aio::TCPSocket* mySock;
};

Client receives task-events via the "subscription"

Will work in the given IOCore

Choose connection parameters

Enter the IOCore with the async connect, start receiving events

Examples

Simple TCP [5]

    void
    MyClient::OnConnect() final
    {
        const char* msg = "hello handshake";
        mySock->SendRef(msg, mg::box::Strlen(msg) + 1);
        mySock->Recv(1);
    }

    void
    MyClient::OnRecv(
        mg::net::BufferReadStream& aStream) final
    {
        MG_BOX_ASSERT(aStream.GetReadSize() > 0);
        mySock->PostClose();
    }

    void
    MyClient::OnClose() final
    {
        mySock->Delete();
        delete this;
    }

Async send "handshake" on connect and read an ack

Async close on confirmation

Delete self, when close is finished

Examples

Pipeline

Examples

Pipeline [1]

class CalcClient final
    : private mg::aio::TCPSocketSubscription
{
public:
    void
    Submit(
        char aOp,
        int64_t aArg1,
        int64_t aArg2,
        std::function<void(int64_t)>&& aOnComplete);

    mg::aio::TCPSocket* mySock;
};

Calculator frontend, operations executed in a "remote microservice"

Client for the remote calculator

Submit async operation

Examples

Pipeline [2]

class MyRequest
{
public:
    MyRequest(
        CalcClient& aCalcClient,
        mg::sch::TaskScheduler& aSched)
        : myCalcClient(aCalcClient)
        , myTask(Execute(this))
    {
        aSched.Post(&myTask);
    }

private:
    mg::box::Coro
    Execute(
        MyRequest* aSelf);

    CalcClient& myCalcClient;
    mg::sch::Task myTask;
};

User request

It has a client to the calculator host, and a scheduler to work in

Start execution

Body

Examples

Pipeline [3]

    mg::box::Coro
    MyRequest::Execute(
        MyRequest* aSelf)
    {
        int64_t res = 0;
        myCalcClient.Submit('+', 10, myID, [aSelf, &res](int64_t aRes) {
            res = aRes;
            aSelf->myTask.PostSignal();
        });

        while (!co_await aSelf->myTask.AsyncReceiveSignal())
            aSelf->myTask.SetWait();

        std::cout << "Result: " << res << std::end;

        // ... Do whatever else or delete the request.
    }

Submit async request to the calculator client

Async wait for completion

On completion save the result and wake the request up

Benchmarks

Benchmarks

Task Scheduler

Thread Pool

vs

Debian 11, 2.5GHz, 32 cores

18.7 mln / sec

3.8 mln / sec

4.3 mln / sec

8.2 mln / sec

1 thread, empty tasks:

x5.0

2 threads, empty tasks:

x1.4

10 threads, empty tasks:

x7.7

50 threads, empty tasks:

x168.5

==

x1.8

1.8 mln / sec

x4.0

2.8 mln / sec

x118.8

7.5 mln / sec

<=3 threads, micro tasks:

5 threads, micro tasks:

10 threads, micro tasks:

50 threads, micro tasks:

Benchmarks

IOCore

boost::asio

vs

Linux, epoll

Ubuntu 22.04.4, 0.8-4.8GHz, 24 cores

5 threads, 100 clients, 128 b:

x1.54

962'630 msg / sec

1 thread, 100 clients, 128 b:

x1.45

330'200 msg / sec

5 threads, 200 clients, 100 KB:

x4.44

51'100 msg / sec

Benchmarks

IOCore

boost::asio

vs

3 threads, 100 clients, 128 b:

Linux, io_uring

x1.49

Ubuntu 24.04.1, 0.8-4.8GHz, 24 cores

218'500 msg / sec

1 thread, 100 clients, 128 b:

x1.27

245'470 msg / sec

3 threads, 200 clients, 100 KB:

x1.48

13'254 msg / sec

Benchmarks

IOCore

boost::asio

vs

3 threads, 100 clients, 128 b:

Windows, IOCP

x1.10

Windows 11 Home, 3.70GHz, 12 cores

195'430 msg / sec

1 thread, 100 clients, 128 b:

x1.20

80'510 msg / sec

3 threads, 200 clients, 100 KB:

x2.11

28'368 msg / sec

Benchmarks

IOCore

boost::asio

vs

3 threads, 100 clients, 128 b:

MacOS, kqueue

x1.50

macOS Big Sur 11.7.10, 2.5GHz, 4 cores

146'930 msg / sec

1 thread, 100 clients, 128 b:

x1.41

101'800 msg / sec

3 threads, 200 clients, 100 KB:

x1.45

12'750 msg / sec

Benchmarks

epoll

vs

io_uring

Get notified about events, then do operation on socket each individually.

Post many operations on many sockets, get notified when done.

Why x2 faster?

Linux 6.8.0, March 2024

C++ Russia 2025: Algorithm of Massively Parallel Networking in C++

By Vladislav Shpilevoy

C++ Russia 2025: Algorithm of Massively Parallel Networking in C++

Мой доклад посвящен производительности серверов на C++ с использованием альтернативы boost::asio для массово параллельной сетевой работы. Boost::asio — фактически стандарт для сетевого кода на C++, но в редких случаях он недоступен или недостаточен по разным причинам. Используя опыт работы над высокопроизводительными проектами, я разработал новый алгоритм планирования задач, построил на его основе сетевую библиотеку и представляю их в докладе. Самые интересные детали — справедливое распределение нагрузки на процессор, поддержка C++ корутин, формальная верификация на TLA+ и воспроизводимые бенчмарки, показывающие ускорение в N раз от boost::asio. Проект в открытом доступе: https://github.com/Gerold103/serverbox.

  • 143