0% found this document useful (0 votes)
19 views35 pages

Docker-Based Federated Learning Framework

Uploaded by

xedoyat944
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views35 pages

Docker-Based Federated Learning Framework

Uploaded by

xedoyat944
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Computing (2023) 105:2195–2229

https://doi.org/10.1007/s00607-023-01179-5

REGULAR PAPER

A Docker‑based federated learning framework design


and deployment for multi‑modal data stream classification

Arijit Nandi1,2 · Fatos Xhafa1 · Rohit Kumar2

Received: 14 November 2022 / Accepted: 19 April 2023 / Published online: 11 May 2023
© The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature 2023

Abstract
In the high-performance computing (HPC) domain, federated learning has gained
immense popularity. Especially in emotional and physical health analytics and
experimental facilities. Federated learning is one of the most promising distributed
machine learning frameworks because it supports data privacy and security by not
sharing the clients’ data but instead sharing their local models. In federated learn-
ing, many clients explicitly train their machine learning/deep learning models (local
training) before aggregating them as a global model at the global server. However,
the FL framework is difficult to build and deploy across multiple distributed clients
due to its heterogeneous nature. We developed Docker-enabled federated learning
(DFL) by utilizing client-agnostic technologies like Docker containers to simplify
the deployment of FL frameworks for data stream processing on the heterogeneous
client. In the DFL, the clients and global servers are written using TensorFlow and
lightweight message queuing telemetry transport protocol to communicate between
clients and global servers in the IoT environment. Furthermore, the DFL’s effective-
ness, efficiency, and scalability are evaluated in the test case scenario where real-
time emotion state classification is done from distributed multi-modal physiological
data streams under various practical configurations.

Keywords Federated learning · High performance computing · Multi-modal data


streaming · Docker-container · Real-time emotion classification

* Arijit Nandi
arijit.nandi@eurecat.org
* Fatos Xhafa
fatos@cs.upc.edu
Rohit Kumar
rohit.kumar@eurecat.org
1
Department of CS, Universitat Politècnica de Catalunya, 08034 Barcelona, Spain
2
Eurecat, Centre Tecnològic de Catalunya, 08005 Barcelona, Spain

13
Vol.:(0123456789)
2196 A. Nandi et al.

Mathematics Subject Classification 68W15 · 94A16 · 68M20 · 68T05 · 68T07 ·


68P27

1 Introduction

Artificial intelligence (AI) has emerged as the de-facto technology for a wide
range of applications (such as smart industry, healthcare, and Unmanned Areal
Vehicle (UAV) applications etc.), coinciding with the growth of Cloud Comput-
ing (CC) [1]. More applications are migrating from private infrastructures to
cloud data centers to reap its benefits, such as scalability, elasticity, agility, and
cost efficiency [2]. Bringing AI to the cloud is advantageous because computing
resources are efficiently utilized, and costs for application deployment and oper-
ation are minimized by appropriately distributing physical resources in clouds,
such as CPU, memory, storage, and network resources, to various cloud applica-
tions [3]. With the recent advances in technology, the number of connected Inter-
net of Things (IoT) devices has increased enormously and according to CISCO,
this number could exceed 75 billion by 2025, which is 2.5 times the amount of
data produced in 2020 (i,e. 31 billion) [4]. So, managing enormous, continuous,
diverse, and dispersed IoT data appears to be hard while offering services at a
specific performance level using cloud infrastructure. However, in traditional AI
systems with CC enabled, data producers (such as IoT devices) most frequently
transfer and exchange data with other parties (such as cloud servers), in order to
train their models (such as Deep Learning(DL) or Machine Learning (ML)) to
improve the performance of the system. This design pattern is unfeasible because
of the high bandwidth requirements, legality, and privacy risks.
For that, Federated Learning (FL) concept has recently emerged as a promising
solution for mitigating the problems of data privacy, and legalization. Because
Fl enables the distribution of computing load and training sensitive data locally
without the need to transfer it to a primary server for privacy considerations [5].
In FL, each client (i.e., mobile, IoT and vehicular etc.) trains its local deep neural
network (DNN) model with local data, which is then aggregated into a shared
global model by the centralized server [6]. This scenario is repeated multiple
rounds for better results and convergence. Having said that, Edge Computing
(EC), bringing the Cloud Computing (CC) services closer to the data sources, is a
revolutionary architecture that lowers latency and bandwidth costs while enhanc-
ing network resilience and availability. As a result, the EC-enabled architecture
may satisfy the needs of time-critical applications with specific Service Level
Agreement (SLA) requirements [4]. Additionally, the FL in the EC is a promising
technique to handle IoT data by benefiting from distributed heterogeneous com-
puting resources and relieving numerous clients’ privacy concerns because the
generated raw data do not be exposed to the third party (cloud servers) [4].
It is seen that FL with EC has solved most of the problems, such as data pri-
vacy, legalization, lowering the latency, and minimizing the bandwidth, which is
an outstanding achievement. However, it is also seen that, with the huge number
of IoT devices, the data have velocity because of the high data generation rate and

13
A Docker‑based federated learning framework design and… 2197

sequential arrival in time, according to data streams [7]. The data tuple needs to
be processed and analyzed as soon as it arrives because operating to the edge has
many limitations, such as limited computing resources, intermittent/denied net-
work connectivity, etc. [8]. Apart from that, the underlying DL/ML model should
be able to adapt to the changes from the continuous data stream in the dynamic
environment.
Nowadays, containers are easy-to-deploy software packages and containerized
applications are easily distributed, making them a natural fit for EC with FL solu-
tions [9]. Edge containers can be deployed in parallel to geographically diverse
points of presence (PoPs) to achieve higher levels of availability when compared
to a traditional cloud container [10]. The edge containers are located at the edge
of the network, much closer to the end user. With the introduction of the con-
tainer and microservice design, it is now possible to increase the scalability and
elasticity of application deployment and delivering [10, 11].
There is a plethora of different FL platforms and frameworks from aca-
demia and industry. These are complex to use and deeper knowledge in FL is
required [12]. Most of the existing FL systems from academia are mostly
research-oriented such as LEAF [13], TFF [14], and PySyft [15]. These are not
straightforward for the interested people in FL and have a lack of support to build
the FL prototype and run it in production. Industrial FL frameworks such as
FATE [16] and PaddleFL [17] are not friendly to beginners and researchers due
to the complex environment setup and heavy system design [12]. Apart from the
complex development and deployability of these FL frameworks, the local dataset
is distributed over different clients in order to build the local model. That means
data does not arrive sequentially, the way data arrives in the case of data streams.
Hence, these FL frameworks are incapable of processing data streams in real time
to adapt the changes in today’s dynamic environment.
To address these issues, in this paper, we design and deploy the Docker-ena-
bled Federated Learning framework (DFL) by taking the advantage of Docker-
Containers and with the capability of processing the data streams in real-time.
The DFL simplifies the deployment of the FL among numerous clients by utiliz-
ing Docker containers and it can handle multi-modal data streams to make the
classification in real time. In particular, in the proposed DFL, each client and
the global server implemented by TensorFlow is installed on a Docker container.
Also, the communication between clients and the server is done by the light-
weight MQTT protocol, which enables DFL to be used IoT environment. Addi-
tionally, the DFL’s efficacy, efficiency, and scalability are assessed in the test case
scenario, which involves real-time emotion state classification from distributed
multi-modal physiological data streams in a variety of real-world setups.
The main contributions of our work are as follows:

• We proposed a Docker-enabled Federated Learning framework (DFL) which


simplifies the deployment of the FL among numerous clients by utilizing the
Docker-container solutions to integrate multi-modal data stream processing
along with a lightweight MQTT protocol for the IoT environment.

13
2198 A. Nandi et al.

• We deploy a multi-modal data streaming in an HPC system to implement the


proposed DFL and leverage the docker-container solution to guarantee the
scalability of the framework.
• A real-time emotion classification from multimodal physiological data stream
use case is adapted for the DFL framework to process the high-velocity data
streams. The experimental results verify its feasibility, scalability and privacy
preserving in multi-modal data processing and online ML applications on cloud
computing infrastructure.

The rest of the paper is structured as follows: Sect. 3 introduces briefly the multi-
modal data stream classification, docker technology and federate learning. The pro-
posed DFL architecture is explained and illustrated in Sect. 4. Following to this, the
deployment in real infrastructure, experimental method and performance evaluation
of the proposed DFL are presented in Sect. 5 along with the experimental results
and discussion. Finally, the paper ends with the conclusion in Sect. 6.

2 Related work

This section presents the previous literature on data stream classification, FL and
the Docker-based deployment in the EC paradigm. Furthermore, the pitfalls of data
stream classification approaches and the existing FL systems are mentioned.
The digital revolution is characterized by an ongoing data collection process,
which results in data being fed into the corresponding machine learning algorithms
across various applications [18]. In the context of distributed machine learning, con-
tinuous data collection provides training data in the form of a data stream for each
"node" in the distributed system. Given that "(full) batch processing" is realistically
impractical in the face of continuous data delivery and distributed training of mod-
els using streaming data needs (single-pass) [18]. In the following two instances,
streaming data is typically processed: (1) For the master-worker learning architec-
ture, the data stream comes at a single master node and is then dispersed among a
total of N worker nodes to distribute the computational load and speed up training
time; (2) The objective of the FL and EC frameworks is to develop a machine learn-
ing model utilizing data from all of the nodes in a collection of N geographically
scattered nodes, each of which receives its own independent stream of data, without
sharing the data to other parties [18]. In the literature, most of the data stream classi-
fication approaches follow the first master-worker approach. These approaches rang-
ing from different model ensemble to feature fusion, in order to increase the over-
all performance of the predictive system, such approaches are Accuracy Weighted
Ensemble classifier (AWE) [19], Adaptive Random Forest classifier (ARF) [20],
Dynamic Weighted Majority ensemble classifier (DWM) [21], Learn++.NSE
ensemble classifier [22], Learn++ ensemble classifier [23], Streaming Random
Patches ensemble classifier (SRPE) [24] and many more.
There have been several approaches developed to address the complexity of
FL integration. For instance, Google, the original FL developer [6], intends to
improve their TensorFlow engine to allow distributed learning with TensorFlow

13
A Docker‑based federated learning framework design and… 2199

Federated [14]. The framework includes crucial functions including the ability to
subscribe to a small number of events to track the execution stages and enable third-
party integration. The framework also includes the tools needed for a quick start as
well as the datasets that are often utilized. But because the framework only works
with TensorFlow models, it is more challenging for researchers to combine FL with
models created using different training engines. There are also FL frameworks that
are engine-specific, such as PaddleFL [17], which supports the PaddlePaddle engine,
and FedToch [25], which exclusively supports PyTorch models. In contrast to the
previously stated frameworks, FedML [26] offers a comprehensive framework that
originally served as a fair benchmarking tool for methods utilizing federated learn-
ing. From client selection through aggregation and validation, the whole federated
workflow is integrated. Their architecture, which is essential for parallel client simu-
lation and enables executions to take place within or outside of a single host, offers
message-passing interface-based (MPI) settings, in contrast to other systems where
tests only run locally [26]. Also, to test their performances, MNIST handwritten dig-
its and the CIFAR for images are the popular benchmark datasets used in the litera-
ture [27]. Also, the industrial FL frameworks like FATE [16] and PaddleFL [17] are
not user- and researcher-friendly because of the complicated environment setup and
sophisticated system design [28].
There is a plethora of master-worker-based approaches for data stream process-
ing. The major disadvantages of these approaches are as follows: (1) privacy and
security issues, because data streams need to be sent to the master server; (2) send-
ing the data to the master server causes a high bandwidth overload (3) most of the
approaches process the data in batch mode hence they are incapable of processing
high-velocity data streams. Whereas, FL with EC is more reliable in this regard, ful-
filling all the needs (as mentioned in the introduction).
In addition to the complicated design and deployment of these FL frame-
works, the local dataset is distributed across several clients to construct the local
model, indicating that data does not come timely, as it does in data streams. As a
result, these FL frameworks cannot handle data streams in real-time to respond to
the changes in today’s dynamic environment. Also, the analysis of computation
resources needed to run the models on the client side is missing in all the available
FL systems, making these even harder to use in a real-world scenario.
Regarding the deployment of FL in edge devices, Docker containers are the best
suitable option to utilize as they are simple to install software packages, and contain-
erized applications are simple to distribute [29]. It can make the deployment quickly,
has a tiny footprint, and has high performance making it a potentially viable Edge
Computing platform [30]. Also, the CPU, memory, storage and network perfor-
mance is minimal as compared to the Virtual Machines (VMs) running on hypervi-
sor [31]. Dockers run 26X faster than VMs and can run on small devices with lower
resources, hence the best fit for FL in EC [32]. So, the docker based FL deployment
is a solution to run efficiently on the edge devices.

13
2200 A. Nandi et al.

Fig. 1  Multi-modal data stream classification in online mode

3 Background

In this section, we present a brief introduction to multi-modal steam classification,


docker technology and federated learning.

3.1 Multi‑modal stream classification

Classifying continuous incoming unbounded data tuples produced from multiple


sources/sensors in real-time is called multi-modal stream classification [33]. In this
case, the base classifier learns and updates itself progressively (called online learn-
ing) from the data stream [19]. As a consequence, the online classifier performs
poorly at the initial stage but improves performance gradually as it sees more data
tuples. The progressive (online) learning steps of a stream classifier are presented as
follows [19]:

1. Receive incoming data instance or tuple ( xt ) without the actual label ( yt ). Where
t is the time of arrival.
2. Predict the class ŷ t = Mt (xt ) for xt , where Mt is the current model.
3. Immediately after classifying xt , receive the corresponding actual label yt.
4. Train the model Mt with the actual label yt and update the performance metrics
using (ŷt , yt ).
5. Proceed to the next data tuple.

13
A Docker‑based federated learning framework design and… 2201

Now, in the case of multi-modal data stream classification, the extracted features
of the received data tuples from each modality are fused (using the concatenation
approach) and fed to the stream classifier for the class prediction, which follows
the previously mentioned progressive learning steps. In Fig. 1 the process of multi-
modal data stream classification is depicted. The main noticeable point in Fig. 1 is
the time (t) dimension, where different data tuple arrives in a stream mode at a dif-
ferent time (t1 , t2 ⋯) from different modality.
In this work, the following assumptions are made to classify multi-modal data
stream:

• The arrival of data tuples is sequential (one at a time).


• The base model first receives the unlabeled data tuples of the stream and immedi-
ately after the class prediction, the true/actual class label of the corresponding data
tuple arrives. Hence it is supervised stream classification approach.
• The model is tested first and then trained based on the arrived true class label, also
known as interleaved test-then-train approach.
• Model can see the received data tuples at most once, hence multiple runs through
the data is not possible.
• Only temporary storage for the data stream is available, meaning no loop back
through the received data tuples.

In this research study, we have considered the feature fusion approach (shown in Fig. 1)
for multi-modal data stream classification. In feature fusion approach includes (1) fea-
ture extraction from data streams from every single modal; (2) performing a feature
fusion to create a single feature representation; (3) performing the data stream classifi-
cation and updating the model progressively (online learning).

3.2 Docker technology

Docker is a well-known, open-source and very popular containerization framework in


the software industry [34]. It automates the development and deployment of application
containers by allowing us to bundle an application with its run-time requirements into a
container image and then execute that image on a host system [11].
The Docker Engine is installed on the host machine and is presently supported by
the majority of operating systems. Containers are an abstraction at the application layer
that groups together code and dependencies. Users can easily acquire binaries or librar-
ies by providing the operating system release. The important part is that there is no
need to install a new operating system; containers just get the kernel of the operating
system, which may be shared among containers. In comparison to new system installa-
tion, which can take tens of gigabytes and boot slowly, container images are often tens
of megabytes in size and start quickly.
Docker images are a lightweight, stand-alone, compact and executable package
that includes all the necessary requirements to run software including code, libraries,
environment variables and configuration files [1]. Due to this property, an image is
heavily customizable based on the contents with a little change. For that users create

13
2202 A. Nandi et al.

Fig. 2  Overview of docker ecosystem

a Dockerfile with a simple syntax for defining the steps needed to build an image and
run it. In summary, Dockerfile is the builder of an image and a container is a runnable
instance of an image. In Fig. 2 the Docker ecosystem is presented.

3.3 Federated learning

The Federated Learning ( [6, 35]) approach intends to provide system support
for cooperatively training machine learning (ML) or deep learning (DL) models
using distributed data silos while maintaining privacy and model performance.
The main system design to support training models "in-place", which differs
from traditional ML/DL model training, in which data is collected and managed
centrally over a fully controlled distributed cluster [36]. The main advantage of
such an "in-place" training system (FL) is that it facilitates privacy and security
protections, which have led to new legislation such as the General Data Protec-
tion Regulation (GDPR) and the Health Insurance Portability and Accountabil-
ity Act (HIPAA), which prohibits transferring user private data to a centralized
location [36].
Based on the scale of the federation, FL is divided into two types – cross-silo and
cross-device. In the case of cross-silo, the number of clients is usually small but has

13
A Docker‑based federated learning framework design and… 2203

Fig. 3  System overview of cross-device federated learning

huge computational ability [37]. On the other hand, when it comes to cross-device the
number of clients are enormous and have low computation power. From the model
training and availability aspect, the organization (cross-silo) to be available is not the
same as mobile devices (cross-device) [28]. In cross-device the poor network and lim-
ited computation resources can hamper the device’s availability. In this work, we have
considered the cross-device federation approach for our DFL development. In Fig. 3,
the system overview of cross-device FL is presented.
The FL has three major components as follows [7]:

• Clients These are the stakeholders in FL. Only clients have data accessibility
because data is generated at this end. The main responsibilities of clients are local
model training from the generated data at its end and sharing their model param-
eters with the global server. The important point is that no raw data is shared with
other parties anywhere.
• Server The server/global server is usually the strong computation node. It handles
the communications between the clients and itself to create the global model based
on the received local models from the clients.
• Aggregation framework FL’s aggregation framework performs two tasks: computa-
tion and communication activities. Computation occurs on both the client and the
global server, and communication occurs between clients and management. The
computational component is utilized for model training, while the communication

13
2204 A. Nandi et al.

component is used to communicate model parameters. Federated Averaging (Fed-


Avg) is FL’s most popularly used aggregation framework.

3.4 Realtime multimodal emotion classification system

ReMECS (Realtime Multimodal Emotion Classification System) is our previously


developed emotion classification system using multimodal physiological data
streams in real-time (details see [33]). In ReMECS, the multimodal data stream
arrives, and the feature extraction is done for the corresponding stream modality.
Then the extracted features are sent to the corresponding modality classifier clas-
sification. The base classifier for ReMECS was a 3-layer feed-forward neural net-
work, and the Stochastic Gradient Descent (SGD) was used to train the classifier
in an online fashion. In the end, emotion class predictions from the corresponding
classifier for each sensor modality are collected, and a decision fusion (Dynamic
Weighted Majority voting) is done to predict the final emotion prediction. How-
ever, in this study, feature fusion has been taken into consideration rather than
decision fusion to lessen the complexity of the real-time classifier inside the DFL
framework. As a result, our ReMECS is altered in this regard, but the system’s
fundamental design and operation have remained the same.

4 DFL architecture

In this section, the overall architecture of the proposed DFL framework is pre-
sented. The motivation of this work is to make easy development and deployment
of a federated learning framework using the cloud-native solution called Docker-
container which is capable of processing real-time multi-modal data streams
and handling scalability (multiple clients). The DFL architecture is presented in
Fig. 4.
From Fig. 4, it can be seen that the DFL consists of one global server/node and
multiple client nodes. At the global server, there is two layers (global model crea-
tion layer and model transfer layer) running inside the docker container. These are
coupled together, hence a multi-layer structure. The model transfer layer is mainly
the MQTT broker running whose main function is to receive the local models from
client nodes and send them to the global model creation layer, and send the created
global model to each client nodes for next round in FL. Now, global model creation
layer, as the name suggests, it is responsible for performing federated averaging on
all the local models collected at time t and creating the global model to send it to
the MQTT broker. However, selecting the appropriate protocol is totally application-
dependent; MQTT and CoAP are widely used in IoT contexts. The four key reasons
for using MQTT over CoAP in our DFL development are as follows:

• Message queuing is supported in MQTT for disconnected subscribers, but not in


CoAP.

13
A Docker‑based federated learning framework design and… 2205

Fig. 4  The proposed DFL’s architecture

• The maximum message size in MQTT is 256MB, however, it is 1152 Bytes


(1.1KB) in COAP. The message (model communicated by clients to the server)
size for our DFL framework was 25KB, however, this varies depending on neu-
ral network designs.
• MQTT works best as a live data (data in motion)communications bus. Our DFL
framework collects data in the stream and shares local models with the global
server in real-time.
• MQTT has many-to-many support, whereas CoAP is one-to-one.

In client nodes, each node runs inside the docker container (one container for each)
and connects with the global server/node. The client node and global server/node

13
2206 A. Nandi et al.

communicate via MQTT protocol. In each client node, there are three layers, the
first is the data access layer, the second one is data transfer layer and the third one
is the online ML layer. Each layer is connected with each other making it a hierar-
chical multi-layer structure as shown in Fig. 4. The data access layer is responsi-
ble for accessing the data stream acquisition and decoding the data streams. Data
transfer layer temporarily stores the data tuples received from the data access layer
in the buffer (message queue) and further sends it to the online ML layer for real-
time processing. In the online ML layer does the real-time processing part, which
involves the model testing (prediction), training and model weight sharing to the
global server for federated averaging. Inside the online ML layer, the model update
functionality is mainly responsible for sharing the model parameters (weights) with
the global server and receiving the global model from the global server; also updat-
ing each client’s local model weights with the received global model.
Last but not least, to the best of our knowledge, DFL is the distributed cloud
native federated learning framework that integrates both real-time data stream pro-
cessing applications and online ML pipelines to explore the innovative analysis of
data streams. Another speciality of DFL is that it is even suitable in the IoT environ-
ment. The source code and implementation details of our proposed DFL framework
can be found on GitHub1and the images for the global server (fed-server) side and
client side (fed-clients) can be found in DockerHub.2 , 3 , 4

5 Experimental materials and methods

To evaluate the feasibility of our proposed DFL framework in data stream process-
ing we have considered a use-case scenario where real-time emotion classification
is done using multi-modal physiological data streams. In this section, we have dis-
cussed the orchestration and management of the proposed DFL in the real infra-
structure, the dataset considered for the multi-modal data stream, the experimental
study and the steps involved in DFL framework. Along with these, the experimental
setup and the considered performance metrics are presented in the end.

5.1 Deployment in real infrastructure

The proposed DFL is deployed in Eurecat’s High-Performance Computing (HPC)


system, Datura. The reason for deploying the DFL in the Datura infrastructure is
to test the DFL framework’s behaviour in the large distributed infrastructure. That
means each component of the DFL framework can be analyzed and monitored indi-
vidually regarding performance, reliability, and scalability.

1
DFL’s source code: https://​github.​com/​offic​ialar​ijit/​Fed-​ReMECS-​Docker.
2
DockerHub: https://​hub.​docker.​com/.
3
Fed-clients: https://​hub.​docker.​com/​repos​itory/​docker/​ariji​tnandi/​fedcl​ient.
4
Fed-server: https://​hub.​docker.​com/​repos​itory/​docker/​ariji​tnandi/​fedse​rver.

13
A Docker‑based federated learning framework design and… 2207

Fig. 5  High level architecture of Eurecat’s Datura HPC

Datura is an Infrastructure as a Service (IaaS) platform providing cloud comput-


ing services to our internal data analytic projects with high computation require-
ments in Eurecat. Datura cloud consists of huge computing resources of 5.5 Tera
Bytes(TB) RAM and 5 Peta Bytes(PB) storage with high-speed internal bandwidth.
The platform is managed using the Red Hat OpenStack platform to provide and
manage the required infrastructure support. In Fig. 5 the high-level architecture of
the Datura HPC platform is presented:
The deployment of DFL in the real infrastructure is divided into two categories:
Global server and MQTT broker integration, Application integration on client side.
In both the global server and client incoming results visualization is available.
Global server and MQTT broker integration: The global server and MQTT bro-
ker are integrated together so that multiple clients can connect to the server for the
FL. In the global server, the federated averaging (FedAvg) script runs which takes
all the local model weights as input and produces a global model. The FedAvg is
developed using Python 3.7 and TensorFlow 2.0.5 For the MQTT broker, we have
used an open-source and distributed IoT message broker framework called EMQ X
broker.6 The reason for choosing EMQ X as the MQTT broker is because it is the
most scalable MQTT Broker for IoT and connects 100 M+ IoT devices in 1 cluster
at 1ms latency (as mentioned in the official website.7) Having said that, both the
global server and EMQ X broker of DFL run inside a docker container at Datura

5
https://​www.​tenso​rflow.​org/.
6
https://​github.​com/​emqx/​emqx.
7
https://​www.​emqx.​io/.

13
2208 A. Nandi et al.

Fig. 6  Global server-side CLI output

HPC. The detailed overview of this integration is pictorially shown in Fig. 4 (see the
"Global server" part). A snippet of the global server-side Command Line Interface
(CLI) output only accessible for the DFL maintainer (not accessible from clients) is
presented in Fig. 6.
Application integration in client-side: This integration runs on the client side.
When the end users (clients) connect this integration runs. The details of this inte-
gration are mentioned in the DFL architecture (see Fig. 4, especially the client
node). For this implementation, we have made this very simple and it’s shown in
CLI. In the CLI, the current data stream classifier’s performance details will be
shown, along with the classifier’s real-time prediction vs the actual class of the cur-
rent data tuple in the stream. A sample view of the CLI on the client side is shown in
Fig. 7. The admin’s side CLI (Fig. 6) view is for code debugging to any sort of error
handling. On the other hand, client-side CLI (Fig. 7) is for just visualizing the cur-
rent emotional state. Because later, we will be showing this in a GUI from the CLI
for better visual, just an add-on to the existing DFL framework, a cosmetic change.

5.2 Dataset description

To assess the feasibility and effectiveness of our proposed DFL framework in real-
time emotion classification, we have used the most popular and widely used bench-
mark dataset DEAP [38]. The following is a brief description of the DEAP8 [38]
(Database for Emotion Analysis using Physiological Signals) dataset. In the DEAP
dataset, Electrodermal Activity (EDA) signal is available in channel no. 37 and

8
DEAP dataset link: https://​www.​eecs.​qmul.​ac.​uk/​mmv/​datas​ets/​deap/.

13
A Docker‑based federated learning framework design and… 2209

Fig. 7  Client’s side CLI view

Respiratory Belt (RB) signal is in channel no. 38. In this experiment, the EDA
and RB signals are considered for the multi-modal physiological data stream (see
Table 1 for a brief description).

5.3 Experimental study

The following steps are the experimental study of our proposed DFL framework for
real-time emotion classification using multi-modal physiological data streams (for
example, EDA+RB data streams for DEAP dataset):

13
2210 A. Nandi et al.

Table 1  Brief description of Description Details


DEAP dataset
No. participants 32 (16 male-16 female)
Stimuli 40 music videos of length 60 s each
Physiological signals EEG, EDA, RB
No. channels Total of 47 channels (32 channels for
EEG, 12 channels of peripheral, 3
unused)
Saved data file system .dat file of total 32 participants
Data representation 3D matrix representation 40 × 40 ×
8064, which represents video/trial ×
channel × data
Label representation 2D matrix representation (40 × 4)
Available emotion labels Valence, arousal, dominance and liking

Table 2  Discrete emotion Valence–Arousal level Discrete emotions


mapping using valence-arousal
state LV-LA Sad (0)
LV-MA Miserable (1)
LV-HA Angry (2)
MV-LA Tired (3)
MV-MA Neutral (4)
MV-HA Tense (5)
HV-LA Calm (6)
HV-MA Happy (7)
HV-HA Excited (8)

– Data set consideration and data rearrangement: The multi-modal data stream is
created using the pre-processed multi-modal DEAP data. The DEAP is stored in
3D matrix form, so a data rearrangement is conducted to transform the data to
1D matrix form for a simpler understanding of EDA and RB data. The represen-
tation is as follows:
[participant, video, data, valence class, arousal class]

In the experiment with the DEAP dataset, the EDA and RB multi-modal data
streams are utilized to classify discrete emotion states based on valence-arousal
measures. While streaming from the DEAP dataset, an automated mapping of the
valence (V) and arousal (A) values to 0–1 is performed. Based on our previous
experiment Fed-ReMECS [7], we followed the same V-A mapping and the next
step discrete emotion conversion in the experiment. In Table 2 the discrete emo-
tion labels (where L-Low, M-Middle and H-High) are presented:

13
A Docker‑based federated learning framework design and… 2211

– Stream reading: For each participant’s data streaming in the multi-modal data
(DEAP) a non-overlapping sliding window protocol is used. As the physiolog-
ical data recordings in the DEAP are 60 s long [38]; therefore the sliding win-
dow size can go to a maximum of 60 s. However, for this experiment, we set it
at 30 s (taken from previous literature [39, 40]). The multi-modal data stream
rate for the DEAP data is approximately 9Mb/30 s.
– Feature extraction and fusion: The wavelet feature extraction approach is
employed in this experiment to extract features from multi-modal signal
streams (EDA and RB signals from the DEAP dataset). The wavelet Daube-
chies 4 (Db4) is the base function used for feature extraction. In our experi-
ment, we decompose EDA and RB into three levels. A feature fusion tech-
nique (concatenation approach) combines the collected features from the EDA
and RB modalities. The fused features are subsequently passed to the client-
side emotion classifier.
– Emotion classifier: A three-layer Feed Forward Neural Network (FFNN) is used
as the basis classifier to categorize the discrete emotion labels (in Table 2) in
real-time from multi-modal input streams (EDA and RB). The effectiveness of
FFNN best classifier for real-time emotion classification from multi-modal phys-
iological data streams is already been established in our previous work [33]. The
9 different discrete emotion classes are in Table 2.
– Local model test-train: In online fashion, the FFNN model is trained using Sto-
chastic Gradient Descent (SGD). As mentioned before, the interleaved test-then-
train technique is the evaluator of the base classifier [19]. It validates the model
before training and then updates the performance metrics using the received
data tuple. That means the base classifier is evaluated on the newly received data
tuples. The basic classifier initially performs poorly, but when it encounters more
data tuples from multi-modal data streams, it develops stability and increases
performance.
– Local model sending and global model receiving: The required time to send the
local model and receive the global model varies from problem to problem and
also depends on the developer of the experiment. In this experiment, we assess
the transmitting and receiving time using the DEAP dataset experimental design.
In the DEAP dataset, each participant views a 60-second video at a time, there-
fore the local model is constructed at that time. After each 60-second video is
completed, the local model is transferred to the MQTT broker, and then to the
global server for global model creation. Once the global model is built, the global
server transmits it to the MQTT broker, who then delivers it to all of the clients
involved in the federated learning. Finally, all of the clients update their local
models using the received global model (see Fig. 8 for better understanding).
– Global model creation: The global server is in charge of constructing the global
model after performing Federated Averaging (FedAvg) on all of the received
local models at a certain point in time. The FedAvg formula is as follows (in
Eq. (1)):

13
2212 A. Nandi et al.

Fig. 8  The proposed DFL framework

|nT|
g 1 ∑ l
wt = w (1)
|nT| i=1 t−1,i

where wt is the global model created at time t, |nT| is the total number of the
g

local model received at the global server, wlt−1,i is the local model received from
all clients at time t. For this work, we assumed full participation from all avail-
able clients in FedAvg, but clients can join and leave in the FL process at any
point in time, hence the framework is highly asynchronous.
– Local model update: When each client receives the global model, its local
model is updated with the global model, and the next federated learning iter-
ation begins at each client side. The following Eq: 2 is used to update the
weight of each local model.

13
A Docker‑based federated learning framework design and… 2213

Fig. 9  The DFL framework’s sequence diagram for real-time emotion classification from multi-modal
physiological signals

g g
wlt+1,i ← wt − 𝜆▽wgt L(wt ) (2)

Where L is the loss function, ▽wgt is the local model gradient of each client and 𝜆
is the learning rate. It is worth noting that the categorical cross-entropy loss func-
tion is utilized to train the FFNN base classifier. The mathematical formula of the
categorical cross-entropy loss function is in Eq. 3.
|nC|

L=− yoc log(poc ) (3)
i=1

Where |nC| is the number of classes (in our experiment, there are a total of 9 emo-
tion class labels), y is the binary indicator (0 or 1) of the class label c for the obser-
vation o, and p is the projected probability that observation o belongs to class c [7].
Figures 8 and 9 show the proposed DFL architecture and the sequence diagram,
respectively, while real-time emotion classification from multi-modal physiological
data stream is performed on Eurecat’s Datura HPC platform.

13
2214

13
Table 3  Hardware configurations and software specifications of DFL testbed
Features IoT broker Client nodes Global server

Node types One x86 computers One x86 computers One x86 computers
CPU specifications Core: 16 Arch: vCPU-8192 Core: 16 Arch: vCPU-8192 Core: 16 Arch: vCPU-8192
Resources RAM:16GB Storage: 1TB RAM:16GB Storage: 1TB RAM: 16GB Storage:1TB
Operating system Ubuntu 20.04 Ubuntu 20.04 Ubuntu 20.04
Software components EMQX: v5.0.15 Paho-mqtt: v1.6.1 Docker: v20.10.21 Tensorflow: v2.3 Docker: v20.10.21 Tensorflow: v2.3
A. Nandi et al.
A Docker‑based federated learning framework design and… 2215

5.3.1 Experiment and parameter setup

In the DFL deployment, we used two servers. On one, the global server and EMQ
X broker running, and on the other server, the clients are created. The hardware
configurations and software specifications for the DFL framework are presented in
Table 3.

5.3.2 Performance metrics

For the classifiers performance evaluation, accuracy (Acc) and F1micro score are
used. The metrics are calculated as follows [41]:
∑�nC� TPi +TNi

(4)
i=1 TPi +FNi +FPi +TNi
Acc =
�nC�

Premicro ∗ Recmicro
F1micro = 2 ∗ (5)
Premicro + Recmicro

where |nC| is the number of classes. True positives (TPi ), True negatives (TNi ), False
positives ( FPi ) and False negatives ( FNi ). The FMmicro is the weighted average of
Precision ( Premicro) and Recall ( Recmicro). Therefore, this score takes both false posi-
tives and false negatives into account.

5.4 Results, analysis and discussion

In this section, we present the experimented results of our proposed DFL framework
for real-time emotion classification from a multi-modal physiological data stream.
For the multi-modal physiological data stream the popular DEAP benchmark data-
set is used. The proposed DFL is tested under different numbers of clients running
in parallel. We have considered 6 different client settings (5, 10, 15, 20, 25, 32);
meaning the first experiment is conducted using 5 clients running in parallel, the
second one is 10 clients running in parallel and so on. Running clients in parallel
means, at each client side the data reading and sending for processing is done using
ReMECS approach (see Sect. 3 for more details) but with a twist that instead of
decision fusion we used feature fusion in the DFL framework to reduce the compu-
tation. That means when clients connect to the server it runs the ReMECS at their
end.
The performance of our proposed DFL framework is examined in two different
ways (1) Scalability vs performance test (see Sect. 5.4.1) and (2) Memory-CPU con-
sumption test (see Sect. 5.4.2).
Apart from the different client settings comparison, we have further compared
(see Sect. 5.4.3 for more details) the proposed DFL framework with the existing lit-
erature based on the following criteria:

13
2216 A. Nandi et al.

Table 4  Average testing Number of clients Avg. accuracy Avg. F1micro


accuracy and F1micro score of
the global model 5 0.8153 (±0.33) 0.8153 (±0.33)
10 0.8230 (±0.31) 0.8230 (±0.31)
15 0.8128 (±0.29) 0.8128 (±0.29)
20 0.8188 (±0.26) 0.8188 (±0.26)
25 0.7940 (±0.24) 0.7940 (±0.24)
32 0.8215 (±0.20) 0.8215 (±0.20)

Fig. 10  The overall performance of the global model in terms of accuracy and F1micro while real-time
emotion classification under different numbers of clients

• Infrastructure-based (Centralised vs Distributed) and Training mode (Batched


vs Online (streaming/real-time)) works: In this comparison, we have considered
state-of-the-art (SOTA) studies that utilizes the same DEAP dataset, an ML/DL-
based classifier for emotion classification using multi-modal physiological data.
Additionally, the effectiveness of the emotion classifiers in distributed and cen-
tralized modes are compared. Furthermore, we have compared batch mode vs.
online model training methods with the same objective.

5.4.1 Scalability vs performance test

In this test, we have tested the scalability vs the overall performance of our DFL
framework in different numbers of client configurations. For the performance meas-
ure, the average testing accuracy and F1micro score of the global model is reported
in Table 4 and Fig. 10 shows the performance changes over the real-time emotion
classification process.
In the real-time emotion classification from multi-modal physiological data
streaming using the DFL framework, the classification is done at each client’s end.
That means the generated data is strictly accessible to each client, there is no way
to access the data from the global server. That is why the global model’s testing
accuracy and F1micro are calculated by taking the average of all local models’ perfor-
mance after updating the model with the current global model weights. Now, from

13
A Docker‑based federated learning framework design and… 2217

Table 4, we can see that the DFL framework is capable of classifying emotions in
real-time with adequate average accuracy and F1micro. Also, it is worth noticing that
the DFL framework is also capable of handling multiple clients running in parallel,
hence proving scalability.
Now, from Fig. 10, the global model’s testing performance in terms of accuracy
(left Fig. 10a) and F1micro (right Fig. 10b) has changed over the time. The reason
is because of the diverse local models received from different clients. The diver-
sity in local models is because different clients’ physiological responses are different
resulting in different characteristics in data (EDA+RB) streams, even though they
are using the same sensors. However, in some rounds, the global model’s accuracy
and F1micro have dropped because the global model performance calculation is done
by taking an average of the local models’ performance. So, a large drop in one of
the local model’s performance can cause a significant drop in global model perfor-
mance. One interesting point to notice here is that the F1-score and the accuracy are
similar because in our DFL framework, the data tuple arrives sequentially (online
scenario), and every data tuple is classified into exactly one class out of 9 emotion
classes. The performance metrics update sequentially based on every data tuple’s
arrival. Also, we have considered micro-averaging of the F1-score for our model
performance evaluation, and our classification is multi-class, hence the precision,
recall, and that are all identical to accuracy.
Nevertheless, from the average testing accuracy and F1micro scores along with the
overall performance of proposed our DFL framework in real-time emotion classifi-
cation using multi-modal data streams without accessing the sensitive data streams,
we can conclude that it has the capability of handling multiple clients in parallel
and still marinating adequate performance. Also, DFL is capable of preserving the
privacy issue by not accessing the data streams and developing a powerful global
model for real-time emotion classification.

5.4.2 Memory and CPU usage test

In this test, we have calculated the CPU and memory usage of each component in
DFL framework. The calculation is done using the default docker stats functionality,
which provides all these details. By this test, we can confirm the overall computation
cost and power consumption that is required to run the proposed DFL. In Fig. 11,
the memory and CPU consumption of the docker-containers while running different
numbers of clients is presented. This Fig. 11 will help us to understand each docker-
containers power consumption needed to do real-time emotion classification using
multi-modal data streams on the client side. Having said that, it can also give us
the idea of running this container in low-powered IoT devices. Also, in Table 5, the
CPU and memory consumption of different client containers are presented for the
better understanding of the Fig. 11.
As we can see from Fig. 11, the memory usage of each client running the con-
tainer, in the beginning, is 200 Mebibytes (MiB) (209 Mb where 1 MiB = 1,048,576
Mb, see Table 5) equals to approximately 1.3% of the total memory available and
it takes a maximum of 250 MiB (260 Mb ≈ 1.6%, see Table 5) memory out of the
total memory available in the server (mentioned in Table 3). Also, the memory

13
2218 A. Nandi et al.

Fig. 11  The memory and CPU consumption of the docker-containers while different numbers of clients
running in parallel

usage depends on the incoming data tuples and the frequency of the model sharing
happens while the FL process. On the other side, the CPU usage by each client-
side container takes a maximum of 53% approximately. The maximum CPU usage
happens when the emotion classification happens and the model update happens.
Another point here worth noticing in the CPU usage plots is that initially there is
a spike in the CPU usage. From the plots, we can see that the usage is very high.
We checked and re-run the test over and over again and it stays the same but after
some troubling shooting, we found out that it was docker containers not because

13
A Docker‑based federated learning framework design and… 2219

Table 5  The summary of the CPU and memory consumption of different client containers in different
settings
Number Client Min cpu (%) Max cpu (%) Avg cpu (%) Min Max Avg memory
of clients memory memory (MiB)
(MiB) (MiB)

5 client1 0.0 100.58 0.66 0.0 275.3 228.28


client2 0.0 100.27 0.65 0.0 276.0 229.05
client3 0.0 100.42 0.68 0.0 274.7 227.72
client4 0.0 45.21 0.58 0.0 274.2 227.59
client5 0.0 38.65 0.31 0.0 275.7 228.35
10 client1 0.0 98.64 0.59 0.0 275.6 218.1
client2 0.0 104.45 0.67 0.0 275.0 217.01
client3 0.0 97.74 0.46 0.0 275.6 217.98
client4 0.0 29.65 0.38 0.0 275.3 217.33
client5 0.0 53.43 0.42 0.0 275.4 217.45
client6 0.0 99.29 0.58 0.0 275.9 218.18
client7 0.0 99.65 0.45 0.0 275.0 216.97
client8 0.0 100.77 0.66 0.0 275.8 218.42
client9 0.0 29.76 0.41 0.0 275.1 217.55
client10 0.0 29.19 0.38 0.0 276.3 218.42
15 client1 0.0 99.03 0.5 0.0 274.9 159.66
client2 0.0 99.46 0.37 0.0 276.1 160.44
client3 0.0 99.9 0.53 0.0 275.1 160.09
client4 0.0 93.54 0.37 0.0 274.9 159.89
client5 0.0 41.59 0.32 0.0 276.0 160.57
client6 0.0 95.62 0.38 0.0 276.5 160.76
client7 0.0 102.47 0.34 0.0 276.3 160.6
client8 0.0 96.85 0.39 0.0 274.8 159.85
client9 0.0 99.15 0.44 0.0 274.9 160.21
client10 0.0 99.6 0.49 0.0 275.0 160.24
client11 0.0 30.41 0.28 0.0 276.3 160.92
client12 0.0 31.33 0.33 0.0 276.5 160.9
client13 0.0 93.24 0.4 0.0 274.9 160.07
client14 0.0 98.04 0.4 0.0 275.0 160.01
client15 0.0 99.77 0.49 0.0 274.3 159.9

13
2220 A. Nandi et al.

Table 5  (continued)
Number Client Min cpu (%) Max cpu (%) Avg cpu (%) Min Max Avg memory
of clients memory memory (MiB)
(MiB) (MiB)

20 client1 0.0 98.97 0.52 0.0 275.2 194.39


client2 0.0 99.4 0.6 0.0 275.1 194.5
client3 0.0 99.67 0.52 0.0 276.0 194.9
client4 0.0 99.22 0.53 0.0 275.4 194.45
client5 0.0 94.08 0.41 0.0 276.7 195.33
client6 0.0 96.64 0.51 0.0 275.1 194.39
client7 0.0 28.99 0.33 0.0 276.9 195.42
client8 0.0 69.51 0.47 0.0 275.2 194.42
client9 0.0 99.99 0.52 0.0 275.3 194.54
client10 0.0 100.32 0.54 0.0 276.4 195.43
client11 0.0 60.03 0.33 0.0 275.0 194.35
client12 0.0 29.55 0.33 0.0 276.4 195.2
client13 0.0 65.15 0.5 0.0 276.4 195.62
client14 0.0 96.33 0.53 0.0 275.3 194.8
client15 0.0 99.75 0.51 0.0 275.4 195.03
client16 0.0 109.99 0.44 0.0 276.4 195.69
client17 0.0 79.86 0.5 0.0 275.2 194.98
client18 0.0 99.98 0.55 0.0 276.8 196.17
client19 0.0 96.52 0.6 0.0 275.2 195.05
client20 0.0 37.23 0.38 0.0 276.3 196.01

13
A Docker‑based federated learning framework design and… 2221

Table 5  (continued)
Number Client Min cpu (%) Max cpu (%) Avg cpu (%) Min Max Avg memory
of clients memory memory (MiB)
(MiB) (MiB)

25 client1 0.0 98.26 0.42 0.0 276.3 235.42


client2 0.0 94.92 0.47 0.0 275.9 235.03
client3 0.0 99.82 0.47 0.0 275.1 234.09
client4 0.0 96.77 0.56 0.0 275.6 234.71
client5 0.0 61.57 0.55 0.0 276.4 235.22
client6 0.0 89.55 0.71 0.0 275.3 234.55
client7 0.0 96.4 0.64 0.0 275.4 234.67
client8 0.0 96.61 0.57 0.0 275.0 234.4
client9 0.0 99.01 0.49 0.0 276.5 235.83
client10 0.0 95.78 0.56 0.0 276.4 235.56
client11 0.0 28.98 0.48 0.0 276.2 235.37
client12 0.0 30.39 0.42 0.0 277.1 236.39
client13 0.0 94.05 0.58 0.0 274.9 234.71
client14 0.0 69.43 0.46 0.0 275.2 234.82
client15 0.0 98.83 0.54 0.0 276.4 235.97
client16 0.0 89.5 0.69 0.0 275.4 234.88
client17 0.0 96.98 0.52 0.0 276.6 236.55
client18 0.0 30.64 0.35 0.0 275.1 235.14
client19 0.0 54.65 0.41 0.0 275.3 235.3
client20 0.0 100.42 0.77 0.0 275.3 235.42
client21 0.0 72.31 0.54 0.0 276.4 236.2
client22 0.0 99.02 0.54 0.0 276.5 236.73
client23 0.0 99.63 0.59 0.0 275.7 236.0
client24 0.0 99.78 0.73 0.0 274.9 235.44
client25 0.0 100.17 0.65 0.0 275.2 235.57

13
2222 A. Nandi et al.

Table 5  (continued)
Number Client Min cpu (%) Max cpu (%) Avg cpu (%) Min Max Avg memory
of clients memory memory (MiB)
(MiB) (MiB)

32 client1 0.0 97.2 0.45 0.0 274.9 134.14


client2 0.0 99.68 0.4 0.0 274.5 133.94
client3 0.0 100.1 0.35 0.0 276.5 135.04
client4 0.0 99.58 0.39 0.0 275.7 134.48
client5 0.0 30.16 0.26 0.0 275.2 134.17
client6 0.0 47.14 0.26 0.0 276.3 134.63
client7 0.0 100.46 0.37 0.0 274.9 133.97
client8 0.0 98.59 0.39 0.0 276.2 134.89
client9 0.0 99.08 0.39 0.0 275.0 134.28
client10 0.0 29.45 0.25 0.0 276.0 134.74
client11 0.0 95.69 0.31 0.0 276.1 134.68
client12 0.0 95.39 0.38 0.0 275.5 134.57
client13 0.0 41.23 0.26 0.0 276.8 135.19
client14 0.0 28.64 0.24 0.0 276.7 135.05
client15 0.0 89.53 0.29 0.0 275.2 134.34
client16 0.0 97.38 0.35 0.0 275.6 134.68
client17 0.0 98.1 0.43 0.0 275.2 134.6
client18 0.0 35.31 0.25 0.0 275.5 134.63
client19 0.0 94.11 0.32 0.0 275.5 134.8
client20 0.0 97.75 0.33 0.0 275.7 134.85
client21 0.0 98.88 0.41 0.0 275.4 134.89
client22 0.0 92.33 0.35 0.0 276.8 135.24
client23 0.0 32.25 0.25 0.0 274.6 134.29
client24 0.0 52.54 0.26 0.0 275.2 134.63
client25 0.0 96.13 0.33 0.0 275.1 134.82
client26 0.0 126.41 0.43 0.0 275.9 135.08
client27 0.0 42.65 0.25 0.0 275.4 134.84
client28 0.0 42.95 0.29 0.0 275.0 134.62
client29 0.0 80.54 0.27 0.0 274.1 134.11
client30 0.0 99.27 0.42 0.0 276.5 135.64
client31 0.0 98.96 0.39 0.0 274.0 134.4
client32 0.0 29.55 0.23 0.0 275.5 135.01

13
A Docker‑based federated learning framework design and… 2223

Fig. 12  The memory and CPU consumption of the EMQ X MQTT Broker in different numbers of clients
setting

Fig. 13  The memory and CPU consumption of the FedAvg component in different numbers of clients
setting

of the processes running inside the client containers. Nevertheless, from the overall
comparison, we can see that the docker-container at the client side takes less com-
putation to do the emotion classification, hence it is capable of integration into IoT
devices (low-powered devices).
On the other side, the memory and CPU consumption is also calculated using
the docker stats functionality. As the global server has two major (the MQTT bro-
ker and the FedAvg component) components running, we have calculated both of
the component’s power consumption separately. In Figs. 12 and 13, the memory
and CPU consumption of the MQTT broker and FedAvg component are presented,
respectively. In these plots (Figs. 12 and 13), on the X-axis, the time (in sec) and the
Y-axis, the memory (in MiB) consumption change and also the CPU consumption
change over time are presented, respectively. Here the time (in sec) means how long
the DFL framework runs for the emotion classification test case.
From the memory and CPU consumption Fig. 12 of EMQ X MQTT broker, we
can see that the memory usage (see Fig. 12a) increases when the number of clients
increases and takes part in the federated learning. In our experiment, the highest

13
2224 A. Nandi et al.

number of clients is 32 and in this case, the memory usage is about above 150 MiB.
Similarly in the CPU usage plot (see Fig. 12b), we can see the maximum usage is
6% and other than that it’s below 4%.
Now, in the FedAvg component at the global server, the memory consumption is
around 210 MiB (shown in Fig. 13a). Memory usage increases when the number of
clients increases in the FL. As the FedAvg is done by aggregating all the local models
received at some point of time and to do so the FedAvg component uses a queue to
hold all the incoming local models before aggregation. Similarly, the CPU usage of the
FedAvg component is a maximum of 37% as shown in Fig. 13b. The CPU usage goes
higher when there are local models coming for the global model creation and the rest of
the time the CPU usage is below 1%.

5.4.3 Comparison with state‑of‑the‑art works

In [42], the authors have developed an 1D Convolutional Neural Network (CNN) Auto
Encoder (AE) model (i.e., 1D-CNNAE) for real-time emotion classification (2-class
i.e. valence and arousal) using photoplethysmogram (PPG) and galvanic skin response
(GSR) signals. The proposed 1D-CNNAE model’s efficiency is evaluated using DEAP
datadset. In another recent work [43], researchers have developed emotion aware
healthcare systems utilizing multi-modal physiological signals (such as PPG, RB and
fingertip temperature (FTT) sensors). To accomplish the multi-modal emotion clas-
sification authors have used decision-level fusion and the base emotion classifier is
Random Forest (RF). Another very interesting work in [44], authors have proposed an
emotion-based music recommendation framework, which gauges a user’s mood based
on signals from wearable physiological sensors. To recognize the emotions authors
have used decision tree (DT), RF, SVM and k-nearest neighbors (k-NN) algorithms
with/out feature fusion from GSR and PPG. Authors in [45] have used an unsupervised
deep belief network (DBN) for depth level feature extraction form the fused observa-
tions from EDA, PPG and Zygomaticus Electromyography (zEMG) sensors signals.
After that, a feature-fusion vector is created by combining the DBN-produced features
with statistical features from EDA, PPG, and zEMG. The fused feature vector is then
used to classify five basic emotions namely Happy, Relaxed, Disgust, Sad and Neutral.
In order to classify these 5 basic emotion, the Fine Gaussian Support Vector Machine
(FGSVM) is used with radial basis function kernel is used in the end. Similarly, in [46]
authors have proposed a substructure-based joint probability domain adaptation algo-
rithm (SSJPDA) to combat the noise impact of physiological signals. By using this
approach, the sample level matching’s noise susceptibility and the domain level match-
ing’s excessive roughness is avoided. In Table 6, the comparison between our proposed
DFL and the selected state-of-the-art literature is presented.
From the comparison presented in Table 6, we can see that our proposed DFL
approach has performed better in classifying more granular emotion labels among
other considered works expect the work present in [45]. Still our proposed DFL is bet-
ter than [45] is because our proposed DFL real-time, distributed and the base classi-
fier is less complex that the DBN. In addition, the DFL approach provides the advan-
tage of data diversity by gathering data from more subjects if required and also secures

13
Table 6  Comparison with the selected state-of-the-art works for emotion classification using multi-modal physiological signals
Reseacrch ML/DL model Different sensor Training mode [Offline vs Infrastructure Emotion class Performance (Accuracy)
modality (Multi- Online (streaming/real- (Centralised vs
modal) time)] Distributed)

[42] 1D-CNNAE PPG and GSR Offline Centralized 2 Valence (81.33%) Arousal (80.25%)
[43] RF RB, PPG and FTT Offline Centralized 2 Valence (73.08%) Arousal (72.18%)
A Docker‑based federated learning framework design and…

[46] SSJPDA EEG, ECG and GSR Offline Centralized 2 Valence (63.6%) Arousal (64.4%)
[44] RF, k-NN and DT PPG and GSR Offline Centralized 2 Valence (72.06%) Arousal (71.05%)
[45] DBN GSR, PPG and zEMG Offline Centralized 5 89.53%
Our proposed DFL 3-layer FFNN GSR and RB Online Distributed 9 82.15%
2225

13
2226 A. Nandi et al.

the sensitive data better than the centralized approaches by training the models locally
where the data is accessible to only the corresponding end user not other parties.

6 Conclusion and future work

In this paper, we have discussed and analyzed the easy development and deployment
of the federated learning framework using cloud-native solutions such as Docker-
Containers called DFL in an HPC environment. We mainly emphasize the easy
deployment of the federated learning framework using the docker by ensuring scal-
ability, fewer hardware resources consumption, privacy-preserving, and IoT environ-
ment friendly. We have deployed our proposed DFL in a real infrastructure at Eure-
cat’s HPC system (Datura) using the benchmark DEAP dataset for real-time emotion
classification from multi-modal physiological data streams. An extensive experi-
mental study is conducted on efficiency, memory usage, and CPU consumption by
varying numbers of clients running in parallel. The results show that the DFL can
handle multiple clients running in parallel. The overall performance is good regard-
ing average accuracy and F1micro while classifying real-time emotions from multi-
modal data streams. Having said that, the DFL ensures privacy preservation by not
accessing (generated data streams are only accessible to the clients) clients’ data to
develop a robust global model.
In our future work, we plan to extend the development of the DFL framework in
an application back end by adding GUI functionality and database storage at the cli-
ents’ end and supporting other ML/DL models. Also, we have plans to add different
protocols other than MQTT and test its efficiency. With these additional functions,
we plan to make it an open-source project so other researchers can use it.
Acknowledgements Arijit Nandi is a fellow of Eurecat’s "Vicente López" PhD grant program. This study
has been partially funded by ACCIÓ, Spain (Pla d’Actuació de Centres Tecnológics 2021) under the pro-
ject TutorIA. We would like to thank the authors of DEAP dataset [38] for sharing with us.

Author contributions All authors contributed to designing the model and the computational framework,
implementation, analysis of the results and writing of the manuscript.

Funding Project TutorIA, ACCIÓ, Generalitat de Catalunya, Spain.

Availability of data and materials Publically available DEAP dataset [38].

Code availability https://​github.​com/​offic​ialar​ijit/​Fed-​ReMECS-​Docker.

Declarations
Conflict of interest Not applicable.

Ethics approval Not applicable.

Consent to participate Not applicable.

Consent for publication All authors have agreed on the publication.

13
A Docker‑based federated learning framework design and… 2227

References
1. Kim J, Kim D, Lee J (2021) Design and implementation of kubernetes enabled federated learning
platform. In: 2021 international conference on information and communication technology conver-
gence (ICTC), pp. 410–412. https://​doi.​org/​10.​1109/​ICTC5​2510.​2021.​96209​86
2. Shivadekar S, Mangalagiri J, Nguyen P, Chapman D, Halem M, Gite R (2021) An intelligent paral-
lel distributed streaming framework for near real-time science sensors and high-resolution medi-
cal images. In: 50th international conference on parallel processing workshop. ICPP Workshops
’21. Association for computing machinery, New York, NY, USA. https://​doi.​org/​10.​1145/​34587​44.​
34740​39
3. Chen Z, Liao W, Hua K, Lu C, Yu W (2021) Towards asynchronous federated learning for hetero-
geneous edge-powered internet of things. Digital Commun Netw 7(3):317–326. https://​doi.​org/​10.​
1016/j.​dcan.​2021.​04.​001
4. Abreha HG, Hayajneh M, Serhani MA (2022) Federated learning in edge computing: a systematic
survey. Sensors. https://​doi.​org/​10.​3390/​s2202​0450
5. Wan X, Guan X, Wang T, Bai G, Choi B-Y (2018) Application deployment using microservice and
docker containers: Framework and optimization. J Netw Comput Appl 119:97–109. https://​doi.​org/​
10.​1016/j.​jnca.​2018.​07.​003
6. McMahan B, Moore E, Ramage D, Hampson S, Arcas BAy (2017) Communication-efficient learn-
ing of deep networks from decentralized data. In: Singh A, Zhu J (eds.) Proceedings of the 20th
international conference on artificial intelligence and statistics. Proceedings of machine learning
research, vol 54, pp 1273–1282. https://​proce​edings.​mlr.​press/​v54/​mcmah​an17a.​html
7. Nandi A, Xhafa F (2022) A federated learning method for real-time emotion state classification
from multi-modal streaming. Methods 204:340–347. https://​doi.​org/​10.​1016/j.​ymeth.​2022.​03.​005
8. Novakouski M, Lewis G (2021) Operating at the edge. Carnegie Mellon University’s Software
Engineering Institute Blog. Accessed 2023 Jan 24 (2021). http://​insig​hts.​sei.​cmu.​edu/​blog/​opera​
ting-​at-​the-​edge/
9. Pitstick K, Ratzlaff J (2022) Containerization at the Edge. Carnegie Mellon University’s Software
Engineering Institute Blog. Accessed 24 Jan 2023 (2022). http://​insig​hts.​sei.​cmu.​edu/​blog/​conta​
ineri​zation-​at-​the-​edge/
10. Damián Segrelles Quilis J, López-Huguet S, Lozano P, Blanquer I (2023) A federated cloud archi-
tecture for processing of cancer images on a distributed storage. Futur Gen Comput Syst 139:38–52.
https://​doi.​org/​10.​1016/j.​future.​2022.​09.​019
11. Zou Z, Xie Y, Huang K, Xu G, Feng D, Long D (2022) A docker container anomaly monitoring sys-
tem based on optimized isolation forest. IEEE Trans Cloud Comput 10(1):134–145. https://​doi.​org/​
10.​1109/​TCC.​2019.​29357​24
12. Zhuang W, Gan X, Wen Y, Zhang S (2022) Easyfl: a low-code federated learning platform for dum-
mies. IEEE Internet Things J 9(15):13740–13754. https://​doi.​org/​10.​1109/​JIOT.​2022.​31438​42
13. Caldas S, Duddu SMK, Wu P, Li T, Konečnỳ J, McMahan HB, Smith V, Talwalkar A (2018) Leaf: a
benchmark for federated settings. arXiv preprint arXiv:​1812.​01097
14. Tensorflow Federated. https://​www.​tenso​rflow.​org/​feder​ated
15. Ryffel T, Trask A, Dahl M, Wagner B, Mancuso J, Rueckert D, Passerat-Palmbach J (2018) A
generic framework for privacy preserving deep learning. arXiv preprint arXiv:​1811.​04017
16. FederatedAI: Federatedai/Fate: An Industrial Grade Federated Learning Framework. https://​github.​
com/​Feder​atedAI/​FATE
17. Ma Y, Yu D, Wu T, Wang H (2019) Paddlepaddle: an open-source deep learning platform from
industrial practice. Front Data Comput 1(1):105–115
18. Nokleby M, Raja H, Bajwa WU (2020) Scaling-up distributed processing of data streams for
machine learning. Proc IEEE 108(11):1984–2012. https://​doi.​org/​10.​1109/​JPROC.​2020.​30213​81
19. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: Massive online analysis. J Mach Learn Res
11:1601–1604
20. Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfharinger B, Holmes G, Abdessalem T
(2017) Adaptive random forests for evolving data stream classification. Mach Learn 106:1469–1495
21. Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting con-
cepts. J Mach Learn Res 8(91):2755–2790
22. Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments.
IEEE Trans Neural Netw 22(10):1517–1531. https://​doi.​org/​10.​1109/​TNN.​2011.​21604​59

13
2228 A. Nandi et al.

23. Polikar R, Upda L, Upda SS, Honavar V (2001) Learn++: an incremental learning algorithm for
supervised neural networks. IEEE Trans Syst Man Cybernet Part C (Appl Rev) 31(4):497–508.
https://​doi.​org/​10.​1109/​5326.​983933
24. Gomes HM, Read J, Bifet A (2019) Streaming random patches for evolving data stream classifica-
tion. In: 2019 IEEE international conference on data mining (ICDM), pp 240–249. https://​doi.​org/​
10.​1109/​ICDM.​2019.​00034
25. Haddadpour F, Kamani MM, Mokhtari A, Mahdavi M (2020) Federated learning with compression:
unified analysis and sharp guarantees. arXiv preprint arXiv:​2007.​01154
26. He C, Li S, So J, Zhang M, Wang H, Wang X, Vepakomma P, Singh A, Qiu H, Shen L, Zhao P,
Kang Y, Liu Y, Raskar R, Yang Q, Annavaram M, Avestimehr S (2020) Fedml: a research library
and benchmark for federated machine learning. arXiv:​2007.​13518
27. Abdulrahman S, Tout H, Ould-Slimane H, Mourad A, Talhi C, Guizani M (2021) A survey on
federated learning: the journey from centralized to distributed on-site learning and beyond. IEEE
Internet Things J 8(7):5476–5497. https://​doi.​org/​10.​1109/​JIOT.​2020.​30300​72
28. Arafeh M, Otrok H, Ould-Slimane H, Mourad A, Talhi C, Damiani E (2023) Modularfed: lev-
eraging modularity in federated learning frameworks. Internet of Things 22:100694. https://​doi.​
org/​10.​1016/j.​iot.​2023.​100694
29. Ismail BI, Mostajeran Goortani E, Ab Karim MB, Ming Tat W, Setapa S, Luke JY, Hong Hoe
O (2015) Evaluation of docker as edge computing platform. In: 2015 IEEE conference on open
systems (ICOS), pp 130–135. https://​doi.​org/​10.​1109/​ICOS.​2015.​73772​91
30. Anderson C (2015) Docker [software engineering]. IEEE Softw 32(3):102–3. https://​doi.​org/​10.​
1109/​MS.​2015.​62
31. Ismail BI, Jagadisan D, Khalid MF (2011) Determining overhead, variance & isolation metrics
in virtualization for iaas cloud. In: Lin SC, Yen E (eds) Data driven e-Science. Springer, New
York, NY, pp 315–330
32. Felter W, Ferreira A, Rajamony R, Rubio J (2015) An updated performance comparison of vir-
tual machines and linux containers. In: 2015 IEEE International Symposium on Performance
Analysis of Systems and Software (ISPASS), pp. 171–172. https://​doi.​org/​10.​1109/​ISPASS.​
2015.​70958​02
33. Nandi A, Xhafa F, Subirats L, Fort S (2021) Real-time multimodal emotion classification system
in e-learning context. In: Proceedings of the 22nd engineering applications of neural networks
conference, pp 423–435
34. Wan Z, Zhang Z, Yin R, Yu G (2022) Kfiml: Kubernetes-based fog computing iot platform for
online machine learning. IEEE Internet Things J 9(19):19463–19476. https://​doi.​org/​10.​1109/​
JIOT.​2022.​31680​85
35. Zhang Y, Jiang C, Yue B, Wan J, Guizani M (2022) Information fusion for edge intelligence: a
survey. Inf Fusion 81:171–186
36. Zawad S, Yan F, Anwar A (2022) In: Ludwig, H., Baracaldo, N. (eds.) Introduction to federated
learning systems, pp. 195–212. Springer, Cham. https://​doi.​org/​10.​1007/​978-3-​030-​96896-0_9
37. Chahoud M, Otoum S, Mourad A (2023) On the feasibility of federated learning towards on-
demand client deployment at the edge. Inf Process Manag 60(1):103150. https://​doi.​org/​10.​
1016/j.​ipm.​2022.​103150
38. Koelstra S, Muhl C, Soleymani M, Lee J-S, Yazdani A, Ebrahimi T, Pun T, Nijholt A, Patras
I (2012) Deap: a database for emotion analysis;using physiological signals. IEEE Trans Affect
Comput 3(1):18–31
39. Ayata D, Yaslan Y, Kamaşak M (2016) Emotion recognition via random forest and galvanic skin
response: comparison of time based feature sets, window sizes and wavelet approaches. In: Med-
ical technologies national congress, pp 1–4
40. Candra H, Yuwono M, Chai R, Handojoseno A, Elamvazuthi I, Nguyen HT, Su S (2015) Investi-
gation of window size in classification of EEg-emotion signal with wavelet entropy and support
vector machine. In: 37th annual international conference of the IEEE EMBS, pp 7250–7253
41. Nandi A, Jana ND, Das S (2020) Improving the performance of neural networks with an ensem-
ble of activation functions. In: 2020 international joint conference on neural networks (IJCNN),
pp 1–7. https://​doi.​org/​10.​1109/​IJCNN​48605.​2020.​92072​77
42. Kang D-H, Kim D-H (2022) 1d convolutional autoencoder-based ppg and gsr signals for real-
time emotion classification. IEEE Access 10:91332–91345. https://​doi.​org/​10.​1109/​ACCESS.​
2022.​32013​42

13
A Docker‑based federated learning framework design and… 2229

43. Ayata D, Yaslan Y, Kamasak EM (2020) Emotion recognition from multimodal physiological
signals for emotion aware healthcare systems. J Med Biol Eng 149–157
44. Ayata D, Yaslan Y, Kamasak ME (2018) Emotion based music recommendation system using
wearable physiological sensors. IEEE Trans Consum Electron 64(2):196–203. https://​doi.​org/​10.​
1109/​TCE.​2018.​28447​36
45. Hassan MM, Alam MGR, Uddin MZ, Huda S, Almogren A, Fortino G (2019) Human emo-
tion recognition using deep belief network architecture. Inf Fusion 51:10–18. https://​doi.​org/​10.​
1016/j.​inffus.​2018.​10.​009
46. Fu Z, Zhang B, He X, Li Y, Wang H, Huang J (2022) Emotion recognition based on multi-modal
physiological signals and transfer learning. Front Neurosci. https://​doi.​org/​10.​3389/​fnins.​2022.​
10007​16

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and
applicable law.

13

You might also like