A Framework For Monitoring Network Node Failure Using Mobile Agents
A Framework For Monitoring Network Node Failure Using Mobile Agents
ISSN No:-2456-2165
Abstract:- Fault detection is an essential aspect of information overwhelming the network becomes especially
conducting fault diagnosis for computer networks. It severe, particularly since a quick solution is imperative.
comprises of two phases: fault detection and fault Swift diagnosis and resolution of the problem either through
localization. The use of mobile agents for detecting faulty automated means or by informing and guiding a human
nodes on a network is a concept aimed at ensuring the operator on the appropriate course of action becomes
proper functioning of networks. This research aims to crucial. Devices such as routers, hubs, servers, and more are
design a fault detection framework for a network system monitored by the manager and when there are faults within
using a mobile agent. Light Weight Agent (LWA) travels the network, the application manager within the network
within the nodes to detect nodes that are down on the notifies the network manager in real-time.
network and returns true or false along with other
information as the status of each node visited. The Operators working with large networks must remotely
system is designed using software agents. This interact with numerous devices from their management
subsystems of the system include the Agent Controller, workstation. To cater for the diverse range of network
Server Agent, Client Agent, Check Status and the components, management applications feature a plethora of
database. The Agent Controller allocates and determines interfaces and tools. However, network management
the agent functions using a unique identification number. systems are often designed as large monoliths, making them
The server agent controls the activities of the client agent challenging to maintain.
by monitoring the migration of each of the probing
agents to each node on the network. The system is Automatic discovery is a crucial aspect of network
implemented using the Java Application Development management systems, with various objectives depending on
Environment (JADE) platform. It was tested on a the scope of the system. At its most basic level, discovery
network with twenty nodes, for five hours per day for aims to locate all devices present within the network.
twenty days. The system achieved a reliability rate of However, an expanded version of this function involves
100% for the highest and 47% for the lowest. This constructing detailed views that encompass additional
research work will be beneficial for testing the reliability information, such as the services offered by each devices
of a networking system to ensure optimal functioning. that meet specific criteria. As the process of identifying the
Future research will focus on using mobile agents to problem becomes more complex, it becomes harder to
diagnose faulty nodes on a network. implement using traditional client/server methods.
Keywords:- Mobile Agent, System Reliability, Computer This research emerged from the exigency to use an
Network, JADE, Fault Detection. agent to detect network faults/failures using intelligent
decision-making agents. It also came from the reading
I. INTRODUCTION literature reviews of previous researchers such as [6] on how
to solve the problem of a complete recovery mechanism in
Nowadays computer networks is becoming very large, case of fault/ failure within a network without simulation.
covering the vast majority of geographical locations. The study by Jian Hu et al. (2008) enables users to define
Network Management Applications (NMA) designed to their own Management Information Base (MIB) tables, but
manage network tasks such as, maintenance and this also results in increased system complexity, as mobile
administration of the network were also designed to manage agents must communicate directly with the managed system,
traditional client/server networks. However, as computer which may impact system compatibility. The primary
networks expand, the size and complexity of client/server objective of this research is to leverage mobile agents in
models are faced with the problem of scaling and flexibility managing today's large and diverse networks. Mobile agent
[6]. software objects are autonomous and can move from one
node to another, carrying logic and data to perform tasks on
Researchers in the field of software mobile agents are behalf of the user. The network management software
now focusing their attention on Network management objects based on mobile agents will be equipped with agents
systems. However, if there is a malfunction, the issue of possessing network management capabilities that will enable
In this section, reviewed literature related to network Mobile agents are programs designed to function
faults, network fault detection, Mobile Agent, system automatically moving from node to node. They can perform
reliability and Network reliability are as follows: Mobile a task on behalf of users and allow difficult tasks to be
Agents in [1], [2], [3], [4], and [40]. Network fault detection shared amongst the agents [1], [2], [3], [4]. The primary goal
in [9], [10], [11] [12], [34], [40], [43], [49]. System of using mobile agents in the management of
reliability [11], [17], [45], [48]. Characteristics of Mobile telecommunication networks is reducing network traffic by
Agents are as presented in [4], [6], and [30]. Network using load balancing and building scalable and reliable
reliability in [12], [15], [28], [29], [31], [33]. Network distributed network management systems. Some of the
management and monitoring in [33], [34], [35]. advantages of using agent technology in telecommunication
networks are as follows:
Fault Identification
Fault identification is used to understand the elemental Addresses the handling of a large volume of data that
failure mode, ascertain the margin of the fault, and find the agents can explore, gather, and filter.
core cause. Fault identification methods may differ, but the Facilitates the utilization of more intelligent techniques
strides to follow are mainly identical. to manage a network, integrate different services into
value-added services, and negotiate quality of service.
A physical fault is a type of network failure that is Promotes the development of higher-level
related to hardware issues. communication and organization within a network.
Port faults typically fall into two categories: unstable Demonstrates reactivity, as agents can promptly respond
ports and port failures.. to local events, such as link failures.
When switches or routers break down, it's often due to Exhibits robustness, as agents can perform their duties to
equipment damage resulting in abnormal network some extent, even when parts of the network are
behavior. temporarily inaccessible. This is particularly crucial in
Network card faults are considered to be a type of host mobile computing, where links can be expensive and
hardware failure and are a frequent reason for network unstable.
problems. Distributes management code to Simple Network
Management Protocol (SNMP) agents to reduce
Fault Detection bandwidth consumption in a wireless network.
Fault detection is the process of locating the existence Decentralizes network management functions by
of a fault in a network before it presents itself in the form of allowing mobile agents to autonomously and proactively
network failure and breakdown. It is the most important carry out administrative tasks, thereby reducing the
stage of network fault detection (NFD) as all of the amount of management traffic required.
subsequent processes depend on its accuracy. If the Dynamically adjusts network policies, as mobile agents
equipment is unable to identify the proper failure mode (or if can modify the underlying rules of network management
detection is incorrect and triggers false alarms), then the periodically.
This table 2 shows the cumulative figures for all the test done on all the twenty nodes on the network for twenty days. It
shows the total percentage reliability of all the nodes on the network. It also shows the reliability of the framework after all the test
has been conducted for the twenty days.
Discussion: ServerAgent and the clientAgents are containing twenty nodes on a network. The failure
connected to the network which contains twenty nodes. The frequency (f) of each node per day (twenty days) of the test
serverAgent is loaded on the single node while the rest of is also recorded as the corresponding nodes that failed
the nodes are loaded with the clientAgents. The system during the test period. The node2 as 2 failures, node3,
works on Client/Server architecture and each of the client node4, node5, node7, node9, node10, node12, node14,
nodes receives probes from the serverAgent which node15, and node17 respectively have failed only one time
consistently monitors all the LWA sent to each of the client within the twenty days test period. Node8 failed four times,
nodes. Figure 4 is a chart representation of data gotten from node11 failed nine times, node18 failed five times, node19
table 3.2 which shows the test period of twenty days failed three times and node20 failed 6 times respectively.
Fig 4 Number of Failures with Corresponding Nodes for Twenty Days Test
Discussion: The failure rate of a system is the shows the failure rate of each of the nodes on the network.
frequency at which the system fails or malfunctions over a The system was tested using twenty nodes for twenty days,
given time and it is usually expressed as the number of and the data for the failure rate for each node was calculated
failures per unit of time. The measurement depends on the from the failure frequency date in table 3.2. Figure 4.8
type of system and the data available. The data is usually shows the Failure rate of each of the nodes, starting from (λ)
obtained by monitoring the system over some time and = 0.011 for nodes that have the lowest failure rate to (λ) =
recording the number of failures that occur. Figure 4 above 0.15 for node(s) that have the highest failure rate.
Discussion: measuring the reliability of a system is reliability rate of the nodes as node1 = 100%, node2 = 0.979
important for ensuring that it performs its intended function (97%), node3 = 0.99 (99%), node4 = 0.99 (99%), node5 =
consistently and identifying potential problems before they 0.99 (99%), node6 =1, node7 =0.99(99%), node8 =0.791
occur. Figure 5 shows the reliability of each node in the (79%), node9 =0.99 (99%), node10 =0.99 (99%), node11
system. The data used for the chart is from table 1 which =0.472 (47%). node12 =0.99 (99%), node13 =1 (100%),
shows the reliability rate of each of the nodes in the node14 =0.99 (99%), node15 =0.99 (99%), node16 =1
network. Each of the nodes tested for the twenty days with (100%), node17 =0.99 (99%), node18 =0.731 (73%),
the system is shown and the corresponding calculated node19 =0.846 (84%), node20 =0.67 (67%). The node with
reliability rates are also shown. Nodes without any failure the highest reliability shows a reliability rate of 100% while
have a reliability rate of 1 which is 100% and also specify the lowest reliability rate as indicated above is 47%.
the probability that the node will not fail. The test shows the
Discussion: The system testing was done for five hours can show the total test hours, the total number of failed
every day and for twenty days, the cumulating results of the nodes, and the total working nodes. Therefore, figure 5 can
nodes from all the days are summed together and the show how reliable the system is haven is gone through the
average of the result is found. This is shown in figure 4.10 five hours daily and twenty days test period.
and shows the total failure rate, reliability, and MTBF. This