0% found this document useful (0 votes)
46 views10 pages

Project For WIN Wireless Intelligent Networks

1) Fault management (FM) is the process of reducing network outages through problem recognition, notification, diagnosis, corrective actions, and restoration. It involves collecting performance data, notifying outages, identifying faults, and resolving them remotely or through on-site fixes. 2) FM processes massive amounts of alarm data using big data analysis techniques like Hadoop since networks are now too large and complex for manual management. Real-time big data frameworks help reduce reaction times. 3) The FM process involves fault detection through monitoring, notification through alarms, diagnosis by further inspecting critical alarms, and resolution by taking corrective actions to restore service. Management staff oversee the process while help desk staff handle customer issues

Uploaded by

slo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views10 pages

Project For WIN Wireless Intelligent Networks

1) Fault management (FM) is the process of reducing network outages through problem recognition, notification, diagnosis, corrective actions, and restoration. It involves collecting performance data, notifying outages, identifying faults, and resolving them remotely or through on-site fixes. 2) FM processes massive amounts of alarm data using big data analysis techniques like Hadoop since networks are now too large and complex for manual management. Real-time big data frameworks help reduce reaction times. 3) The FM process involves fault detection through monitoring, notification through alarms, diagnosis by further inspecting critical alarms, and resolution by taking corrective actions to restore service. Management staff oversee the process while help desk staff handle customer issues

Uploaded by

slo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Project for WIN « Wireless Intelligent Networks »

Introduction:
In a market where networks and services are quite similar between operators, a differentiator is the
way in which operators manage their networks to increase user satisfaction.

A key goal in this process is to reduce the impact of network outages, which is the role of Fault
Management (FM).

FM includes problem recognition, problem notification, problem diagnosis, initiating corrective


actions, and restoring the original set.

after fixing the problem. Some of these tasks are labor intensive.

FM begins by collecting statistical performance of network devices and links in real time, notifying
outages and / or degradation of service through different sequences of alarms.

Once the fault is identified, network administrators attempt to resolve the fault remotely.
Traditionally, this process has been carried out by a group of experts, located in the Network
Operation Center (NOC) (is one or more sites from which the monitoring and control of a network of
computers, a telecommunications network or a network of satellites), who use their experience and
intuition to troubleshoot, locate and resolve faults by checking all the alarms collected in the
different segments of the network.

Thus, FM is currently the most time-consuming and expert-demanding task of all network
management processes.

The size and complexity of cellular networks, where equipment from different suppliers and
technologies coexist, leads to a large number of alarms.

this makes FM extremely unreliable when done manually, which can no longer be effectively
managed by human administrators.

With recent advances in information technology, it is now possible to process massive volumes of
information by Big Data Analysis (BDA) techniques. “Big Data” refers to data that cannot be
processed by traditional means due to its volume, speed and variety (eg alarms).

By analyzing this data, operators are more aware of the current state of the network, enabling
corrective actions in a proactive manner (Gupta & Jha, 2015).

One of the most widely pursued objectives is to make the data available in (almost) real time to
reduce reaction time (Bange, Grosser and Janoschek, 2015).

Thus, an increasing number of frameworks have emerged for real-time BDA (for example, Hadoop
Online, Storm, Spark ...).

These executives take advantage of the enormous computing power provided by cloud computing.
1) Fault management process :
In this section, the FM process in cellular networks is described.

Firstly, a basic network management architecture is presented to understand system structure and
components.

Secondly, the main roles involved in FM are introduced.

Finally, a generic FM methodology is presented.

1.1) Network management architecture :


A mobile network management architecture basically consists of five components manager, agent,
network management protocols, management information base and communication model.

*) Manager :

A manager is a tool with a user interface, generally located in NOC, that allows to manage the
different network elements in a mobile network. Its main functions are:

(a) to monitor managed devices and collect data reported by them, used for further analysis by
management staff;

(b) to request information from managed devices and receive their responses;

(c) to configure managed devices by setting variables and thresholds.

*) Agent :

An agent is a program embedded in a network element (e.g., router, switch, server...) responsible for
monitoring it and communicating with the manager. The agent provides

information about the status of a network element to the manager, either asynchronously or
after a query.

*) Manager information base(MIB):

A MIB is a database that collects management information that describes the network element
parameters. This information is provided by the agent and shared with the manager. In a fault
management process, this information is usually requested for troubleshooting purposes and further
analysis.

*) Network management protocol:

the main role of a network management protocol is to define a standard language used by the agent
and the manager to exchange information between all the elements of the network. The most widely
adopted protocols for network management are:

(a) the Simple Network Management Protocol (SNMP), used for the monitoring of fault and
performance data; with its stateless nature, SNMP also works well for status polling and determining
the operational state of specific functionality

(b) Remote Network MONitoring (RMON), which is an extension of SNMP defining a set of statistics
and functions that can be exchanged between controllers.
(c ) Common Management Interface Protocol (CMIP), more complex than the previous protocols and
only used by some information technology service providers.

*) Communication model:

The main patterns of information exchange between manager and agent in a network management
system (NMS) are polling and transmission.

The polling model is based on the request / response paradigm in which the manager requests data
and the agent responds by sending the requested information. Thus, the data flow is always initiated
by the manager and the interrogation can be automatic or initiated by the user. In a push model, the
data flow is initially configured by the user, and then the agents individually take the initiative to
push the data to the manager through the scheduler or asynchronously.

-The components described above can be arranged in different network management architectures,
which can be classified as centralized, distributed and hierarchical. In a centralized architecture,
shown in FIG. 1, there is only one manager controlling the entire network and one agent per
managed device.

In contrast, distributed solutions divide the network into segments and a manager is deployed in
each segment, with no interaction between them.

Alternatively, hierarchical solutions combine the two approaches, where each manager locally
manages a subset of network elements and a higher level manager acts as the central controller.

1.2) Roles in fault management :


During the FM process, NOC staff is responsible for monitoring every network element managed by
the NMS, as well as making decisions and performing corrective actions. This staff is usually
composed of several actors and roles that work together to ensure optimal network performance
and productivity. The main roles in a FM process are:

*) Management staff:

It is the primary role in the outage management process, responsible for resolving network outages
and restoring service as quickly as possible. Their main tasks are to analyze alerts to detect failures,
identify and prioritize failures and initiate corrective actions to restore service.

*) Help desk staff:

this is a secondary role used by the customer as a point of contact for issues and queries.
Helpdesk staff focus on receiving and recording customer or employee inquiries (by phone call or
email) to notify an incident that may not be reported by an alarm but is detected by end user.

*) Other roles:

Additional groups may be needed for an effective FM process. For example, field engineers are
responsible for site visits and the implementation of the corrective actions required which cannot be
carried out remotely (for example, replacement of the hardware, update of the software ...)

1.3) Basic fault management methodology :

As shown in figure 2, FM is broken down into four stages: fault detection, fault notification, fault
diagnosis and fault resolution.

*) Fault detection:

fault detection allows the NMS to detect and report faults. To this end, all network elements report
their status to the NOC.

Two different fault detection modes can be configured in the NMS: passive and active.

In passive mode, agents inform the manager when a predefined condition, configured by
management personnel, is met.

Note that if the agent stops working, no notification is generated and fault detection does not work.

On the other hand, in active mode, the manager checks the status of each agent by sending request
messages.

Thus, if an agent does not provide the required information, the manager can process it by
generating the corresponding alarm to be studied later.

*) Fault notification:

Once a failure occurs, the failure information is transmitted by the NMS to the manager who verifies
it by comparing it against a set of predefined rules.

In the event that the rules match, the manager generates a notification message to the management
staff which will be sent via email and instant message.

This notification is generally defined as an alarm, consisting of a brief description of the fault in a
specific format defined by the equipment supplier. Such a description includes the device / service
generating the fault, a clear text describing the problem, the fault class (equipment, communication,
environment, quality of service, etc.), the severity of the alarm (notification, minor, critical ...) and
information associated with the management process (fault identifier, time of creation of the fault,
resolution status, etc.).

All of these alarms are displayed in real time and monitored by management personnel in the Alarm
Management (AM) process.

Then, the alarms are classified and prioritized in order to identify the most critical alarms.

L1 technicians check a large number of alarms in real time to identify those requiring further analysis
or corrective actions.

In this process, the alarm information is usually enriched with the MIB data on the managed device
where the error occurred.

Once the most critical alarms are detected, an incident ticket is generated and sent to the L2 group.

At this point the alarm has been isolated and further inspection is required for fault resolution.

*) Fault diagnosis:

Once the trouble ticket is generated, the L2 group starts a ticket management (TM) process, where a
root cause analysis is performed to diagnose the cause of the fault.

On the other hand, the remaining alarms which do not result in the generation of a fault ticket
usually do not affect the service and are ultimately restored by themselves.

*) Fault resolution:

Once the root cause of the problem has been identified, corrective actions are initiated.

In some cases, the error can be resolved remotely and no further action is required to restore the
service.

However, in some other cases, resolution may involve physical action requiring an on-site visit by a
field engineer.

In these cases, the L2 group generates a work order, which is managed by the Work Order
Management (WOM) process. Once a Work Order Management is created, a dispatch notification is
sent to the corresponding field engineer who must accept this notification and initiate the corrective
action.

Once the fault is resolved and service is restored, the work order information is completed with the
actions performed by the field engineer and the fault ticket is closed.

1.4) Model for alarm prioritization :


A new predictive model is presented here to prioritize alarms based on the need for specialist
personnel (i.e., the higher the priority, the higher the need for a specialist).

The input to the model is alarm data produced by faults in managed network elements.

The output of the model is a prediction of whether an alarm would generate a trouble ticket, so it
should be prioritized.
During model building and evaluation, such output is checked against actual trouble ticket data
generated by L1 technicians.

The construction of the model is based on the CRISP-DM (Cross-Industry Standard Process for Data
Mining) methodology, composed of six stages: business understanding, data understanding, data
preparation, modeling, evaluation and deployment.

Figure 3 shows the different constituent elements of the proposed model when it is implemented in
a commercial data mining tool.

The model consists of interconnected nodes covering the different stages of the CRISP-DM
methodology.

The circles represent the import nodes to retrieve the input data; hexagons are nodes for basic
operations, such as data merge, field derivation, or data balancing; stairs are "super-nodes" used to
combine basic nodes in more complex operations; pentagons are nodes for data analysis operations,
such as feature selection, decision trees, or artificial neural networks; gold diamonds represent the
output of each pattern once formed; and the squares are used for evaluation purposes. More details
on these operations are given below.

Business understanding :
1) Problematic :
Network management systems play an important role in dealing with the large size and complexity of
today's cellular networks.

operators and suppliers focus a large part of their efforts on the development of new techniques and
tools for network management One of the most critical processes in network management is fault
management, because a failure in a network element can have a significant impact on user
satisfaction due to degradation of service.
Unfortunately, cellular networks generate thousands of alarms daily, which must be verified
manually by the operating personnel.

With the latest advances in Big Data analysis, various methods of reducing the number of alarms to
monitor have been proposed in the literature.

During the last years, the number of users and services in cellular networks has increased
considerably.
By 2021, a ten-fold increase in mobile traffic is expected and around fifty billion devices will be
connected to cellular networks (Cisco Systems Inc, 2017a)

Likewise, the deployment of new radio access technologies (for example, 5G) in the coming years will
pave the way for new use cases Such diversity will increase the complexity of cellular networks,
creating new problems in the management of networks.

SON techniques can be divided into three main categories, depending on the stage of the network
life cycle: auto-configuration, self-optimization and self-healing.

Auto-configuration defines the process by which the Base Station (BS) configuration parameters are
automatically set when a new base station is deployed.

Once the system is properly configured, self-tuning ensures optimum network performance through
continuous monitoring and tuning of system parameters to cope with changes in the environment.

Finally, self-healing is triggered whenever a defect or failure is detected to diagnose the cause (i.e.
root cause analysis) and execute the appropriate compensation mechanisms

2) Goals :
Our goal is to predict the out of service alarm And with the results we will take the required actions
to solve the problems by manipulating data.

We need Out of Service Alarm data and a KPIs (key performance indicator) to achieve out of service
alarm prediction.

3) Objectifs :
- The intelligent network operation and maintenance achieves fault prediction through data analysis
and continuous self-learning, which can effectively improve processing efficiency and accuracy .We
can build an AI model to predict the probability of out-of-service alarms in the future through minor
alarms, which will benefit maintenance personnel to deal with faults in advance and effectively avoid
the base station from out-of-service.

- we will try many models to find the best prediction result.


- the category which we will classify our dataset with is : error id, alarms type, probable causes,
defense actions …

- In fact, predictive maintenance will allow you to manage breakdowns properly, improve staff
autonomy, increase safety ,productivity , reduce time ,increase the profit ,reduction of maintenance
costs by 25%-35% , Equipment life can be extended by 10%-15%. and raise the number of subscribers

Model Optimization  : Gradient Boosting Decision Tree(GBDT)

The GBDT model give us :

✓ Training fast.

✓ High accuracy

✓ Low memory usage.

- we will reach accuracy at less 0,7


- Model training only takes 3- 5 minutes

- Collect two-year operating data of thousands of base stations to train the model
- Multi-dimensional predictions can be made by adding features such as power consumption

Example of alarm database:

You might also like