0% found this document useful (0 votes)
26 views97 pages

Module 4 Complete

Module 4 covers data and analytics for IoT, highlighting the differences between structured and unstructured data, and the challenges of managing massive IoT data. It discusses machine learning techniques, including supervised and unsupervised learning, and the role of neural networks in processing data. Additionally, it addresses big data analytics tools and technologies, emphasizing the importance of real-time data analysis and predictive analytics in IoT applications.

Uploaded by

rohitaksha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views97 pages

Module 4 Complete

Module 4 covers data and analytics for IoT, highlighting the differences between structured and unstructured data, and the challenges of managing massive IoT data. It discusses machine learning techniques, including supervised and unsupervised learning, and the role of neural networks in processing data. Additionally, it addresses big data analytics tools and technologies, emphasizing the importance of real-time data analysis and predictive analytics in IoT applications.

Uploaded by

rohitaksha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 97

DATA AND ANALYTICS FOR

IOT AND
SECURING IOT
Module – 4
DATA AND ANALYTICS
FOR IOT
Topic Covered
• An Introduction to Data Analytics for IoT
• Structured Versus Unstructured Data
• Data in Motion Versus Data at Rest
• IoT Data Analytics Overview
• IoT Data Analytics Challenges
• Machine Learning
• Machine Learning Overview
• Supervised Learning
• Unsupervised Learning
• Neural Networks
• Machine Learning and Getting Intelligence from Big Data
• Predictive Analytics
Topic Covered
• Big Data Analytics Tools and Technology
• Massively Parallel Processing Databases
• NoSQL Databases
• Hadoop
• YARN
• The Hadoop Ecosystem
• Apache Kafka
• Lambda Architecture
• Edge Streaming Analytics
• Comparing Big Data and Edge Analytics
• Edge Analytics Core Functions
• Distributed Analytics Systems
• Network Analytics
• Flexible NetFlow Architecture
• FNF Components
• Flexible NetFlow in Multiservice IoT Networks
An Introduction to Data Analytics for
IoT
• IoT data is just a curiosity, and it’s even useful if handled
correctly.
• Traditional data management systems are simply
unprepared to handle IoT data so new approach is “big
data”.
An Introduction to Data Analytics for
IoT
• The IoT creats of massive amounts of data from sensors
and now the biggest challenges not only from a transport
perspective but also from a data management standpoint.
• E.g :
• Commercial Aviation Industry
• Modern jumbo jet is equipped with 10,000 sensors then petabyte
(PB) of data generated per day per commercial airplane.
• Analyzing this amount of data in the most efficient
manner, using data analytics
An Introduction to Data Analytics for
IoT
• Modern jet engines
• fitted with 1000 of sensors that generate a 10GB of data per
second.
An Introduction to Data Analytics for
IoT
• A few key concepts related to data.
• Not all data is the same; so it can be categorized and analyzed
data is categorize
• Two important categorizations from an IoT perspective
are:
• Data is structured or unstructured
• Data is in motion or at rest.
Structured Versus Unstructured
Data
• Require different toolsets to analytics Structured data and
unstructured data.
Structured Versus Unstructured
Data
• Structured data is well represented or organized, it can be
handled by traditional relational database management
system (RDBMS).
• Structured data is easily formatted, stored, queried, and
processed.
• IoT sensor data uses structured values, such as
temperature, pressure, humidity, and so on, which are all
sent in a known format.
Structured Versus Unstructured
Data
• Unstructured data means any data that does not fit neatly
into a predefined data model.
• E.g.
• Data type include text, speech, images, and video.
• Data analytics methods that can be applied to
unstructured data, such as cognitive computing and
machine learning
• Machine learning applications
• E.g.
• Natural language processing (NLP) to decode speech.
• Image/facial recognition applications to extract critical information from
still images and video.
Structured Versus Unstructured
Data
• Smart objects in IoT networks generate both structured
and unstructured data.
• Structured data is more easily managed and processed
• Unstructured data can be harder to deal with and typically requires
very different analytics tools for processing the data
• After data classification (structured and unstructured data)
then appropriate data analytics solution can be applied.
Data in Motion Versus Data at Rest
• Data in motion
• Data is in transit
• E.g.
• client/server exchanges, such as web browsing and file transfers, and email
• At Edge Level (data filtered and deleted or forwarded to fog node or
the data center)
• Edge process data in real-time, while data is still in motion
• Real-time streaming analysis tools in Hadoop ecosystem
• Data at rest
• held or stored
• E.g.
• Data saved to a hard drive, storage array, or USB drive
• At Fog Level
• Data rest in IoT brokers or in some sort of storage array at the data
center
• Hadoop not only helps with data processing but also data storage
IoT Data Analytics Overview
• There are four types of data analysis results:
IoT Data Analytics Overview
• Descriptive:
• Descriptive data analysis tells you what is happening, either now or in
the past.
• E.g.
• A thermometer in a truck engine reports temperature values every second.
So we can collect the data at any moment
• Too high temperature, indicate problem in a cooling or the engine may be
experiencing too much load.
• Diagnostic:
• When you are interested in the “why,” diagnostic data analysis can
provide the answer.
• E.g.
• The temperature sensor in the truck engine, you might get why the
truck engine failed.
• Diagnostic analysis might show that the temperature of the engine was
too high, and the engine overheated.
IoT Data Analytics Overview
• Predictive:
• Predictive analysis aims to foretell problems or issues before they
occur.
• E.g.
• With historical values of temperatures for the truck engine, predictive
analysis could provide an estimate on the remaining life of certain
components (Replace parts or oil change or engine cooling
maintenance) in the engine.
• Prescriptive:
• Prescriptive analysis goes a step beyond predictive and
recommends solutions for upcoming problems.
• E.g.
• A prescriptive analysis of the temperature data from a truck engine
might calculate various alternatives to cost effectively (frequency of oil
changes or upgrading to new engine) maintain truck.
IoT Data Analytics Overview
• Application of Value and Complexity Factors to the Types
of Data Analysis
IoT Data Analytics Challenges
• IoT data places two specific challenges on a relational database:
• Scaling problems:
• Due to huge data from smart object, he relational databases can grow
incredibly large very quickly. This affect the performance
• Volatility of data:
• With relational databases, it is critical that the schema be designed correctly
from the beginning.
• Changing it later can slow or stop the database from operating
• Less flexible

• To deal with scaling and data volatility problem, a different type of


database, known as NoSQL, is being used.
• Structured Query Language (SQL) is the computer language used to
communicate with an RDBMS.
• A NoSQL database is a database that does not use SQL
• NoSQL databases do not set up in the traditional tabular, enforce a strict
schema, and they support a complex, evolving data model.
IoT Data Analytics Challenges
• In IoT main challenge is analyzing streaming data in real-
time.
• Major cloud analytics providers are Google, Microsoft,
and IBM.
• Another challenge in IoT is network analytics
• Data flows are effectively managed, monitored, and secure
• Network analytics tools such as Flexible NetFlow and IPFIX
provide the capability to detect irregular patterns or other problems.
Machine Learning
• Machine Learning (ML):
• how to makes sense of the data that is generated or not
• tools and algorithms are needed to find the data relationships
• Data collected by smart objects needs to be analyzed,
and intelligent actions (even in real time) need to be taken
based on these analyses
• E.g
• In self-driving vehicles, abnormal pattern recognition in a crowd, is
done using ML
Machine Learning Overview
• ML is part of Artificial Intelligence (AI)
• E.g.
• Finding parked car from GPS location
• Simple if-then-else
• ML is concerned with any process where the computer needs to
receive a set of data that is processed to help perform a task with
more efficiency.
• E.g.
• The “dictation program” is configured to recognize the audio pattern of each
word in a dictionary, but it does not know your voice’s specifics your accent,
tone, speed, and so on.
• You need to record a set of predetermined sentences to help the tool match
well-known words to the sounds you make when you say the words
• ML is divided in two main categories:
• Supervised Learning
• Unsupervised Learning
Supervised Learning
• In supervised learning, the machine is trained with input for which
there is a known correct answer.
• E.g.
• Training a system to recognize when there is a human in a mine tunnel.
• A sensor equipped with a basic camera can capture shapes and return them to
a computing system
• Training set:
• With supervised learning techniques, hundreds or thousands of images are fed into
the machine, and each image is labeled (human or nonhuman in this case).
• Classification:
• Comparing the captured image with Training data set with help algorithm to fine any
deviation.
• Supervised learning is efficient only with a large training set; larger
training sets usually lead to higher accuracy in the prediction
• Regression: Train the machine with measured values, so the
machine can predict the values.
Unsupervised Learning
• Hundreds or thousands of parameters are computed, and
small cumulated deviations in multiple dimensions are
used to identify the exception.
• In unsupervised learning there is not a “good” or “bad”
answer known in advance.
• It is the variation from a group behavior that allows the
computer to learn that something is different.
• E.g.
• Finding defect from thousands of small engine keeping the many
parameter like sound, pressure, temperature of key parts and so
on.
• Each parameter need to compare each other to find the defect in
engine.
Unsupervised Learning
• Graph of grouping and deviation identification logic.
Unsupervised Learning
• Graph of grouping and deviation identification logic.
• Three parameters are graphed (components 1, 2, and 3), and four
distinct groups (clusters) are found.
• In graph some points (Green Color) that are far from the respective
groups.
• Individual devices that display such “out of cluster” characteristics
should be examined more closely individually.
Neural Networks
• In Supervisor Learning requires training the machines was
often deemed too expensive and complicated, along with
a lot of computing power, however Internet made this
easy, but training a machine to differentiate them requires
more than basic parameters.
• E.g.
• Machine can able distinguishing a human from another mammal is
much more difficult
• Neural networks come into the picture
Neural Networks
• Neural networks are ML methods that mimic the way the
human brain works.
• E.g.
• When we look at a human figure, multiple zones of your brain are
activated to recognize colors, movements, facial expressions, and so
on. Our brain combines these elements to conclude that the shape you
are seeing is human.
• In NN the information goes through different algorithms (called
units), each of which processing an aspect of the information.
• The resulting value of one unit computation can be used
directly or fed into another unit for further processing to occur.
• The great efficiency of neural networks is that each unit
processes a simple test, and therefore computation is quite
fast.
Neural Networks
Neural Network Example:
How Neural Network Recognize
the Dog in a Photo
Neural Networks
• Neural networks have the number of units and layers, the type
of data processed at each layer, and the type and combination
of algorithms used to process the data to make processing
more efficient for specific applications.
• Not like Supervisor Learning
• In Neural networks information is divided into key components,
and each component is assigned a weight.
• The weights compared together decide the classification of this
information (no straight lines + face + smile = human).
• When the result of a layer is fed into another layer, the process is
called deep learning (“deep” because the learning process has more
than a single layer).
• Advantage of deep learning is that having more layers allows for richer
intermediate processing and representation of the data. At each layer, the
data can be formatted to be better utilized by the next layer.
• This process increases the efficiency of the overall result.
Machine Learning and Getting
Intelligence from Big Data
• Determining the right algorithm and the right learning model
• Organize ML operations into two broad subgroups:
• Local learning:
• In this group, data is collected and processed locally, either in the
sensor itself (the edge node) or in the gateway (the fog node).
• Remote learning:
• In this group, data is collected and sent to a central computing unit
(typically the data center or in the cloud), where it is processed.
• Regardless of the location where data is processed, common
applications of ML for IoT revolve around four major domains:
• Monitoring
• Behavior control
• Operations optimization
• Self-healing, self-optimizing
Machine Learning and Getting
Intelligence from Big Data
• Monitoring
• Smart objects monitor the environment where they operate. Data is
processed to better understand the conditions (external factors,
such as air temperature, humidity) of operations.
• ML can be used with monitoring to detect early failure conditions
(K-means deviations showing out-of-range behavior) or to better
evaluate the environment (shape recognition for a robot).
• Behavior control
• When a given set of parameters reach a target threshold—defined
in advance (Supervised Learning) or learned dynamically through
deviation from mean values (Unsupervised Learning) monitoring
functions generate an alarm.
• taking corrective actions based on thresholds
Machine Learning and Getting
Intelligence from Big Data
• Operations optimization
• Analyzing data can also lead to changes that improve the overall
process.
• E.g.
• A water purification plant in smart city based on which chemical is used, at
what temperature, and associated to what stirring mechanism.
• Neural networks can combine multiples of units (chemical, temperature,
stirring), in one or several layers, to estimate the best chemical and stirring
mix for a target air temperature.
• Self-healing, self-optimizing
• ML-based monitoring triggers changes in machine behavior (the
change is monitored), and operations optimizations.
• The system becomes self-learning and self-optimizing.
• It also detects new K-means deviations that result in predetection of
new potential defects, allowing the system to self-heal and self-
optimizing.
Machine Learning and Getting
Intelligence from Big Data
• A weather sensor provide information about the local
pollution level. (Edge Level)
• At the scale of the entire city, can monitor moving
pollution clouds (Fog Level) and the global and local
effects of mist or humidity, pressure, and terrain. (Cloud
Level)
• All this information can be combined with traffic data to
globally regulate traffic light patterns (ML), reduce
emissions from industrial pollution sources, or increase
the density of mass transit vehicles along the more
affected axes. (ML and NN)
Predictive Analytics
• The advanced stages models are the network self-
diagnose and self-optimize.
• When data from multiple systems is combined and
analyzed together, predictions can be made about the
state of the system.
• E.g.
• Transportation
• weight on each wheel
• multiple engine parameters are measured and analyzed
• data processing center in the cloud that can re-create a virtual twin of
each locomotive
• Detected failures
• sensors values are combined with big data can defects or issues in
vehicles operating in mines, in manufacturing machines (in advance)
Big Data Analytics Tools and
Technology
• Terms big data and Hadoop interchangeably
• Hadoop is at the core of today’s big data implementations
• Big data analytics can consist of many different software pieces
that together collect, store, manipulate, and analyze all different
data types.
• “Three Vs” to categorize big data
• Velocity: Velocity refers to how quickly data is being collected and
analyzed.
• E.g. Hadoop Distributed
• Variety: Variety refers to different types of data. Often you see data
categorized as structured, semi-structured, or unstructured.
• E.g. Hadoop is collect and store all three types.
• Volume: Volume refers to the scale of the data (GB to PB to EB
(Exa))
• E.g. Big data
Big Data Analytics Tools and
Technology
• The characteristics of big data defined by the sources and types of data.
• IoT devices data is unstructured data,
• Transactional data, are high volume and structured.
• Social data sources, are typically high volume and structured.
• Enterprise data, is lower in volume and very structured.
• Hence big data consists of data from all these separate sources
• Data ingest
• is the layer that connects data sources to storage.
• It has preprocesses, validates, extracts, and stores data temporarily for further
processing
• Two distinct database types,
• Relational databases :
• are good for transactional, or process, data. Their benefit is being able to analyze
complex data relationships on data that arrives over a period of time.
• E.g. Oracle and Microsoft SQL
• Historians databases:
• Are optimized for time-series data from systems and processes.
• Built with speed of storage and retrieval of data at their core, recording each data
point in a series with the pertinent information about the system being logged.
Big Data Analytics Tools and
Technology
• Three most popular categories of database technologies
are:
• Massively parallel processing systems
• NoSQL
• Hadoop
Big Data Analytics Tools and
Technology
Massively parallel processing systems
• Relational databases are grouped into a broad data
storage category called “data warehouses”.
• Massively parallel processing (MPP) databases were built
on the concept of the relational data warehouses but are
designed to be much faster, to be efficient, and to support
reduced query times.
• To accomplish this, MPP databases take advantage of multiple
nodes (computers) designed in a scale-out architecture such that
both data and processing are distributed across multiple systems.
• MPPs also know as analytic databases because they are
designed to allow for fast query processing and often
have built-in analytic functions
Big Data Analytics Tools and
Technology
Massively parallel processing systems
Big Data Analytics Tools and
Technology
Massively parallel processing systems
• An MPP architecture typically contains
• A single master node that is responsible for the coordination of all
the data storage and processing across the cluster.
• It operates in a “shared-nothing” fashion, with each node
containing local processing, memory, and storage and operating
independently.
• Data storage is optimized across the nodes in a structured
SQL-like format that allows data analysts to work with the
data using common SQL tools and applications
Big Data Analytics Tools and
Technology
NoSQL Databases
• NoSQL (“not only SQL”) is a class of databases that
support semi-structured and unstructured data, in addition
to the structured data handled by data warehouses and
• MPPs.
• NoSQL is not a specific database technology; it has
several different types of databases, including the
following:
• Document stores
• Key-value stores
• Wide-column stores
• Graph stores
Big Data Analytics Tools and
Technology
NoSQL Databases
• Document stores:
• stores semi-structured data, such as XML or JSON.
• Document stores generally have query engines and indexing features
that allow for many optimized queries.
• Key-value stores:
• Stores associative arrays where a key is paired with an associated
value.
• Easy to build and easy to scale.
• Wide-column stores:
• Stores similar to a key-value store, but the formatting of the values can
vary from row to row, even in the same table.
• Graph stores:
• Organized based on the relationships between elements.
• Graph stores are commonly used for social media or natural language
processing, where the connections between data are very relevant.
Big Data Analytics Tools and
Technology
NoSQL Databases
• NoSQL was developed to support the high-velocity,
urgent data requirements of modern web applications that
typically do not require much repeated use.
• Expanding NoSQL databases to other nodes is similar to
expansion in other distributed data systems, where
additional hosts are managed by a master node or
process.
• Database types that fit under the NoSQL category, key-
value stores and document stores tend to be the best fit
for what is considered “IoT data.
• Database schema to change quickly,
• NoSQL document databases tend to be more flexible than key-
value store databases.
Big Data Analytics Tools and
Technology
Hadoop
• The Hadoop two key elements are:
• Hadoop Distributed File System (HDFS): A system for storing data
across multiple nodes
• MapReduce: A distributed processing engine that splits a large
task into smaller ones that can be run in parallel
• In Hadoop architecture
• Local processing, memory, and storage to distribute tasks and
provide a scalable storage system for data.
• Both MapReduce and HDFS take advantage of this distributed
architecture to store and process massive amounts of data and are
thus able to leverage resources from all nodes in the cluster
Big Data Analytics Tools and
Technology
Hadoop
Big Data Analytics Tools and
Technology
Hadoop
• For HDFS, capability is handled by specialized nodes in
the cluster, are
• NameNodes
• DataNodes
Big Data Analytics Tools and
Technology
Hadoop
• NameNodes
• All interaction with HDFS is coordinated through the primary
(active) NameNode,
• With a secondary (standby) NameNode notified of the changes in
the event of a failure of the primary.
• The NameNode takes write requests from clients and distributes
those files across the available nodes in configurable block sizes,
usually 64 MB or 128 MB blocks.
Big Data Analytics Tools and
Technology
Hadoop
• DataNodes
• These are the servers where the data is stored at the direction of
the NameNode.
• Many DataNodes in a Hadoop cluster to store the data.
• Data blocks are distributed across several nodes and often are
replicated three, four, or more times across nodes for redundancy.
• Once data is written to one of the DataNodes, the DataNode selects two
(or more) additional nodes, based on replication policies, to ensure data
redundancy across the cluster.
Big Data Analytics Tools and
Technology
Hadoop
• Diagram shows relationship between NameNodes and DataNodes
and how data blocks are distributed across the cluster.
Big Data Analytics Tools and
Technology
Hadoop
• Batch processing is the process of running a scheduled or
ad hoc query across historical data stored in the HDFS.
• A query is broken down into smaller tasks and distributed
across all the nodes running MapReduce in a cluster.
• Depending on how much data is being queried and the
complexity of the query, the result could take seconds or
minutes to return.
• Not useful for real time processing
Big Data Analytics Tools and
Technology
Hadoop – YARN
• Version 2.0 of Hadoop,
• YARN (Yet Another Resource Negotiator)
• Designed to enhance the functionality of MapReduce.
• YARN was developed to take over the resource
negotiation and job/task tracking, allowing MapReduce to
be responsible only for data processing
Big Data Analytics Tools and
Technology
Hadoop – The Hadoop Ecosystem
• Hadoop may have had inadequate beginnings as a
system for distributed storage and processing, but it has
since grown into a robust collection of projects that,
combined, create a very complete data management and
analytics framework (complete Ecosystem).
Big Data Analytics Tools and
Technology
Hadoop – Apache Kafka
• Apache Kafka is a distributed publisher-subscriber
messaging system that is built to be scalable and fast.
• It is composed of topics, or message brokers, where
producers write data and consumers read data from these
topics.
Big Data Analytics Tools and
Technology
Hadoop – Apache Kafka
Big Data Analytics Tools and
Technology
Hadoop – Apache Kafka
• Diagram shows the data flow from the smart objects
(producers), through a topic in Kafka, to the real-time
processing engine.
• Due to the distributed nature of Kafka, it can run in a
clustered configuration that can handle many producers
and consumers simultaneously and exchanges
information between nodes, allowing topics to be
distributed over multiple nodes.
• The goal of Kafka is to provide a simple way to connect to
data sources and allow consumers to connect to that data
in the way they would like.
Big Data Analytics Tools and
Technology
Hadoop – Lambda Architecture
• Lambda is a data management system that consists of
two layers for ingesting data (Batch and Stream) and one
layer for providing the combined data (Serving).
• These layers allow Spark and MapReduce, to operate on
the data independently, focusing on the key attributes for
which they are designed and optimized.
• Data is taken from a message broker, commonly Kafka,
and processed by each layer in parallel, and the resulting
data is delivered to a data store where additional
processing or queries can be run.
Big Data Analytics Tools and
Technology
Hadoop – Lambda Architecture
Big Data Analytics Tools and
Technology
Hadoop – Lambda Architecture
• Stream layer:
• This layer is responsible for near-real-time processing of events.
• Technologies such as Spark Streaming, Storm, or Flink are used to
quickly ingest, process, and analyze data on this layer.
• Batch layer:
• The Batch layer consists of a batch-processing engine and data
store.
• Serving layer:
• The Serving layer is a data store and mediator that decides which
of the ingest layers to query based on the expected result or view
into the data.
• If real time analytics is needed, it may invoke the Stream layer.
• The Serving layer is often used by the data consumers to access
both layers simultaneously.
Edge Streaming Analytics
• Nowadays every large technology company is now selling
software and services from the cloud, and this includes
data analytics systems.
• E.g.
• Sophisticated data analytics systems to enhance Formula One
racing strategy.
• Racing strategy data kept in data center which is far away from the
track.
• The time takes (the latency) to collect and analyze racing data as a
batch process in a distant part of the world is not only inefficient but
can mean the difference between a successful race strategy that
adapts to changing conditions and one that lacks the flexibility and
agility to send meaningful instructions to the drivers.
Comparing Big Data and Edge
Analytics
SECURING IOT
Topics Covered
• A Brief History of OT Security
• Common Challenges in OT Security
• Erosion of Network Architecture
• Pervasive Legacy Systems
• Insecure Operational Protocols
• The following are industrial protocols
• Modbus
• DNP3 (Distributed Network Protocol)
• ICCP (Inter-Control Center Communications Protocol)
• OPC (OLE for Process Control)
• International Electrotechnical Commission (IEC) Protocols
• Other Protocols
• Device Insecurity
• Dependence on External Vendors
• Security Knowledge
Topics Covered
• How IT and OT Security Practices and Systems Vary
• The Purdue Model for Control Hierarchy
• OT Network Characteristics Impacting Security
• Security Priorities: Integrity, Availability, and Confidentiality
• Security Focus
• Formal Risk Analysis Structures
• OCTAVE
• FAIR
• The Phased Application of Security in an Operational
Environment
• Secured Network Infrastructure and Assets
• Deploying Dedicated Security Appliances
• Higher-Order Policy Convergence and Network Monitoring
Introduction
• As IoT brings more and more systems together under
different network connectivity and security has never been
more important.
• Cyber attacks on OT systems is much shorter compared
to IT systems.
• Security in the OT world also addresses a wider scope
than in the IT world.
A Brief History of OT Security
• Cyber security incidents in industrial environments can result in
physical consequences that can cause threats to human lives as well
as damage to equipment, infrastructure, and the environment.
• Physical damage was caused by a cyber security attack is the Stuxnet
malware that damaged uranium enrichment systems
• Physical damage, operational interruptions have occurred in OT environments
due to cyber security incidents
• Attackers were skilled individuals with deep knowledge of technology and the
systems they were attacking
• Changing methods of attack that will be increasingly difficult to defend against
and respond to.
• The isolation between industrial networks and the traditional IT business
networks has been referred to as an “air gap,” suggesting that there are no
links between the two.
• Unlike in IT-based enterprises, OT deployed solutions commonly
have no reason to change as they are designed to meet specific (and
often single-use) functions, and have no requirements or incentives to
be upgraded.
• In OT is system uptime and high availability, so changes are typically only
made to fix faults
A Brief History of OT Security
• History of Vulnerability Disclosures in Industrial Control
Systems Since 2010
A Brief History of OT Security
• Given the slow rate of change and extended upgrade
cycles of most OT environments, the investment in
security for industrial communication and compute
technologies has historically lagged behind the investment
in securing traditional IT enterprise environments.
Common Challenges in OT Security
Erosion of Network Architecture
• Two of the major challenges in securing industrial
environments have been initial design and ongoing
maintenance.
• The challenge, and the biggest threat to network security, is
standards and best practices either being misunderstood or the
network being poorly maintained.
• The uncontrolled or poorly controlled OT network evolutions
have, over time led to weak or inadequate network and
systems security.
• In many industries, the control systems consist of packages,
skids, or
• components that are self-contained and may be integrated as
semi-autonomous portions of the network. These packages
may not be as fully or tightly integrated into the overall control
system, network management tools, or security applications,
resulting in potential risk.
Common Challenges in OT Security
Pervasive Legacy Systems
• Due to the static nature and long lifecycles of equipment
in industrial environments, many operational systems may
be deemed legacy systems.
• For example, in a power utility environment, have racks of old
mechanical equipment still operating alongside modern intelligent
electronic devices (IEDs).
• The communication infrastructure and shared centralized
compute resources are often not built to comply with
modern standards
Common Challenges in OT Security
Insecure Operational Protocols
• Many industrial control protocols, particularly those that
are serial based, were designed without inherent strong
security requirements.
• In Supervisory control and data acquisition (SCADA) some
common issues are frequent lack of authentication between
communication endpoints, no means of securing and protecting
data at rest or in motion, and insufficient granularity of control to
properly specify recipients or avoid default broadcast approaches.
Common Challenges in OT Security
Insecure Operational Protocols
• The following sections discuss some common industrial
protocols and their respective security concerns:
• Modbus
• DNP3 (Distributed Network Protocol)
• ICCP (Inter-Control Center Communications Protocol)
• OPC (OLE for Process Control)
• International Electrotechnical Commission (IEC) Protocols
• Other Protocols
Common Challenges in OT Security
Insecure Operational Protocols
• Modbus
• Modbus Organization.
• The security challenges that have existed with Modbus are not
unusual. Authentication of communicating endpoints was not a default
operation because it would allow an inappropriate source to send
improper commands to the recipient.
• DNP3 (Distributed Network Protocol)
• DNP3, participants allow for unsolicited responses, which could trigger
an undesired response. The missing security element here is the ability
to establish trust in the system’s state and thus the ability to trust the
veracity of the information being presented
• ICCP (Inter-Control Center Communications Protocol)
• One key vulnerability is that the system did not require authentication
for communication. Second, encryption across the protocol was not
enabled as a default condition, thus exposing connections to man-in-
the-middle (MITM) and replay attacks.
Common Challenges in OT Security
Insecure Operational Protocols
• OPC (OLE for Process Control)
• OPC is based on the Microsoft interoperability methodology Object
Linking and Embedding (OLE).
• Many of the Windows devices in the operational space are old, not
fully patched, and at risk due to a plethora of well-known
vulnerabilities. The dependence on OPC may reinforce that
dependence
• International Electrotechnical Commission (IEC) Protocols
• Three message types were initially defined: MMS (Manufacturing
Message Specification), GOOSE (Generic Object Oriented
Substation Event), and SV (Sampled Values).
• Both GOOSE and SV operate via a publisher/subscriber model,
with no reliability mechanism to ensure that data has been
received.
Common Challenges in OT Security
Device Insecurity
• Correlation of Industrial Black Hat Presentations with
Discovered Industrial Vulnerabilities
• To understand the nature of the device insecurity, it is
important to review the history of what vulnerabilities were
discovered and what types of devices were affected.
• It is not difficult to understand why many systems are
frequently found vulnerable.
• First, many of the systems utilize software packages that can be
easily downloaded and worked against. Second, they operate on
common hardware and standard operating systems, such as
Microsoft Windows.
• There is little need to develop new tools or techniques when those
that have long been in place are sufficiently adequate to breach the
target’s defenses.
Common Challenges in OT Security
Dependence on External Vendors
• While modern IT environments may be outsourcing
business operations or relegating certain processing or
storage functions to the cloud, it is less common for the
original equipment manufacturers of the IT hardware
assets to be required to operate the equipment. However,
that level of vendor dependence is not uncommon in
some industrial spaces.
• Such vendor dependence and control are not limited to
remote access. Onsite management of non-employees
that are to be granted compute and network access are
also required, but again, control conditions and shared
responsibility statements are yet to be observed.
Common Challenges in OT Security
Security Knowledge
• Another relevant challenge in terms of OT security
expertise is the comparatively higher age of the industrial
workforce.
• Average age gap between manufacturing workers and other non-
farm workers
• New connectivity technologies are being introduced in OT industrial
environments that require up-to-date skills, such as TCP/IP,
Ethernet, and wireless that are quickly replacing serial-based
legacy technologies
• Due to the importance of security in the industrial space,
all likely attack surfaces are treated as unsafe.
How IT and OT Security Practices and
Systems Vary
• The differences between an enterprise IT environment
and an industrial-focused OT deployment are important to
understand because they have a direct impact on the
security practice applied to them.
How IT and OT Security Practices and Systems
Vary
The Purdue Model for Control Hierarchy
• To ensure end-to-end security, the coordination between the IT
and OT should improve.
• E.g
• Use of firewalls and intrusion prevention systems (IPS)
• To understand the security and networking requirements for a
control system, the use of a logical framework to describe the
basic composition and function is needed.
• The Purdue Model for Control Hierarchy, is the most widely
used framework across industrial environments globally and is
used in manufacturing, oil and gas, and many other industries.
• It segments devices and equipment by hierarchical function
levels and areas and has been incorporated into the ISA99/IEC
62443 security standard.
How IT and OT Security Practices and Systems
Vary
The Purdue Model for Control Hierarchy
• The Logical Framework Based on the Purdue Model for
Control Hierarchy
How IT and OT Security Practices and Systems
Vary
The Purdue Model for Control Hierarchy
• The enterprise and operational domains are separated into
different zones and kept in strict isolation via an industrial
demilitarized zone (DMZ):
• Enterprise zone
• Level 5: Enterprise network:
• Corporate-level applications are Enterprise Resource Planning (ERP),
Customer Relationship Management (CRM), document management.
• Services are Internet access and VPN entry from the outside world exist at
this level.
• Level 4: Business planning and logistics network:
• The IT services are scheduling systems, material flow applications,
optimization and planning systems, and local IT services are phone, email,
printing, and security monitoring.
• Industrial demilitarized zone
• DMZ:
• The DMZ provides a buffer zone where services and data can be shared
between the operational and enterprise zones.
How IT and OT Security Practices and Systems
Vary
The Purdue Model for Control Hierarchy
• Operational zone
• Level 3: Operations and control:
• This level includes the functions involved in managing the workflows to
produce the desired end products and for monitoring and controlling the
entire operational system.
• Level 2: Supervisory control:
• This level includes zone control rooms, controller status, control system
network/application administration, and other control-related applications.
• Level 1: Basic control:
• At this level, controllers and IEDs, dedicated HMIs, and other applications
may talk to each other to run part or all of the control function.
• Level 0: Process:
• This is where devices such as sensors and actuators and machines such as
drives, motors, and robots communicate with controllers or IEDs.
• Safety zone
• Safety-critical: This level includes devices, sensors, and other
equipment used to manage the safety functions of the control system.
How IT and OT Security Practices and Systems
Vary
OT Network Characteristics Impacting Security
• While IT and OT networks are beginning to converge, they still
maintain many divergent characteristics in terms of how they operate
and the traffic they handle.
• Compare the nature of how traffic flows across IT and OT
• networks:
• IT Network:
• Data frequently traverse the network through layers of switches and eventually
make their way to a set of local or remote servers, which they may connect to
directly. Data in the form of email, file transfers, or print services will likely all
make its way to the central data center, where it is responded to, or triggers
actions in more local services, such as a printer.
• OT networks:
• By comparison, in an OT environment (Levels 0–3), there are typically two
types of operational traffic.
• The first is local traffic that may be contained within a specific package or area to
provide local monitoring and closed-loop control. This is the traffic that is used for
realtime (or near-real-time) processes.
• The second type of traffic is used for monitoring and control of areas or zones or the
overall system.
How IT and OT Security Practices and Systems Vary
Security Priorities: Integrity, Availability, and Confidentiality
• In an IT the most critical element and the target of attacks
has been information.
• In an OT the critical assets are the process participants:
workers and equipment.
• Security priorities diverge based on those differences
• In IT privacy focuses on the confidentiality, integrity, and
availability of the data.
• In case of OT losing a device due to a security
vulnerability means production stops, and the company
cannot perform its basic operation.
• In OT the safety and continuity of the process participants is
considered the most critical concern
How IT and OT Security Practices and Systems
Vary
Security Focus
• Security focus is frequently driven by the history of
security impacts that an organization has experienced.
• In an IT environment, critical data is extracted or
corrupted.
• In the OT the history of loss due to external factors such
as human error
Formal Risk Analysis Structures:
OCTAVE and FAIR
• In OT or industrial environment, there are a number of
standards, guidelines, and best practices adopted.
• In any industrial environment is that it needs to address
security holistically and not just focus on technology
• Two such risk assessment frameworks:
• OCTAVE (Operationally Critical Threat, Asset and Vulnerability
Evaluation) from the Software Engineering Institute at Carnegie
Mellon University
• FAIR (Factor Analysis of Information Risk) from The Open Group
• For risk assessment FAIR model is adopted compared to
OCTAVE
• OCTAVE, it also allows for non-malicious actors as a potential
cause for harm, but it goes to greater lengths to emphasize the
point
Formal Risk Analysis Structures:
OCTAVE
• The OCTAVE Allegro steps and phases
Formal Risk Analysis Structures:
OCTAVE
• The first step of the OCTAVE Allegro methodology is to
establish a risk measurement criterion.
• OCTAVE provides a fairly simple means of doing this with an
emphasis on impact, value, and measurement.
• The second step is to develop an information asset profile.
• This profile is populated with assets, a prioritization of assets,
attributes associated with each asset, including owners, custodians,
people, explicit security requirements, and technology assets
• The third step is to identify information asset containers.
• This is the range of transports and possible locations where the
information might reside.
• The fourth step is to identify areas of concern.
• At this point, we depart from a data flow, touch, and attribute focus to
one where judgments are made through a mapping of security-related
attributes to more business-focused use cases
Formal Risk Analysis Structures:
OCTAVE
• Fifth step, where threat scenarios are identified.
• Threats are broadly (and properly) identified as potential undesirable
events.
• At the sixth step risks are identified.
• Risk is the possibility of an undesired outcome.
• The seventh step is risk analysis
• with the effort placed on qualitative evaluation of the impacts of the risk
• Mitigation is applied at the eighth step.
• There are three outputs or decisions to be taken at this stage
• First, accept a risk and do nothing, other than document the situation,
potential outcomes, and reasons for accepting the risk.
• The second is to mitigate the risk with whatever control effort is
required.
• The final possible action is to defer a decision, meaning risk is neither
accepted nor mitigated.
Formal Risk Analysis Structures:
FAIR
• FAIR (Factor Analysis of Information Risk) is a technical
standard for risk definition from The Open Group
• FAIR places emphasis on both unambiguous definitions and
the idea that risk and associated attributes are measurable.
• FAIR has a definition of risk as the probable frequency and
probable magnitude of loss.
• With this definition, a clear hierarchy of sub-elements emerges, with
one side of the taxonomy focused on frequency and the other on
magnitude.
• Loss even frequency is the result of a threat agent acting on an
asset with a resulting loss to the organization.
• This happens with a given frequency called the threat event frequency
(TEF), in which a specified time window becomes a probability
• Risk taxonomy is the probable loss magnitude (PLM), which
begins to quantify the impacts, with the emphasis again being
on easurable metrics
The Phased Application of Security in
an Operational Environment
• Many of the processes used by IT security practitioners
still have validity and can be used in an OT environment.
• The following slides present a phased approach to
introduce modern network security into largely preexisting
legacy industrial networks
Secured Network Infrastructure and Assets
• The physical layout largely defines the operational
process, this phased approach to introducing modern
network security.
• Need to analyze and secure the basic network design.
• Most automated process systems or even hierarchical energy
distribution systems have a high degree of correlation between the
network design and the operational design.
• Functions should be segmented into zones (cells) and
that communication crossing the boundaries of those
zones should be secured and controlled through the
concept of conduits
Secured Network Infrastructure and Assets
• Security Between Levels and Zones in the Process
Control Hierarchy Model
Secured Network Infrastructure and Assets
• The network discovery process may require manual
inspection of physical connections, starting from the
highest accessible aggregation point and working all the
way down to the last access layer.
• This discovery activity must include a search for wireless
access points.
• In geographically distributed environments, it may not be
possible to trace the network, and in such cases, the long-
haul connections may not be physical or may be carried
by an outside communication provider.
• For those sections of the operational network, explicit
partnering with other entities is required.
Deploying Dedicated Security
Appliances
• The goal is to provide visibility, safety, and security for traffic
within the network.
• Visibility provides an understanding of application and communication
behavior.
• The level of visibility is typically achieved with deep packet inspection (DPI)
technologies such as intrusion detection/prevention systems (IDS/IPS).
• Application-specific protocols are also detectable by IDS/IPS systems.
• Modern DPI implementations can work out-of-band from a span or tap.
• Visibility and an understanding of network connectivity uncover the
information necessary to initiate access control activity.
• Safety is a particular benefit as application controls can be managed at
the cell/zone edge through an IDS/IPS.
• Safety and security are closely related linguistically (for example, in
German, the same word, Sicherheit, can be used for both), but for a
security practitioner, security is more commonly associated with
threats.
Higher-Order Policy Convergence and
Network Monitoring
• Finding network professionals with experience performing such
functions or even training those without prior experience is not
difficult.
• Another security practice that adds value to a networked industrial
space is convergence, which is the adoption and integration of
security across operational boundaries.
• This means coordinating security on both the IT and OT sides of the
organization.
• There are advanced enterprise-wide practices related to access
control, threat detection, and many other security mechanisms that
could benefit OT security.
• IT and OT environments areas are remote access and threat
detection.
• For remote access, most large industrial organizations backhaul
communication through the IT network.
• OT security practitioner to coordinate access control policies from the remote
initiator across the Internet-facing security layers, through the core network,
and to a handoff point at the industrial demarcation and deeper, toward the IoT
assets.
Higher-Order Policy Convergence and
Network Monitoring
• Network security monitoring (NSM) is a process of finding
intruders in a network.
• It is achieved by collecting and analyzing indicators and warnings
to prioritize and investigate incidents with the assumption that there
is, in fact, an undesired presence.
• The practice of NSM is not new, yet it is not implemented often or
thoroughly enough even within reasonably mature and large
organizations.
• It is important to note that NSM is inherently a process in which
discovery occurs through the review of evidence and actions that
have already happened.
• NSM is the discipline that will most likely discover the extent of the
attack process and, in turn, define the scope for its remediation

You might also like