Industrial-Cybersecurity-Case-Studies-And-Best-Practices - OT CYBERSECURITY
Industrial-Cybersecurity-Case-Studies-And-Best-Practices - OT CYBERSECURITY
Reviews
“This author definitely has long term and recent real world experience and is not a typical cybersecurity
academic. New and experienced people will benefit from taking the time to read this.”
“If you are looking for a resource in ICS, this book is very thorough.”
“One of the better books on OT security, the writer shows an in-depth understanding of the various
topics covered. If OT security is your profession, I suggest everyone to read it.”
Contributor to
No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by
any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written
permission of the publisher.
ISA
P.O. Box 12277
Research Triangle Park, NC 27709
Acknowledgments
About the Author
Chapter 1 Introduction
About this Book
Terminology
Intended Audience
Chapter 2 What Makes Industrial Cybersecurity Different?
Introduction
What Are the Differences between OT and IT?
Relative Priorities
The Golden Triangle
The Significance of Technology
The Significance of Culture
Consequences
Mitigations
Foundations of Industrial Cybersecurity Management
Frameworks, Regulations, Standards, and Guides
The Difference between Frameworks, Regulations, Standards,
and Guides
National Institute of Standards and Technology Cybersecurity
Framework
ISA/IEC 62443
NIST Special Publication 800 Series
Others
Summary
Chapter 3 Creating Effective Policy
Introduction
Establish the Governance Infrastructure
Assign Senior Management Representation
Allocate Resources and Assign Clear Ownership
Establish Good Oversight
Reporting Cybersecurity Management System Effectiveness
Tracking and Managing Cybersecurity Risk
Monitoring Changes
Communicate to the Organization
Regular Reports on Lagging and Leading Indicators
Prompt Reporting of Cybersecurity Incidents
Reporting Cybersecurity Observations or Near Misses
Reporting Cybersecurity Incidents to Employees
Monitoring Compliance and Benchmarking
Monitoring Against Policy
Monitoring Against Industry Standards
Summary
Chapter 4 Measure to Manage Risk
Introduction
A Brief Overview of Risk Management
The Importance of Risk Management
Defining Safety Risk
Defining Cybersecurity Risk
Industrial Cybersecurity Risk
As Low as Reasonably Practicable
Security Process Hazard Analysis
Quantifying Risks with Statistics
Monte Carlo Simulation
Bayes’s Theorem
Cybersecurity Safeguards
Using ISA/IEC 62443 Standards to Define Safeguards
Responsibility for Defense-in-Depth Measures
Simplified Assessment and Definition of Safeguards
The Future for Industrial Cybersecurity Risk Management
Summary
Chapter 5 Standardized Design and Vendor Certification
Introduction
Benefits of Standardizing Designs
Essential Elements of a Standardized Design
Secure Network Design
System Hardening
Hardening Wi-Fi Networks
Physical Access Control
Electronic Access Control
Secure Remote Access
Network Monitoring
Cybersecurity Incident Response Plan
Backup and Recovery Procedures
Manual Procedures
System Availability
Specifying System Availability
Designing for System Availability
Other Considerations
Internet Protocol Addressing
Encryption
ISASecure
Summary
Chapter 6 Pitfalls of Project Delivery
Introduction
Secure Senior Project Leadership Support
Embed Cybersecurity Throughout the Project
Feasibility
Engineering
Construction
Commissioning
Start-Up
Handover and Closeout
Embed Cybersecurity Requirements in All Contracts
Raise Awareness Within the Project Team
Implement a Rigorous Oversight Process
Verification of Requirements
Risk and Issue Management
Performance Management
Summary
Chapter 7 What We Can Learn from the Safety Culture
Introduction
The Importance of Awareness
Underestimating Risk
Human Error
Supporting the Right Behaviors
The Safety Culture
The First Line of Defense
Training and Competency
Continuous Evaluation
Summary
Chapter 8 Safeguarding Operational Support
Introduction
Making Cybersecurity a Key Factor
Barrier Model Analysis and Visualization
People Management
Background Checks
Separation of Duties
Joiners, Movers, and Leavers
Manual Procedures
Inventory Management
Creating an Inventory for New Facilities
Creating an Inventory for Existing Facilities
Maintaining and Auditing the Inventory
Incident Response
Suppliers, Vendors, and Subcontractors
Insurance
Summary
Chapter 9 People, Poetry, and Next Steps
Bibliography
Appendix A: Resources
Index
Acknowledgments
To Elizabeth Selvina and Jason Schurmann for all the experiences, doubles,
and pies we shared in La Vertiente, Port of Spain, Reading, and Chinchilla.
To Lauren Goodwin, for your support, and the Clase Azul Reposado.
To Ken Nguyen, for the opportunity to work on one of the most exciting and
fulfilling projects of my career, and to all my friends on the digital team for
making it fun along the way.
To Steve Huffman, Steve Pflantz, Leo Staples, and Mike Marlowe, for your
friendship, mentorship, support, and chicken pot pie.
To Blair Traynor, Nicky Jones, John Flynn, and Paul Holland for your
invaluable comments on drafts of this book.
To Liegh Elrod, for your never-ending support and never-failing belief that I
would one day finish this book, and to all the ISA publications team for all your
hard work turning the material into a professional product.
To the ISA staff, for working together with the member community to create a
better world through automation.
To Bill Furlow, for painstakingly reviewing and fixing my writing, while
simultaneously being an expert mixologist, part time bon vivant, and full time
Bojack Horseman fan.
Finally, to David Boyle, my friend and colleague for too many years to count.
A true friend accepts who you are, but also helps you become who you should
be. Thank you for helping me be better.
About the Author
With no malicious intent, Bob Thomas created the first computer worm, called
Creeper. That was quickly followed by Ray Thomlinson’s Reaper, designed to
find Creeper and shut it down; in essence, it was the first antivirus program. This
was the early 1970s, 15 years before Windows 1.0 was released and 19 years
before Tim Berners-Lee coined the term World Wide Web.
Windows would eventually come to dominate the operating system market and,
as a result, be the primary target for malicious attacks. During this period,
control system vendors began moving their software from operating systems
such as Unix to Windows. This move allowed them to benefit from
standardization and improved time to market.
By 2000, the benefits of integrating control systems with the enterprise were
being realized, and the ISA-95 standard (“Enterprise-Control System
Integration”) articulated these benefits with a clear definition of how to achieve
them. At this point, Google had been around for two years, Amazon was only six
years old, and e-commerce in the United States made up just 1% of retail sales.
That same year, the ILOVEYOU worm infected an estimated 50 million
computers, causing more than $5.5 billion in damage.
From the early 2000s, when Vitek Boden used a stolen laptop and a radio to
wreak havoc at a sewage treatment plant in Queensland, Australia; through
2010, when the Stuxnet malware disrupted production at an Iranian nuclear
enrichment facility; to 2018, when attackers gained access to safety systems and
shut down a Middle East refinery, the threats of malware and cyberattacks have
increased in lockstep with advances in industrial automation.
Some industry sectors have progressed further than others. Some sectors, such as
oil and gas, have invested heavily in industrial cybersecurity. Other sectors, such
as water and wastewater, remain behind the curve on addressing their
cybersecurity. In all cases, asset owners/operators have largely developed their
own solutions and systems in isolation. This results in similar approaches with
varying degrees of success.
Addressing industrial cybersecurity risks includes several key elements, but the
foundation is good system design. This book will provide guidance to define
secure additions and modifications for brownfield sites as well as secure-by-
design solutions for greenfield sites.
Poor project delivery can negate some or all of the benefits of secure designs.
This can take the form of poor execution or oversight. It might entail the
introduction of new vulnerabilities that are not properly identified or addressed.
It can even be seen in poor practices during the development or commissioning
of a system. This book will provide some guidance on effective oversight
methods.
The need to raise and maintain awareness in personnel is not unique to industrial
cybersecurity. That goes for everyone on staff, from senior management, who
provides the funding and own the risk; to the frontline workers, who are most
likely to be involved in either causing or avoiding an incident. This book will
offer tips on raising awareness of the risks and strategies to manage them.
Finally, this book will consider the role operational support plays in industrial
cybersecurity. That includes day-to-day activities such as operating system
patching and system backups, as well as preparation for and response to
cybersecurity incidents.
There are three core elements of cybersecurity: people, process, and technology.
Industrial cybersecurity is distinct from its counterpart in the IT world,
encompassing not just technology, but also the broader elements related to
people and processes.
With this in mind, the aim is to provide an understanding of the objectives, and
how to achieve them, without being prescriptive on technical details.
Terminology
Cybersecurity, like many technical subjects, comes with its own lexicon and,
with that, many confusing and interchangeable terms.
There are ongoing debates about the most appropriate and inclusive terms for the
subject matter of this book. I have used the term industrial cybersecurity, but I
acknowledge that some sectors or specialisms consider themselves excluded. For
example, many building automation system providers and users do not typically
consider themselves as “industrial.”
Operational technology, or OT, is another term that has been created to attempt
to distinguish industrial environments from information technology, or IT,
environments. Even this term generates discussion, and what is included or
excluded often depends on interpretation. Some even suggest that there is no
distinction; there is only technology.
Even the definition of automation systems is hotly debated. The ISA/IEC 62443
Series of Standards, which will be referenced throughout this book, is titled
Security for Industrial Automation and Control Systems. The abbreviation for
industrial automation and control systems, IACSs, is frequently used
interchangeably with ICSs (industrial control systems). Cyber physical systems
is another term that is gaining traction.
The series title brings up a final terminology question: Should the term used be
“cybersecurity” or simply “security”? Some believe that the cyber distinction
leads to incorrect assumptions such as this: ownership of the risk lies with an
organization’s chief information security officer (CISO). Others have accepted
that the term cybersecurity has been adopted sufficiently and that changing it
would lead to further confusion.
Throughout this book I will use many of these terms, sometimes
interchangeably. However, I believe the ideas in this book apply equally well to
any system or electronic and computing parts used to monitor and control
physical processes, whether they be in an industrial facility, a commercial
building, a vehicle, or anywhere else.
Intended Audience
This book is intended for anyone involved in industrial automation and control
systems cybersecurity, including operators, technicians, engineers, and managers
within asset-owner and asset-operator organizations; product vendors; system
integrators; and consultants.
____________
1 George Dalakov, “The First Computer Virus of Bob Thomas (Complete History),” accessed July 25,
2021, https://history-computer.com/inventions/the-first-computer-virus-of-bob-thomas-complete-
history/.
2 RSA is an acronym made up of the first letters of the last names of the three company co-founders:
Ron Rivest, Adi Shamir, and Leonard Adleman.
3 RSAC Contributor, “The Future of Companies and Cybersecurity Spending,” accessed June 21, 2021,
https://www.rsaconference.com/library/Blog/the-future-of-companies-and-cybersecurity-spending.
4 Gartner, “Gartner Forecasts Worldwide Security and Risk Management Spending to Exceed $150
Billion in 2021,” May 17, 2021, accessed June 21, 2021,
https://www.gartner.com/en/newsroom/press-releases/2021-05-17-gartner-forecasts-worldwide-
security-and-risk-managem.
5 Finances Online, “119 Impressive Cybersecurity Statistics: 2020/2021 Data & Market Analysis,”
accessed June 21, 2021, https://financesonline.com/cybersecurity-statistics/.
6 RiskBased Security, “2020 Year End Report: Data Breach QuickView,” accessed June 21, 2021,
https://pages.riskbasedsecurity.com/en/en/2020-yearend-data-breach-quickview-report.
7 Jack Evans, “Someone Tried to Poison Oldsmar’s Water Supply during Hack, Sheriff Says,” Tampa
Bay Times, February 9, 2021, accessed June 21, 2021,
https://www.tampabay.com/news/pinellas/2021/02/08/someone-tried-to-poison-oldsmars-water-
supply-during-hack-sheriff-says/.
8 Chris Young, “A 22-Year-Old Logged in and Compromised Kansas’s Water System Remotely,”
Interesting Engineering website, April 6, 2021, accessed June 21, 2021,
https://interestingengineering.com/a-22-year-old-logged-in-and-compromised-kansas-water-system-
remotely.
9 Ellen Nakashima, Yeganeh Torbati, and Will Englund, “Ransomware Attack Leads to Shutdown of
Major US Pipeline System,” Washington Post, May 8, 2021, accessed June 21, 2021,
https://www.washingtonpost.com/business/2021/05/08/cyber-attack-colonial-pipeline/.
10 Jacob Bunge, “JBS Paid $11 Million to Resolve Ransomware Attack,” Wall Street Journal, June 9,
2021, accessed June 21, 2021, https://www.wsj.com/articles/jbs-paid-11-million-to-resolve-
ransomware-attack-11623280781.
2
What Makes Industrial
Cybersecurity Different?
Introduction
Information technology (IT) cybersecurity is concerned with information
security, personal information, and financial transactions. Operational
technology (OT) cybersecurity is concerned with operational system availability
and safety.
Availability is a factor for IT systems, but there is some confusion about the
importance of availability. To address possible misunderstandings, a fourth
element, safety, began appearing in the OT priorities.
The C-I-A/S triad is a helpful tool. However, more clarity is needed to improve
the understanding of what sets industrial cybersecurity apart.
Table 2-1 summarizes the key differences between IT and OT environments for
these three elements.
People • Primary focus is the service provision • Primary focus is safety, then production
• Underlying technology is the majority of the • Underlying technology is a means to an
service end
• Control and management of data • Control and management of physical
• Many skilled professionals processes
• Limited pool of skilled professionals
To date, the focus on IT/OT differences has been on the technology element.
Many books and presentations have discussed similar lists of differences. These
differences continue to be important:
The term safety culture was first introduced in 1986 after the Chernobyl nuclear
accident. In this incident, the core of reactor number 4 at the Chernobyl Nuclear
Power Plant in Pripyat, Ukraine, ruptured in a steam explosion, releasing 5% of
the core’s radioactive material into the environment.13 The Chernobyl accident is
one of only two nuclear accidents rated at the maximum severity on the
International Nuclear Event Scale.14 The Chernobyl disaster killed 30 people
within weeks and caused an estimated 6,500 cancer cases. It also required the
evacuation of 350,000 people. The accident occurred during a safety test that
involved simulating a power outage. It resulted from a combination of a flaw in
the reactor design and inadequately trained operators.
The UK Health and Safety Commission defines safety culture as follows:
The product of individual and group values, attitudes, perceptions, competencies, and patterns of
behavior that determine the commitment to, and the style and proficiency of, an organization’s health
and safety management.
As already noted, the C-I-A triad greatly simplifies the relative concerns for IT
and OT systems. A more realistic list of potential consequences is as follows:
The cybersecurity attacks on OT systems to date have demonstrated that the first
three layers can be compromised. This means that organizations depend on
additional layers of protection to save them from catastrophe:
• Basic automation layer – In attacks on three energy distribution
companies in the Ukraine, hackers remotely seized control of the
supervisory control and data acquisition (SCADA) system, switching
substations off. Up to 73 MWh of electricity (or 0.015% of daily
electricity consumption in the Ukraine) was interrupted, leaving
customers without power for up to six hours.
• Plant personnel intervention layer – In the Stuxnet attack, malware
was able to push the centrifuges outside their normal operating envelope,
while reporting to operators that conditions were normal. The resulting
damage to the centrifuges set back the Iranian uranium enrichment
program several years.
• Safety system layer – In the TRISIS attack, bad actors attempted to
replace the code in a safety controller. The attempt was thwarted when
the safety system, operating as designed, failed safe, shutting in the plant.
Although there was no loss of primary containment or harm to
individuals, a plant shut-in is highly undesirable.
The risk assessment of the facility or process is based on the assumption that all
layers of protection are in place and will operate on demand. This means
organizations must take seriously the threat of a cybersecurity incident on these
systems.
Consider the example of a gas turbine control system. Gas turbines are used
extensively in industry for critical processes, such as power generation, gas
compression, and water injection. A gas turbine is shown in Figure 2-4. A typical
gas turbine may cost $6 million (£4.3 million), weigh 20,000 lb (9000 kg), and
operate at up to 10,000 psi (69,000 kPa).
Figure 2-4. Gas turbine used for power generation, gas compression, and water injection.
A control system is required to safely operate the turbine and shut it down in the
event of a serious situation. Figure 2-5 shows a simplified block diagram of this
system.
Figure 2-5. Simplified gas turbine control system showing potential cybersecurity risks.
A programmable logic controller (PLC) is the basis for the control functions that
provide the basic automation layer; a connected human-machine interface
(HMI) enables operators to observe the system status and make set-point
changes (the plant personnel intervention layer).
The safety functions that form the safety system layer may include a safety
controller that focuses on turbine protection, and a fire and gas controller that
interfaces with the safety controller to shut down the turbine if required. These
functions operate independently of the control functions and react immediately,
and automatically, to contain or mitigate a hazard.
To provide warranty and support for the end user, the gas turbine vendor collects
process data from the system. This data typically travels over a secure
connection between the vendor’s operations center and the facility. This
connection enables the vendor to analyze turbine performance and determine
maintenance actions.
The gas turbine control system is vulnerable to the same incidents discussed
earlier:
Consider the example of the gas turbine control system. External access is
limited to a secure connection with the vendor’s operations center. There is no
particular concern regarding sensitive data exfiltration; however, note the
following:
• The PLC, HMI, safety controller, and fire and gas controller may be
accessed by anyone in the facility. This flaw enables an unauthorized
individual to reprogram these systems or deploy malware to the HMI.
Although the specialist skills and knowledge to work on these systems
can be hard to find, there are several examples of hackers with no
industrial control systems experience identifying and exploiting
vulnerabilities in those systems. In one example, two hackers with no
prior product experience identified three previously unknown
vulnerabilities in a major automation vendor’s product and presented
them at the RootedCON 2014.23,24
• OT facilities have a variety of physical controls, such as lockable
cabinets and rooms, and strict procedures for accessing and working in
these areas. Nevertheless, personnel may bypass some of these controls,
for example, by leaving cabinets unlocked.
• The turbine control system network may be isolated from the wider
network (aside from the vendor connection). This means automated
monitoring and updates of Windows equipment may need to be done
manually.
• The secure connection provides some protection, but the effectiveness of
this control depends on the awareness, training, policies, procedures, and
physical security behaviors of the vendor’s personnel.
Based on these considerations, the focus on mitigations for the gas turbine
control system is distinct from that for mitigations for an IT system in several
ways:
• Physical and electronic security – Limiting physical and electronic
access to the control system components to authorized individuals only.
This is accomplished by such actions as locking doors and cabinets and
protecting usernames and passwords.
• Strict enforcement of procedures – For instance, limiting, or banning,
the use of removable media and maintaining security updates, antivirus
software, and signatures (or using application control) on Windows
equipment.
As noted earlier, the focus is more on the people and the processes than the
technology. The misuse or insecure use of technology in the OT environment can
create significant vulnerabilities. A safety culture with well-trained people
following strict processes and procedures is essential in the OT environment.
Figure 2-6. Governance is the foundation for effective industrial cybersecurity management.
Chapter 3, “Creating Effective Policy,” will address this subject in more detail.
The book Mission Critical Operations Primer26 provides more detail on the
primary function of regulatory and standards bodies. Although the book focuses
on US regulatory and standards bodies, similar organizations in other countries
perform the same function.
The CSF is structured into five core functions, each of which includes categories
and subcategories. This format enables those unfamiliar with the requirements of
cybersecurity management to navigate the subject and drill into detail as needed.
The CSF overview is illustrated in Figure 2-7. It shows the five core functions,
Identify, Protect, Detect, Respond, and Recover, with their respective categories
(e.g., Asset Management, Identity Management, and Access Control).
As noted previously, as a framework, the CSF does not provide any detailed
guidance. Instead, the document refers to standards and guides. This format
helps readers who are unfamiliar with the standards and guides to navigate the
documents.28
For industrial cybersecurity, the CSF refers to the ISA/IEC 62443 Series of
Standards and NIST 800 series guides for its specific guidance. Both of these
sources are focused specifically on industrial cybersecurity.
ISA/IEC 62443
The ISA/IEC 62443 Series of Standards addresses the security of industrial
automation and control systems (IACSs) throughout their life cycle. These
standards and technical reports were initially developed for the industrial process
sector but have since been applied to the building automation, medical device,
and transportation sectors. Figure 2-8 provides an overview of the family of
standards.
There are four tiers in the series of standards. The first two focus on people and
processes. The last two focus on technology (systems and components). At the
time of writing, some documents are still in development. Key documents in the
family include the following:
• Part 2-1 – Establishing an IACS security program. This helps
organizations plan and implement a cybersecurity management system
focused on industrial cybersecurity.
• Part 3-2 – Security risk assessment, system partitioning, and security
levels. This describes the requirements for addressing the cybersecurity
risks in an IACS, including the use of zones and conduits as well as
security levels. These are key aspects of industrial cybersecurity design.
• Part 3-3 – System security requirements and security levels. This
document describes the requirements for an IACS system based on a
specified security level. It helps organizations quantify their requirements
in universally understood terms.
• Part 4-1 – Product security development life-cycle requirements. This
describes the requirements for a product developer’s security
development lifecycle.
• Part 4-2 – Technical security requirements for IACS components. This
addresses the requirements for IACS components based on the required
security level. Components include devices and applications.
As noted previously, these documents are guides, not standards. They do not
benefit from the consensus and rigor of standards, such as the ISA/IEC 62443
series, and should be used appropriately.
Others
There are many other frameworks, standards, guides, and regulations that relate
to cybersecurity. These resources may be required for an organization’s
industrial cybersecurity management system or may need to be understood when
developing a system that interacts with an IT cybersecurity management system.
Table 2-2 provides some examples. Several are US-specific but typically have
equivalents in other countries.
Regulation Title 10 CFR – Energy Nuclear Regulatory Commission (NRC) regulation for the US
nuclear industry.
Regulation Critical Infrastructure North American Electric Reliability Corporation (NERC)
Protection (CIP): regulation for North American electricity generation and
• CIP-002-5.1a – Cyber distribution industries.
Security – BES Cyber
System Categorization
• CIP-003-6 – Cyber
Security – Security
Management Controls
• CIP-004-6 – Cyber
Security – Personnel &
Training
• CIP-005-5 – Cyber
Security – Electronic
Security Perimeter(s)
• CIP-006-6 – Cyber
Security – Physical
Security of BES Cyber
Systems
• CIP-007-6 – Cyber
Security – System Security
Management
• CIP-008-5 – Cyber
Security – Incident
Reporting and Response
Planning
• CIP-009-6 – Cyber
Security – Recovery Plans
for BES Cyber Systems
• CIP-010-2 – Cyber
Security – Configuration
Change Management and
Vulnerability Assessments
• CIP-011-2 – Cyber
Security – Information
Protection Related
Information
• CIP-014-2 – Physical
Security
Regulation Title 21 CFR Part 11 – US Food and Drug Administration (FDA) regulation on
Electronic Records; businesses producing food, tobacco products, medications,
Electronic Signatures – biopharmaceuticals, blood transfusions, medical devices,
Scope and Application electromagnetic radiation emitting devices, cosmetics, and
animal feed and veterinary products.
Regulation 6 CFR Part 27 – Chemical US Department of Homeland Security (DHS) regulation for
Facility Anti-Terrorism chemical facilities in the United States.
Standards (CFATS)
Standard ISO 61511:2016 – International standard that defines practices in the engineering
Functional Safety – Safety of systems that ensure the safety of an industrial process
Instrumented Systems for the through the use of instrumentation. It includes an explicit
Process Industry Sector requirement to conduct a security risk assessment (IEC 61511,
Part 1, Clause 8.2.4).
Standard ISO 27001:2013 – International standard for information security. Although
Information Technology – specific to IT systems, there are some overlaps that may need
Security Techniques – to be considered when developing an industrial cybersecurity
Information Security management system.
Management Systems –
Requirements
Guide Center for Internet Security Simple guide to the top 20 security controls that should be
(CIS) Critical Security implemented in IT and OT systems.
Controls
Framework COBIT 5 Control Objectives Developed by the Information Systems Audit and Control
for Information and Related Association (ISACA) to define a set of generic processes for
Technology (ISACA) the management of IT.
Summary
The aim of this chapter was to differentiate OT and IT cybersecurity and show
why these differences are important to an effective cybersecurity management
system.
For some time, the difference between OT and IT was explained using the C-I-A
triad. C-I-A shows the priority for IT cybersecurity is confidentiality (C),
whereas the priority for OT cybersecurity is availability (A). This explanation is
too simplistic and requires elaboration to provide a more complete picture.
____________
11 Jon Hemmerdinger, “Boeing Asked FAA in 2017 to Strip MCAS from Max Training Report,”
FlightGlobal website, October 18, 2019, accessed June 21, 2021,
https://www.flightglobal.com/airframers/boeing-asked-faa-in-2017-to-strip-mcas-from-max-training-
report/134896.article.
12 S. Lucchini, “I Thought I Had the Right Roadmap for Implementing a Safety System!,” (white paper
presented at the Texas A&M Engineering Experiment Station, 20th Annual International Symposium,
Mary Kay O’Connor Process Safety Center, Texas A&M University, 2017).
13 World Nuclear Association, “Chernobyl Accident 1986,” accessed June 21, 2021, https://www.world-
nuclear.org/information-library/safety-and-security/safety-of-plants/chernobyl-accident.aspx.
14 The other occurred at the Fukushima Daiichi Nuclear Power Plant in Ōkuma, Japan, and was caused
by an earthquake and subsequent tsunami.
15 The vendor, Fazio Mechanical Services, provided heating, ventilation, and air conditioning (HVAC)
services to Target. It was the subject of a phishing attack that resulted in the exfiltration of credentials
for Target’s billing system. The attackers used this system to gain access to the rest of Target’s
network. Because HVAC was indirectly involved, many mistakenly believe that this attack was the
result of ingress via a less-secure HVAC network. Brian Krebs, “Target Hackers Broke in Via HVAC
Company,” KrebsOnSecurity blog, February 5, 2014, accessed June 21, 2021,
https://krebsonsecurity.com/2014/02/target-hackers-broke-in-via-hvac-company/.
16 Kevin McCoy, “Target to Pay $18.5M for 2013 Data Breach that Affected 41 Million Consumers,”
USA Today, updated May 23, 2017, accessed June 21, 2021,
https://www.usatoday.com/story/money/2017/05/23/target-pay-185m-2013-data-breach-affected-
consumers/102063932/.
17 ”2017 Equifax Data Breach,” Wikipedia, accessed June 21, 2021,
https://en.wikipedia.org/wiki/2017_Equifax_data_breach.
18 ”Order Granting Final Approval of Settlement, Certifying Settlement Class, and Awarding Attorney’s
Fees, Expenses, and Service Awards,” Equifax Data Breach Settlement, accessed June 21, 2021,
https://www.equifaxbreachsettlement.com/admin/services/connectedapps.cms.extensions/1.0.0.0/927686a8-
4491-4976-bc7b-83cccaa34de0_1033_EFX_Final_Approval_Order_(1.13.2020).pdf.
19 Mark Thompson, “Iranian Cyber Attack on New York Dam Shows Future of War, Time, March 24,
2016, accessed June 21, 2021, https://time.com/4270728/iran-cyber-attack-dam-fbi/.
20 ”FireEye Responds to Wave of Destructive Cyber Attacks in Gulf Region,” FireEye blog, December
1, 2016, accessed June 21, 2021, https://www.fireeye.com/blog/threat-
research/2016/11/fireeye_respondsto.html.
21 Thomas Brewster, “Warnings as Destructive ‘Shamoon’ Cyber Attacks Hit Middle East Energy
Industry,” Forbes, December 13, 2018, accessed June 21, 2021,
https://www.forbes.com/sites/thomasbrewster/2018/12/13/warnings-as-destructive-shamoon-cyber-
attacks-hit-middle-east-energy-industry/#53fe71893e0f.
22 Joby Warrick and Ellen Nakashima, “Officials: Israel Linked to a Disruptive Cyberattack on Iranian
Port Facility,” Washington Post, May 18, 2020, accessed June 21, 2021,
https://www.washingtonpost.com/national-security/officials-israel-linked-to-a-disruptive-cyberattack-
on-iranian-port-facility/2020/05/18/9d1da866-9942-11ea-89fd-28fb313d1886_story.html.
23 Brian Prince, “Researchers Detail Critical Vulnerabilities in SCADA Product,” Security Week, March
13, 2014, accessed June 21, 2021, https://www.securityweek.com/researchers-detail-critical-
vulnerabilities-scada-product.
24 ”Juan Vazquez and Julián Vilas, “A patadas con mi SCADA! [Rooted CON 2014],” YouTube,
accessed June 21, 2021, https://www.youtube.com/watch?
v=oEwxm8EwtYA&list=PLUOjNfYgonUsrFhtONP7a18451psKNv4I&index=23.
25 Dr. Saul McLeod, “Maslow’s Hierarchy of Needs,” updated December 29, 2020, accessed June 21,
2021, https://www.simplypsychology.org/maslow.html.
26 Steve Mustard, Mission Critical Operations Primer (Research Triangle Park, NC: ISA [International
Society of Automation], 2018).
27 ”S.I. No. 360/2018 – European Union (Measures for a High Common Level of Security of Network
and Information Systems) Regulations 2018,” electronic Irish Statute Book, accessed June 21, 2021,
http://www.irishstatutebook.ie/eli/2018/si/360/made/en.
28 NIST, “Components of the Cybersecurity Framework,” presentation, July 2018,
https://www.nist.gov/cyberframework/online-learning/components-framework.
29 SP 800-82 Rev. 2, Guide to Industrial Control Systems (ICS) Security (Gaithersburg, MD: NIST
[National Institute of Standards and Technology], 2015), accessed June 21, 2021,
https://csrc.nist.gov/publications/detail/sp/800-82/rev-2/final.
3
Creating Effective Policy
Introduction
As mentioned in Chapter 2, “What Makes Industrial Cybersecurity Different?,”
governance is the foundation of a cybersecurity management system. Without
effective governance, policies and procedures may be overlooked or go
unenforced. Training may be ineffective if it lacks the weight of the
organization’s leadership, and investment in technical controls may be poorly
managed, leading to disappointing results. Despite these clear shortcomings,
many organizations implement elements of a cybersecurity management system
without good governance in place.
In simple terms, IT is responsible for the systems and networks in the office
environment (known colloquially as “the carpeted area”). OT is responsible for
the systems and networks in the production or manufacturing environment (“the
noncarpeted area”). The physical division between the two is referred to as the
demilitarized zone (DMZ). The DMZ represents an interface between the two
environments, designed to secure communications between them.
In health and safety management, lagging and leading indicators are critically
important. Figure 3-5 shows the relationship between leading and lagging
indicators for health and safety in the form of the accident triangle, also known
as Heinrich’s triangle or Bird’s triangle. The accident triangle is widely used in
industries with health and safety risks. It is often adapted to include regional or
organization-specific terms (e.g., HiPo or High Potential Accident instead of
Serious Accidents) or to provide further categorization (e.g., showing Days
Away from Work Cases and Recordable Injuries as types of Minor Accidents).
Figure 3-5. The accident triangle with lagging and leading indicators.
The accident triangle clearly shows the relationship between the earlier, more
minor incidents and the later, more serious ones. This relationship can also be
expressed in terms of leading and lagging indicators. In the accident triangle in
Figure 3-5, unsafe acts or conditions and near misses are leading indicators of
health and safety issues. As these numbers increase, so does the likelihood of
more serious incidents. These more serious incidents are the lagging indicators.
To minimize serious health and safety incidents, organizations monitor leading
indicators such as unsafe acts or near misses.
Monitoring near misses through safety observations (and encouraging employees
to report them), coupled with regular audits to check that employees are
following their training and procedures, creates leading indicators that can be
adjusted. For example, employees can be prompted to complete assigned safety
training. Observations and audits will show if this training must be adjusted. The
goal is to improve safety as reflected in lagging indicators.
Most cybersecurity metrics used today are lagging indicators. They are
outcomes, such as the number of cybersecurity incidents experienced, or the
time to detect, identify, contain, or resolve an incident. To manage cybersecurity,
an organization must identify leading indicators that influence the lagging
indicators. A sample security triangle, based on the safety triangle, is shown in
Figure 3-6.
Figure 3-6. A simple security triangle with lagging and leading indicators.
1 Impact of regulatory change and scrutiny on operational resilience, products, and 6.38 6.24 (3)
services
2 Economic conditions impacting growth 6.34 5.93
(11)
3 Succession challenges; ability to attract and retain top talent 6.27 6.34 (2)
4 Ability to compete with “born digital” and other competitors 6.23 6.35 (1)
5 Resistance to change operations 6.15 6.17 (5)
6 Cyber threats 6.09 6.18 (4)
7 Privacy/identity management and information security 6.06 6.13 (7)
8 Organization’s culture may not sufficiently encourage timely identification and 5.83 5.99 (9)
escalation of risk issues
9 Sustaining customer loyalty and retention 5.82 5.95
(10)
10 Adoption of digital technologies may require new skills or significant efforts to 5.71 N/A
upskill/reskill existing employees (new in 2020) (new)
*Scores are based on a 10-point scale, with “10” representing that the risk issue will have an extensive
impact on the organization.
While this survey is encouraging, it indicates that, for most organizations, the
focus will be on information security and privacy issues. Industrial cybersecurity
is still not well understood by boards, although production availability, safety,
and environmental harm are. Chapter 4, “Measure to Manage Cybersecurity
Risk,” will discuss methods to leverage this awareness when presenting
cybersecurity risk to senior management.
Monitoring Changes
A key role for the governance board is to monitor the changing circumstances
that impact the organization’s risk and, potentially, the cybersecurity
management system.
• Review and approval of operational support decisions relating to
industrial cybersecurity – Often decisions made at the operational level
have a major impact on an organization’s cybersecurity preparedness
(e.g., the decision whether to purchase spare parts, such as workstations,
PLC cards, and network devices).
• Review of changes to risk assessment – The cybersecurity risk
assessment is subject to constant change. This is in response to external
threats, new vulnerabilities, and organizational changes.
• Review of changes to the cybersecurity management system – In
addition to organizational changes resulting from operational support
decisions or risk assessments, it may be necessary to make changes to the
cybersecurity management system. For example, audits may highlight
gaps that must be closed, incident investigations may identify
improvements, or benchmarking may identify new best practices.
These considerations will be reviewed in more detail later in this book. For now,
it is enough to highlight that decision-making at the governance board level
leads to greater consistency across the organization and helps manage everyone’s
expectations. These three change monitoring items should be included in a
standing agenda for the governance board.
All information should be shared within the organization and with any relevant
external parties, such as the Department of Homeland Security in the United
States and the National Cyber Security Centre (NCSC) in the United Kingdom.
Sharing the information, good or bad, helps to make all employees feel they are
part of the results. Sharing with relevant external parties helps build a clearer
picture of the situation that all critical infrastructure operators face. Reports
should be issued regularly (in line with health and safety reporting) to reinforce
that the organization takes its cybersecurity responsibilities seriously.
Sharing only positive 1. Funding is cut Share information openly and honestly.
reports or messages or because it appears
downplaying bad news to to not be needed.
senior management 2. There is
significant
surprise and
disappointment
when an incident
occurs.
Overselling value of 1. There is a Ensure everyone is aware that people and process are
technical controls to the perception that always the weakest links, regardless of technical
organization everything is controls.
under control,
therefore
personnel do not
need to be
vigilant.
Declaring the details of 1. Lessons are not Censor and redact details as necessary but ensure that
cybersecurity incidents to be learned by those all incidents, including near misses, are promptly
classified, and thus limiting who are most reported. Reporting improves awareness and
the sharing of these likely to be encourages improvement, and prompt reporting
incidents within the involved in inspires a sense of urgency in addressing issues.
organization similar incidents.
2. Lack of incident
reporting creates
a sense that
cybersecurity is
not a major
problem.
IT departments acting 1. Incomplete view OT department defines and manages risk and IT
independently of OT of cybersecurity department supports OT department with services to
departments when engaging risk in the help manage that risk.
on cybersecurity organization or
with regulators
and other
authorities.
2. Lack of
investment as a
result of
incomplete view
of risk.
Collecting the data to track these indicators can involve significant cost and
effort. In some cases, an organization may lack access to the data needed. There
is also an infrastructure requirement to achieve this data collection. Once the
data is collectible, it must be analyzed and presented in a suitable format.
Cybersecurity vendors offer solutions, including reporting dashboards.
Organizations can develop their own in-house reporting solutions if they have
such a capability. The key with cybersecurity metrics and reporting, like any
other type of reporting, is to:
Like the safety triangle, the security triangle, as shown in Figure 3-7, clearly
shows the relationship between the leading and lagging indicators. The focus
must be on managing the leading indicators. A more detailed security dashboard
will be required to generate the data in the security triangle, but the security
triangle provides a snapshot of cybersecurity performance.
Figure 3-7. Security triangle showing real data.
For cybersecurity observations, the safety observation card in Figure 3-9 could
be modified as follows:
• Replace “Safe Behavior/Unsafe Behavior/Unsafe Condition” with
“Secure Behavior/Insecure Behavior/Insecure Condition.”
• Replace the list under “Hierarchy of Controls” with cybersecurity
controls employed in the organization, for example, “Antivirus
Protection/Operating System Patching/Access Control/Backup and
Recovery/Removable Media.”
Such reports are issued quickly upon conclusion of incident investigations. This
minimizes the chance of reoccurrence.
Publicizing new Like vulnerabilities, cybersecurity controls are well understood, and it is unlikely
controls designed to that an organization has a proprietary control that it cannot risk being disclosed.
avoid future
incidents
Regulatory Regulatory reporting cannot and should not be circumvented.
punishments
Public disclosure The impact on reputation is much greater if disclosure is withheld.
could harm
reputation
The arguments in Table 3-3 could apply to health and safety incidents. However,
the advent of “safety culture” has demonstrated that the benefits of open
reporting are greater than any potential downside.
• Security Policy
• Risk Assessment and Management
• Organizational Security Policies and Procedures
• Physical, Environmental, Personnel, and Vendor Security
• Continuous Improvement and Maturity Management
The radar chart shows how the organization is performing against the policy,
comparing the percentage of compliance for each section against a target.
The target for each section will be determined by the governance body. It may be
based on comparison with other organizations or other data.
To determine the percentage of compliance, the organization tracks whether it is
meeting the individual requirements of the section. Table 3-4 shows a simplified
example of a section with four requirements. For each requirement, the response
options are:
• F – Fully compliant
• P – Partially compliant
• N – Noncompliant
Figure 3-10 shows a hypothetical organization’s compliance against the five core
functions of the NIST NSF. This can be calculated in a similar manner to the
organization’s own policy, described previously.
Figure 3-10. Hypothetical compliance assessment against NIST CSF core functions.
Level 1 No No attempt is made to train and educate the organization. People do not know or understand
Security organizational policies and procedures, do not realize they are a target, and are highly
Awareness vulnerable to most human-based attacks.
Program
Level 2 Awareness program is designed primarily to meet specific compliance or audit
Compliance requirements. Training is limited to annual or ad hoc basis, such as an on-site presentation
Focused once a year or quarterly newsletters. There is no attempt to change behavior. Employees are
unsure of organizational policies, their role in protecting their organization’s information
assets, and how to prevent, identify, or report a security incident.
Level 3 Awareness program identifies the training topics that have the greatest impact on supporting
Promoting the organization’s mission and focuses on those key topics. Program goes beyond annual
Awareness training and includes continual reinforcement throughout the year. Content is
and Change communicated in an engaging and positive manner that encourages behavior change at
work, at home, and while traveling. As a result, employees are aware of policies/processes
and actively prevent, recognize, and report incidents.
Level 4 Long-term sustainment builds on an existing program that is promoting awareness and
Long-Term change, adding the processes and resources for a long-term life cycle, including at a
Sustainment minimum an annual review, and updating both training content and communication
methods. As a result, the program becomes an established part of the organization’s culture
and is always current and engaging.
Level 5 Defined as a security awareness program that has metrics in place to track progress and
Metrics measure impact. As a result, the program is continuously improving and able to
demonstrate a return on investment.
Summary
The most common failure in industrial cybersecurity governance arises from a
failure to properly execute one or more of the following tasks:
Introduction
There are many books on the subject of risk management. This chapter is not
intended to cover the basic principles described in those books. Instead, the aim
is to explain how industrial cybersecurity risks are different, and how those risks
can be quantified and managed. This chapter will look beyond how industrial
cybersecurity risks are currently managed to propose more effective approaches.
The recommended reading list provides several resources that describe the basics
of risk management in more detail. These resources offer more insight into how
probability and statistics can be used in cybersecurity.
The Cullen Report41 made more than 100 recommendations to improve safety.
One of the most significant recommendations was that responsibility for
identification of major accident risks should be transferred from legislator and
safety inspectorate to the operating company. Since then, safety has been
established as the number one priority for companies operating in high-hazard
environments. The management of safety risk is now a key activity embedded
into every process.
Defining Safety Risk
In safety, the definition of risk is: “a measure of human injury, environmental
damage, or economic loss in terms of both the incident likelihood and the
magnitude of the loss or injury.”42
ALARP is widely used in safety standards and legislation throughout the world.
IEC 6150849 defines the following for risks:
Figure 4-4 shows a visual representation of ALARP. Initially minimal cost (or
effort equated to cost) is expended to mitigate significant or intolerable risk. As
risk is addressed, the cost to mitigate increases further. The ALARP point is
when the cost to mitigate additional risk becomes disproportional to the risk
reduction.
The ALARP concept can be applied equally well to managing cybersecurity risk,
and especially to industrial cybersecurity risk. Consider the fact mentioned in
Chapter 1, that investment in cybersecurity spending is forecasted to reach $172
billion in 202250. Despite this massive investment, organizations are still being
impacted by cybersecurity incidents that, with hindsight, could have been easily
prevented. The following examples were discussed previously: The fuel pipeline
company that was incapacitated by ransomware because of an inadequately
secured remote access connection; the water treatment plant that was tampered
with by a disgruntled former employee who still had access to systems; the
compromise of a safety controller that could have been prevented by turning a
physical key to a different position.
These incidents illustrate the fact that too little effort is made to address
intolerable risks that, ironically, involve minimal cost. While it is unclear if the
current investment is appropriate, it is clearly not correctly targeted.
The process hazard analysis (PHA) methodology is used to assess hazards within
industrial processes. In the United States, the Occupational Safety and Health
Administration (OSHA) regulates high-risk facilities using process safety
management (PSM), which, among other things, requires that a PHA be
conducted every five years. There are several methods for performing a PHA. Of
these, the hazard and operability study (HAZOP) is the most comprehensive,
systematic, and commonly used method. A variation of the HAZOP, called
CHAZOP (for control hazard and operability study), was developed to assess the
safety impact of control system failures.
There have been attempts to align industrial cybersecurity risk assessment with
these safety methods, with cyber PHA being the most common. CHAZOP may
also incorporate cybersecurity risks. However, Marszal and McGlone note that
cyber PHA and CHAZOP are more akin to failure modes and effects analysis
(FMEA) than HAZOP. Whereas a HAZOP focuses on the hazards in the process,
cyber PHA and CHAZOP focus on control system and network equipment
failure. This is not ideal because of the following:
• The frequency of cyberattack is not random like other equipment failures
modeled in the FMEA. Although it is possible to apply a frequency for
the purposes of analysis, there is no statistical basis for it, unlike the
random hardware and human failures in non-cybersecurity assessments.
The output of such an analysis is therefore misleading. This could be
addressed by rigorous collection of data, but that takes time, and this
issue cannot wait.
• With the focus on control system and network equipment failure, the
identification of safeguards is limited to the control system and the
network, whereas the overall process analysis will identify other
safeguards (such as mechanical protection).
Figures 4-5 and 4-6 are bowtie diagrams related to safety and cybersecurity risk,
respectively. Bowtie diagrams are used in safety management to help visualize
the relationship between causes, hazards, and consequences.53
Figure 4-5 is a bowtie diagram showing a single initiating cause (flow valve fails
open) resulting in a hazard (overfill/overpressure of free water knockout vessel)
that causes an event (loss of primary containment) that can lead to a
consequence (fire/explosion). On the left side of the diagram are preventive
actions, those that are in place to stop the event (e.g., opening of a pressure
valve); on the right side of the diagram are mitigating actions, those that are in
place to reduce the impact of the event (e.g., emergency shutdown).
Figure 4-6 shows a typical cybersecurity bowtie. In this case, the initiating cause
(malware deployed on system) leads to a generic hazard (cybersecurity threat),
event (cybersecurity incident), and ultimately a consequence (loss of control),
which is system, rather than process, focused. As a result, the preventive and
mitigating actions are focused on the system rather than the process.
Figure 4-7 shows a simplified overview of the SPR process. The beauty of the
process is that the SPR is either part of an overall PHA study or uses the output
of a PHA study. This elevates cybersecurity in the overall process, where it can
be properly addressed. The company’s safety organization must understand that
cybersecurity risks can contribute to process hazards, and they should not be
treated as unrelated issues to be managed by others in the company. This is
easier said than done: Plant-based OT personnel may not have the time or
resources to address these issues. IT personnel may not have the domain
knowledge to properly appreciate the issues.
Figure 4-7. Overview of the SPR process.
Table 4-1 shows a simplified list of causes, consequences, and safeguards from a
gas turbine PHA. In this case, the loss of turbine wash-water feed-pump suction
and loss of lube oil cooling would be hackable because the safeguards relate to
computer-based elements, and the overpressure of a high-pressure (HP)
separator would not, because its safeguard is mechanical.
Table 4-1. Simplified causes, consequences, and safeguards from a gas turbine
PHA.
Cause Consequence Safeguard
Monte Carlo simulation is already widely used in organizations that are required
to estimate their cybersecurity risk at a specific level of confidence, so the
concept should be familiar.
• Major projects use Monte Carlo simulation to analyze cost and schedule
and to produce risk-based confidence estimates, such as P10, P50, or
P90, where P stands for percentile. Many organizations, such as the UK
Ministry of Defense, require P10, P50, and P90 confidence forecasts to
be provided.56
• The US Securities and Exchange Commission (SEC) defines oil and gas
reserves in terms of P10, P50, and P90 ranges.57
Using the same ranges to quantify cybersecurity risk should provide a familiar
basis for probability.
58
Hubbard and Seiersen’s method is well suited to estimating the likely financial
impact of a cybersecurity incident. The question is whether it can work with
other consequences, such as death or injury, harm to the environment, equipment
damage, loss of production, regulatory violations, and brand damage. Because
these consequences all have a financial impact, one option is to estimate that
impact using the loss exceedance method. The results of this method can also be
used to calibrate the results from the security PHA method.
Bayes’s Theorem60
Another statistical concept recommended by Hubbard and Seiersen is Bayes’s
theorem. Bayes’s theorem is well suited to dealing with situations where data is
limited. The challenge with statistics is that the accuracy of any estimate is based
on the size of the sample. Conversely, it is impractical (or impossible) to collect
a sample size sufficiently large to be accurate. Frequentist statistics, which
involves the collection of sample data and estimating mean and standard
deviation, works well when the data is normally distributed but is less reliable
otherwise. Figure 4-9 shows a simplified example. The expectation based on
frequentist statistics is the normal bell curve. Using the mean and standard
deviation of this curve would indicate a remote probability of an event
occurring. This could provide false reassurance.
Paul Gruhn, a renowned functional safety expert and past president of the
International Society of Automation (ISA), has adopted Bayes’s theorem in his
work for similar reasons, “Frequentist statistics cannot be used to confirm or
justify very rare events,” for instance, the probability that a plant will have a
catastrophic process safety accident in the next year.61
Where P(A) and P(B) are the probabilities that two events, A and B, occur
independently. P(A|B) means the probability that event A (the event we are
interested in, which is hard to estimate) occurs given that event B has occurred
(an event that can be observed). P(B|A) is the probability that event B occurs
given that event A has occurred. What makes Bayes’s theorem powerful is that
P(A) can start with any prior estimate, but with new evidence (P(B), the new
estimate will improve and can be used in the next iteration of the calculation
when new evidence is available. To see how Bayes’s theorem can help, consider
the following example from Hubbard and Seiersen:63
A prior estimate can be based on data (e.g., how many people followed
another procedure), or an educated guess (e.g., a specialist may use their
experience to estimate the number). To be extremely conservative, the
estimate could assume all probabilities are equal. This is shown in Figure 4-
10. All possible probabilities are shown from 0 to 1 (or 0% to 100% in
percentage terms). The uniform distribution shows all probabilities are
equally valid.
Each FR is first defined in terms of what is needed for each SL. For example, for
FR 2, Use Control the SLs are:
• SL-1 – Restrict use of the IACS according to specified privileges to
protect against casual or coincidental misuse.
• SL-2 – Restrict use of the IACS according to specified privileges to
protect against circumvention by entities using simple means with low
resources, generic skills, and low motivation.
• SL-3 – Restrict use of the IACS according to specified privileges to
protect against circumvention by entities using sophisticated means with
moderate resources, IACS-specific skills, and moderate motivation.
• SL-4 – Restrict use of the IACS according to specified privileges to
protect against circumvention by entities using sophisticated means with
extended resources, IACS-specific skills, and high motivation.
Base On all interfaces, the control system shall provide the capability to enforce
Requirement authorizations assigned to all human users for controlling use of the control system to
support segregation of duties and least privilege.
Requirement RE (1): Authorization enforcement for all users on all interfaces – The control system
Enhancements shall provide the capability to enforce authorizations assigned to all users (humans,
(RE) software processes, and devices) for controlling use of the control system to support
segregation of duties and least privilege.
RE (2): Permission mapping to roles – The control system shall provide the capability
for an authorized user or role to define and modify the mapping of permissions to roles
for all human users.
RE (3): Supervisor override – The control system shall support supervisor manual
override of the current human user authorizations for a configurable time or event
sequence.
RE (4): Dual approval – The control system shall support dual approval where an action
can result in serious impact on the industrial process.
Once the system under consideration (SUC) is identified (i.e., what is included
and excluded from the scope), an initial cybersecurity risk assessment is
performed. The SUC is then divided into separate zones (e.g., by vendor or by
functional area). Next, the connections between these zones (the conduits) are
identified. Cybersecurity safeguards are then identified and documented. This is
accomplished using the guidance from ANSI/ISA-62443-3-3 described earlier.
Implementing safeguards based on the ISA/IEC 62443 SRs enables asset owners
to demonstrate traceability to an international standard. In the absence of specific
regulations, such traceability helps the asset owner demonstrate that it has
reduced cybersecurity risks to ALARP, much the same as traceability to
International Electrotechnical Commission (IEC) 61508 requirements
demonstrates management of safety risks. When combined with the SPR
methodology described earlier, this traceability gives an asset owner a powerful
argument for its management of cybersecurity risks.
Table 4-3 shows the typical sharing of responsibilities for the defense-in-depth
measures in the bowtie diagram in Figure 4-8.
Although the asset owner has responsibility for all defense-in-depth measures,
the asset owner depends on other principal roles to perform the tasks. For
example:
• The asset owner can limit physical and electronic access to the system,
but the maintenance service provider must do the same. In 2000, a
disgruntled former contractor working for an integration service provider
was able to gain unauthorized access to a wastewater control system and
release raw sewage into the environment more than 40 times over several
months. The integration service provider did not do enough to prevent
the contractor’s access. It should have, as a minimum, removed the user
from the system and changed all shared account passwords when the
contractor left the project.
• Removable media access, antivirus protection, operating system updates,
and backup and recovery are key cybersecurity defense-in-depth
measures, but they require all principal roles to contribute. The product
supplier must support these features. This means, for example, not
relying on an obsolete operating system that cannot be updated. The
integration service provider must design in these requirements from the
outset and test them before handover. The maintenance service provider
will be required to operate these measures. This could involve taking and
testing backups, as well as applying antivirus and operating system
updates. The asset owner must support the measures with rules such as
forbidding removable media access.
• The asset owner is entirely responsible for the mechanical fail-safes.
These provide one of the last lines of protection in the event of an
incident. The product supplier should not be responsible for the design or
maintenance of these fail-safes. The integration service provider may
design in these fail-safes (depending on its role in the project) but cannot
be responsible for their maintenance and upkeep. The maintenance
service provider may have some responsibility depending on its
arrangement with the asset owner.
In many ways, these key cybersecurity safeguards mimic the lifesaving safety
rules asset owners also mandate. Figure 4-13 shows the International
Association of Oil & Gas Producers (IOGP) lifesaving rules that are adopted by
many in that sector. Those who do not use the IOGP lifesaving rules have their
own set of rules that are very similar.
The lifesaving rules set out simple and clear dos and don’ts. The rules have been
put in place to ensure a consistent safety posture for all workers.
The cybersecurity safeguards achieve a similar result for the organization’s cyber
resources without the need to perform in-depth analysis of every system and
process. If an organization can comply with all cybersecurity safeguards on all
systems, the likelihood of a cybersecurity incident will be greatly reduced. The
following are disadvantages of this one-size-fits-all approach:
• Systems, and the processes they control and monitor, have different
levels of risk and consequence. This approach does not prioritize based
on these factors.
• Systems contain different components, and as a result, it may not be
possible to apply all safeguards equally across all systems.
Chapter 5, “Standardized Design and Vendor Certification,” will address the
issues with this approach in more detail and suggest solutions to improve it.
Figure 4-13 shows that eliminating the hazard is the most effective method,
while personal protective equipment (PPE) is the least effective. If the minimum
cybersecurity safeguards described earlier are mapped to the hierarchy of
controls, the effectiveness of the safeguards can be seen more easily. This is
shown in Table 4-4. For effectiveness, a score of 1 to 5 is used, with 1 being the
least effective and 5 being the most effective.
The cybersecurity risk chain, shown in Figure 4-15, identifies all the stages in
the process from product design to facility operation. This is distinct from other
frameworks that describe the process of a cybersecurity attack, such as Mitre’s
ATT&CK for Industrial Control Systems and Lockheed Martin’s Cyber Kill
Chain.68,69 This cybersecurity risk chain shows how cybersecurity vulnerabilities
in an operational facility are created throughout the entire system life cycle from
initial product design and development to operational use.
Table 4-5 summarizes the key stages in the process and highlights what can be
done to reduce the risk at each stage to reduce overall risk for the asset owner.
One notable observation from this visual representation of risk is the dominance
of people and procedures. As noted throughout this book, although technology is
important, the most significant factors in industrial cybersecurity are people and
processes.
There are already certifications in place for people, products, and systems. There
will be similar certifications in place for facilities in the foreseeable future.
Adoption of these certifications throughout the industrial cybersecurity risk
chain would dramatically reduce cybersecurity risk for asset owners. The main
issue is that asset owners are not currently demanding these certifications.
Instead, asset owners are spending millions of dollars every year performing
their own assessments of systems. Sometimes they assess the same system in
different projects and regions. Despite this investment, the results are
disappointing. As noted earlier, asset owners are satisfied with a set of minimum
cybersecurity safeguards applied around the system.
Some vendors have taken the initiative to obtain certifications for their
development processes, products, and systems. They promote this as a
differentiator. The analogy of hazardous equipment still applies. Asset owners
would not consider buying a product for use in a hazardous area unless it was
certified. The future of industrial cybersecurity will be similar. Asset owners will
only buy certified products and systems delivered by certified professionals.
Summary
This chapter provided details on why industrial cybersecurity risk is different
from its IT counterpart. Although an increasing number of organizations
understand these differences, many still use the same techniques to estimate
industrial cybersecurity risk.
Even without doing a thorough risk analysis, it is still possible to apply some
basic controls and make a significant improvement in cybersecurity posture.
Examples abound: secure network design, hardening of devices, deployment of
antivirus software, ongoing update of operating system patches, maintenance of
system backups, establishment of recovery procedures, establishment of
awareness training for all personnel, and establishment of cybersecurity incident
response plans. These basic controls are similar to the safety rules that many
organizations operate. However, like the safety rules, they rely on people
following procedures, and the people or procedures can fail.
____________
41 Lord William Cullen, The Public Inquiry into the Piper Alpha Disaster (London: Her Majesty’s
Stationery Office, 1990), http://www.hse.gov.uk/offshore/piper-alpha-disaster-public-inquiry.htm.
42 Definition from the American Institute of Chemical Engineers (AIChE) and Center for Chemical
Process Safety (CCPS).
43 A denial-of-service (DoS) attack occurs when legitimate users are unable to access information
systems, devices, or other network resources due to the actions of a malicious cyber threat actor. US
Cybersecurity and Infrastructure Security Agency (CISA), “Security Tip (ST04-015): Understanding
Denial-of-Service Attacks,” revised November 20, 2019, accessed June 21, 2021, https://us-
cert.cisa.gov/ncas/tips/ST04-015.
44 Ralph Langer, To Kill a Centrifuge: A Technical Analysis of What Stuxnet’s Creators Tried to Achieve,
accessed June 21, 2021 (Arlington, VA: The Langner Group, November 2013),
https://www.langner.com/wp-content/uploads/2017/03/to-kill-a-centrifuge.pdf.
45 Nicholas Falliere, Liam O Murchu, and Eric Chen, W32.Stuxnet Dossier Version 1.3 (November
2010), accessed June 21, 2021,
https://www.wired.com/images_blogs/threatlevel/2010/11/w32_stuxnet_dossier.pdf.
46 Amy Krigman, “Cyber Autopsy Series: Ukrainian Power Grid Attack Makes History,” GlobalSign
Blog, October 22, 2020, accessed June 21, 2021, https://www.globalsign.com/en/blog/cyber-autopsy-
series-ukranian-power-grid-attack-makes-history.
47 Dragos, “TRISIS Malware: Analysis of Safety System Targeted Malware,” version 1.20171213,
accessed June 21, 2021, https://www.dragos.com/wp-content/uploads/TRISIS-01.pdf.
48 ”ALARP at a glance,” Health and Safety Executive, accessed November 6, 2021,
https://www.hse.gov.uk/managing/theory/alarpglance.htm.
49 IEC 61508-1:2010, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-
Related Systems – Part 1: General Requirements (IEC [International Electrotechnical Commission]).
50 “Cybersecurity spending trends for 2022: Investing in the future,” CSO, Accessed February 14, 2022,
https://www.csoonline.com/article/3645091/cybersecurity-spending-trends-for-2022-investing-in-the-
future.html
51 Edward Marszal and Jim McGlone, Security PHA Review for Consequence-Based Cybersecurity
(Research Triangle Park, NC: ISA [International Society of Automation], 2019).
52 Marszal and McGlone, Security PHA Review for Consequence-Based Cybersecurity, 14.
53 It is not the aim of this book to describe the bowtie diagram in detail. The “Further Reading” section
provides references for more details on this subject.
54 Marszal and McGlone, Security PHA Review for Consequence-Based Cybersecurity, 9.
55 Douglas W. Hubbard and Richard Seiersen, How to Measure Anything in Cybersecurity Risk
(Hoboken, NJ: John Wiley & Sons, 2016), 38.
56 Martin Hopkinson, “Monte Carlo Schedule Risk Analysis—A Process for Developing Rational and
Realistic Risk Models” (white paper, Risk Management Capability, 2011), accessed June 21, 2021,
http://www.rmcapability.com/resources/Schedule+Risk+Analysis+v1.pdf.
57 “Summary Report of Audits Performed by Netherland, Sewell & Associates,” accessed June 21,
2021, https://www.sec.gov/Archives/edgar/data/101778/000119312510042898/dex992.htm.
58 Hubbard and Seiersen, How to Measure Anything in Cybersecurity Risk, 52.
59 European Union Agency for Cybersecurity (ENISA), “ENISA’s Position on the NIS Directive,”
January 2016, accessed June 21, 2021, https://www.enisa.europa.eu/publications/enisa-position-
papers-and-opinions/enisas-position-on-the-nis-directive.
60 This is often written as Bayes’ Theorem. This book uses the Britannica version of the name.
61 Paul Gruhn, “Bayesian Analysis Improves Functional Safety,” InTech, March 31, 2020, accessed June
21, 2021, https://www.isa.org/intech-home/2020/march-april/features/bayesian-analysis-improves-
functional-safety.
62 Hubbard and Seiersen, How to Measure Anything in Cybersecurity Risk, 161–165.
63 Hubbard and Seiersen, 171–174.
64 ANSI/ISA-62443-3-3 (99.03.03)-2013, Security for Industrial Automation and Control Systems –
Part 3-3: System Security Requirements and Security Levels (Research Triangle Park, NC: ISA
[International Society of Automation]).
65 ANSI/ISA-62443-3-2, Security for Industrial Automation and Control Systems – Part 3-2: Security
Risk Assessment for System Design (Research Triangle Park, NC: ISA [International Society of
Automation]).
66 These principal roles are defined in ANSI/ISA-62443-1-1.
67 Intrinsic safety is a design technique applied to electrical equipment for hazardous locations that is
based on limiting energy, electrical and thermal, to a level below that required to ignite a specific
hazardous atmospheric mixture.
68 The Mitre Corporation, “ATT&CK for Industrial Control Systems,” accessed June 21, 2021,
https://collaborate.mitre.org/attackics/index.php/Main_Page.
69 Lockheed Martin Corporation, “The Cyber Kill Chain,” accessed June 21, 2021,
https://www.lockheedmartin.com/en-us/capabilities/cyber/cyber-kill-chain.html.
5
Standardized Design and
Vendor Certification
Introduction
In response to cybersecurity threats to their products, major automation vendors
have begun developing their own secure architectures. In some cases, they
incorporate customized security tools. Several vendors have gone a step further
and obtained third-party certification of these solutions.
Despite this, asset owners collectively spend millions of dollars designing and
reviewing solutions from vendors. These solutions are routinely deployed, even
within the same asset-owner organization. In fact, an asset owner with multiple
projects in different regions of the world, using the same vendor solution, may
treat each deployment project as if it were novel and unknown.
Clearly, more standardization would improve the owner’s cybersecurity posture
while reducing the cost of deployment. This chapter will consider the benefits of
standardized designs, identify the elements of a standardized design, and
recommend ways to capture these details and minimize implementation costs.
With these certifications in place, asset owners can select products based on the
certification, confident that the hazardous-area requirements are met. The asset
owner is required to follow standards and vendor instructions for safe
deployment of the product, for instance, selecting the power supply or
connection of external devices. These instructions are sufficiently prescriptive
that there is little room for interpretation. Design documentation must be
produced, but the requirements for this documentation are well defined.
Table 5-1 summarizes the essential elements along with a reference to the
ISA/IEC 62443 FRs75 from which they are derived. The table also shows which
elements relate to the system and which pertain to the environment around the
system.
Table 5-1 shows why is it is not enough to implement secure systems and
components. This is consistent with the hazardous-area equipment analogy
already discussed. Even when a certified-hazardous-area product is procured, it
must still be installed in compliance with standards and the vendor’s instructions
to ensure it is safe. In this case, the asset owner must put controls in place around
the systems and components it procures to ensure the overall facility is secure.
Figure 5-2 shows a simplified block diagram of a typical IACS environment for
a facility. This example includes an integrated control and safety system (ICSS).
This could also be a separate, distributed control system (DCS) and safety
instrumented system (SIS), or a wide-area supervisory control and data
acquisition (SCADA) system. The facility also includes a system to monitor and
control the power to the plant and power to the systems themselves. These
systems, along with a turbine control system, an associated vibration monitoring
system, and specific control systems, are provided as part of the packaged plant
system (e.g., wastewater treatment). These systems are typically procured from
different vendors but must work together to achieve the overall objectives for the
asset owner.
Even with vendor support, the asset owner must secure the entire facility, not just
individual systems. Maintaining multiple vendor systems for antivirus and
patching could be cost prohibitive and difficult to resource. Vendors may have
different standards for screening operating system patches or network
monitoring, which may lead to inconsistencies. For this reason, it is essential that
the scope is clearly defined. Security implementation must be facility-wide, not
on a per-system basis. There are many challenges to achieving this clear scope.
These challenges are discussed in more detail in Chapter 6, “Pitfalls of Project
Delivery.”
Figure 5-3 shows the example facility in more detail. This diagram identifies the
key components of each system76 and the connectivity required for operational
purposes. The individual system architectures are based on actual vendor
solutions.
Figure 5-3. Illustrative facility architecture with no environment security controls.
The systems themselves may individually meet the asset owner’s security
requirements, but additional controls are required to operate in this
interconnected manner.
The core of the facility is the ICSS. The other systems communicate key process
data with the ICSS. This provides a facility-wide overview of operations from
one human-machine interface (HMI). Each system provides its own HMI. This
allows for a more detailed view of system operation. For instance, operators may
require a summary of power status on the ICSS HMI. Electrical engineers may
need to view more detailed power management information from that system’s
HMI.
The ancillary systems connect to the ICSS via Ethernet networks. Typically, the
ICSS will poll the ancillary systems using an industrial protocol, such as
Modbus or EtherNet/IP. This is an open standard based on the Common
Industrial Protocol (CIP), not to be confused with Transmission Control
Protocol/Internet Protocol (TCP/IP). The ancillary systems will return the data
requested by the ICSS.
The content and operation of the packaged plant control systems vary
considerably depending on the package and the vendor. In some cases, the
control system may comprise a programmable logic controller (PLC) and an
HMI. In other cases, it may be a personal computer (PC) connected to specialist
sensors. A third option may be a PLC with no HMI. The connectivity to the
ICSS will also vary. Modern packaged control systems integrate into the same
Ethernet networks as the ancillary systems described earlier. However, some
systems still connect using serial networks (RS-232 or RS-485). In some cases,
this may involve hardwired connections, such as analog or digital inputs or
outputs. These connections represent critical signals in the process.
Purdue Hierarchy
The Purdue hierarchy was developed by a team led by Theodore (“Ted”)
Williams (formerly of Monsanto Chemical Co.) at Purdue University’s
consortium for computer integrated manufacturing and published in 1992.77 The
Purdue reference model is part of a larger concept: the Purdue Enterprise
Reference Architecture (PERA). This concept “provides a way to break down
enterprises into understandable components, to allow staff at all levels to see the
‘20,000-ft view’ as well as to describe the details that they see around them
every day.”78 PERA expert and evangelist Gary Rathwell was a member of the
original development team. He maintains that PERA was ahead of its time and
never achieved the level of adoption that it deserved. Rathwell has successfully
implemented many major automation projects by following the PERA
methodology, but few outside his projects appreciate what can be achieved. Most
automation professionals know only of the Purdue hierarchy, either from the
ISA-95 standard, incorporated into the IEC 62264 standard, Enterprise-Control
System Integration,79 or the ISA-99 standard, incorporated in ISA-62443-1-1,
Security for Industrial Automation and Control Systems,80 and even then, there is
limited understanding of the principles behind it.
Figure 5-4 shows the original Purdue hierarchy, in this case for a continuous
process such as petrochemicals. Another version (with different descriptions)
covered a manufacturing complex.
Figure 5-4. The original Purdue hierarchy.81
The Purdue hierarchy is a foundational element for all automation systems, just
as the Open Systems Interconnection (OSI) model, which defines how network
systems are architected, is fundamental to all networks. An overview of the OSI
model can help one better understand the importance of the Purdue hierarchy.
OSI Model
The basic reference model for OSI is a standard created in 1983 by The
International Organization for Standardization (ISO) and the International
Telegraph and Telephone Consultative Committee (CCITT). The standard is
usually referred to as the Open Systems Interconnection Model, or OSI model for
short.
The OSI model is shown in Figure 5-6. It divides network communications into
seven layers:
A key application for many asset owners and their vendors is condition-based
monitoring. Condition-based monitoring requires data at a relatively low rate
and assumes the condition does not change rapidly. The primary use of
condition-based monitoring is to monitor operational data over time. The goal is
to predict failures, rather than detect issues, in near real time. Figure 5-9 shows
that both of these solutions achieve the condition-based monitoring application
requirement.
Figure 5-9. Conventional and IIoT-based approach to vibration monitoring.
In traditional implementations, the set point is set locally via the operator
console or the supervisor control console. The advent of improved
communications and devices allows a “higher degree of automation.” This
enables some business logic to determine the optimal set points needed to
achieve a particular objective. However, the use of IIoT cannot change where the
closed-loop control is executed, at least not in a resilient solution. The Purdue
levels help explain the order of priority for operation:
• Level 0 – The plant must be able to either operate safely without Level 1
or be shut down in the event of a loss of level 1 (local control).
• Level 1 and below – The plant must be able to operate without Level 2
(supervisory control).
• Level 2 and below – The plant must be able to operate without Level 3
(historian).
• Level 3 and below – The plant must be able to operate without Level 4
(business logic).
Without plant data logging, any of these scenarios could lead to a loss of
regulatory data, or a site visit to manually record this data. Depending on system
configuration, some scenarios may avoid this situation. For example, if the
SCADA can store data for several days, then loss of communications with the
historian, or failure of the historian, may not present a problem. Nevertheless, a
good design that incorporates the plant data logging function would mitigate this
risk in all cases. The weaker design may cost a bit less, but this savings would be
erased by the fines and overtime costs for a single failure.
Compare the facility architecture shown in Figure 5-3 with the equivalent
Purdue hierarchy shown in Figure 5-12.
Conduits
Conduits are the connection between two or more zones. Conduits can represent
the following:
The cloud has many advantages, but the loss of cloud connectivity could result
in a facility shutdown. There are many real-world examples where this occurred.
In one anecdotal example, a production facility depended on a printing service to
produce product labels. During the WannaCry outbreak in May 2017,87 the
facility operations team was told to disconnect from the external network to
prevent infection. That is when they discovered the printing service was located
on the other side of the disconnected network. As a result, production halted
until the connection could be restored.
Figure 5-14 is an update of the facility architecture example in Figure 5-3. This
update includes a zone and conduit hierarchy, with centralized services.
Figure 5-14. A potential zone hierarchy for the example facility architecture.
The architecture now includes a DMZ with network time, central domain,
backup, endpoint protection, and remote access services to be shared by all the
systems. Any duplicated equipment, such as NTP servers, can be removed. All
traffic is directed through the DMZ avoiding direct access to any of the systems.
The management of this DMZ is critical to the successful operation of the
facility. In some organizations, management of DMZ equipment may by default
be the responsibility of the information technology (IT) function, with
independent oversight by the operational technology (OT) function. Some
organizations may create an industrial DMZ managed by the OT function with
independent oversight by the IT function. As with all decisions discussed in this
book, the organization must take a risk-based approach to assessing the options,
ensure there are sufficient qualified resources available to administer the
procedures, and apply rigorous oversight to ensure the procedures are followed
and the risks are managed.
When determining conduits, all communications paths into and out of zones
must be considered. These considerations include the following:
• The primary communications path for transferring data to and from the
zone
○ Remote access connections
○ Cellular or other backup connections
○ Dial-up connections used by vendors
• Definitions required for each conduit identified
○ The zones it connects (to and from)
○ The communications medium it uses (e.g., Ethernet, cellular)
○ The protocols it transports (e.g., Modbus, TCP port 502)
○ Any security features required by its connected zones (e.g., encryption,
multifactor authentication).
Once identified, a conduit list should be produced to capture the details. An
example is shown in Table 5-2. At this stage, it is sufficient to list the names of
the traffic/protocols. This list will identify and document the specific ports
needed to create the firewall rules.
88,89
With the zones and conduits identified, the next step is to segregate the zones
and manage the communications across the conduits. There are several options,
the most common being
• firewall,
• virtual local area network (VLAN), and
• virtual private network (VPN).
Firewall
A firewall controls access to and from a network for the purpose of protecting it
and the associated devices. A firewall connects to two or more networks,
creating separate network zones. A firewall operates at layer 2 or layer 3 of the
OSI model—the network layer or data layer—and filters traffic by comparing
network packets against a ruleset. A rule contains the following details:
• Source address
• Source port
• Destination address
• Destination port
• Protocol, TCP, User Datagram Protocol (UDP), or both
It can also include a time element (e.g., limiting remote access as needed rather
than always). Most firewalls have multiple network interfaces. The firewall rule
will define which interface the rule is configured on.
The firewall ruleset should be defined based on the conduit list produced earlier.
For example, using the conduit list in Table 5-2, the ruleset for the turbine
control system zone would be as shown in Table 5-3.90
Consider the example of the turbine control system interface to the ICSS. This is
Modbus/TCP based. The Modbus protocol is a simple command-response type.
The server sends a command that indicates the operation (read or write, and the
type of data involved). The command also identifies the range of registers
(values) it affects. Modbus includes eight common commands, each with its own
function code, and the address range identifying the function codes. Table 5-5
shows these commonly used codes.94
The turbine control system uses only two of these commands and a limited
address range for each command. An example as shown in Table 5-6.95
Table 5-6. Modbus commands and address ranges for a turbine control system
interface.
Function Command Address Range
Code
Figure 5-16 shows a real example, from an operational facility, where the
industrial firewall was disconnected because “the automation system did not
work when we connected it.” There were no plans to resolve this situation before
it was identified in an audit.
Figure 5-16. A disconnected industrial firewall in an operational facility.
The industrial firewalls are network-based devices and may be used for more
than one conduit. For instance, the firewall could be placed on the uplink of a
switch that connects multiple systems. However, in this example, additional
commands and/or address ranges would need to be configured. From a
segregation and management perspective, it would be better to configure one
industrial firewall per conduit. In this case, the firewall would be placed in line
between each system and the switch. The cost per unit is marginal compared
with ongoing maintenance and management. Also, this approach improves
security.
In a VPN, the computers at each end of the tunnel encrypt the data entering the
tunnel. They then decrypt the data at the other end using encryption keys. Once
data is encrypted, it is impossible to read without access to the encryption keys.
IP Security (IPsec) secures the storage and transmission of these encryption keys
and enables secure VPNs to operate. IPsec is a set of protocols developed by the
Internet Engineering Task Force (IETF) to support the secure exchange of data
across the Internet. IPsec has been deployed widely to implement VPNs.
For IPsec to work, the sending and receiving devices must share a public key.
This is accomplished through a protocol known as Internet Security Association
and Key Management Protocol/Oakley (ISAKMP/Oakley), which allows the
receiver to obtain a public key and authenticate the sender using digital
certificates. Digital certificates have additional security benefits. As well as
authenticating a user, they provide
• data integrity assurance, by verifying that data has not been altered in
transit; and
• nonrepudiation, by proving that data was sent by a particular user, based
on their certificate credentials.
There are some scenarios where asset owners approve vendors to provide
technical support via VPN. This technical support may require changes to
system set points or logic. In fact, the COVID-19 pandemic forced many
organizations to adapt to the challenges of restricted travel and site work. In May
2020, Siemens successfully completed the start-up and adjustment of one of its
gas turbines in Russia.96 Although this case involved changes made by personnel
on-site with guidance by remote experts through videoconference, there is a
trend in many organizations toward performing more work remotely. This should
be considered with great caution. One effective control is to limit the availability
of such remote access to only when necessary, and under strict on-site
supervision. Such access should be disabled by default.
System Hardening
Hardening a system means configuring equipment to reduce the likelihood of
using a vulnerable program or service. Automation systems have a narrower
function than IT systems. Automation systems are thus better suited to the
rigorous hardening needed to prevent unauthorized access or operation.
Assuming the physical security team has put in place the elements mentioned
earlier, additional considerations for physical security of automation systems
equipment include the following:
• Define who should have access to each facility. This might include who
has keys, or copies of keys; who is programmed into an electronic card
access system; or who has the codes for keypad locks.
• Create a process for taking action when someone leaves. It is common to
share codes or keys with staff and vendors. A standard process should
ensure that locks or codes are changed, or card access systems are
updated, when someone leaves employment.
• Enforce physical security on-site. This includes ensuring that equipment
rooms (e.g., Figure 5-22) and cabinets (e.g., Figure 5-23) are locked
when not in use. Visitors must always be escorted. These controls are
part of the risk assessment performed by the physical security team. They
are deemed necessary and should always be in place.
Figure 5-22. An equipment room within a secured facility.
Figure 5-23. Inside an equipment room with locked cabinets.
• Documents and storage media (e.g., CDs, USB drives) should be kept in
secure cabinets on-site and should not be left unattended. Documents
may contain sensitive information that can be used in conjunction with a
cybersecurity attack.
• Cabling and equipment ports should be physically secure from
interference. For example, it should not be possible to cut cables or
connect equipment to networks from outside the secure perimeter.
• Equipment should be sited or protected to reduce the risks from
environmental threats and hazards, as well as from unauthorized access.
Figure 5-24 shows an automation system device (a laptop used to log
data from a sensor array and transmit it to a control system) that is not
properly secured.
These points highlight that the skills and knowledge required to administer IACS
environment electronic access are significant, and the time and resources to
undertake this task are is also significant. Organizations may choose to allow
their IT function to administer with oversight from the OT function or have
qualified personnel in the OT function administer with IT oversight.
Multifactor Authentication
Multifactor authentication is recommended for an increased level of security
when performing secure activities, such as remote access. Because it is possible
to compromise a username and password combination, multifactor
authentication also requires one or more additional factors. The idea is that an
unauthorized person is unlikely to have all the factors. Two-factor authentication
is the most common. The authentication factors of a two-factor authentication
scheme may include the following:
• A physical object in the possession of the user, such as a USB drive with
a secret token, a bank card, or a key
• A secret known to the user, such as a username, password, or personal
identification number (PIN)
• A physical characteristic of the user such as a fingerprint, an iris, or a
voice
Secure Remote Access
Remote access has become a key consideration for automation systems for many
reasons, including the following:
• To provide constant access to plant status regardless of geographic
location
• To reduce the risk to personnel by allowing them to work at remote
locations away from potentially hazardous facilities or processes
• To facilitate more flexible working arrangements for employees
There are various types of remote access in automation systems. The most
common are:
• Use of an automation system application on a laptop, desktop, tablet, or
smartphone to perform normal system functions from a location outside
the main facilities (control room, plant)
• Access to a web-based, read-only view of automation system data, served
from a system separate from the main system
• Access to PLCs, RTUs, or other devices to remotely program or monitor
operation
• Vendor access to process data for equipment maintenance or service
purposes
Technology, such as better communications devices and improved software, has
made this objective more feasible. However, it is necessary to design remote
access solutions carefully to manage both safety and security.
Remote Access Risks
Remote access for legitimate purposes opens up the same connections that
would be used for unauthorized purposes. Remote access can improve efficiency
and responsiveness, but it can also allow a new set of issues to arise. When
providing remote access for vendors, it can be difficult to control who has access
and from where. Many vendors operate globally, and agreements may not
prescribe who is allowed to work on an asset owner’s system and from where.
Also, there may be no agreement on what background checks have been
performed on these individuals. There may be no restrictions on when vendors
can access systems, which systems they can access, and what they can do with
that access.
These risks can be mitigated by strict enforcement of remote access policies and
controls. Asset owners should limit remote access as much as possible. This
includes limiting who can access, what they can access, when they can access,
from where they can access, and what they can do with that access.
Selecting Remote Communications Technology
Selection of technology for remote communications should be based on the
following factors.
Location
Rural sites with limited cellular or broadband coverage may require an alternate
solution such as satellite or radio. The following points should be considered
when selecting the communications medium.
• Broadband should offer the best bandwidth but may be expensive to
install, especially in remote rural areas.
• Cellular, satellite, and radio can be deployed in a wide variety of areas
with minimal infrastructure, but have limitations:
○ Cellular is not currently available everywhere. The available service
may not provide the required bandwidth needed for remote access
connections.
○ Radio requires a line of sight to a receiver station. Although repeaters
are available to extend reach, these can be cost prohibitive for a remote
access solution.
○ Satellite needs only a line of sight to the sky, but bandwidth is
currently limited or extremely expensive.
Availability and Redundancy
Before choosing a remote access solution, ensure the requirements/expectations
are clearly defined. Remote access 24/7/365 is essential for operational
management. This will require a different level of technical support than a
system that is only required to supplement normal procedures on-site.
Key considerations to meet defined availability requirements are
Security
Security is a key aspect of remote access solutions; however, the actual security
controls required vary depending on the remote access requirements. For
instance, read-only access to a separate website is a lower security risk than full
user access to the automation system itself.
Key security considerations for remote access solutions include the following:
Procedural Controls
• All remote access activities involving changes to automation systems, or
associated devices (e.g., PLC, RTU), should be only conducted under an
approved permit to work. The permit should identify the planned
activities, the associated risks, and any additional controls required.
• No remote access activity should be permitted if the risk of a remote-
connection failure would leave the facility unsafe or in an out-of-service
state.
• Remote access for particular tasks may require a specific type of
connection. For instance, a cellular connection may be less reliable than a
broadband connection.
• Formal, defined support schedules should be available to all involved.
These document who should be connecting at any particular time.
Network Monitoring
Network monitoring is a broad term that includes
Network monitoring, IDS, and IPS tools are used extensively in IT networks.
However, care must be taken when deploying these tools in automation system
networks.
• Some tools can generate significant additional traffic. This traffic can
affect the operation of automation system equipment that depends on
deterministic or near-real-time responses. Passive tools that listen only to
data are available to alleviate this issue.
• Often, the operation of automation systems is not well understood. The
tools can produce misleading results and false positives. This creates
associated inhibition of functionality.
These issues make the use of such tools in automation systems challenging. The
benefits should be weighed against the following challenges:
• Many standard firewall configurations evolve over time and are not well
managed. As a result, there can be obsolete rules, or rules that are
incorrectly implemented. Network monitoring and IDS cannot be fully
effective if firewall rules are not correct.
• In Figure 5-16, the automation system fails when the industrial firewall is
connected, so the firewall remains disconnected. Network monitoring
and IDS may detect unauthorized commands, but nobody will be able to
recognize them if they cannot connect to the firewall.
• There are many connections and associated devices that will not be
detected by network monitoring tools. Figure 5-26 shows a pair of serial
devices that provide a critical operational interface.
Network monitoring, IDS, and IPS tools may have a place in automation
systems, but before they are deployed it is essential that:
• The operation of all automation systems is clearly understood and
documented. This includes defining all protocols, commands, and
registers necessary for operation.
• Other controls are properly implemented. This includes the correct
configuration and testing of all standard firewalls. It also includes
industrial firewall features, such as only allowing specific commands and
registers, and logging all other events.
• All equipment is properly hardened. This includes activities noted earlier
in this chapter, in particular, disabling or removing unnecessary services
or programs that might generate unwanted traffic.
• Procedures are in place to regularly review log files and investigate
suspicious activity.
With these elements in place, it is possible that network monitoring, IDS, and
IPS could be useful aids in the monitoring process.
Cybersecurity Incident Response Plan
When a cybersecurity incident occurs, an incident response (IR) plan must be
initiated. Incident response plans must cover all the failure scenarios considered
in the network design. The incident response plan will define
• recovery objectives;
• roles, responsibilities, and levels of authority;
• communications procedures and contact information;
• locations of emergency equipment and supplies; and
• locations of spares and tools.
The incident response plan must identify the recovery objectives for each
essential function in the automation system. There are two key recovery
objectives to identify:
1. The recovery time objective (RTO) – Defining how long the function
can be out of service
2. The recovery point objective (RPO) – Defining how much data can
be lost in the event of a failure
Near Misses
As discussed in Chapter 3, “Creating Effective Policy,” cybersecurity incidents
are like safety incidents in that near misses occur. These near misses are leading
indicators of issues requiring attention. For example, a vendor uses its own USB
drive to install software on an automation system workstation. This action may
not infect the workstation, but this near miss is a failure to follow correct
procedures. Not recording the near miss can lead to further procedural failures
and, eventually, a cybersecurity incident. Recording the near miss should trigger
a review, which may involve retraining users, issuing warnings to vendors, or
other actions. The process of dealing with the near miss provides feedback on
behavior to help avoid future failings.
Server (DCS, SCADA, historian, Disk image On change Based on RPO (e.g.,
etc.) Application/database daily/weekly)
Workstation Disk image On change
Automation device (RTU, PLC, Program/configuration On change
etc.) file(s)
Network device (switch, router, Configuration file(s) On change
firewall, etc.)
Each equipment type should have its own procedure that describes the specific
steps taken to perform the backup. Note that some automated backup may be
available, either at a system level (entire system) or device level (e.g., via a PLC
programming environment).
Backup files can be large. Transferring them over a network can be time-
consuming and interfere with other network operations. It may be necessary to
transfer files during quiet periods.
When backups are taken by vendors or service providers, an asset owner will
need a different set of verification procedures to check that the vendor or service
provider is taking the backups, testing them, and checking them for malware.
Manual Procedures
As noted in Chapter 2, “What Makes Industrial Cybersecurity Different?,”
policies and procedures are a critical element of good cybersecurity
management. Fortunately, personnel at facilities operating automation systems
are accustomed to following procedures. These environments are hazardous and
following procedures can mean the difference between life and death.
• Require all site visitors to take a cybersecurity induction that covers the
key cybersecurity rules.
• Require all personnel to complete formal training, including ongoing
security awareness, and update this training annually to keep up with
evolving threats, vulnerabilities, and mitigations.
• Require that all changes involve backups of equipment before and after
the change.
• Require that all changes follow a formal change-control procedure that
includes updating and approving all documentation.
• Require that all files are transferred using a secure, approved method.
System Availability
The terms availability, reliability, maintainability, and redundancy are often
used incorrectly or interchangeably.
Maintainability measures the ease with which a product can be maintained and is
an essential element for successful operations.
The level of complexity in the redundancy design can vary considerably. Safety-
critical systems may have triplicated components. Operating under a “voting”
system, decisions are made based on the status of two out of the three (usually
written 2oo3) components. Some systems are dual redundant but still have 2oo3
voting from three separate sets of input/output (I/O) and instruments/actuators.
If the probability of the individual events (e.g., primary supply failure) are
known, it is possible to calculate the overall probability of the scenario (e.g., loss
of view). This probability can then be used to define availability figures for each
scenario.
The fault tree method can be used to model modifications to the system design
(e.g., the addition of a backup communications option) to determine the effect on
availability.
Designing for System Availability
Power
Most automation system sites use an uninterruptible power supply (UPS) to
ensure continuous power to equipment. The UPS monitors incoming power,
detects problems, and automatically switches over to battery backup. The battery
is charged continuously while the primary power supply is available. Larger sites
with bigger demands require stand-alone generators (e.g., diesel) to provide
backup power.
To ensure reliable power to meet system availability targets, consider the
following:
Failure of the primary power supply should trigger a process putting the facility
into a safe state before the backup power supply is engaged. This process may
include some level of support from the automation system and should be
captured in the relevant process control narrative.99
Communications Networks
There are several network topologies that can be deployed to meet various
availability requirements:
Support Contracts
Support contracts can have a significant impact on automation system
availability. Key factors that must be in place for any support contract SLA are
as follows:
Other Considerations
Internet Protocol Addressing
An IP address uniquely identifies a device on an IP network. There are two
standards for IP addressing, IPv4 and IPv6. In IPv4, the address is made up of 32
binary digits, or bits, which can be divisible into a network portion and host
portion. The 32 bits are broken into four octets (1 octet = 8 bits). The value in
each octet ranges from 0 to 255 decimal, or 00000000 to 11111111 binary. Each
octet is converted to decimal and separated by a period (dot), for example,
172.16.254.1. This is shown in Figure 5-29.
Figure 5-29. Basic structure of an IP address.
In the early days of the Internet, the network and host portions of the address
format were created to allow for a more fine-grained network design. The first
three bits of the most significant octet of an IP address were defined as the class
of the address. Three classes (A, B, and C) were defined for addressing. As
shown in Figure 5-30, in class A, 24 bits of host addressing allows for
16,777,216 (224) unique addresses. In class B, only 16 bits of host addressing are
available, reducing the number of unique addresses to 65,536 (216). In class C,
only 256 (28) unique addresses are possible because there are only 8 bits for the
host address.
Figure 5-30. Classes of IP address.
In IPv4, the CIDR notation is written as the first address of a given network
followed by the bit-length of the network portion of the address. For example,
192.168.1.0/24 means that there is an address range that starts at 192.168.1.0 and
has 256 unique addresses up to 192.168.1.255 (the /24 signifies that the network
portion of the address is 24 bits, leaving 8 bits for the host address, which yields
28 or 256 addresses).
IPv6 uses a 128-bit address that allows 2128, or approximately 3.4 × 1038,
addresses. The CIDR notation for IPv6 addresses is similar to that for IPv4
addresses. For example, the IPv6 address 2001:db8::/32 denotes an address
block starting at 2001:0db8:0000:0000:0000:0000:0000:0000 with 296 addresses
(having a 32-bit routing prefix denoted by /32 leaving 96 bits for host addresses).
This is shown in Figure 5-31.
• Many vendors have their own preferred IP addressing schemes. This can
include the allocation of large blocks of address ranges that are then no
longer available for use in the wider network. In one case, an automation
system vendor insisted on allocating the network 10.0.0.0/8 to one of its
networks. This equates to 16 million unique addresses. This made it easy
for the vendor to allocate addresses to new devices as needed.
Unfortunately, this meant the end user was unable to access the 10.0.0.0
network range for other devices in their facility and was forced to change
to a different scheme. This issue was not addressed in early requirements
specification or contract phases.
• Some systems still use host files100 for the reconciliation of host name
and IP address. This is usually due to legacy factors, in particular the
implementation of automation systems without full IT network features,
such as DNS. It is not a good practice. Changes would require
administrative file access to the relevant machines, which might be
compromised by unauthorized users.
When implementing a facility-level network, it is essential that an IP address
scheme is defined early and included in requirements with vendors.
Requirements should also specify that obsolete methods, such as the use of host
files, are not allowed.
Encryption
Encryption transforms data so that it is unreadable. Even if someone gains
access to the data, they cannot read it unless they decrypt it. The data to be
encrypted, also called plaintext or cleartext, is transformed using an encryption
key. The encryption key is a value that is combined with the original data to
create the encrypted data, also called ciphertext. The same encryption key is
used at the receiving end to decrypt the ciphertext and obtain the original
cleartext.
In addition to ensuring data is not read by the wrong people, encryption protects
data from being altered in transit and verifies the sender’s identity.
In symmetric encryption, the key used to encrypt and decrypt the message must
remain secure, which explains the alternate name private-key encryption.
Anyone with access to the encryption key can decrypt the data. Using symmetric
encryption, a sender encrypts the data with the key, sends the data, and the
receiver uses the same key to decrypt the data. This is shown in Figure 5-32.
Asymmetric encryption uses two keys, one for encryption and one for
decryption. This is shown in Figure 5-33. The encryption key is known as the
public key. It is freely available to everyone to encrypt messages. This is why
asymmetric encryption is also known as public-key encryption.
Figure 5-33. Symmetric encryption.
Asymmetric key systems ensure a high security level, but their complexity
makes them slower and computationally more demanding than symmetric key
encryptions. Hybrid encryption systems use symmetric and asymmetric systems,
combining the advantages of the two. Hybrid systems have the safety of the
public key and the speed of the symmetric key.
In the hybrid system, a public key is used to safely share the symmetric
encryption system’s private key. The actual message is then encrypted using that
key and sent to the recipient.
ISASecure
The ISA Security Compliance Institute (ISCI) is a nonprofit organization that has
developed several product certification programs for IACSs and the components
of these systems.101 These programs are based on certification around the
ISA/IEC 62443 Series, Security for Industrial Automation and Control Systems.
Although many vendors have SDLA, SSA, and CSA certification, at the time of
this writing, it is still not common for asset owners to demand certified vendors,
systems, or components. As noted at the beginning of this chapter, there are
substantial benefits to building facilities around certified vendors and products,
just as there are with hazardous-area certified equipment.
The main driver for vendors to obtain certification is market pressure. Few
vendors will take the initiative to invest in certification without a business case
for a return on that investment. Unfortunately, many asset owners still do not
fully understand automation systems security. Many security questionnaires in
requests for proposal include questions oriented entirely around information
security, such as the following:102
• Are you certified and/or audited to any information security or quality
standards such as ISO/IEC 27001, ISO 9001, SAS 70, or PCI DSS?
• Will any <asset owner> information be stored, processed, or accessed
from outside of <country>?
• What security controls are in place to keep <asset owner> systems and
data separate from other client data?
• Will access to <asset owner> information held on your systems be able to
be gained via a remote connection?
These questions are important, but without asking for ISA/IEC 62443
certification or any details of automation systems-related controls, there is no
requirement for vendors to learn about or pursue them.
When asset owners finally demand certified automation system vendors and
products, the business case will be clear and vendors will comply. This
compliance will greatly improve the inherent security of automation systems
products.
Summary
Despite the general awareness of cybersecurity risks, many asset owners and
vendors are still not providing or maintaining secure automation systems.
Although some automation vendors have begun developing their own secure
architectures, and some have obtained third-party certification, there is still much
to be done.
Asset owners that are security-aware have developed their own internal
automation systems security standards. They have been designing and reviewing
solutions from vendors. The lack of consistency of approach, even within asset-
owner organizations, introduces additional cost while failing to achieve the most
secure outcome.
Even with the most secure solution feasible, an asset owner will still experience
cyber incidents. Being prepared for these, with proven tested incident response
and disaster recovery plans, supported by backup and recovery processes, will
make the difference between a minor and major outage or incident.
____________
70 The National Electrical Code (NEC) defines hazardous-area classifications in the United States (NEC
Article 500). An NEC hazardous-area classification consists of several parts: the class, group, and
division. Worldwide, outside the United States, IEC standard IEC 60079 defines hazardous-area
classifications using class and zone (this classification method is known as ATEX, an abbreviation of
the French atmosphères explosibles).
71 ANSI/ISA-62443-4-2 defines the requirements for component products; these can be embedded
devices, host devices, network devices, and software applications.
72 ANSI/ISA-62443-3-3 defines the requirements for an IACS system based on security level.
73 ISASecure System Security Assurance (SSA) certifies that products have the capability to meet the
requirements in ANSI/ISA-62443-3-3 and have been developed in accordance with a Security
Development Lifecycle Assurance (SDLA) program. ISASecure Component Security Assurance
(CSA) certifies that component products have the capability to meet the requirements in ANSI/ISA-
62443-4-2 and have been developed in accordance with an SDLA program.
74 ANSI/ISA-62443-3-3 (99.01.01)-2013, Security for Industrial Automation and Control Systems –
Part 3-3: System Security Requirements and Security Levels (Research Triangle Park, NC: ISA
[International Society of Automation]).
75 Listed in ANSI/ISA-62443-3-3.
76 The diagram is for illustrative purposes. The number of components in each system will vary
depending on facility requirements.
77 Theodore J. Williams, The Purdue Enterprise Reference Architecture: A Technical Guide for CIM
Planning and Implementation (Research Triangle Park, NC: Instrument Society of America, 1992).
78 PERA Enterprise Integration (website), Gary Rathwell, accessed June 21, 2021, http://www.pera.net/.
79 IEC 62264-1:2013, Enterprise-Control System Integration (Geneva 20 – Switzerland: IEC
[International Electrotechnical Commission]).
80 ISA-62443-1-1-2007, Security for Industrial Automation and Control Systems – Part 1-1:
Terminology, Concepts, and Models (Research Triangle Park, NC: ISA [International Society of
Automation]).
81 Williams, The Purdue Enterprise Reference Architecture, 146.
82 ANSI/ISA-62443-1-1-2007, Security for Industrial Automation and Control Systems.
83 ISA-62443-1-1-2007, Security for Industrial Automation and Control Systems, 60.
84 “Is the Purdue Model Dead?”
85 ”Industry 4.0,” University of West Florida (website), accessed June 21, 2021,
https://uwf.edu/centers/haas-center/industrial-innovation/industry-40/.
86 Williams, The Purdue Enterprise Reference Architecture, 144.
87 The WannaCry incident involved exploiting a vulnerability in Microsoft Windows and resulted in
over 230,000 computers in 150 countries being infected with ransomware. Timothy B. Lee, “The
WannaCry Ransomware Attack Was Temporarily Halted. But It’s Not Over Yet,” Vox, May 15, 2017,
accessed June 21, 2021, https://www.vox.com/new-money/2017/5/15/15641196/wannacry-
ransomware-windows-xp.
88 As noted earlier, to maintain resilience, a local NTP service is provided, so there is no NTP traffic
required from the DMZ to the Corporate Zone.
89 This book uses the newer convention of server and client. This convention was adopted by the
Modbus Organization on July 9, 2020. See https://www.modbus.org/docs/Client-ServerPR-07-2020-
final.docx.pdf for further details.
90 The specific ports are shown for example only and are not intended to reflect particular products or
solutions or any changes in products or solutions after this book is published.
91 Good firewall configuration procedures require the association of unique names with IP addresses to
improve the readability of a ruleset.
92 FortiGate 7060E chassis. Fortinet, “FortiGate® 7000E Series FG-7060E, FG-7040E, and FG-7030E
Datasheet,” accessed June 28, 2021, https://www.fortinet.com/content/dam/fortinet/assets/data-
sheets/FortiGate_7000_Series_Bundle.pdf.
93 Tofino Argon 100 security appliance. Tofino, “Argon Security Appliance Data Sheet,” DS-TSA-
ARGON, Version 5.0, accessed June 28, 2021, https://www.tofinosecurity.com/sites/default/files/DS-
TSA-ARGON.pdf.
94 There are additional function codes specified in the Modbus protocol. Some vendors have their own
function codes for product-specific features. The commands specified here are commonly used by
most systems.
95 This is an example only and does not reflect any particular vendor solution.
96 Fortum, “Siemens Carried Out First Remote Start-Up and Adjustment Work in Russia at Nyagan
GRES,” accessed June 21, 2021, https://www.fortum.com/media/2020/06/siemens-carried-out-first-
remote-start-and-adjustment-work-russia-nyagan-gres.
97 This approach may lead to inconsistencies, with different vendors approving different patches or
signatures, or not adequately testing against all patches before approving. Application control may
provide a more consistent approach if implemented systematically.
98 Geofencing is the use of location-based services to locate users, and that information is used to make
decisions, in this case to provide remote access. Margaret Rouse, “What Is Geo-Fencing
(geofencing)?” WhatIs.com, accessed June 21, 2021.
99 A process control narrative, or PCN, is a functional statement describing how automation system
components should be configured and programmed to control and monitor a particular process,
process area, or facility.
100 A host file is a clear text file stored on a server or workstation that contains a list of hostnames and
associated IP addresses. This is a simplified, but decentralized, version of DNS where network
devices share and update a similar list in real time.
101 See https://www.isasecure.org/en-US/About-Us for more details.
102 These questions are similar to a sample from a real questionnaire from an asset owner. Identifying
details have been removed where applicable.
6
Pitfalls of Project Delivery
Introduction
Most cybersecurity literature and training reference the challenge of applying
cybersecurity controls to legacy equipment. Typical of these references are
“There is a large installed base of SCADA [supervisory control and data
acquisition] systems, ranging from current levels of technology back to
technologies from the 1980s (and possibly older),”103 and “In the long run,
however, there will need to be basic changes in the design and construction of
SCADA systems (including the remote terminal units—RTUs) if they are to be
made intrinsically secure.”104
One might think that a project involving a new facility, or an upgrade to an
existing facility, would present the ideal opportunity to resolve this challenge.
Unfortunately, this is not the case. Despite the widespread awareness of the
cybersecurity threat and the availability of standards, certified products, certified
professionals, and collective experience, systems are being deployed that lack
the most basic security controls. In addition, the projects themselves create
additional security vulnerabilities due to poor training, awareness, and oversight
among personnel. In addition, a focus on efficiency and cost reduction means
that many of the duties involved in managing cybersecurity are added to existing
workloads, rather than to dedicated professionals with the right mix of skills and
knowledge.
The key factors required to correct these issues are
• secure senior project leadership support,
• embed cybersecurity throughout the project,
• embed cybersecurity requirements in all contracts,
• raise awareness within the project team, and
• implement rigorous oversight processes.
Asset owners with a strong cybersecurity posture will have adopted standards or
policies that dictate how projects deliver secure solutions. However, even with
this level of control, it is possible that cybersecurity management could be more
effective.
One key factor is project execution. Large infrastructure projects use a form of
contract called engineering, procurement, and construction (EPC). This contract
is between an asset owner and a contractor. Well-known EPC contractors include
Bechtel, Black & Veatch, Burns & McDonnell, Fluor, McDermott, Saipem,
Worley, and Wood. EPC contracts are referred to as turnkey because the
contractor provides a complete facility for the asset owner who only needs to
turn a key to start it up.
Because EPC contracts cover the entirety of the project life cycle and the
delivery of the entire project scope, responsibility for cybersecurity design and
governance should be explicitly included. Unfortunately, at the time of this
writing, EPC contracts typically do not identify cybersecurity as a major
element. This is largely due to the small financial value involved. This oversight
can result in a lack of ownership of issues during the project and creates the
potential for gaps in the final deliverables.
Table 6-1 shows the typical EPC project stages together with key cybersecurity
considerations for each stage.
Conceptual engineering Cybersecurity risk comparison for high-level logical design options
Feasibility
As previously noted in Chapter 2, EPC projects tend to run for many years. For
the systems and networks being implemented, cybersecurity is more than a
design or assurance issue105. During the feasibility stage, the project team should
consider the risks related to each subsequent phase of the project. This includes
the time to deploy controls, procedural or technical, to manage these risks.
Establishing a solid foundation of good governance, as part of a comprehensive
cybersecurity management system, will reduce delays to a project. Putting this
system in place early may prevent a cybersecurity incident or last-minute
technical issues.
Consider the December 2018 Shamoon 3 cyberattack that targeted service
providers in the Middle East. EPC contractor Saipem suffered significant
disruption at locations in the Middle East, India, Aberdeen, and Italy. According
to a Reuters report, up to 400 servers and 100 workstations were crippled by the
attack.106 The impact was not limited to Saipem. Its customers all over the world
suffered disruption to their projects as they were forced to take action to prevent
being drawn into the attack. Such actions included disabling user accounts and
removing access to systems. These preventive measures forced projects to find
workarounds until the incident was satisfactorily addressed. These workarounds
can introduce new security vulnerabilities that require additional attention to
avoid increasing exposure to attack. Saipem customers invested significant time
and effort investigating the incident to determine their exposure. Sensitive
information may have been exfiltrated, and accounts may have been
compromised. This weeks-long investigation distracted employees and drew
down resources needed for the actual EPC project.
If Saipem and its customers recognized this risk during the early stages of the
project, they could have developed defenses such as awareness training,
monitoring, and joint incident response plans. Although these mitigations may
not have prevented the attack, they would have reduced the impact to the project.
Engineering
The engineering phase of the project focuses heavily on the design of
construction elements, for example, the fabrication of a vessel or oil and gas
platform, or the construction of a treatment plant.
During this phase, automation system vendors refine the details of their solution
in several ways:
• Defining the list of data, including the instrument type, location, and data
type
• Defining the control strategy
• Conducting or contributing to hazard and operability studies (HAZOP),
including a control system HAZOP (CHAZOP) that focuses specifically
on failures of the control system
• Identifying interfaces with other systems
• Designing the physical network arrangement
• Designing the cabinet arrangement and cabling details
Decisions made in the engineering phase can have significant impacts later in the
project or during operations. For example, there is a misconception that isolating
automation system equipment from other networks addresses cybersecurity
threats to that system. Obviously, this is not the case. Isolated systems are
exposed to many cybersecurity threats, including the use of uncontrolled
removable media. Furthermore, the isolation of automation systems creates
operational challenges that reduce cybersecurity posture. For example, operating
system patches and anti-malware updates must be transferred manually using
removable media, rather than through secure network-based mechanisms. As
discussed, manual processes are vulnerable to failure.
Therefore, cybersecurity should be treated as a key design consideration during
the engineering phase. Issues that must be considered are as follows:
• Updating equipment with patches during operation – This must
include consideration of how vendor-approved patches can be delivered
to the equipment and how they can be applied with minimal operational
disruption.
• Maintaining anti-malware protection – This may involve the use of
application control, or antivirus software. In either case, the vendor will
advise on configuring the equipment to work with the protection.
• Maintaining, testing, and restoring backups – Automation systems do
not typically need frequent backups. Still, backups of machines must be
available in the event of a disaster situation. A proven process for
restoring these backups is also needed. A failure to consider a practical
means for storing and retrieving backups, or a failure to practice the
process, can result in extended periods of downtime.
• Managing user access – Many automation systems have elementary
access control features such as simple Windows Workgroup accounts. In
many cases, these accounts are shared between users. A facility with
multiple automation systems is difficult to manage effectively.
Periodically changing passwords, and updating user accounts for joiners,
movers, and leavers will involve manual processes vulnerable to failure.
• Managing remote access – Many automation system vendors require
some level of remote access. This access might provide a data stream
from the system to allow condition monitoring or to allow remote
diagnosis during failure situations. Many projects only address this
requirement in the operations phase. That is when the facility is added to
the vendor’s support agreement with the asset owner. At that point, there
is likely no physical space to house the necessary equipment, nor cabling
in place to enable the remote access. As a result, compromises are made.
Equipment is stored where space allows, rather than where it should be
located. Cables are run to accommodate this equipment, bypassing the
necessary segregation controls that were put in place for the rest of the
network. Remote access requirements are discussed further in Chapter 5,
“Standardized Design and Vendor Certification.”
A common failure during project design is not standardizing on shared resources.
This is discussed in detail in Chapter 5. Automation system vendors may have
their own solutions for backup, patch management, anti-malware, and user
access. However, in a facility with multiple vendor systems, the asset owner
should define these features and require the vendors to use them. Otherwise, the
asset owner must manage multiple solutions during the operations phase.
Construction
There are two major issues relating to cybersecurity during the construction
phase of a project:
1. Management of change
2. Incident response preparedness
Management of Change
Despite the best efforts of everyone involved in the engineering phase, errors and
omissions will occur that must be corrected. A typical example is the need to run
additional cables to accommodate system connections.
Often, some requirements are omitted during the project phase. For instance,
equipment required for vendor remote access to its system may not be included
in the project scope because it is considered part of a separate maintenance
contract. As a result, changes may be needed to accommodate this equipment
later, as well as additional cabling to provide connectivity.
Changes may not involve omissions related to known requirements. Due to the
long-term nature of the project, new requirements may arise. For example, the
asset owner may incorporate new equipment to support additional production
capacity.
For the automation vendors, this phase of the project can last well over 12
months. During this time, numerous individuals from the asset owner, EPC
contractor, and the automation system vendor will come into contact with the
automation system equipment.
Basic cyber hygiene tasks, such as anti-malware protection, backup, and
electronic access management, should be performed during the construction
phase, but this is not always the case. This negligence can be attributed to poor
cybersecurity awareness, limited oversight, and minimal contractual obligations.
Many automation system vendors assume that because the equipment is not
operational, these tasks are not necessary. These important steps are often seen
as time-consuming and overly cautious for equipment that is still under
development. However, the risk to the project timescale and associated cost of
neglecting basic cyber hygiene is significant. For example:
• A failure to maintain regular backups could result in a loss of several
days, or even weeks, of progress in the event of a cybersecurity incident.
• A failure to maintain rigorous electronic access control, especially with
respect to joiners, movers, and leavers, can lead to a compromise of
systems.
• As with operational automation systems, there is a misunderstanding that
because these systems are not usually directly connected to the Internet,
they are not at risk from external threats. These systems are often
indirectly connected to the Internet,107 for example, through a connection
to the office network allowing developers to work from their desks.
When the automation system vendor is operating from a temporary
facility, there is an even greater chance of indirect connectivity through
poorly managed temporary firewalls.
• Even with no connection to the Internet, there is a major risk that systems
could be compromised from within. This is especially true with poor
management of removable media. This risk is increased when secure file
transfer facilities are not provided. Developers will need a secure means
to transfer files to and from servers and workstations on the automation
system.
The plan must identify incident handling procedures and categorize these
procedures for four stages:
The deferral of activities, such as testing, to later in the project may save time up
front. However, putting off testing can lead to problems that may cause delays.
Resulting incidents may require last-minute, high-risk workarounds or changes
that are not properly documented. Better planning and execution of testing
earlier in the project should avoid the need for major changes during
commissioning.
Red-Team Assessment
A red-team assessment is an important tool in the verification of cybersecurity
posture. The assessment gets its name from military wargaming, where conflicts
are simulated between an aggressor, the red team, and a defending force, the blue
team. Red-team assessments in cybersecurity involve experts attempting to
achieve a target, such as access to a certain machine or other resource. The
exercise identifies vulnerabilities that can then be addressed. There are other
methods of identifying vulnerabilities, such as penetration testing. Red-team
assessments, if conducted properly, reflect realistic scenarios that may occur.
These assessments identify vulnerabilities in technology, people, or processes.
Although the commissioning phase is hectic, it is an opportune time to conduct a
red-team assessment. It is likely impractical to conduct such an assessment
earlier in the project. Prior to commissioning, many of the systems and networks
are not fully operational. For similar reasons, the scope of testing security
controls during factory acceptance testing (FAT) may be limited and still not
fully representative of the final facility. For instance, the physical security
element of a red-team assessment is not indicative of the actual controls that will
be in place. However, a red-team assessment also provides realistic training for
the operations personnel acting as the blue team in the exercise.
Typical objectives for an automation system red-team assessment might be as
follows:
• Gain remote access to the safety engineering workstation. This would
test whether this critical device is properly segregated on the network. It
will also indicate if the device is adequately protected by access controls.
These controls include username and password as well as a second factor
that requires physical presence, such as a fingerprint or key card.
• Gain local access to a control system HMI that allows set-point changes.
This would test physical security, including locked rooms and cabinets. It
also tests physical access to local factors, such as key cards. Figure 6-4
shows a Red-team member testing physical security controls as part of an
assessment during a construction project.
Figure 6-4. Red team testing physical security controls during an assessment.
Start-Up
The highest profile milestone in any project is start-up. Start-up is the
culmination of the project and highly symbolic. In some cases, completion of the
project may be strategically significant to the organization. Any delay may have
a negative impact on share price. As a result, there is a great deal of focus on the
start-up date. Start-up is the last chance to eliminate any cumulative delays
created during the project. There will be significant pressure from management
to make up this lost time and not miss the start-up date. As we saw with
commissioning, time pressure can cause important steps to be skipped during the
start-up phase.
Even without such attention, a new facility is vulnerable during the early stages
of operation. With a new facility and new systems, operators and technicians will
not be familiar with normal behavior and will be slower to identify abnormal
situations. Training and incident response exercises throughout a project prepare
personnel for start-up and beyond. The project’s incident response plan will be
updated to reflect changes in circumstance and include new threats.
This emphasis on the early stages of a project creates a significant risk of poor
quality or incomplete as-built documentation:
• Late and over-budget project teams must cut costs and reduce hours. An
obvious place to start is final deliverables, especially if the savings
outweigh the investment required.
• As the project nears completion, team members begin to disperse,
moving on to new projects or roles. They take with them the knowledge
needed to verify documentation.
There is an international standard associated with asset planning that could help
address the data challenge.
ISO 15926 is the standard for data integration, sharing, exchange, and
handover.108 There is an initiative based on this standard, led by the International
Association of Oil & Gas Producers (IOGP), called Capital Facilities
Information Handover Specification (CFIHOS).109 CFIHOS utilizes the ISO
15926 definitions for a common language, format, and exchange of data.
Utilizing the standard and the CFIHOS initiative has the potential to standardize
the sharing of information across industries and projects. Ultimately, this would
mean projects need only specify conformance to this standard. That would be a
vast improvement over the current practice of projects defining their own
standards and methods, many of which EPCs and other stakeholders may not
completely follow.
In some sectors where cybersecurity posture is higher (e.g., oil and gas), asset
owners provide a set of cybersecurity requirements to contractors and then
conduct assessments on deliverables to confirm that these requirements are being
met.
One important consideration for EPC projects is that the EPC will issue its own
contracts to subcontractors and vendors. Asset owners should therefore ensure
that a contract with the EPC stipulates what cybersecurity conditions must be
passed on to subcontractors and others working for the EPC.
Key considerations for cybersecurity in contracts are as follows:
• Explicit milestones, deliverables, and payments related to cybersecurity.
Examples include successful completion of design review(s), successful
red-team assessment, and closure of cybersecurity punch-list items or
actions. Payment terms for these milestones and deliverables must be
significant enough that the EPC or vendor is motivated to complete them.
• Quality assurance of handover documentation. As noted earlier, data
handed over in projects is often of poor quality. At the end of a project,
the EPC or vendor may not be willing to invest the additional time
required to clean up the data.
• Project-related cybersecurity activities. This would include a
cybersecurity incident response plan that addresses how the EPC or
vendor will deal with a cybersecurity incident on the project. The
contract should explicitly state that the EPC or vendor is responsible for
maintaining the cybersecurity posture of all equipment during the project
life cycle. This includes the patch status of the operating system for
servers and workstations, and the awareness of the EPC or vendor’s
employees and contractors.
Some asset owners now specify certified secure products in their contracts. The
popularity of this approach will continue to grow, as a standard provides an
independent means of assessing the security of products.
Chapter 7, “What We Can Learn from the Safety Culture,” covers awareness in
detail.
• requirement verification,
• risk and issue management, and
• performance management.
Verification of Requirements
Contracts must include key requirements, but including them does not ensure the
requirements will be met.
The viability of the questionnaire depends on the quality of the assessor, the
knowledge of the responder, and the availability of the information needed to
answer the questions. These questionnaires are often performed late in the
project life cycle. That means there is more information available, but it may be
too late in the project to address nonconformances.
A better approach is to verify that the product vendor has completed the
questionnaire before any contract is let. The vendor can also be asked to provide
responses such as compliant, optional at extra cost, and not compliant.
It would still be necessary to validate that what is delivered meets the original
requirements. That validation is easier to accomplish based on information
provided by the vendor. The National Cyber Security Centre has produced
compliance guidelines for Operators of Essential Services (OES), and Appendix
B of these guidelines provides a helpful checklist that can be used if no specific
questionnaire exists.110
Performance Management
Effectively tracking the performance of a contractor or vendor is critical to the
success of a project. EPC projects typically use S-curves as a visual
representation of planned and actual progress. Figure 6-5 shows a simple
example of planned working hours per month (bars) and cumulative hours (line).
The S-curve gets its name from the fact that the cumulative hours line is S-
shaped. Depending on what is being tracked, the bars and lines might represent
other metrics, such as deliverables (number of HMI screens completed, number
of cabinets assembled, etc.).
Figure 6-5. Example S-curve.
The shape of the S-curve represents what should happen, in terms of progress, at
any stage in the project: a slow start, ramping up to peak activity, followed by a
decline as the remaining work tapers off toward completion.
The reality is often quite different. Figure 6-6 shows an example of plan versus
forecast for a project activity. For various reasons, projects take longer to start
than planned. The usual response is to show a forecast where more work is done
later (or backloaded) to achieve the planned target date.
Summary
Today projects that deliver new automation systems or enhancements to existing
systems routinely introduce new cybersecurity vulnerabilities in organizations.
In addition, the projects themselves contain vulnerabilities that can impact the
organization. The lack of understanding of cybersecurity risks is a major factor,
as is the failure to correctly manage cybersecurity. This chapter has identified the
key factors for successfully managing cybersecurity:
There are many things that organizations can leverage to improve results,
including the following:
____________
103 William T. Shaw, Cybersecurity for SCADA Systems (Tulsa, OK: PennWell Corporation, 2006), 389.
104 Shaw, Cybersecurity for SCADA Systems, 390.
105 Also called front-end engineering design, or FEED.
106 Stephen Jewkes and Jim Finkle, “Saipem Says Shamoon Variant Crippled Hundreds of Computers,”
Reuters, December 12, 2018, accessed June 21, 2021, https://www.reuters.com/article/us-cyber-
shamoon/saipem-says-shamoon-variant-crippled-hundreds-of-computers-idUSKBN1OB2FA.
107 According to a 2019 Dragos report, 66% of incident response cases involved adversaries directly
accessing the Industrial Control System (ICS) network from the Internet. Dragos, “2019 Year in
Review,” accessed June 21, 2021, https://www.dragos.com/wp-
content/uploads/Lessons_Learned_from_the_Front_Lines_of_ICS_Cybersecurity.pdf.
108 POSC Caesar Association, “An Introduction to ISO 15926,” November 2011, accessed June 21,
2021, https://www.posccaesar.org/wiki/ISO15926Primer.
109 Capital Facilities Information Handover Specification, International Association of Oil & Gas
Producers (IOGP), “More About CFIHOS,” accessed June 21, 2021, https://www.jip36-
cfihos.org/more-about-cfihos/.
110 National Cyber Security Centre (NCSC), “NIS Compliance Guidelines for Operators of Essential
Service (OES),” accessed June 21, 2021,
https://www.ncsc.gov.ie/pdfs/NIS_Compliance_Security_Guidelines_for_OES.pdf.
7
What We Can Learn from the
Safety Culture
Introduction
Cybersecurity awareness training is a common tool employed by many
organizations. What constitutes awareness training and who receives it can vary
considerably. Any cybersecurity awareness training is better than none, but
training designed for those in information technology (IT) environments is not
sufficient for those in operational technology (OT) environments. This
distinction is lost in many organizations where training, like other aspects of
cybersecurity, is managed by the IT function. Such generic training neglects the
operational and cultural differences in OT facilities.
This chapter will identify the operational and cultural differences between an IT
and an OT environment. Taking these differences into account, it will explore the
essential elements of cybersecurity awareness training and monitoring required
for an OT environment.
The Importance of Awareness
Visit any OT facility today and you will likely find several obvious cybersecurity
policy violations or bad practices. Typical examples include the following:
Figure 7-1. Control room console with user credentials visible on a permanent label.
Underestimating Risk
Skepticism is a significant factor in poor cybersecurity preparedness. By now,
most individuals are familiar with one or more high-profile cybersecurity
incidents; some even may have been impacted by one. Despite this growing
awareness, it seems many people in OT environments feel cybersecurity is not
their problem. Their views fall into one of two camps:
1. The likelihood of a cyber incident is low because either the
organization is not a target, or it has not happened in the past.
2. The consequence of a cyber incident is low because many layers of
protection are in place.
As noted in Chapter 4, “Measure to Manage Cybersecurity Risk,” it is essential
that organizations take a different approach to estimating the likelihood and
consequences of cybersecurity risk in OT environments.
Empirical probability calculates a likelihood based on historical data. It is not
well suited to providing future estimates when the historical data is sparse.
Bayes’s theorem is routinely used to estimate risk in finance (the risk of lending
money to new borrowers) and medicine (the accuracy of a medical test). It can
be used to estimate the likelihood of an event in situations with sparse historical
data.
Figure 7-2 shows the typical organization risk matrix from Chapter 4. The
typical underestimate of likelihood and consequence results in risks in the lower
left of the matrix. In cybersecurity risk assessment, underestimating likelihood
occurs because there is a lack of historic data and an assumption that it has not
happened in the past so it will not happen in future. Underestimating
consequence occurs because of the failure to adequately take into account the
process-based risk. Realistic estimates of likelihood, using statistical methods,
and consequence, using process-based risk assessment, will almost certainly
result in risks moving into the lower right of the matrix. The difference is
dramatic, moving from low to extreme according to the definitions in this
particular risk matrix.115
Figure 7-2. Applying realistic estimates of likelihood and consequence changes risk level.
Human Error
In an article titled “The Sorry State of Cybersecurity Imagery,” Eli Sugarman
and Heath Wickline note that images online are “all white men in hoodies
hovering menacingly over keyboards, green Matrix-style 1s and 0s, glowing
locks and server racks, or some random combination of those elements—
sometimes the hoodie-clad men even wear burglar masks. Each of these images
fails to convey anything about either the importance or complexity of the
topic.”116 It is therefore no surprise that most people’s perception of a
cybersecurity incident is limited.
The potential for human error, and the limitations in mitigating this risk,
highlight the importance of people in cybersecurity. No amount of technology or
procedures can completely mitigate the potential for a human-initiated
cybersecurity incident. This is borne out by an analysis of the initiating cause of
high-profile cybersecurity incidents. Almost without exception, a human was
involved, clicking on a link in a phishing email, failing to deploy patches, failing
to follow removable media procedures, and so on.
Unfortunately, many organizations still try to address the cybersecurity challenge
by deploying more and more technology and creating more and more rules. They
do not recognize or address the significance of humans. Strict rules may provide
the appearance that cybersecurity is under control. A bowtie diagram with many
additional barriers will help support this argument. However, this approach can
actually lead to complacency on the part of the individuals. Workers assume that
adequate controls are in place, or that cybersecurity is someone else’s job. Even
individuals aware of the importance of cybersecurity may consider the threat
mitigated by physical security, technical controls, and procedures. This attitude
may result in a more casual approach to other controls, such as limiting
electronic access and the use of removable media. A common example of this in
OT environments is the deployment of universal serial bus (USB) locks on
equipment such as HMIs or servers. Personnel will claim these controls are not
necessary because the equipment is in locked cabinets inside secure rooms. In
fact, it is not unusual to discover these rooms are not adequately secured, and the
cabinets are left with keys in the locks. Even the keys themselves are commonly
available, and the same key can usually open most cabinets from the same
manufacturer.
Jessica Barker is the CEO of Cygenta, a cybersecurity provider specializing in
assessment and awareness services. In her book Confident Cyber Security, she
warns against the phrase “users are the weakest link.” Instead, she argues that we
should do more to understand the challenges users face and identify what causes
them to take these risks.121
To highlight this issue, Barker presents a case study: A finance administrator
receives an email from the company’s chief executive officer (CEO) instructing
him to transfer funds to an account. This transfer is urgently needed to secure the
acquisition of a new business. The finance administrator feels pressured to
respond promptly and transfers the funds without any further validation.122 This
may seem unlikely, yet this CFO fraud, or whaling,123 is very real. Since 2013,
more than $12 billion has been lost to whaling in the United States, United
Kingdom, and Europe.124 In one high-profile incident, a finance executive at the
toy company Mattel transferred $3 million to cybercriminals. Mattel had a
procedure where wire transfers required two signatures. The signature of the
newly appointed CEO had been forged, and the finance executive did not do any
further validation.125
In her book, Barker points to economists Richard Thaler and Daniel Kahneman
for an insight into what drives otherwise rational, intelligent people to make such
glaring mistakes. Thaler and Kahneman have separately won the Nobel Prize in
economics for their research into behavioral economics and decision-making.
Each has identified two systems of thinking in human behavior: Thaler calls
them the Automatic System and Reflective System;126 Kahneman describes them
as Fast and Slow.127 The automatic/fast system is an important element to
thinking. It allows for rapid, autonomous reactions such as stepping out of the
path of an approaching car. The reflective/slow system is more deliberate and
involves complex decision-making. The application of these two systems creates
several heuristics and biases. One that is directly applicable to cybersecurity is
the availability heuristic. In this case, decisions are influenced by experience.
Our perception of the consequences of an action are influenced by whether we
recall those consequences. There are many other relevant considerations that
help explain why people make bad decisions. These should be factored into
policies, processes, and procedures, as well as cybersecurity awareness training.
It is not enough to expect people to make good decisions. Organizations must be
prepared for poor decisions as well.
Jessica Barker notes that “the burden for security often falls to the end user.” She
goes on to say that it is “not fair to ask people to add security” in the same way
that we “do not ask people to make sure the soft furnishings they buy are fire
resistant or the car they rent has been safety-tested.”129
Reason says organizations should not think they are safe because there is no
information to say otherwise. This mind-set leads to less concern about poor
work practices or conditions. It may even reduce unease about identified
deficiencies in layers of protection. The same thinking should be applied to
cybersecurity. As noted earlier in this chapter, underestimation of cybersecurity
risk and overconfidence in the layers of protection lead to a lack of concern. Left
unchecked, this lack of concern can be expressed in the acceptance of bad
practices such as use of unapproved removable media, leaving cabinets
unlocked, leaving controllers open to remote programming, and poor account
management.
The more cybersecurity is embedded into the safety culture, the more likely it is
to be adopted as an integral part of operations rather than an afterthought.
The First Line of Defense
The phrase “users are the weakest link” underestimates the challenges users face
in dealing with cybersecurity. Organizations recognize that badly trained users
operating with a lack of procedures, tools, or management support are likely to
initiate most, if not all, cybersecurity incidents.
Users may be the weakest link, but they are the first line of defense for any
organization. To be effective, they must be aware of the critical role they play.
The blocks in the ACM should not be considered in isolation. Being effective in
automation systems cybersecurity requires the skills and knowledge defined in
other blocks in the ACM. Furthermore, most organizations have a mixture of IT
and OT personnel who have some role to play in automation systems
cybersecurity. Each role will need a minimum standard of competency that
covers a variety of areas.
To demonstrate this, consider Table 7-1. It provides a simplified competency
matrix with a set of competency areas (found in the ACM). These have been
categorized as information technology, operational technology, and emerging
technology. Some generic job roles have been provided to demonstrate the
mapping.
Each organization should produce its own competency matrix. The following are
key considerations when doing so:
Automation system product vendors (as well as system integrators and other
service providers) must understand cybersecurity so that they can design
products securely. Their organization must have secure development procedures
in place to validate that products are secure. These development procedures will
also improve the rigor in development and testing, providing a higher quality,
more reliable solution.
One way to provide assurance of this competence is to purchase independently
certified secure products and systems from certified vendors. Compliance with
ANSI/ISA-62443-2-1 requires that vendors be adequately trained and follow
rigorous processes.
Continuous Evaluation
Training and competency are not one-time exercises. Personnel need continuous
learning to ensure they are aware of changes to policies and procedures, risks,
and mitigating controls. In addition, there must be a system of monitoring to
ensure that training is effective.
The organization can now create a trend line for PICO, which, in this example,
shows a steady increase toward three credible phishing incidents per month. This
number should be more meaningful to individuals. It is similar in structure to
well-known safety HiPo (high-potential) events they might be familiar with.
Summary
Cybersecurity is constantly in the news, so it may seem reasonable to believe
that people have a good awareness of the cybersecurity risks their organizations
face. However, evidence indicates otherwise as incidents continue to occur. This
trend is primarily driven by people failing to enforce good cybersecurity
management practices. Even in regulated industries, organizations still fail to
meet cybersecurity management requirements. This is largely due to personnel
not following procedures, and a lack of oversight and enforcement by
management.
A major factor underpinning this problem is that people underestimate their
organization’s cybersecurity risk. This might be because they have no
quantitative means of measuring the likelihood or consequence of an incident. It
can also be a result of complacency, believing that the other layers of protection
will prevent any serious consequences.
Having considered the negative aspects of people and their role in cybersecurity
incidents, note that the same people are also the first line of defense. This is not
limited to personnel within the organization. The entire supply chain is full of
individuals who can be effective controls in the management of cybersecurity, as
long as they are trained to be aware of the following:
• Why cybersecurity is so important
• How a cybersecurity incident can lead to a serious safety or operational
issue
• What cybersecurity controls are in place, and what happens if they fail
• What part each person plays in maintaining a good cybersecurity posture
for the organization
____________
111 See https://www.nerc.com/pa/comp/CE/Pages/Actions_2019/Enforcement-Actions-2019.aspx for
details.
112 Rebecca Smith, “Duke Energy Broke Rules Designed to Keep Electric Grid Safe,” Wall Street
Journal, updated February 1, 2019, accessed June 21, 2021, https://www.wsj.com/articles/duke-
energy-broke-rules-designed-to-keep-electric-grid-safe-11549056238.
113 A “forward-looking statement” is a recognized term in US business law that is used to indicate, for
example, plans for future operations or expectations of future events.
114 Duke Energy News Center, “Duke Energy Reaffirms Capital Investments in Renewables and Grid
Projects to Deliver Cleaner Energy, Economic Growth,” July 5, 2020, accessed June 21, 2021,
https://news.duke-energy.com/releases/releases-20200705-6806042.
115 This is for illustrative purposes only. Specific likelihood, consequence, and risk values will vary, but
the deviation is likely to be as dramatic when more rigorous methods are used.
116 Eli Sugarman and Heath Wickline, “The Sorry State of Cybersecurity Imagery,” July 25, 2019,
accessed May 12, 2022, https://hewlett.org/the-sorry-state-of-cybersecurity-imagery/.
117 Repository of Industrial Security Incidents, “2013 Report on Cyber Security Incidents and Trends
Affecting Industrial Control Systems, Revision 1.0,” June 15, 2013, available by request from RISI,
https://www.risidata.com/
118 It is unclear whether this increase is due to better measurement or more human error, or both.
119 IBM Security, IBM X-Force Threat Intelligence Index 2020, 8, accessed June 21, 2021,
https://www.scribd.com/document/451825308/ibm-x-force-threat-intelligence-index-2020-pdf.
120 IBM Security, IBM X-Force Threat Intelligence Index 2020.
121 Jessica Barker, Confident Cyber Security (London: Kogan Page Limited, 2020), 91.
122 Barker, Confident Cyber Security, 92–93.
123 Whaling is a method of targeting high-profile employees of organizations, not just chief financial
officers, and derives its name from the big catch during phishing.
124 Dante Alighieri Disparte, “Whaling Wars: A $12 Billion Financial Dragnet Targeting CFOs,” Forbes,
December 6, 2018, accessed May 12, 2022,
https://www.forbes.com/sites/dantedisparte/2018/12/06/whaling-wars-a-12-billion-financial-dragnet-
targeting-cfos/?sh=7d0da85a7e52.
125 Darren Pauli, “Barbie-Brained Mattel Exec Phell for Phishing, Sent $3m to China,” The Register,
April 6, 2016, accessed May 12, 2022,
https://www.theregister.com/2016/04/06/chinese_bank_holiday_foils_nearperfect_3_million_mattel_fleecing
126 Richard H. Thaler and Cass R. Sunstein, Nudge: Improving Decisions About Health, Wealth, and
Happiness (New Haven, CT: Yale University Press, 2008).
127 Daniel Kahneman, Thinking, Fast and Slow (New York: Farrar, Straus and Giroux, 2011).
128 A. Ertan and G. Crossland, Everyday Cyber Security in Organizations (Royal Holloway University of
London, 2018), 23.
129 Barker, Confident Cyber Security, 69.
130 James Reason, “Achieving a Safe Culture: Theory and Practice,” Work & Stress: An International
Journal of Work, Health and Organisations 12, no. 3 (1998): 302, accessed June 21, 2021,
https://www.tandfonline.com/doi/abs/10.1080/02678379808256868.
131 E. S. Geller, “10 Leadership Qualities for a Total Safety Culture,” Professional Safety, May 2020,
accessed June 21, 2021,
http://campus.murraystate.edu/academic/faculty/dfender/OSH650/readings/Geller—
10%20Leadership%20Qualities%20for%20a%20Total%20Safety%20Culture.pdf.
132 ”It’s Up to Me” and the extract from “I Chose to Look the Other Way” are reprinted with the
permission of the author, Don Merrell. Contact Don Merrell at [email protected] to inquire
about the use of his poems or to comment on their impact.
133 Career Onestop Competency Model Clearing House, “Automation Competency Model,” accessed
June 21, 2021, https://www.careeronestop.org/competencymodel/competency-
models/automation.aspx.
134 My thanks to Collin Kleypas for the original idea.
135 I am indebted to Don Merrell for providing this modified verse from his poem “It’s Up to Me.” It is
reprinted with the permission of the author, Don Merrell. Contact Don Merrell at
[email protected] to inquire about the use of his poems or to comment on their impact.
8
Safeguarding Operational
Support
Introduction
One of the distinguishing features of operational technology (OT) is the
operational life of the equipment. Information technology (IT) is refreshed every
18 months to 3 years to keep pace with the demands of users and their
applications. Conversely, OT equipment is designed for a specific, limited set of
functions. Once deployed, there is little desire to change it. Recall the adage, “If
it ain’t broke, don’t fix it,” from Chapter 2, “What Makes Industrial
Cybersecurity Different?” In fact, the high-availability environments where OT
exists create a unique operational support culture, one that does not lend itself to
good cybersecurity management. Shortcomings include the following:
He then notes, “Things are changing; slowly, but they’re changing. The risks are
increasing, and as a result spending is increasing.”136
In his blog, Schneier also discusses why vendors “spend so little effort securing
their own products.”
We in computer security think the vendors are all a bunch of idiots, but they’re behaving completely
rationally from their own point of view. The costs of adding good security to software products are
essentially the same ones incurred in increasing network security—large expenses, reduced
functionality, delayed product releases, annoyed users—while the costs of ignoring security are minor:
occasional bad press, and maybe some users switching to competitors’ products. The financial losses to
industry worldwide due to vulnerabilities in the Microsoft Windows operating system are not borne by
Microsoft, so Microsoft doesn’t have the financial incentive to fix them. If the CEO of a major
software company told his board of directors that he would be cutting the company’s earnings per
share by a third because he was going to really—no more pretending—take security seriously, the
board would fire him. If I were on the board, I would fire him. Any smart software vendor will talk big
about security, but do as little as possible, because that’s what makes the most economic sense.140
This too is changing.
In this example, the organization used this tool for communication at four levels
within the company:
On its own, this barrier representation can be helpful, especially if the status of
the barriers can be determined by reading data from operational systems.
The real power of this approach comes if the barrier representation shown in
Figure 8-2 is updated to include a cybersecurity barrier. Now the barrier
representation reviewed at all four levels in the aforementioned organization
clearly shows the status of cybersecurity at the facility. The question Are we still
safe to operate? must now include the status of cybersecurity.
People Management
As noted throughout this book, technology is not the only facet of the
cybersecurity challenge. People and processes are equally as important as
technology, and in some cases more important.
Many assume the threat of cybersecurity attack comes from individual hackers,
organized crime, or nation-states. In fact, for most organizations, the greatest
threat of cybersecurity incidents comes from inside the organization itself. This
threat comes in two forms:
Background Checks
Employers need a means of verifying the integrity and honesty of their
employees. In general, people with a history of honesty are more likely to be
honest in the future. Conversely, applicants who lie to obtain a job are more
likely to be dishonest once they have the job. Interviews alone may not be
sufficient to weed out dishonest applications.
Background may help. Background checks can verify information such as past
employment and education. These checks may involve searching relevant public
or private databases, such as driving records, criminal histories, or credit reports.
The depth of the background check should be appropriate to the role being filled.
Background checks must be conducted in accordance with relevant employment
laws. In some jurisdictions, for instance, it is illegal to ask about a criminal
record on an application form. For this reason, background checks should be
performed by professional and competent individuals or organizations.
Separation of Duties
Separation of duties is one means of maintaining employee oversight. Separation
of duties involves ensuring that more than one person is required to complete a
particular task where safety or security might be at risk. This approach reduces
the risk of fraud, theft, and human error. Also known as the four-eyes principle
(each process involves two people), typical separation of duties may involve the
following:
Where it is not possible or practical to separate duties, for example in very small
organizations, alternate controls should be in place. These include
• audit trails to track who took what action, and when, and
• periodic supervisory reviews of audit trails and other records to verify all
tasks are being performed as expected.
The information recorded in audit trails, and the frequency of reviews should
match the level of risk involved. It may be too late to take corrective action if the
review frequency is too low. In some cases, the individuals may have left the
organization, or the consequences are felt before the cause is known.
Even with the four-eyes principle in place, periodic reviews are essential. In
2005, the US Food and Drug Administration (FDA):
…carried out an inspection of Able Laboratories, a New Jersey–based generic pharmaceutical
manufacturer between May 2 and July 1, 2005. As a result of finding discrepancies between paper and
electronic records in the analytical laboratory and due to the firm’s failure to investigate out-of-
specification (OOS) results, the company ceased manufacturing operations, recalled 3,184 batches of
product (its entire product line) and withdrew seven Abbreviated New Drug Applications (ANDAs).
The resulting problems and a failure to resolve the issue with the FDA resulted in a USD 100 million
bankruptcy filing in October 2005 and a fire sale of the company’s assets.147
When someone joins an organization, their data access needs should be clear.
There should be a formal process to arrange this access. That process should
incorporate the four-eyes principle to avoid misuse.
Once someone is in a role, a periodic review process will ensure access is still
required. Changes should be made with immediate effect, and records should be
kept of the review and any actions arising. Accurate records allow for periodic
audits to ensure the processes are being followed.
The greatest risk to any organization is posed by leavers, especially if they are
disgruntled. The cases of Adam Flanagan and Vitek Boden, mentioned earlier in
this chapter, personify this risk. In both cases, the individuals were fired under
acrimonious circumstances. Each of them had administrator-level access to
business-critical systems: Flanagan to radio-base stations used by his former
company’s customer base, including numerous water authorities and sewage
treatment plant control systems. Boden stole a laptop containing the control
system human-machine interface (HMI) software and a radio. With these, he
made at least 46 attempts to turn on sewage pumps between March and April
2000.
The most significant process failure, in all three cases, was the failure to remove
access rights and change account details in response to a leaver, in particular, a
disgruntled one.
These manual procedures, and their documentation, require time and effort. One
reason manual procedures fail is that organizations do not provide the necessary
time, resources, and training. With constant pressure to streamline operations,
any activity not directly related to profit or productivity is at risk. In many cases,
it may be months or years before the effect of these cutbacks is felt. Ironically,
organizations that put the time and effort into designing and supporting manual
procedures actually become more efficient.
The effort to become agile leads organizations to rethink their processes and
procedures. Although many larger organizations could benefit from less
bureaucracy, care must be taken to protect critical processes needed to manage
business risk, including cybersecurity. If agility is treated as an excuse to remove
or short-circuit any and all processes, the result can be chaos.
As a relatively new function, cybersecurity is particularly vulnerable to these
issues. In many organizations, individuals who are already fully allocated are
given the additional task of cybersecurity manager, cybersecurity single point of
accountability, or some other function almost as an afterthought. In automation
environments, this person might be the lead instrumentation and control
technician or engineer in a facility. Although it might make sense for this person
to be given responsibility and oversight, he or she will almost certainly need
additional resources to ensure the necessary tasks are performed. These
resources include the following:
Inventory Management
A key element of successful OT cybersecurity management during operational
support is inventory management. When a product vulnerability is announced,
the first question to answer is: Does this affect my organization, and if so, where,
and how much?
It is impossible to answer this question without an accurate and up-to-date
equipment inventory. An equipment inventory is sometimes called an asset
register or configuration management database. It can be as simple as an Excel
spreadsheet or can be a purpose-made relational database and application. IT and
OT security vendors offer inventory management systems.
IT solutions can work well for IT systems and devices. This equipment tends to
be based on a small number of standard operating systems, which are normally
connected to a network. Most of these cooperate well with asset management
systems, providing information about their configuration and patch status, for
instance.
The same is not true for OT systems and devices. There are several challenges to
using an automated tool to create a reliable OT device inventory:
• The range of device types is much larger and includes many firmware
and software solutions that are not designed to interact with asset
management solutions.
• Many devices that are networked may only respond to the most basic
industrial protocol commands. Rarely do these commands support the
return of configuration information. This is a requirement for an effective
inventory.
• There is no guarantee that devices are accessible on a common
communications network. Many installations will contain serially
connected (RS-232, RS-485, RS-422) devices that only respond to the
aforementioned basic industrial protocol commands.
• In more modern OT networks, there may be industrial firewalls or data
diodes that isolate devices from the wider network. This design limits
communications to very few industrial protocol commands.
Some asset owners avoid these issues by focusing their inventories only on
network-connected devices, or other specific categories such as Windows
devices. This strategy is fundamentally flawed. The compromise or failure of
any interconnected device could cause operational issues. Every device that is
required for the successful operation of the system should be included in the
inventory.
• A unique identification number (to make tracking easier, a label with this
number should be affixed to the device)
• The manufacturer’s make and model number
• Device serial number
• A brief description of the device (e.g., Pump #2 Control PLC, Operator
Workstation #1)
• Location of the device (e.g., cabinet number, room number)
• Version number of the device hardware
• A list of all software installed on the device and all associated version
numbers
• Any address information (e.g., Internet Protocol address, protocol
identifier)
• All configuration and program files for embedded devices such as
programmable logic controllers (PLCs) and remote terminal units (RTUs)
• A photograph of the device and where it is installed
This data should be collected as early as possible in the project and maintained
throughout the life of the project. It should be treated just like any other
controlled document or data source.
Figure 8-5 shows how the proportions typically change as the size of the facility
increases. Note that as the facility size increases, the number of embedded
devices grows, but the number of Windows devices stays relatively fixed. This is
because the control room (where most of the Windows devices are located) does
not grow in direct proportion to the facility size. The number of embedded
devices (PLCs and RTUs) must increase to manage additional process areas.
Figure 8-5. Change in device proportions for varying OT facility sizes.
Personnel who are independent of the system being audited usually undertake
these audits. Findings from the audit should be documented and a follow-up
scheduled to verify that issues have been addressed.
Incident Response
Incident response planning is not just about preparing for the inevitable incident.
Considering plausible scenarios facilitates a review of business risk and the
identification of additional mitigations to reduce this risk.
For example, an incident response review identifies that, in the event of a failure
or compromise of a particular device requiring replacement (e.g., PLC, network
switch), it will take 24 hours to obtain a replacement. The review concludes this
downtime will cost the organization more than the expense of holding a spare.
As a result, the organization may choose to ensure a spare is available on-site.
• The operator was aware that his supervisor and other users routinely used
remote access to view the HMI screen and so did not report the incident.
• At approximately 1:30 p.m. on the same day, the operator noticed a
second remote access to the HMI. This time, the remote user navigated
through various screens and eventually modified the set point for sodium
hydroxide (lye) to a level that would be toxic to humans.
The remote user logged off, and the operator immediately reset the sodium
hydroxide level to normal. The operator then disabled remote access and
reported the incident to the city and to local and state law enforcement. It is
unclear if the operator was following an incident response plan or was just
experienced enough to make the right decisions. City representatives stated at
the press conference that additional controls were in place to prevent exposure of
toxic water to consumers, but they did not describe them in detail.
At the time of this writing, the investigation is still underway, and the culprit,
and his or her intentions, remain unclear. The most likely explanations range
from an authorized user who made the change in error, a disgruntled former
employee or contractor, or a random hacker who discovered the system was
accessible from the Internet. Other options that should not be discounted are
organized crime syndicates or nation-states. The water treatment plant affected
was 15 miles from the Raymond James Stadium in Tampa, Florida, which hosted
the Super Bowl just two days after the incident occurred.
It was fortunate that the city of Oldsmar operator was sufficiently observant and
aware to take immediate action. This prevented catastrophic consequences. It
remains to be seen how well-prepared similar organizations would be.
There are more than 145,000 active public water systems in the United States
(including territories). Of these, 97% are considered small systems under the
Safe Drinking Water Act, meaning they serve 10,000 or fewer people. Public
water systems of the size of the one in the city of Oldsmar (15,000 population)
have limited resources to manage threats to their operations.
Although it resulted in a near miss, the Oldsmar incident highlights gaps in
process and people elements. Closing these gaps could make future events less
likely and the potential consequences less severe:
• The operator observed a remote user several hours before the attempted
set-point adjustment. This did not arouse suspicion because the
supervisors used remote access to monitor the plant. Remote access of
this type must be strictly limited to specific users, from specific
locations, at specific times. The Oldsmar operator should have known
who was accessing the system. If this was not an authorized user, it
should have prompted the operator to activate the incident response plan.
This response would start with disconnecting remote access to the
system. It would then initiate various forensics and tests to determine if
anything had been altered (e.g., code), in either the systems or processes
(e.g., set points, alarms).
• Until the incident was reported, the engineering company that developed
the supervisory control and data acquisition (SCADA) system for the city
of Oldsmar maintained a page on the portfolio section of its website. This
page displayed a screen from the SCADA HMI, providing details of plant
processes (e.g., number of reverse osmosis skids, number of pumps on
each skid). It was easy to see the button that would enable navigation to
the sodium hydroxide page. Such a screenshot is extremely valuable in
terms of planning a potential attack. The page is now deleted, although it
can be found in Internet archives through search tools.
• The deleted page also had a summary of the project, which included the
following description of an automatic control feature: “(Noting that the
engineering company…) worked with the city to create an easy-to-use,
single-button interface. This button resides on the SCADA screen in the
control room and is also accessible through city iPads connected to the
SCADA system. Operators can easily press the button to initiate
automatic control regardless of their location, which is helpful in
emergency situations and during routine site tours.” This raises the
question about what functionality should be accessible remotely. A
common initial reaction to the Oldsmar incident was “There should be no
remote access at all, ever.” This is unrealistic and impractical. Even if
remote access were not available, users would inevitably find their own
less secure solutions. This is a cultural issue because the same users
would not try to circumvent a safety system or safety procedures. In
addition to ensuring remote access is securely designed and limited by
user, location, time, and duration, remote access should offer limited
functionality. The ability to monitor or view may be all that is required
for most users. Although it may be desirable to have an automatic control
switch, is it really necessary? In which circumstances would it be used?
Are these rare enough that the risk outweighs the benefit?
• The incident raises questions about the functionality of the SCADA
system itself. In the aforementioned press conference, the city reported
that the unauthorized remote user attempted to change the set point of
sodium hydroxide from 100 parts per million (ppm) to 11,100 ppm. This
higher level seems to be way outside any normal expected setting. That
prompts the question: “Why would the SCADA system accept such a
setting?” In fact, it is unclear if it did accept the new level. At the press
conference, a city official said the operator reset sodium hydroxide to its
normal level, which implies it was changed. Recall the standard layers of
protection model that has been presented throughout this book. The basic
process control layer’s function is to maintain the process within its
normal, safe, operating envelope. If this safe operating envelope is not
properly defined, the basic process control layer will not perform
correctly, which means the risk transfers to other layers, in this case the
plant personnel/process alarm layer. Limiting the sodium hydroxide
range, restricting who could change it, and from where they could change
it would have been significant mitigation factors in this case. This is why
OT cybersecurity risk quantification is so different from IT risk
quantification. As noted in Chapter 4, the assessment process must
consider the hazards in the process and treat cybersecurity as an initiating
cause.
Note that many of the incident detection tools and methods promoted by IT
vendors (and even some supposed OT vendors) would have done little to help
the city of Oldsmar. Intrusion detection and prevention systems only work if the
unauthorized access can be identified as abnormal. As already mentioned, even
the operator could not determine if the user was authorized. It is unlikely that
any tool would have been able to discern this fact. Likewise, regarding the
change in the sodium hydroxide set point, if the system enables the user to
change the value, then there is no way a detection system could identify this as
an anomaly.
The city of Oldsmar incident provides clear evidence of why cybersecurity
incident response planning is required, and why this planning must take account
of OT factors.
In many cases, the personnel from these organizations are in place so long that
they become indistinguishable from asset-owner personnel. Few asset owners
properly manage the cybersecurity risks arising from these arrangements:
• Third-party computers may not have the necessary security controls (e.g.,
anti-malware protection, application control, user access), yet they may
be connected to business-critical systems or networks.
• Vendors may not have sufficient controls in place to manage user
credentials for their clients’ systems. Examples include having standard
administrator accounts for all client systems, sharing account details, and
not securely protecting these account details.
• Vendors may not have procedures in place to manage system backups.
They must also protect these backups to ensure continuity of operations
for their clients. In the case of cloud environment providers, this
oversight could be catastrophic for the asset owner, as illustrated by the
Blackbaud example earlier in this chapter.
• Suppliers, vendors, and subcontractors may not have adequate security
management systems in place in their organization. Their vulnerability to
cybersecurity incidents exposes the asset owner.
• Suppliers, vendors, and subcontractors may not provide adequate security
awareness training to their personnel. These personnel may be working
in the asset owner’s business-critical environment where this awareness
is essential.
Insurance
There is an established cyber insurance market focused on IT cybersecurity
risks, and insurers and brokers are now developing policies to cover threats to
OT infrastructure. As explained in Chapter 2, OT or industrial cybersecurity is
different, and insurers and brokers are still learning what risks an asset owner is
exposed to from an OT cybersecurity incident. Chapter 4 discussed methods to
measure and manage this risk.
The two high-profile ransomware incidents in early 2021 that are referenced in
the introduction to this book were resolved when insurers negotiated payments.
The asset owners had sufficient insurance coverage to enable payment of large
sums ($4.4 million and $11 million).
Tom Finan of Willis Towers Watson, a global insurance broking company, points
out that “having a cyber insurance policy does not make a company safer.
Instead, an enhanced cybersecurity posture results from going through the cyber
insurance application and underwriting process.”153
Summary
Although OT environments have a different operational support culture from IT
environments, many factors can give OT cybersecurity the management
attention it requires.
As noted throughout this book, technology is not the only element of the
cybersecurity challenge. People and process are critical weak points. Much of
what happens in operational environments revolves around people.
Cybersecurity relies on training and awareness, and the adherence to strict
processes and procedures. Gaps in training and awareness or in processes and
procedures create vulnerabilities that can be as severe as any technical issue.
Incident response is one of the most importance plans to have in place. With the
growth in high-profile cybersecurity incidents and the knowledge of the costs of
dealing with them, it is harder for organizations to ignore the need for good
preparation. There is still work to be done to educate asset owners that good
incident response planning does not begin and end in their own organization.
The use of suppliers, vendors, and subcontractors means that cybersecurity risks,
and their remediation, rely on the cooperation of all parties.
One key control that asset owners can use is contract management. A set of
model clauses that represent good cybersecurity management should be included
in all third-party contracts. These should be nonnegotiable. Any third party that
is not already following these practices should not be in business today.
Although insurance can be a useful tool for an asset owner, it cannot replace
effective identification and proactive management of risk.
____________
136 Bruce Schneier, “Secrets and Lies: Introduction to the Second Edition,” Schneier on Security blog,
accessed June 21, 2021, https://www.schneier.com/books/secrets-and-lies-intro2.
137 Blackbaud, “Cloud Software Built for the World’s Most Inspiring Teams,” accessed June 21, 2021,
https://www.blackbaud.com/.
138 Marianne Kolbasuk McGee, “Blackbaud Ransomware Breach Victims, Lawsuits Pile Up,” BankInfo
Security, Information Security Media Group, September 24, 2020, accessed June 21, 2021,
https://www.bankinfosecurity.com/blackbaud-ransomware-breach-victims-lawsuits-pile-up-a-15053.
139 Maria Henriquez, “Blackbaud Sued After Ransomware Attack,” Security magazine, November 6,
2020, accessed June 21, 2021, https://www.securitymagazine.com/articles/93857-blackbaud-sued-
after-ransomware-attack.
140 ”Schneier, “Secrets and Lies: Introduction to the Second Edition.”
141 David E. Sanger, Nicole Perlroth, and Eric Schmitt, “Scope of Russian Hack Becomes Clear:
Multiple U.S. Agencies Were Hit,” New York Times, December 14, 2020, accessed June 21, 2021,
https://www.nytimes.com/2020/12/14/us/politics/russia-hack-nsa-homeland-security-pentagon.html.
142 Tom Kemp, “What Tesla’s Spygate Teaches Us About Insider Threats,” Forbes, July 19, 2018,
accessed June 21, 2021, https://www.forbes.com/sites/forbestechcouncil/2018/07/19/what-teslas-
spygate-teaches-us-about-insider-threats/?sh = 4a09507c5afe.
143 Ben Popken, “Facebook Fires Engineer Who Allegedly Used Access to Stalk Women,” NBC News,
May 1, 2018, accessed June 21, 2021, https://www.nbcnews.com/tech/social-media/facebook-
investigating-claim-engineer-used-access-stalk-women-n870526.
144 Iain Thomson, “US Engineer in the Clink for Wrecking Ex-Bosses’ Smart Meter Radio Masts with
Pink Floyd lyrics,” The Register, June 26, 2017, accessed June 21, 2021,
https://www.theregister.com/2017/06/26/engineer_imprisoned_for_hacking_exemployer/.
145 Tony Smith, “Hacker Jailed for Revenge Sewage Attacks,” The Register, October 31, 2001, accessed
June 21, 2021, https://www.theregister.com/2001/10/31/hacker_jailed_for_revenge_sewage/.
146 H. Boyes, Draft Code of Practice for Cyber Security in the Built Environment, Institution of
Engineering and Technology, Version 2.0, January 31, 2021.
147 R. D. McDowall, “Quality Assurance Implications for Computerized Systems Following the Able
Laboratories FDA Inspection,” Quality Assurance Journal 10 (2006): 15–20.
148 Willem Z.’s full name was not given in any of the online records of this incident.
149 ”Sewer Hack Committed Via Admin and Test Accounts” (“Rioolhack gepleegd via admin- en
testaccounts”), AG Connect, September 14, 2018, accessed June 21, 2021,
https://www.agconnect.nl/artikel/rioolhack-gepleegd-admin-en-testaccounts.
150 Alexander Martin, “Garmin Obtains Decryption Key After Ransomware Attack,” Sky News, July 28,
2020, accessed June 21, 2021, https://news.sky.com/story/garmin-obtains-decryption-key-after-
ransomware-attack-12036761.
151 ”Treatment Plant Intrusion Press Conference,” YouTube, accessed June 21, 2021,
https://www.youtube.com/watch?v = MkXDSOgLQ6M.
152 National Cyber Security Centre, “Supply Chain Security Guidance,” accessed June 21, 2021,
https://www.ncsc.gov.uk/collection/supply-chain-security.
153 Tom Finan and Annie McIntyre, “Cyber Risk and Critical Infrastructure,” Willis Towers Watson,
March 8, 2021, accessed June 21, 2021, https://www.willistowerswatson.com/en-
US/Insights/2021/03/cyber-risk-and-critical-infrastructure.
154 Finan and McIntyre, “Cyber Risk and Critical Infrastructure.”
9
People, Poetry, and Next Steps
____________
155 Once again, my thanks to Don Merrell for providing this modified verse from his poem “It’s Up to
Me.” It is reprinted with his permission. Contact Don Merrell at [email protected] to inquire
about the use of his poems or to comment on their impact.
Bibliography
Ertan, A., and G. Crossland, C. Heath, D. Denny, and R. Jensen. Everyday Cyber
Security in Organizations. Royal Holloway: University of London, 2018.
Evans, Jack. “Someone Tried to Poison Oldsmar’s Water Supply during Hack,
Sheriff Says.” Tampa Bay Times, February 9, 2021. Accessed June 21, 2021.
https://www.tampabay.com/news/pinellas/2021/02/08/someone-tried-to-
poison-oldsmars-water-supply-during-hack-sheriff-says/.
“2017 Equifax Data Breach.” Wikipedia. Accessed June 21, 2021.
https://en.wikipedia.org/wiki/2017_Equifax_data_breach.
Lockheed Martin Corporation. “The Cyber Kill Chain.” Accessed June 21, 2021.
https://www.lockheedmartin.com/en-us/capabilities/cyber/cyber-kill-
chain.html.
McCoy, Kevin. “Target to Pay $18.5M for 2013 Data Breach that Affected 41
Million Consumers.” USA Today. Updated May 23, 2017. Accessed June 21,
2021. https://www.usatoday.com/story/money/2017/05/23/target-pay-185m-
2013-data-breach-affected-consumers/102063932/.
McGee, Marianne Kolbasuk. “Blackbaud Ransomware Breach Victims,
Lawsuits Pile Up.” BankInfo Security, Information Security Media Group,
September 24, 2020. Accessed June 21, 2021.
https://www.bankinfosecurity.com/blackbaud-ransomware-breach-victims-
lawsuits-pile-up-a-15053.
McDowall, R. D. “Quality Assurance Implications for Computerized Systems
Following the Able Laboratories FDA Inspection.” Quality Assurance
Journal 10 (2006): 15–20.
McLeod, Dr. Saul. “Maslow’s Hierarchy of Needs.” Updated December 29,
2020. Accessed June 21, 2021.
https://www.simplypsychology.org/maslow.html.
Mustard, Steve. Mission Critical Operations Primer. Research Triangle Park,
NC: ISA (International Society of Automation), 2018.
The Mitre Corporation. “ATT&CK for Industrial Control Systems.” Accessed
June 21, 2021. https://collaborate.mitre.org/attackics/index.php/Main_Page.
Nakashima, E., Y. Torbati, and W. Englund. “Ransomware Attack Leads to
Shutdown of Major US Pipeline System.” Washington Post, May 8, 2021.
Accessed June 21, 2021.
https://www.washingtonpost.com/business/2021/05/08/cyber-attack-colonial-
pipeline/.
National Cyber Security Centre (NCSC). “Supply Chain Security Guidance.”
Accessed June 21, 2021. https://www.ncsc.gov.uk/collection/supply-chain-
security.
National Cyber Security Centre (NCSC). “NIS Compliance Guidelines for
Operators of Essential Service (OES).” Accessed June 21, 2021.
https://www.ncsc.gov.ie/pdfs/NIS_Compliance_Security_Guidelines_for_OES.pdf
NIST. “Components of the Cybersecurity Framework.” Presentation, July 2018.
https://www.nist.gov/cyberframework/online-learning/components-
framework.
North Carolina State University and Protiviti. “Illuminating the Top Global
Risks in 2020.” Accessed June 21, 2021. https://www.protiviti.com/US-
en/2020-top-risks.
“Order Granting Final Approval of Settlement, Certifying Settlement Class, and
Awarding Attorney’s Fees, Expenses, and Service Awards.” Equifax Data
Breach Settlement. Accessed June 21, 2021.
https://www.equifaxbreachsettlement.com/admin/services/connectedapps.cms.extensions/1
4491-4976-bc7b-
83cccaa34de0_1033_EFX_Final_Approval_Order_(1.13.2020).pdf.
Occupational Health and Safety Hub. “Quick Safety Observation Card – Free
Template.” https://ohshub.com/quick-safety-observation-card-free-template/.
Pauli, Darren. “Barbie-Brained Mattel Exec Phell for Phishing, Sent $3m to
China.” The Register, April 6, 2016. Accessed May 12, 2022.
https://www.theregister.com/2016/04/06/chinese_bank_holiday_foils_nearperfect_3_millio
Popken, Ben. “Facebook Fires Engineer Who Allegedly Used Access to Stalk
Women.” NBC News, May 1, 2018. Accessed June 21, 2021.
https://www.nbcnews.com/tech/social-media/facebook-investigating-claim-
engineer-used-access-stalk-women-n870526.
POSC Caesar Association. “An Introduction to ISO 15926.” November 2011.
Accessed June 21, 2021. https://www.posccaesar.org/wiki/ISO15926Primer.
Prince, Brian. “Researchers Detail Critical Vulnerabilities in SCADA Product.”
Security Week, March 13, 2014. Accessed June 21, 2021.
https://www.securityweek.com/researchers-detail-critical-vulnerabilities-
scada-product.
Rathwell, Gary. PERA Enterprise Integration (website). Accessed June 21, 2021.
http://www.pera.net/.
RSAC Contributor. “The Future of Companies and Cybersecurity Spending.”
Accessed June 21, 2021. https://www.rsaconference.com/library/Blog/the-
future-of-companies-and-cybersecurity-spending.
RiskBased Security. “2020 Year End Report: Data Breach QuickView.”
Accessed June 21, 2021. https://pages.riskbasedsecurity.com/en/en/2020-
yearend-data-breach-quickview-report.
Reason, James. “Achieving a Safe Culture: Theory and Practice.” Work &
Stress: An International Journal of Work, Health and Organisations 12, no. 3
(1998): 302. Accessed June 21, 2021.
https://www.tandfonline.com/doi/abs/10.1080/02678379808256868.
Sanger, David E., N. Perlroth, and E. Schmitt. “Scope of Russian Hack Becomes
Clear: Multiple U.S. Agencies Were Hit.” New York Times, December 14,
2020. Accessed June 21, 2021.
https://www.nytimes.com/2020/12/14/us/politics/russia-hack-nsa-homeland-
security-pentagon.html.
“Sewer Hack Committed Via Admin and Test Accounts” (“Rioolhack gepleegd
via admin- en testaccounts”). AG Connect, September 14, 2018. Accessed
June 21, 2021. https://www.agconnect.nl/artikel/rioolhack-gepleegd-admin-
en-testaccounts.
Smith, Rebecca. “Duke Energy Broke Rules Designed to Keep Electric Grid
Safe.” Wall Street Journal. Updated February 1, 2019. Accessed June 21,
2021. https://www.wsj.com/articles/duke-energy-broke-rules-designed-to-
keep-electric-grid-safe-11549056238.
Smith, Tony. “Hacker Jailed for Revenge Sewage Attacks.” The Register,
October 31, 2001. Accessed June 21, 2021.
https://www.theregister.com/2001/10/31/hacker_jailed_for_revenge_sewage/.
Spitzner, Lance. “Security Awareness Maturity Model.” Blog, January 1, 2019.
Accessed June 21, 2021. https://www.sans.org/security-awareness-
training/blog/security-awareness-maturity-model/.
“S.I. No. 360/2018 – European Union (Measures for a High Common Level of
Security of Network and Information Systems) Regulations 2018.” Electronic
Irish Statute Book. Accessed June 21, 2021.
http://www.irishstatutebook.ie/eli/2018/si/360/made/en.
SP 800-82 Rev. 2. Guide to Industrial Control Systems (ICS) Security.
Gaithersburg, MD: NIST (National Institute of Standards and Technology),
2015. Accessed June 21, 2021.
https://csrc.nist.gov/publications/detail/sp/800-82/rev-2/final.
Thomson, Iain. “US Engineer in the Clink for Wrecking Ex-Bosses’ Smart Meter
Radio Masts with Pink Floyd lyrics.” The Register, June 26, 2017. Accessed
June 21, 2021.
https://www.theregister.com/2017/06/26/engineer_imprisoned_for_hacking_exemployer/
Thompson, Mark. “Iranian Cyber Attack on New York Dam Shows Future of
War.” Time, March 24, 2016. Accessed June 21, 2021.
https://time.com/4270728/iran-cyber-attack-dam-fbi/.
Further Reading
Barker, Jessica. Confident Cyber Security: How to Get Started in Cyber Security
and Futureproof Your Career. London: Kogan Page Limited, 2020, ISBN
978-1789663426.
Marszal, Edward, and McGlone, Jim. Security PHA Review for Consequence-
Based Cybersecurity. Research Triangle Park, NC: ISA (International Society
of Automation), 2019.
Langer, Ralph. “To Kill A Centrifuge: A Technical Analysis of What Stuxnet’s
Creators Tried to Achieve.” Arlington, VA: The Langner Group, November
2013. Accessed June 21, 2021. https://www.langner.com/wp-
content/uploads/2017/03/to-kill-a-centrifuge.pdf.
Useful Resources
Infracritical (http://infracritical.com/). Infracritical is an organization founded by
Bob Radvonsky, Jake Brodksy, Tammy Olk, and Michael Smith, internationally
recognized experts in the field of industrial cybersecurity. Infracritical provides
two resources: