Confident DevOps - The Essential Skills and Insights For DevOps Success
Confident DevOps - The Essential Skills and Insights For DevOps Success
Mark Peters
To my wonderful, loving wife and beautiful
daughter, without whom none of my success
would be possible
CONTENTS
Foreword
Acknowledgements
Introduction
My approach to DevOps
Every person who approaches DevOps finds a unique approach
to success. It is like when one prepares to scale a summit such
as Mount Everest. Every basecamp starts at the same location
and proceeds with the same tools, but the path to the top is
always different. This book will help prepare your basecamp
for later success. In preparing, I always find images and well-
defined words to be extremely useful. My own DevOps
basecamp diagram appears here, and then the details will be
explained throughout the book, especially in the later sections.
FIGURE 0.1 DevOps basecamp
Introduction
The term DevOps is relatively new in terms of technology,
emerging within the past 15 years, but the embedded concepts
have existed for quite some time. DevOps stands for
Development and Operations and embodies the concept that
software developers should also be responsible for operating it.
This concept emerges from the idea that builders know better
than anyone else how to deliver on what they have promised.
Linking individuals who know how to create the most effective
solutions emerges from a history of Just-in-Time and Lean
manufacturing. These things focus on minimizing waste while
maximizing productivity, and DevOps accomplishes the same
functions for software. At its heart, DevOps focuses on three
concepts: flow, feedback and improvement.
FIGURE 1.1 DevOps venn diagram
History
Programme management
It would not be entirely fair to look at DevOps without putting
the initial history within the context of programme
management. DevOps has proven to be more than just another
form of programme or project management but a cultural
influence.2 The specific implications will appear throughout
this work but for now the assumption will be that DevOps has
some roots in the programme management theories.
Ever since humans have tried to succeed in business,
someone has been trying to figure out a better way, an
opportunity to do it faster, cheaper, and still maintain value.
Scholars generally see programme management theory
development as four separate historical distinctions: prior to
1958, from 1958 to 1979, 1980 to 1994 and from 1995 to the
present.3 Each era has different goals and strategies based on
the historical perspective. The first era appears as a transition
during the industrial age, led by Henry Ford, interchangeable
parts and the assembly line. The next era sees the growth of the
military-industrial complex and the need to migrate project
strategies from the government to the commercial world. The
third era begins to take advantage of software interactions and
move from manual to electronic outcomes, allowing the
implementation of lighter-weight methodologies. Finally, the
modern era is the growth of software development, the era we
are interested in, in the need to manage continuous
development and deployment versus a simplified product end
state.
Era 1 (prior to 1958) is characterized by tools such as the
Gantt Chart, Critical Path Method, Precedence Diagramming
Method and the Project Evaluation Review Technique. All of
these are very manually intensive ways to depict programme
success, emphasizing observability of all possible events rather
than a DevOps flow. Many still use the Gantt chart within
DevOps but my professional opinion is that it overcomplicates
and focuses too much on end-dates rather than continuously
delivering products.
This stage focused heavily on gathering data about project
events and streamlining what was necessary through critical
paths without the later Lean focus on eliminating waste.
Defining projects in this era were the Manhattan Project (1942–
1945), the project plan for the Pacific Railroad (1857) and the
Hoover Dam (1931–1936).4
The second era begins the transition from large government
projects to the commercial and corporate model. Appearing in
this era were tools like the Work Breakdown Structure (WBS),
conflict management and some initial iterative project
planning. The WBS creates a more complicated version of the
Gantt chart, detailing who is responsible for every step,
outlining costs for those items and even further locking down
delivery dates. Conflict management focuses more specifically
on humans, a critical area for DevOps. Finally, the start of
iterative planning, like the Lean solutions in the next section,
shows some of the basis for DevOps in realizing that growth can
occur over time rather than being a planned event.
Representative projects are the Polaris project to build
submarine-launched nuclear missiles (1956–1961), the Apollo
project to reach the moon (1958–1969) and ARPANET, the
precursor to the internet (1962–1979).
Moving into the next era begins some transitions into the
modern age. Here, the lightweight methodologies are not yet
Agile but are beginning to understand the need. The earlier
reliance on charts and diagrams incorporates risk management
to understand the probability and impact of various actions.
This allowed people to observe where an item occurs along the
path and the internal and external events that could affect
success. In this era, the certificate transition begins when one
could achieve certificates, like the Programme Management
Professional from the Programme Management Institute, that
could validate credentials short of a degree. This carries
forward in many other modern institutions with DevOps,
Amazon Web Services and many other certificate options.
Examples of projects in this era include the England-France
tunnel project (1989–1991), the Space Shuttle Challenger (1983–
1986) and the Calgary Olympic Winter Games (1988).
From 1995 to the present, the modern project management
era starts with linking the Agile and DevOps mindset to overall
management theories. Agile methodologies first appear and
begin implementation through Extreme Programming, Pair
Programming and eventually Agile. As the era reaches the
present, within the past ten years we see transitions to remote
work from in-person attendance. This era sees Critical Chain
Project Management as expanding the Critical Path method to
allow the inclusion of resources and risk into delivery
timelines. Finally, the era has seen the introduction of
university degrees in project management. Projects
exemplifying this era include the Y2K solutions to update
computer architectures, the Panama Canal Expansion and the
Large Hadron Collider.
Programme management processes are far from DevOps
solutions, only starting to contribute in the most recent
timeframes. Many who realize DevOps success have previously
worked in programme management for physical projects and
try to adapt those tools to software development. The next step
in learning DevOps history moves from basic programme
management to learning Lean contributions.
Lean
DevOps roots begin with Lean and Just-in-Time manufacturing
processes generally attributed to Toyota during the 1980s and
early 1990s. The earliest roots attribute initial thoughts to
Henry Ford with the Model T line in 1913 when he linked
interchangeable parts to a production line.5 This enabled a
continuous operations flow but did not accelerate the linkages
between development and operations. Toyota incorporated
these concepts to provide the best quality, lowest cost and
shortest production times by eliminating waste within the
system.6
Toyota faced numerous shortages in the post-WW2 era, and
any hiccup in their process caused an unacceptable slowdown,
resulting in value loss. Implementing an idea called Kaizen
dedicated Toyota to seeking rapid improvement wherever
possible. This concept relies on the two pillars of Just-in-Time
and Jidoka. Just-in-Time uses continuous flow, takt time and a
pull system. Continuous flow means delivering value never
stops; the production line must always continue flowing. Takt
time emerges from the German word, takt, for a beat or pulse
in music to measure production output against customer
demand.
FIGURE 1.2 Takt time equation
Source: https://www.toyota-global.com/company/history_of_toyota/75years/data/com
pany_information/management_and_finances/finances/income/1937.html
Defining DevOps
The term DevOps is generally attributed to an individual named
Patrick Debois in trying to combine all possible information
technology (IT) perspectives within the business.9 A business
may be defined as multiple streams with some specializing in
one aspect and some specializing in another. In 2007, when he
began formulating DevOps basics, Debois was in charge of
testing for a data centre migration. Here he recognized the
difficulty in merging the software development on one side of
the corporation from those implementations into operations
from the other side.
IT development is generally responsible for creating new
software, hardware, firmware or configurations that address
customer problems and business needs to create value.
However, the operations of those functions can become
segmented, with the IT operations elements responsible for
monitoring and maintaining business technology by handling
issues submitted by other departments and resolving their
business problems. When testing, Debois began to realize that
the new items submitted by development did not always fit
operations processes and sometimes the operations fixes did
not always travel back to development. This resulted in manual
fixes for items that could be automated and delayed the
delivery of value to the customer. The thought process and
culture to bridge this gap did not exist. This gap likely caused
frustration for many in the software development industry, but
to Debois’s credit he advanced the possibilities for solutions to a
like-minded group and today we have DevOps.
The idea was pushed forward in 2008 when a gentleman,
Andrew Shafer, arranged a working session at an Agile
conference in Toronto, Canada, called Agile Infrastructure.
Unfortunately, Shafer received negative feedback and did not
attend his own session. Reportedly, Debois found him later at
the conference for a hallway discussion, and they formed an
online group to discuss ideas further. In a proper DevOps
pattern, one can see flow, feedback and improvement in the
process. As before, the idea was to bridge the business gap
between development and operations.
The next major push occurred in 2009 when Paul Hammond
and John Allspaw presented a lecture entitled ‘10+ Deploys a
Day: Dev and Ops Cooperation at Flickr’.10 Even over 15 years
later, watching the video helps us see how many of today’s
DevOps concepts appear in this presentation. This helped
identify what started as a conflict between the two groups who
frequently blamed the other for the delay instead of working on
the issue. It highlighted the stereotypes between the two groups
that Dev is supposed to constantly deliver new things while Ops
keeps the site stable and fast. However, the bridge begins to
emerge in that the groups have the same goal of advancing the
overall business. The presentation explained some initial
concepts embedded in DevOps today, such as automated
infrastructure, shared version control, one-step build and
deploy, and common metrics. Most importantly, the
presentation highlighted the need for shared trust between the
groups rather than an adversarial approach.
Debois ran with this idea and gathered system administrators
and developers to sit together during a DevOpsDays conference
in October 2009.11 Emphasizing the nature of the community,
the group focused on the problems they encountered and ways
to bridge those gaps. The concept of DevOpsDays has exploded
since that initial conference and instances can be found
globally in most cities today through devopsdays.org. On
average, there are over 50 conferences yearly, and this
emphasizes again the feedback and improvement aspects of
DevOps that practitioners regularly seek help from others
across multiple industries.
Over the next two years, the movement’s following grew
quickly and many smaller tech enterprises began incorporating
some of the ideas into their daily work and experiencing
success. DevOps incorporated several ideals, with the primary
being that Dev and Ops should talk regularly rather than being
isolated in corporate silos. Other items for improvement were
the inclusion of Agile methodologies, the need for continuous
integration and delivery through automated software pipelines,
version control, common software repositories, lightweight
service management with individual responsibilities, and
consolidated incident management.
The next step was publishing the first ‘State of DevOps’ report
by Alanna Brown in 2013 through Puppet software.12 It
highlighted the many benefits accessible through implementing
DevOps as captured by the DevOps Research Association
metrics: deployment frequency, lead time to change, change
failure rate and mean time to recover. These elements focus on
how fast one can accurately deploy and the time to recover
from a published or newly discovered bug. These reports
continue to be one of the yearly tools useful in measuring
DevOps, although several organizations now publish yearly
DevOps reports from different perspectives. One example is the
DevOps Institute, now under PeopleCert, which publishes
yearly ‘Upskilling IT’ reports highlighting the skills humans
need to continue their DevOps transformation.13
Following the initial collection of DevOps-related data, the
next advance for DevOps was the publication of The Phoenix
Project in 2013.14 The book relays a fictional story about an IT
manager called upon to perform a long-suffering project. It
introduces some of the first alignments to the Three Ways and
is recommended reading for anyone interested in DevOps
success. The book was followed in 2016 by The DevOps
Handbook, which broke down many of those initial concepts
into more practical approaches, and which remain highly
useful.15 These two books form the foundation of many DevOps
practitioners’ work libraries.
Since those initial steps, DevOps has only continued to grow.
In 2015, a Gartner report predicted that DevOps would
transition from a niche strategy to a mainstream ideal
employed by more than a quarter of the Global 2000
organizations.16 In 2017, Forrester’s research called for ‘the
year of DevOps’ and reported that up to 50 per cent of surveyed
organizations were practising DevOps.17 As of 2021, up to 83 per
cent of IT decision makers use DevOps practices to unlock
higher value, 99 per cent report DevOps as having a positive
impact and 61 per cent state it has helped to reach higher-
quality deliverables and reduce time-to-market for software.18
These positive statistics then align with Gartner’s predictions
about most companies adopting cloud computing by 2025, with
almost 100 per cent of new digital workloads occurring on the
cloud. Each of these suggests DevOps is growing, and expanding
individual confidence can be a key to unlocking success for
many teams. However, the question remains: who uses DevOps
today and how do I apply it to my organization?
Notes
1 Elshaikh, E M (2023) Standing on the Shoulders of Invisible Giants, Khan Academy,
https://www.khanacademy.org/humanities/big-history-project/big-bang/how-did-bi
g-bang-change/a/standing-on-the-shoulders-of-invisible-giants (archived at https://p
erma.cc/B9UT-FQ8V)
11 Kim, G, Humble, J, Patrick, D and Willis, J (2016). The DevOps Handbook (2nd ed),
Portland, OR: IT Revolution
12 Brown, A (2013) State of DevOps, Puppet.com (archived at https://perma.cc/H2MP-
85WL), https://www.puppet.com/resources/history-of-devops-reports#2013
(archived at https://perma.cc/ACL5-5DTU)
16 Gartner (2015) Gartner Says By 2016, DevOps Will evolve from a niche to a
mainstream strategy employed by 25 percent of global 2000 organizations, https://
www.gartner.com/en/newsroom/press-releases/2015-03-05-gartner-says-by-2016-de
vops-will-evolve-from-a-niche-to-a-mainstream-strategy-employed-by-25-percent-o
f-global-2000-organizations (archived at https://perma.cc/H5HW-L6X6)
17 Stroud, R (2017) 2018: The year of enterprise DevOps, Forrester, www.forrester.co
m/blogs/2018-the-year-of-enterprise-devops/ (archived at https://perma.cc/9Q9Y-94
5P)
18 Todorov, G (2023) 40+ DevOps statistics you should know in 2024, https://www.stro
ngdm.com/blog/devops-statistics (archived at https://perma.cc/QD8R-U9XT)
19 Goel, S (2022) How does Planet Fitness work and make money, The Strategy Story,
https://thestrategystory.com/2022/09/29/how-does-planet-fitness-work-and-make-m
oney-business-model/ (archived at https://perma.cc/EU96-CKXZ)
20 Periera, D (2023) Microsoft Business Model, Business Model Analyst, https://busine
ssmodelanalyst.com/microsoft-business-model/ (archived at https://perma.cc/YE76-
R3UL)
21 Kanaracus, C (2023) How they did it: 6 companies that scaled DevOps, TechBeacon,
https://techbeacon.com/app-dev-testing/how-they-did-it-6-companies-scaled-devops
(archived at https://perma.cc/ZV8S-VHX8)
STRENGTHS
Strengths identify items done well and range from technical
skills to the cultural environment. These items allow a company
to stay competitive. Strengths directly tie to the proposed
business model. Strengths bring clear advantages and should be
compared to competitor strengths. If one considers high-speed
production a strength but cannot crack the marketplace top 10,
it may not be a strength. For example, if one looks at Google’s
market leadership, strengths are brand knowledge, brand
valuation, rapid growth and an ability to rapidly respond to
marketplace changes.1 Each strength puts Google in a market-
leading position.
WEAKNESSES
Generating weaknesses relies on a comprehensive strength
category. A market leader’s strength might pair with an
inability to change quickly as a weakness. The best weakness
assessments consider how others in the marketplace see the
organization. Many personnel systems use a 360-degree
assessment where individuals are rated not only by managers
but also by peers and subordinates. This also applies to
weaknesses, as competitors will actively try to attack your
position or gain market share. Returning to Google, their
weaknesses are privacy policies, unfair business practices and
failures in the social media revolution.2 These outside
perceptions turn inward as Google has lost personal data,
fought to prevent other organizations from entering the
marketplace and failed to launch a viable social media
platform. Not all weaknesses need fixing to remain competitive
but acknowledging weaknesses helps one to prepare.
OPPORTUNITIES
Opportunities are the areas where you can gain an advantage
over competitors. While a company may not yet be moving
towards an opportunity, recognition allows change before
others seize those advantages. Changes in technology often
become a major opportunity source. These can often be
devastatingly overlooked – one historical example of this is
Thomas Watson as President of IBM in 1943 stating, ‘I think
there is a world market for maybe five computers’.3 While IBM
still maintains consistent revenues, their chance for
exponentially higher numbers may have disappeared by not
recognizing the opportunity earlier. Returning to Google, some
recognized opportunities might be the wearables market,
recent remote work increases and non-advertising revenue
sources. In non-advertising revenue, while most Google
revenue comes from platform advertisements, new options
include Google Cloud, selling apps under the Google Play logo,
expanding services to emerging markets and home hardware
such as Nest.4 Each shows movement Google is taking to
leverage opportunities as a market leader.
THREATS
Threats deals with what competitors are currently doing to
influence your marketplace position. These items negatively
affect the business from market shifts, inability to gain and
retain needed talent, or supply chain issues. In 2020, the chip
shortage caused by the pandemic was a major threat to many
businesses. The global supply chain meant chip manufacture
became distributed and required aggregating multiple
products. Threats might be as simple as cash-flow problems or
poorly managed debt that prevents moving on new
opportunities. Google faces market share threats from
competition with Facebook, Amazon and Instagram, antitrust
controversies and lingering post-pandemic economic
uncertainties.5 Each clearly appears as a threat but can be
difficult to address without comparing other SWOT elements.
Traditional SDLC
Each stage of the standard cycle appears in Figure 2.1 with
various definitions. After years of working with project
management, these steps will become embedded in your
understanding. Knowing stage requirements intimately allows
for adjusting and implementing those stages in best-fit, targeted
models. While each stage is distinct, sometimes one stage’s
answers may solve the next without requiring process
reversion.
This image depicts each stage sequentially, but steps may be
parallel depending on requirements and team involvement. For
example, one may build limited software functionality and
deploy while still building later interface or plug-in
improvements. These basic steps relate to physical
manufacturing where, for example, an incomplete automobile,
such as one missing a tyre and an engine, could not be
delivered to a customer. In software, one can often separate
these pieces and still deliver functional products. Even if every
car part requires distinct cycles, all components are needed to
successfully manage delivery.
FIGURE 2.1 The software development lifecycle
(SDLC)
PLANNING
The first cycle stage is planning, starting with receiving
requirements. In this cycle, requirements emerge from
planning and are adjusted to the larger context. Planning
allows the opportunity to solidify requirements from good,
high-level ideas to concrete requirements. Planning
requirements can be selected through various value criteria
such as value chain analysis, company strategy alignment,
resource availability or technical difficulties. While this basic
list comprises some common comparison items, in any model,
planning’s importance is connecting initial requirements to
prioritization. Without prioritization there can be no guarantee
that completed products will deliver value to the company.
One planning challenge is that many organizations spend a
long time planning without effectively connecting delivered
value. The tools embedded in the planning stage allow for
direct comparison between different assessments based on
resource commitments, value and required time elements. No
one assessment should be used, but a variety of assessments,
each weighted for the organization’s goals. In a software
example, fixing a recent security patch might be a medium
resource requirement, important to the company’s stated
security mission, and take a medium amount of time. The next
item, changing the user login screen, might be a low resource
requirement, not important to the overall mission and take very
little time. If one scored resources from low (1) to high (3),
mission from important (1) to not important (3) and time
required from low (1) to high (3), then the first element would
be a 5 (2,1,2) and the second may score a 5 (1,3,1). The
importance of distinguishing the weight would be another
factor. Then the factors could be weighted as mission is most
important (1), resources are second (2) and time is third (3) as
shown in Figure 2.2. Allowing multiple teams to assign weight
and average can be an effective tool as well.
TABLE 2.1 Weighting factors
Skip table
Skip table
Fibonacci
The table shows that, if the lowest score is prioritized, then the
user login is a clear planning winner despite the professional
instinct to prioritize the security patch. One useful tool appears
in the bottom half of the figure where the rankings are
converted to a Fibonacci sequence rather than straight
numbers. This allows a broader gap to appear more quickly.
Fibonacci is a mathematical sequence where every number is
added to the one before (1,1,3,5,8,13) and allows for rapid
growth. This accentuates the difference between any two items.
My preference is for scores where the lowest wins, like in
golf. Another tip is ensuring any weighting or scores appear as
a 1 to n ranking where n represents the total number of set
items or the highest Fibonacci score. For an item in a three-
element set, the scoring would be 1,3,5 and in a set of five
elements it would be scored as 1,3,5,8,13. One can see how
quickly complexity grows but DevOps emphasis relies on
prioritizing a small number of things often, and finishing the
intended items. Experience and practice are the best ways to
succeed at planning elements. One must create effective
categories and efficiently score various options.
ANALYSIS
From planning, the SDLC moves into the analysis phase.
Analysis takes the approved requirement and includes users to
fully understand the customer ask. This builds specific use-
cases. In the previous security patch example, a use-case might
state, ‘As a user, I want to know that the system I use has been
patched and that my personal data is not easily available to
those outside the environment or for malicious purposes.’
Correctly defining the use-case allows for determining where
and how an organization addresses goals. In this case, the
following requirements might be a team familiar with security
warnings, the current system status, fix implementation and
integration.
After breaking out the requirement, analysis continues by
eliminating redundancies between this requirement and
others. For example, the new user login might require the
security patch before the login can be integrated. If the patch
stated all login information should be encrypted on
transmission, then new logins should include patching
requirements. This deconfliction helps assure alignment. If one
understands dependencies and redundancies, this creates
feedback in rapidly moving cycles to add dependency
resolution as part of planning.
DESIGN
The third SDLC phase is design. At this step, the teams begin to
take the task and work through proposed solutions. In design,
logical and physical requirements are essential for success. For
a software solution, logical designs are the first step. Logical
requirements establish linkages between computer processes,
reports, databases, stored information and internal or external
sites.
Next, the design phase requires analysing physical
requirements. While most software does not require physical
components, the possibility remains. In some cases, a security
patch implementation might require all users to have a key fob
that generates a random software synchronized number to
provide multi-factor authentication. In this case, not only would
the security patch be installed but key fobs must be produced
and distributed. In the traditional cycle for this change, adding
a key fob might constitute a different requirement and require
returning to Stage 1 or Stage 2.
IMPLEMENTATION
The fourth phase is implementation. When delivering software,
implementation requires coding and testing and delivery. Each
aspect helps refine the product to meet defined needs as well as
offer a stable, supportable delivery. Sometimes SDLC models go
offtrack if coding solutions vary significantly from
requirements. In remote organizations, sometimes even the
code basis may vary between common standards such as
Python, JavaScript or even Cobalt. In later DevOps models,
consolidated pipelines help resolve these issues as differences
in linters, scanning tools, testing models or integrated builds for
each user could significantly slow implementation. Another gap
between requirement and coding can exist if the customer asks
are so detailed as to remove developer creativity, for example if
the requirement for one aspect of a parking application
requires Apache integration and another explicitly requires an
architecture not supported by Apache.
MAINTENANCE
Once implemented, the cycle is not complete. The final SDLC
consideration is maintenance – how to fix and update a
distributed thing. DevOps solves the maintenance problem by
continuous delivery and continuous integration rather than
distinct elements identifiable as repairs. More traditional IT Ops
employ large, heavily manned divisions whose sole purpose is
keeping code and systems running. Sometimes this means
maintenance fixes are not communicated back to
implementations for quick solutions but instead re-enter
planning as new requirements. These approximations affect
how quickly you can deliver new value and maintain a positive
return on investment.
Overall, the output from a carefully planned, traditional SDLC
approach is often a business plan with hundreds of pages,
multiple documents and communication gaps between decision
makers and implementation. Aligning all needed individuals
can be challenging, especially if the decision authority resides
separately from those enacting decisions. DevOps tries to
reduce this cycle, shorten the paperwork and push decisions as
low as possible, similar to some effective military operations.
DevOps SWOT
The DevOps strength lies in agility and versatility. Each step
appears as formally defined even without formal execution. A
DevOps pipeline can handle any project and a business
designed with the DevOps cultural underpinnings can break
down even the largest project into small, valuable steps. The
DevOps weakness can be in the lack of specializations. While
each team can handle code, highly specialized items may still
revert to a preferred team. Since every project uses flow and
feedback at the lower levels, highly specialized applications
may be difficult to achieve as the amount of time to gain
proficiency in the specialization delays customer delivery.
Similarly, businesses preferring to dictate technical solutions
may have difficulty incorporating DevOps processes.
One example appears with some legacy systems where the
goal is continuing the current process rather than adaptation.
During the pandemic, some older systems based on the COBOL
programming language required changes to adapt to remote
working systems that needed help.7 The system design
prevented a Java refactor or a cloud migration, making it a poor
target for DevOps-style implementations where technical
possibilities generally remain wide open for best-fit solutions.
The DevOps opportunity lies in the free-flowing approach. A
DevOps team can take any problem, at any stage, and integrate
through the double-helix for improvement. This allows brown-
field and green-field approaches to be solved with a DevOps
SDLC. A brown-field approach is where an application exists
and requires modification while a green-field approach
describes where the initial requirement does not have any pre-
determined technical specifications. In addition, DevOps
typically has a shorter time flow to deliver working software
than other models and allows quicker return on investment
than a Waterfall approach.
The threat faced by a DevOps SDLC is the traditional or highly
regulated industry. These approaches can quickly disrupt and
delay a DevOps SDLC and leave customers complaining that
DevOps failed them. Examples of this threat appear in
government solutions where an acquisition programme adopts
DevOps but requires all the traditional documents produced
including system designs, maintenance plans or five-year
roadmaps. Another example appears when customers fail to
understand DevOps basics, challenging why multiple pipelines
are required, or wanting to build a customized toolbox based
on their favourite applications. Some organizations think
pipelines are distinct items rather than flexible components
and fixate on the idea that one tool should provide all solutions.
The common analogy to this is that if all I have is a hammer,
then every problem should become a nail. These threats can be
overcome through shared understanding.
Strengths: Agility, versatility, constant feedback.
Weakness: Lack of specialization, legacy requirements.
Opportunities: Free-flowing approach, value now.
Threats: Traditional industry, structured processes.
Waterfall SWOT
In analysing Waterfall with the SWOT matrix, several elements
stand out. The strength of the waterfall method is the
concentration on every element. Each individual instance is
categorized in the plan, and must be accomplished before
moving to the next segment. This allows for clear progression
from one area to the next, as long as progression is defined in
the plan. Managing a Waterfall system is also a strength as
every step is documented and recorded. A project manager can
clearly show where the programme is at and what steps occur
next.
The same stability providing strength creates weakness if
unexpected changes occur. Changes not addressed in the plan
are neglected or plans returned to earlier stages. This serial
Waterfall approach may then require cancelling a project to
return to the beginning. For example, if building a house,
material shortfalls such as lumber may require an alternate
material solution that might not have been considered in the
requirements and design phases. This could cancel the project.
In a software cycle, a new patch in dependent software, vendor
shortfalls or moving to an alternate cloud environment would
have the same impact. The changes most developers handle
through version control systems like GitHub are anathema to a
good Waterfall programme. There are no early offramps in
Waterfall allowing for good ideas to succeed; the programme
runs to completion or fails.
The opportunity offered by Waterfall is planning well into the
future. Waterfall can recognize and plan for strategic
opportunities in longer timelines than Agile methods. However,
not all those timelines are achieved. The Waterfall outline
appears as if removing risk but the uncertainty with every
element often adds significant project risk. Clear planning
offers resource investment at a known cost, and early
procurement but external shortages and changing markets can
derail those planned goals. Organizations advocate Waterfall to
accurately portray resource needs and commit company
resources consistently. The challenge then becomes that
stability poses a threat to responsiveness.
The threat faced by Waterfall SDLC approaches is lacking
change opportunities. Projects must be clearly defined and
resources may need to be scheduled early in the cycle. This
makes it difficult to deal with rapid market changes and
innovation. If another company can produce faster, the longer
Waterfall cycle inhibits response and decreases business
success. A good example could be any security patch required
by modern systems. The Log4J vulnerability, a zero-day exploit
against an open-source logging framework, required rapid
identification and response across a variety of industries.13 If
those solutions had needed to wait for a Waterfall approach, the
first fixes might still just be rolling out years later.
Strengths: Well documented, easy to manage.
Weaknesses: Highly prescriptive, no early offramps.
Opportunities: Clearly defined, stable process.
Threats: Inability to adapt to new technology or shifting
requirements.
XP Agile
1. Simplicity 1. Individuals and interactions over processes and tools
2. Communication 2. Working software over comprehensive documentation
3. Feedback 3. Customer collaboration over contract negotiation
4. Courage 4. Responding to change over following a plan
5. Respect
XP Agile
Whole Team Our highest priority is to satisfy the customer through early
and continuous delivery of valuable software.
Businesspeople and developers must work together daily
throughout the project.
Build projects around motivated individuals. Give them the
environment and support they need, and trust them to get
the job done.
Planning Game Welcome changing requirements, even late in
development. Agile processes harness change for the
customer’s competitive advantage.
Small Releases Deliver working software frequently, from a couple of
weeks to a couple of months, with a preference to the
shorter timescale.
Customer Tests (Intentionally left blank)
Simple Design Continuous attention to technical excellence and good
design enhances agility.
Pair Programming The most efficient and effective method of conveying
information to and within a development team is face-to-
face conversation.
Test-Driven At regular intervals, the team reflects on how to become
Development more effective, then tunes and adjusts its behavior
accordingly.
Design Improvement The best architectures, requirements, and designs emerge
from self-organizing teams.
Continuous Working software is the primary measure of progress.
Integration
Collective Code (Intentionally left blank)
Ownership
Coding Standard (Intentionally left blank)
Metaphor Simplicity – the art of maximizing the amount of work not
done – is essential.
Sustainable Pace Agile processes promote sustainable development. The
sponsors, developers, and users should be able to maintain
a constant pace indefinitely.
The first call-out should be for items at the top. XP defines the
whole team concept as a team working together daily, in a
collocated site, where business representatives and developers
can communicate freely. Agile defines this broadly through
team interaction and innovation to obtain functional delivery.
On the other side, customer tests, collective code ownership and
coding standards have no direct comparison to an explicit Agile
principle. The thought appears instead through multiple
principles focused on design, self-organization and reflection.
Again, the primary difference is the Agile desire to propose
thought guidelines appropriate for multiple situations while XP
offers a more dedicated approach.
One of the unusual XP terms is metaphor. Here, metaphor
means relating a program function towards a visual approach,
such as describing a search program function like a hunting
owl – it finds prey at night by rapidly discriminating between
multiple distractions and captures only the wanted item. While
metaphor approaches the Agile simplicity desire, simplicity
expands beyond initial capability statements to remove work
not required to focus on functional software quickly.
The close correlation between XP and Agile appears
throughout the table. One sees indecision in the XP model as
failing to determine whether it wants to be practical
implementation or overall guidance. Agile clearly lands on the
overall guidance preference. This makes Agile more universally
applicable to model SDLC through the cultural emphasis on
delivering functional software. In any approach, if you consider
implementing XP, it should probably be done within an Agile
framework. The variance means one can run XP within Agile
but has more difficulty in applying Agile to an XP culture.
Extreme and Agile SWOT
Agile’s strength builds a culture framework to deliver software.
This strength addresses each SDLC stage in parallel rather than
sequentially. In addition, Agile routes phases as effectively
communicating to teams rather than passing completed items
between teams. As a weakness, Agile does require cultural
implementation. Just as with DevOps, cultural transitions are
more difficult to achieve than a technical or process approach.
Some processes are addressed in Agile but not explicit enough
for universal success without cultural understanding. The Agile
principles drive implementation but lack sufficient outcome
orientations to be successful in the same manner as DevOps.
Again, one can run DevOps as part of an Agile framework, or
Agile approaches within a DevOps framework as long as the
differences are understood and communicated through cultural
understanding. The key Agile weaknesses are an explicit
requirement for cultural emphasis and lacking principles
addressing metrics or automation.
The Agile opportunity is the first SDLC change highlighting
cultural needs. In words accredited to Peter Drucker, ‘Culture
eats strategy for breakfast’. Unless the business culture aligns
with overall goals, success is difficult, if not impossible to
achieve and sustain. The opportunity continues in building a
development business initially as Agile rather than
transforming later. Agile businesses are likely to remain Agile
whereas a business advertising as Agile may fall short of the
full cultural integration. Agile opportunities appear in handling
a wide project range and delivering quickly.
The threat to Agile emerges from those same cultural
leanings. Fast-moving and development-focused businesses are
often quick to assess Agile from their process approach. They
attempt to implement underlying practices without adopting
the high-level, cultural conversion. The business then attributes
these failures to the Agile system rather than their cultural
implementation shortcomings. Outside viewpoints can see Agile
as limited successes in one field or location rather than the
underlying strengths, especially if their transformation failed.
One common strike against Agile is the lack of explicit testing
protocols.
Strength: Creates a cultural frame for delivery.
Weakness: No explicit process guidance.
Opportunities: Cultural and strategic alignment, wide
opportunity.
Threats: Difficult to define success, lack of testing
protocols.
SAFe model
The Scaled Agile Framework (SAFe) was first advertised in 2011
and has undergone six major version changes since then.16
SAFe offered exactly that for big companies – what felt like a
safe transition from traditional Waterfall formats to a DevOps
model. Each version offers more diagrams, a further expansion
and more explicit charts defining individual roles and
responsibilities. The system currently states they have more
than 20,000 organizations and one million people using their
system worldwide.17 Most SAFe profit originates from
continuing training classes associated with every step, distinct
classes for every role in the process and the need to update to
the newest version every two years. I personally have been
certified at multiple levels with SAFe as I worked for different
organizations.
From the analogy sense, SAFe, much like the DevOps SDLC,
offers a basecamp for those seeking a DevOps summit. It
provides detailed plans built on themes such as organizational
agility, lean portfolio management and agile product delivery.
Each theme then divides into levels for portfolio, solution, Agile
Release Teams (ART) and team flows. The disconnect from SAFe
and DevOps appears in the way each step hierarchically flows
and connects into the next step. Critical assessments and
success stories all emerge from the SAFe stack based on the
organization’s ability to adapt.
Throughout the process, SAFe makes use of sound DevOps
approaches, Agile transitions and the Waterfall documents,
which makes large organizations comfortable. The system
advocates value throughout but only by fully adopting all
hierarchical methods. The portfolio level encompasses high-
level business strategy, solutions deliver products, the ART
offers features and the individual teams complete the stories
necessary for each feature. At every level, one sees a
responsible manager, an architect and an engineer. These
individuals are supplemented by the teams at every level. These
levels suggest a DevOps flow exists and can be split through
levels. SAFe designates individuals rather than accepting
collaborative responsibilities.
One unique SAFe implementation occurs in the Programme
Increment (PI) design. Programme Increments are designed as
approximately 12-week sessions to organize delivery. This
matches some DevOps thoughts but too many organizations
take the opportunity to build a Waterfall delivery model based
on 12-week increments. At the planning sessions, all members
of each team are intended to sit together, plan design and
resolve issues associated with dependencies and delivery. These
allow each level to observe lower-level actions, and build their
own plans from those. This best mimics the DevOps Feedback
elements.
The part lacking in the SAFe model is that the structured
delivery inhibits flow, feedback and improvement. Flow occurs
but must progress through every level rather than directly to
delivery. An item that satisfies a business strategy, no matter
how big or how small, must progress across the multiple levels.
For example, if a security change requiring new encryption
standards is needed for the portfolio, it must be integrated as
code, feature, solution and then portfolio even if a simple code
change would solve the problem. Feedback disappears between
levels and only occurs at the scheduled interactions.
Experimentation also disappears in the structured models and
the commitment to the planned PI deliveries.
The SAFe strength is that large corporations can buy a model
advertising a DevOps philosophy without fully committing to
cultural change. One weakness with SAFe is the cost required to
fully train and develop employees inside the explicit model. The
other weakness is that the rigid approach prevents the
flexibility DevOps incorporates. An opportunity exists with
SAFe as it allows larger organizations to begin a DevOps
approach without total commitment. Some soon learn that the
rigid nature prohibits the desired end goals. In fact, in 2019 the
US Air Force released a memo advocating Agile practices but
specifically discouraging the use of the SAFe model.18 Finally,
the threat from SAFe is being locked into a delivery model.
While looking at the surface like a DevOps transformation, it
falls short of that success.
Strengths: Move to DevOps, similar model.
Weaknesses: Cost, rigid approach.
Opportunity: First steps for large organization.
Threat: Delivery model lock-in.
GitOps model
GitOps offers the opposite approach to SAFe – a minimal
approach designed to reach DevOps SDLC results. Instead of
linking high-level strategy, GitOps links low-level commitment
to a software model delivering quickly. GitOps depends on the
common Git command framework in the command line
interface (CLI) using either GitHub or GitLab. GitHub is an
open-source approach to managing code versions while GitLab
is the proprietary model. GitLab offers a wide range of options
designed to help reach DevOps goals.
GitOps builds on three main DevOps best practices:
Infrastructure as Code (IAC), Merge Requests (MR) and
Continuous Integration/Continuous Delivery (CI/CD).19 IAC
basics appear in the next chapter as infrastructure but it states
that any item required for software to function appears at the
coding level. Merge requests are how one pulls code from forks
and branches to the main code delivery elements. CI/CD states
the flow process never stops; every improved item goes straight
from development to delivery based on integrated automatic
testing and promotion.
And that is all there is to the GitOps model. The system does
attach four principles, each at least partially embedded in the
GitLab/GitHub software. The first is that the Git repository acts
as the sole source of truth for all software, and all changes must
be present somewhere in those repositories. Next, the CD
pipeline builds, tests and deploys all applications. This links
software in that no applications can be manually promoted; all
must use the GitLab technology to progress. Third, deployment
steps monitor resource usage and fourth, monitoring systems
track performance to provide feedback. More on the third and
fourth elements appears in Chapter 5 on observability but the
feedback essential to DevOps is core to the GitOps process.
This process is not a DevOps cultural transformation or a full
alignment to the SDLC. Instead, it explains how to use a tool to
mimic the DevOps SDLC benefits without the cultural
transformation. The underlying gap is that to appreciate GitOps
benefits, those cultural transformations should already exist.
GitOps works best for DevOps organizations that are looking for
some custom tools rather than as an approach one could start
without the underlying DevOps culture.
The GitOps strength is in apparently containing all tools to
nominally complete DevOps SDLC requirements in a single
space. In a library analogy, GitOps provides the catalogue and
shelves but still requires the user to bring the books. The
weakness is that the underlying factors associated with DevOps
success appear in resource management, delivery and
planning. These can be done with GitOps but lack the
automation advertised until you build local pipelines, i.e.
bringing the books to the completed system. The opportunity is
that all potential tools seem to appear in one box. In Chapter 6
we discuss pipeline solutions and, later, platform solutions.
GitOps is an early step to incorporate pipeline and platform
solutions quickly. The threat to GitOps emerges from the
proprietary lock-in. For example, if feedback suggests
experimenting with other platforms or pipelines, it may not be
possible for those devoted to GitOps.
Strength: All the tools in one place.
Weakness: No planning, resource structure or guidance
outside CI/CD.
Opportunity: All in one platform.
Threat: Vendor lock-in.
Summary
Starting with the DevOps model, all of the above SDLCs offer
potential benefits to pursuing an organizational transformation
to quickly deliver software. One can see how different cultural
organizations prefer different approaches and why those might
be chosen. The SWOT comparison allows an understanding of
those business advantages and approaches. As software
development requires faster delivery and more comprehensive
processes, one can understand how the various SDLCs emerged,
from the traditional to the GitOps and SAFe approaches today.
The next chapter takes the assumption that as you are
reading this book, you are primarily interested in current
DevOps success or conducting a cultural transformation. The
various cultural components that guarantee success in the
DevOps SDLC are explained to show how integrating cultural
transformation into organizational practices poses a confident
route for one’s journey into DevOps.
Notes
1 Parker, B (2023) Google SWOT analysis 2023, Business Strategy Hub, https://bstrateg
yhub.com/swot-analysis-of-google-2019-google-swot analysis/
2 Ibid.
3 Strohmeyer, R (2008) The 7 worst tech predictions of all time, PC World, https://ww
w.pcworld.com/article/532605/worst_tech_predictions.html (archived at https://per
ma.cc/WZ3W-47S9)
4 Parker, B (2023) Google SWOT analysis 2023, Business Strategy Hub, https://bstrateg
yhub.com/swot-analysis-of-google-2019-google-swot-analysis/ (archived at https://p
erma.cc/CBP4-X7TJ)
5 Parker, B (2023) Google SWOT analysis 2023, Business Strategy Hub, https://bstrateg
yhub.com/swot-analysis-of-google-2019-google-swot-analysis/ (archived at https://p
erma.cc/YUA8-PYNH)
6 Vizard, M (2023) Rebellion against changes to open source Terraform license
mounts, DevOps.com (archived at https://perma.cc/MA8D-4BPR), https://devops.co
m/rebellion-against-changes-to-open-source-terraform-license-mounts/ (archived
at https://perma.cc/UH8W-WSR3)
7 Lawton, G (2020) How COVID-19 created a demand for COBOL basics, TechTarget, h
ttps://www.theserverside.com/feature/How-COVID-19-created-a-demand-for-COBO
L-basics (archived at https://perma.cc/EE6J-K3JB)
8 Banica, L, Radulescu, M, Rosca, D and Hagiu, A (2017) Is DevOps another project
management methodology? Informatica Economica, 21 (3), 39–51, https://doi.org/10.
12948/issn14531305/21.3.2017.04 (archived at https://perma.cc/S7ES-HHS8)
Cultural synthesis
The culture concept remains important to every organization. A
well-known quote attributed to Peter Drucker, a widely
influential contributor to management thinking, states ‘Culture
eats strategy for breakfast’.2 Deconstructing the quote suggests
it is impossible to succeed with a strategy running counter to
the company’s culture. One example occured when Rollerblade,
a business with a free-wheeling past, was bought by Benetton in
the mid-1990s. The loose culture at Rollerblade, with employee
exercise breaks and independent approaches, was countered by
the stricter retail company.3 Accompanied by employee cuts,
Benetton went from $5 million annual US income to losing $31
million in the first year. The underlying culture must support
desired outcomes and this remains true to implement DevOps.
Cultural designation elements remain similar whether the
culture is national, ethnic or organizational. Those elements are
symbols, language, norms, values and artifacts.4 DevOps flow,
feedback and improvement are best expressed through those
organizational norms. The shared common structure is not just
part of DevOps success but the key reason behind success and
failure at different organizations. Without a common shared
framework, companies are unable to adapt to the ‘deliver value
now’ DevOps aspect, relying on traditional approaches and
often unsuccessful paths, similar to the Bennetton example.
Culture is defined by the shared attitudes, values, goals and
practices characterizing an institution. These concepts are
reflected through chosen leadership, decision-making styles
and strategic architecture. The DevOps practice starts off
emphasizing the Three Ways. Many DevOps practices further
explain this with the CALMS concept: Culture, Automation,
Lean, Metrics and Sharing. The Three Ways and CALMS shape
how you implement a DevOps culture regardless of location,
company role or assigned work. Culture appears first as
integral to the other four factors succeeding. A shared learning
culture creates belief patterns to drive organizational
decisions.5
C – Culture
A – Automation
L – Lean
M – Metrics
S – Sharing
One frequent challenge in implementing a DevOps culture deals
with shifting from a local, in-person model to the partially or
fully remote operations necessary for global organizations.
Modern companies may use more remote work with
independent contractors, temporary employees and regular
employees. Regardless of corporate attachment, culture drives
the basic foundations for success. Measuring individuals occurs
across three dimensions of work: status, content and
conditions.6 The gap between the fully and mostly employed
can create divisions detrimental to a cultural establishment.
Many companies use various subcontractors as individual
teams or sub-contractors within existing teams. Small
variations among teams can cause stressful cultures. One
example appears in the practice of taking government holidays.
If working for a government customer, one may find the
government takes all federal holidays such as Labour Day,
Martin Luther King Day or other designated holidays as
downtime with optional down days added to create longer
weekends. As a local team, one may prefer to work on holidays,
accentuate flow and perhaps skip regular meetings to prevent
contextual shifts. This discrepancy can create organizational
friction which can increase as operations scale.
The growing translocality cultural problems appear in a
qualitative Agile practice study on Agile circa 2011–2014. Many
of these issues have yet to be resolved. The study finds that
coordination, temporality and communication all suffer in a
diversely cultured environment with geographic boundaries.7
Studies and use-cases show that despite an original ethnic or
locale-based culture, you can implement successful DevOps
cultural transformations. In implementing the DevOps culture,
one depends on close human communication as a key factor in
building value through a bottom-up approach. These
integrations promote a sharing culture and further DevOps
strategies by avoiding organizational silos.8 Establishing an
effective culture can help prevent future challenges.
Addressing stress and creating cultural synthesis at the early
level occurs through developing team working agreements.
These are standard within Agile practices and often lead to
increased success. The basic agreement foundation dictates that
every team member agrees to a consensus, facilitating flow.
These agreements occur at the team, management and higher
levels to communicate cultural role understanding. While team
agreements are discussed later in this chapter, the basic
concepts provide knowledge of roles, responsibilities,
communication expectations, feedback mechanisms and
change processes. Early identification of flow, feedback and
improvement mechanisms within a team agreement is vital to
creating effective DevOps cultures.
The DevOps model creates a cultural process based on flow,
feedback and continuous learning that interacts with other
business processes, even when considering compliance. One
way many teams improve communication is by adopting
elements like the Unified Modelling Language (UML) baseline to
detail business and development standards within a common
framework. An individual’s experience with standards leads to
first interactions in conforming to corporate culture and
improving standards for continuing success. These cultural
elements identify synthesis needs but cannot happen without
implementation direction from management.
Management implementation
No culture can be complete unless alignment between
management and workers occurs on the agreed-upon
standards. The goal in sound cultural implementation is
creating value alignment on delivery and ensuring steady value
delivery to the customer. There are many management texts
describing the difference between top-down and bottom-up
activities and DevOps requires adopting a hybrid approach. The
workers must want to go faster, develop effective products and
feel empowered through success. Management must agree that
adopting DevOps creates some uncertainty and risk in previous
long-term plans. The risk occurs at the delivery level rather
than the strategic level as small, valuable deliveries ensure
continued success but are not aligned to the Gantt charts and
standards management may have previously favoured. The key
element link lies in recognizing how groups communicate. As
the basis for this communication, I prefer thinking through the
social exchange theory.
Social exchange theory determines if an exchange will occur
when profit, defined as reward minus cost, reaches a sufficient
level for the involved groups to see beneficial exchange.9 The
core essence states, ‘Social behaviour is an exchange of goods,
material goods but also non-material goods’.10 The theory
originated from George Homans’ 1951 work, The Human Group,
exploring the behavioural decision theory with activities,
interactions and means defining the group.11 These areas are a
model for everything occurring within DevOps software
development. Activities are coding, merging and producing to
create customer value as well as including the observability
created through using task boards such as Scrum and core
repositories like Git. Interactions describe value transfers
within daily team meetings, product demonstrations and
accepted deliverables. Means implies the entire Three Ways and
CALMs implementations.
Social exchange addresses individual interactions to interpret
the meaning others hold about the world. The expansion to
social exchange provided a more precise definition of how
groups determined which behaviours were acceptable,
understanding the multiple perspectives of how one uses
experience as the lived-through events to guard against the
cliché, conceptual and predetermined. This allows moving from
the overall meaning to the moment where the individual
reaches their intuitive grasp. Collecting the shared success
within those intuitive mindsets focuses on the smaller elements
driving DevOps implementation.
Converting to any process takes time. Management
implementation should not be considered a one-day change but
a cyclical process. Most transformations typically take three to
six months to move from a previous standard to a new change.
These changes should be preceded by training for new standard
elements. This is where generating the team working
agreement (TWA) typically occurs. In this space, those TWAs
can also be shared to help eliminate silos and emphasize flow
across multiple elements. In creating a new team, it typically
takes three months for the forming process and initial
capability. The next three months allow the team to find their
speed and lock in elements. Then, at that point, the DevOps
benefits begin to occur, and velocity will gradually increase. It
should be noted that any change to teams such as new
members, product shifts or organizational changes can reset
these teams to the initial forming phase. This restarts the three-
to six-month process. Many management implementations
constantly change teams without realizing the structural
impact.
Studies show how social exchange behaviours affect
corporate practices by aligning direct cost models to social
responsibilities.12 Most important are the experiences that link
cost models to social responsibility. One social exchange
interpretation brackets conceptual, empirical and technical
information technology aspects through three aligned case
studies.13 Again, one sees the connection to DevOps through
formulating the idea, converting the idea into code and then
testing the code through product implementation. If one has
frequently dealt with engineering professions, one finds that
teams often jump from the idea to the preferred solution and
work backwards into the code. Unfortunately, this can derail
the feedback element and while code might be developed, it
will not be the best solution but only a solution that fits at the
time, and future solutions may become more difficult.
Decomposing these ideas returns one to the human
experience aspect as the core of DevOps, that DevOps is for
humans and not a technical solution. Social exchange standards
integrated into industrial settings show how decision-making
costs and rewards, the economic individual decision principles
and relationship levels between teams affect the overall
outcomes. These core concepts demonstrate social exchange
utility standards to the DevOps cultures. From the thought
perspective, Heidegger offered an underlying proof through the
unique term, enframing, as the process by which the individual
orders and is, in turn, ordered by technological processes.14
This provides a philosophical foundation for the flow and
feedback aspects of the DevOps experience.
Understanding these interactions from the management
perspective allows for describing experiences and interpreting
the interaction across multiple teams within the organization’s
cultural flow. A step-by-step process would first create a sound
ontological assessment for common issues, capture first
perceptions, identify parts and wholes inside the data and
finally a deeper understanding that captures the textual
essence of the global process. This approach contributes to how
the DevOps culture experiences valuation processes.
DevOps constitutes not just another project management
methodology but a conceptual method to break organizational
silos. Some functional answers to creating value appear
infrequently in research, such as value-based software
engineering, but still fall short of determining any value-based
approach towards the entire development cycle.15 A key
DevOps attribute must be to increase speed and stability while
delivering but again the question remains implementation.16
Delivering value as the end goal should be the desired aim for
all organizational participants.
The practical management question becomes not ‘What is
DevOps?’ but ‘How do I lead a team into adopting DevOps
processes and methodologies?’ Before one gathers
requirements, begins an architecture and forms teams, the
system must be understood. Perceptions need management at
every level when a transformational topic appears. These
perceptions lead to understanding the process parts and wholes
to create conceptual understanding. Those understandings
allow our DevOps practice to grow, just like the martial artist at
the start of the chapter, into an organically supported whole.
In the rest of this chapter I will use the CALMS approach to
show how DevOps grows through the Three Ways into a fully
functioning whole. CALMS becomes a useful approach as ideas
and values associated with DevOps are expressed. These first
steps build the basis for your understanding to grow as you
proceed.
Flow
Flow always appears as an easy process. All that has to happen
is work needs to be done. Good flow sets the path for the
DevOps journey. The word ‘flow’ fills parts of speech as a noun
and a verb. The dictionary defines the noun as ‘a smooth,
uninterrupted movement or process’ and the verb as ‘to move
in a stream’.17 Both apply when using flow as the First Way. The
teams should offer an uninterrupted action creating value
moving in a stream. Every stream has a beginning and an end,
even if the end is simply in a larger body of water, i.e. product.
This increases the relevance for value streams to represent
DevOps flow, beginning with the customer requirement,
delivered into the customer’s hands, and allowing the next
iteration with continued flow. As in every element, the
opportunity to create flow begins in culture.
The culture around flow should support constant movement
from idea to done, usually within a sprint cycle. The difficult
part for teams when adopting flow is not to allow themselves to
derail delivery. The best practice should be identifying blockers
as a positive when they impede flow. Too many teams see a
blocker and avoid discussing those issues with management to
prevent negative responses. As an example, if one identifies a
dependency for a third-party software mid-sprint, it often
becomes a solution quest without visibility. If the solution does
not readily appear, that interruption then slows other pieces,
much like trees falling across a stream. A team hitting blockers
needs to communicate those aspects, so everyone knows when
each stream will hit the reservoir as the end state.
Building a flow culture requires management and teams to
emphasize what work needs to be accomplished. Critical to
accomplishing work is deconstructing requirements to
complete the smallest work elements possible. While work
often starts with grand ideas, the company must ask what
should be done first, and what can be done most quickly. If the
idea is to construct a new mobile app to accelerate public
parking, the first element may not be a completed app but
recognizing users, a second element to accept secure payments,
and the third as scanning existing parking databases to find
spots. Rather than build the elements in silos, each should be
simplified and incorporated into the flow.
Automation expedites flow. Any flow is good but fast flows
are markedly better than slow progression. The first step in
automating flow is recognizing what slows you down and the
next accepts that automation does not remove job needs. A
common example is grocery shops moving to automated
checkout lines. Where before, every item required multiple
interactions, the overall grocery process to deliver value
through selling goods improves if every customer can scan
their own groceries. This then enables the employees to focus
on stocking shelves and improving the overall value of the
shopping experience.
In the same manner, automating flow is accomplished
through pipelines, planning boards, high observability, version
control and other DevOps-associated tools. Each tool takes a
manual process and passes it to a machine to allow humans to
focus on creativity. Pipelines are an excellent example as the
typical pipeline can incorporate multiple processes that would
take hours individually but aggregated are much faster. In
addition, process automation allows a developer to move on to
another task while waiting for pipeline results. Automation also
improves flow by allowing management to see progress directly
rather than stopping a developer to ask about progress or
require an update meeting.
Lean, of course, gained its beginnings from flow. Lean
eliminates waste to focus on product delivery. We saw Lean in
the Andon cord, in understanding the SDLC and in the struggle
to make delivery faster. Some Lean flow elements can include
daily standups of ten to fifteen minutes rather than an hour-
long meeting, asynchronous communications such as Slack or
Mattermost, and remote work versus daily commutes. One way
to accelerate Lean in organizations is to strictly timebox all
meetings and then apply common communications.
Timeboxing refers to setting a desired time and then stopping
the meeting there, regardless of outcome. Some communication
formats for DevOps involve meeting calls, of which several are
listed below:
‘LMO’ (pronounced ‘elmo’) – let’s move on. We think we
have reached the useful end of discussion on this topic
and are moving to the next one.
‘Weeds’ – the topic is too far from the initial subject and
may deal with solutions rather than characterizing the
current issue.
‘Parking lot’ – that point is necessary but does not relate to
the subject at hand so will be captured for later discussion.
‘Shiny’ – the topic is interesting and merits discussion but
is not relevant currently.
‘Thumbs’ – call for consensus, used to verify group
agreement. A thumbs requires a positive up or down from
every meeting participant, no abstentions allowed.
Flow-based communication calls are based on creating
alignment rather than consensus. Agreeing to a parking lot or a
shiny call does not mean that you always agree, just that you
agree for now. These calls build alignment to move an idea
forward in the flow even without complete consensus.
Metrics are an important topic that I’ll raise throughout the
book. Any metric used must not impede flow and must address
automation needs in accelerating flow. The DORA (DevOps
Research and Assessment) metrics are a good spot for starting
to recognize flow with deployment frequency, lead time to
change, mean time to recover and change failure rate. These
convey how fast code moves to production, the time an idea
requires to become code, how much bad code passes the
pipeline and how long fixing bad code takes in operations. The
most important part of flow metrics measures observable
events within normal progress and highlights items impeding
flow with unusual starts and stops.
The last element considers sharing. Sharing returns to
observability and reminds us that all DevOps events should be
universally visible. Enhancing sharing requires coding in
common repositories rather than individual silos to support
shared learning practices. Sharing also extends to processes,
working through retrospectives to acknowledge issues and
sharing knowledge forward through active demonstrations
about delivered features. Like many of the CALMS elements,
sharing ties to other levels through culture, metrics and
automation.
The best method to enhance flow resides in using Scrum or
Kanban boards. A Scrum board focuses on story points and only
commits a team to completing assigned items, while a Kanban
board uses the work-in-progress structure. Work-in-progress
limits the total items underway at any one time, ensuring every
item reaches completion before new work begins. Ideas in
Kanban can be pulled from the backlog at any point while
Scrum limits pulling work to items assigned to the sprint. While
both boards can use columns to show the state in which work
occurs, I always advise limiting those areas to ‘backlog’, ‘good
ideas’, ‘to do’, ‘work that has yet to be assigned’, ‘in progress’,
‘work assigned to individuals’ and ‘done, all items complete’. A
secondary tip here – every item should at least have a use-case
and acceptance criteria. Being aware of the problem and
potential solution helps drive flow.
These boards put all items in a common area, allow everyone
to visualize problems and actively track the progress of work
being done. Having work displayed commonly allows shared
judgments about whether work items are large or small. In
addition, it commits the culture to the pull requirement of
constantly moving work across boards. This allows flow across
multiple teams as all participating can visualize any other
team’s work at any point. Flow is maximized not by directly
counting work or story points accomplished on a board but by
how well each team completed the items planned. High
completion rates demonstrate flow occurred.
Returning to a SWOT approach, the strength of flow is in the
constant production of value. The weakness is that a high flow
rate does not lend itself well to long-term or ongoing projects.
Opportunity emerges constantly through flow, allowing quick
failures to drive learning without overcommitting to resources.
Threats to flow occur when you allow flow to be driven from a
push rather than a pull approach. In the push approach, you
generate and complete new ideas but these can become
disconnected from the end-state value.
Feedback
Feedback, the Second Way, provides the fuel behind successful
DevOps implementations. Feedback means that once an item
flows, information flows back to workers and forward to
customers. Without regular feedback, DevOps becomes just
another project planning tool.
The DevOps feedback mechanisms should emphasize
constancy and continuity. Every element drives discussion,
which eventually leads to the Third Way, improvement. When
you inspect automated DevOps pipelines, you see feedback’s
essence. Every step in the pipeline returns feedback, not based
on success or failure, but on step functions. An item can be
evaluated based on either the high-level success or the results
from an individual line of code within the progress. In
connecting with observability, feedback allows anyone to see
results and interpret.
Using feedback does not require a set format but DevOps
accepts all feedback as valuable. One of the DevOps keys is
constantly having blame-free discussions about what occurred.
Normal organizational action tends toward ordered processes.
Feedback, from the initial definition, creates a corrective action
requiring energy. Too many organizations devalue negative
feedback, wanting to hear only positive results while ignoring
negative feedback despite similar opportunities for change.
From this feedback, you can derive that while all action creates
change, inaction can create disaster. The continuous feedback
mechanism attempts to avoid disaster.
Culture describes the organizational context and determines
if a highly ordered state is preferential to the culture favouring
strict processes with low entropy. Order here means the static
process the organization has adopted. Some organizations
thrive with high-entropy cultures and some with low-entropy
cultures. If you place a space heater into a cold room, you’ll
notice the results sooner than if you placed one in a warm
room. DevOps feedback is the space heater, constantly injecting
more energy. Regardless of the initial state and whether you
notice it or not, the energy injection always takes place.
Organizations should seek a culture of accepting feedback. This
reflects Conway’s Law, ‘Any organization that designs a system
will produce a design whose structure is a copy of the
organization’s communication system’.18 Feedback culturally
follows an organization’s communication system.
Automation allows quick feedback on repetitive processes.
The goal here is to create mechanisms that rapidly provide the
necessary information to make adjustments. This typically
includes the dashboards common in modern software
engineering. An automated dashboard can provide details such
as the current security state, bandwidth consumed, memory in
use, lines of code committed and many other factors.
Organizations should seek to automate as much as possible
within their delivery cycle to provide improvement
opportunities.
Lean directly emphasizes feedback. In the previous chapter
we saw takt time as flow, the difference in planned versus
actual production. This is an example of feedback automated
with a dashboard to create feedback. Lean also drives the
impetus behind all feedback to eliminate waste. Feedback
attempts to create an optimal First Way flow by implementing
feedback. Just as with my prior experience in the intelligence
field, Lean suggests feedback should be actionable. Feedback
without suggested actions becomes a wasted effort.
In an example, feedback stating that this quarter’s work was
terrible and we made less money is not actionable feedback.
Feedback stating that production was unable to meet demand
and our customers turned to an alternative product becomes
more actionable. Then, if we dig into the feedback further, we
may find we were unable to meet demand because there was
growth from 200 units per quarter to 400 per quarter. This then
becomes a difference in the communication between the sales
funnel and the production element. Further discussion between
those two areas should create feedback to drive
experimentation and improvement, aligning better in the
future.
Metrics are equally essential to feedback. They provide the
initial source of truth to determine if actions can be undertaken
and then measure the impact of those actions. One key to
feedback is understanding the baseline. Baseline means
creating the standard for any action measured. One common
DevOps transformation problem is when many organizations
believe adopting DevOps creates success. Most organizations
require three months to adapt to a new change, and then an
additional three months to start producing with the benefit
from that feedback. In a small organization this applies to any
change, including changing team members, altering product
strategy or even changing company snack policies if they affect
team morale. Metrics show a historical record of what has
happened and allow comparison to the current output as
feedback.
Sharing, the last element of the CALMS structure, is equally
key in feedback. Feedback must be shared at all levels. DevOps
requires management involvement from the top and
suggestions from the bottom to be successful. The interaction
through these elements improves through constant sharing.
One can implement sharing through bulletin boards, open
meetings, suggestion boxes and open file structures. Traditional
organizations often adopt a process of locking files, managing
internal business in silos and not sharing processes. When
these silos occur, one can not baseline events. In the previous
example, if the sales team never shares the new business
funnel, there is no method for developers to observe demand
change prediction. On the other side, if the production
experience is mired in secrecy, the sales team does not know
where to share their information to drive production.
Although metrics and measures occur across many
dashboards and automated processes as feedback, there are
two common tools to verify answers. The tools I prefer are the
sprint retrospective (retro) and the feature demonstration
(demo). In software development, retros occur after a sprint or
product cycle. At this time, everyone gathers around and
discusses what went well, what went poorly and what needs to
improve. Another version uses the terms keep doing, stop doing
and do better. Both tend to reach the same point. The most
important part is that the feedback here resembles a
brainstorming session; no ideas should be discriminated
against. The other essential part requires revisiting the retro on
a regular basis to ensure previous issues are being addressed
rather than recreated during the following work cycles.
My other favourite feedback tool for software development is
the demo. The demo is a key software development function as
it confirms a feature has been accomplished. Regardless of the
feature size, DevOps’ Agile background emphasizes functional
software. If functional software is the goal, then those functions
should be demonstrated. Sprint features that cannot be
demonstrated should not be considered complete. This can
range from showing slides about research to demonstrating a
new interaction in logs or presenting a new capability. Demos
should be presented to a stakeholder or user and allow
feedback between what was promised and what was delivered.
This is one of the best ways to ensure everyone stays on track.
From the top level, the main strength of adopting feedback
systems is the very nature of the feedback. Strong feedback
means every action produces information that can be looped
back to improve the action itself. Organizations with good
feedback tend to improve and organizations without feedback
tend to fail (the progression of the early generations of Apple
computers is good evidence of this). As a weakness, one must
ensure that feedback creates a chance for further discussion.
Just because feedback exists does not always mean the feedback
is beneficial. A good example here would be prioritizing
feedback for availability when the company only has one
customer who logs in for 15 minutes a day. The feedback would
be all positive but if the company trusted only that availability
feedback, they would be unlikely to improve.
Feedback creates an opportunity to get better. Good
opportunities address that the best feedback leads to an
opportunity for more feedback. The best feedback may not
necessarily be positive – the important factor is that it answers
the intended question. The threat from feedback comes in a
similar path to the weakness. You must always seek to expand
feedback and gain more breadth or depth on issues. The
company must also not stop the feedback process by believing
that all useful questions have been answered. There is always a
need for more feedback.
Improvement
If Flow sets the path and Feedback fills the tank with fuel, then
Improvement is the engine for DevOps success. If a
commitment to continuous improvement does not occur, the
DevOps journey ends. Flow and feedback create the necessary
data to move forward. Improvement does not commit to big
changes but to small incremental changes over time. On our
journey, we want to look to the destination, then our feet, taking
one step at a time, overcoming current obstacles and being
careful about the next step. Then, over time, we can look back
on the path and realize that the start point is so distant because
all those small steps occurred at the proper place and time.
Starting with the definition, improvement is ‘the act or
process of improving’.19 We add that improving is ‘something
that adds value or excellence’ to become the act or process of
adding value or excellence. Then we have improvement as the
act or process of adding value or excellence to a product or
process. Improvement cannot be isolated but must link to
experimentation. Third Way experimentation allows one to
quickly try alternatives and then use the feedback to create
flow. The actions improved are not isolated and removed from
daily processes but organic growth from what needs to be
changed.
In that nature, every improvement starts with a question:
Why do we do this? Can the numbers be better? Where can the
code be refined? What other elements should be considered?
Key to those experiments is the ability to set up a trial and test
alternative methods. Traditional SDLC models examine those
alternatives only during early planning phases but DevOps
allows for experimentation to occur continuously to drive
improvement. All experimentation starts with a question and
builds to an answer. The scientific method appears below for a
quick reference on how to build any experiment:
1. State the question
2. Analyse existing research or data
3. Form a hypothesis
4. Perform the experiment
5. Analyse the results
6. Form a conclusion
Each step can be done continuously but understanding a
method allows one to make comparisons between different
experiments. A successful experiment is one that answers the
initial question. Not all answers are successful but all answers
have value in contributing to improvement.
Three types of experiment can be conducted: qualitative,
quantitative and mixed method. Each is suited to a different
type of question. Qualitative answers questions about
interaction, such as ‘What was the customer’s best experience
with this software?’, ‘Do employees like working with the
company?’, ‘What common experiences do developers share
when working with a feature?’ These are geared towards
feelings and experiences and are difficult to quantitatively
analyse. Quantitative research looks for a numerical answer:
‘How many did we produce?’, ‘What is the uptime?’, ‘How many
users returned for a second visit to the website?’ These are
useful for statistical data. The mixed method, as the name
suggests, includes elements of both to provide better alignment.
An example might be, ‘Why did customers return for a second
website visit?’ or ‘How does a high uptime (over 80 per cent)
relate to the customer’s experience with the site?’ The key
element should be understanding what questions will be asked
by the experiment and how the answers will be conveyed.
A culture driven by experimentation is an inquisitive one,
one that believes the only sin is the unasked question and
devotes time and resources to finding answers. A common
practice in experimental cultures is instituting hackathons. A
hackathon is where an organization proposes a problem, allows
developers a set time to create solutions and then integrates
those solutions into the business strategy. Another experimental
approach appears with Google, where employees are allowed
research time to spend 20 per cent of their time learning new
skills and trying new solutions.20
Driving automated improvement through experimentation is
more difficult to assess. As experiments are driven by creative
questions, most automated solutions will fall short. The answer
lies in not automating the experimentation but automating
experimentation environments. DevOps teams using
Infrastructure as Code (IaC) can rapidly deploy multiple
environments mirroring the production environment. This
simplifies testing and allows experimentation with multiple
options. Another method appears in using chaos engineering
principles. This experimentation technique attempts to predict
where faults occur. Successful prediction allows the team to
work on solutions before problems emerge.
Lean does not consistently reference experimentation. The
experimental idea appears as the Andon cord when you stop
the entire production line until a solution can fix the issue. That
solution is experimental. The part left out would be structuring
experiments into a regular process. Experiments lead to
improvement, and those continual improvement cycles do
appear in Lean. DevOps tries to structure experiments into a
more regular frequency.
Metrics are an essential part of improvement through
experimentation. Linking to the quantitative and qualitative
measures early, all experiments should tie answers to metrics.
At the same time, these metrics are likely to be explicitly tied to
the question answered. The goal in experimenting to reach
improvement is to achieve an answer. Those answers may not
always create direct improvement but should result in future
discussions about what to implement.
Finally, sharing improvement creates growth across the team
structure. Members should build cultural links to sharing
growth. We see these improvements shared through feedback
processes and through team interaction. Sharing experimental
results benefits the team in communicating results and
enhances the overall culture in demonstrating management
and team support for experimentation.
The best methods to advocate improvement were mentioned
earlier with the hackathon and dedicated research time.
Describing the best way to implement experimentation can be
difficult. An employee’s nature and cultural alignment play a
central role in whether they are receptive to experimental
leanings. Even if the hackathon is in place, many employees are
not motivated to learn and experiment with new technology. In
those cases, your business should always seek to advance and
promote those who show those qualities.
The strength in the Third Way is that continuous
improvement and growth allows rapid adjustment to a
changing marketplace. The weakness is that when employees
expect constant improvement and experimentation, they can
quickly grow dissatisfied with static environments. As an
opportunity, if you adopt an improve and adapt workplace, as
with Google, the workforce is constantly prepared and planning
for changes. These changes can create rapid growth, or just
better connections to marketplace trends. Finally, the threat
posed by improvement means management needs to be
prepared to find and offer opportunities to advance and
experiment, in line with the Third Way’s weakness.
Summary
Understanding the Three Ways, their interaction with CALMS
and the fundamental characteristics of each allows one to
journey deeper into DevOps solutions. The Three Ways are the
cultural pillars for all of DevOps. Here, you saw flow as the
central path to set the journey, emphasizing the need for quick
and continuous delivery above all else. Those deliveries only
become continuous when you can integrate regular feedback.
Returning information makes sure the flow proceeds in an
iterative fashion, constantly adapting to small changes. Finally,
improvement and experimentation allow one to concisely
determine those areas that are not small changes and then
implement an iterative approach.
In the next section, the concepts change from a broad,
cultural understanding of DevOps to a technical one. The
following chapters will build out the technical underpinnings
behind successful DevOps and allow the creation of
infrastructure, applications and environments that support the
Three Ways. Throughout all the following chapters you should
remain focused on these Three Ways and how technical
solutions enhance those opportunities.
Notes
1 Bergeson, D (ND) Strategic Planning: Moltke the Elder, Dwight Eisenhower, Winston
Churchill, and Just a Little Mike Tyson, connect2amc.com (archived at https://perm
a.cc/DG4A-WCVN). https://connect2amc.com/118-strategic-planning-moltke-the-elde
r-dwight-eisenhower-winston-churchill-and-just-a-little-miketyson (archived at htt
ps://perma.cc/F2NC-Q9RR)
2 Guley, G, and Reznik, T (2019) Culture eats strategy for breakfast and
transformation for lunch, The Jabian Journal, https://journal.jabian.com/culture-eat
s-strategy-for-breakfast-and-transformation-for-lunch/ (archived at https://perma.c
c/2X9X-DN7T)
3 Bridges, W and Bridges, S (2016) Managing Transitions, Lifelong Books, pp 4–5
4 University of Minnesota (2016) Sociology, https://open.lib.umn.edu/sociology/chapte
r/3-2-the-elements-of-culture/ (archived at https://perma.cc/42LB-L7LN)
10 Ibid, p. 606
11 Homans, G C (1951) The Human Group, New York: Routledge
20 Clark, D (2022) Google’s 20% rule shows exactly how much time you should spend
learning new skills-and why it works, CNBC.com (archived at https://perma.cc/4384
-6NZZ), https://www.cnbc.com/2021/12/16/google-20-percent-rule-shows-exactly-ho
w-much-time-you-should-spend-learning-new-skills.html (archived at https://perm
a.cc/2QXA-3X8B)
PART TWO
Fundamentals of DevOps
CHAPTER FOUR
DevOps architecture
One common element between the world’s manmade wonders
– from the Giza pyramids to the Great Wall of China – is that
none of them were built without careful planning. Too often,
DevOps implementations focus on delivering quickly and leave
out architectural components. Architecture frames
construction, allowing expert craftspeople to deliver
confidently. Without architecture, you can deliver quickly but
lose continuous integration over the long term.
Architecture constitutes a modern system’s core. It includes
everything from conceptual design to the interacting software
and hardware. Software and hardware create the computer
system supporting user interaction. Whether one works from
Windows or Linux, or codes from Python to ReactJS, each obeys
architectural prescripts. This chapter focuses on
communicating architectural basics for the planner or the
programmer and ensures success in building a common
perspective. The chapter covers the following:
Architectural basics
Communicating about architecture
Best architectures for DevOps
Implementing architectural standards
Each topic will include key takeaways to guarantee creating a
confident architecture for your DevOps deliveries.
The basics of architecture
As always, starting with a definition creates a foundational
discussion point. The dictionary definition of architecture offers
multiple explanations: ‘The art or science of building; formation
or construction resulting from a conscious act by unifying a
coherent form; how components of computer systems are
organized and integrated.’1 The most relevant is the third
definition, and the widest the first; the middle section should be
our focus. The architectures we design and employ must fill a
coherent form. These coherent forms create synergy for
developers, operators, managers or any organizational mix.
Coherent forms create clear communication, maintainability
and allow rapid value delivery.
The first architecture sets a foundation for all functions and
applications. Architecture for information technology (IT)
requires three central steps: fetching information, decoding
into a usable state and executing commands. A straightforward
architectural example is the contact list on your personal
phone. Contact lists have various formats, but a central
architecture exists to link data – the individuals – to the
operations – making a call. One selects the phone contact by
fetching data from a memory location and decoding that
information for the user to select ‘John Smith’. The John Smith
entry contains email addresses, phone numbers, perhaps a
picture and associated notes. Then, one executes commands by
selecting a contact method and connecting to John.
These architectural interactions often become complex, but
the core resides in fetching, decoding and executing commands.
These three areas resolve to three essential architectural
functions in the DevOps solution. The first is System Design,
how the hardware, firmware and software interact. Next is the
Instruction Set Architecture (ISA), the embedded programming
code allowing interaction. And finally, Microarchitecture, data
paths for processing and storage. One of the hardest
architectural design elements lies in distinguishing overlapping
items.
When evaluating DevOps architecture, one must learn the
four Cs: cloud, cluster, container and code. These four levels
describe organization for any cloud-native or cloud-migrant
architecture. Every architecture can mirror or change as
operations shift levels. Cloud covers the top virtual space as
AWS, Azure or private cloud instances. A cluster details where
users operate in cloud segments, virtual private clouds (VPC) or
virtual machines (VM). Containers are the cluster-resident,
individual applications and pods. Code is the lowest level as the
ISA directs communication. Moving between levels requires
abstracting commands and interpreting. Abstraction remains
important as any element at these or changing levels should
offer observability regarding inputs, outputs and processes. The
fact that they are abstracted requires translation between
levels.
System design starts with the basics. Despite various
complexity layers, all systems follow one of three approaches:
the Von Neuman, Harvard or Hybrid architecture.
FIGURE 4.1 Von Neuman architecture
UML SWOT
My favourite part of UML is the lack of declared definitions in
building architecture, which feels more comfortable from a
DevOps perspective. While diving into the ISO standards gives
detailed answers, the initial requirements can be addressed
anytime. This makes the core strength the system simplicity.
However, as a weakness, the lack of explicit requirements mean
a less mature team might skip necessary steps. UML presents an
opportunity in that you do not require extensive training to
communicate effective architectures. Most professionals have
designed flowcharts and sequencing diagrams but UML offers a
path to add missing steps. As a threat, one can see where
designing in UML and missing a step could delay or derail a
planned delivery. The next section explores how TOGAF offers
an alternative method to architectural descriptions
Strength: Simplicity.
Weakness: Lack of explicit requirements.
Opportunity: Easy to learn and implement.
Threat: Potential for missing needed steps.
TOGAF SWOT
The strength associated with TOGAF is that it is commonly
understood and globally available. TOGAF provides an excellent
format, especially with more detailed SDLC models such as
Waterfall or Spiral development. The weakness is the depth
required to perform a full TOGAF analysis. The numerous steps
and training required can make it difficult to implement on
time. When using software, the opportunity presented in using
TOGAF is the ability to rapidly scale up and down across
multiple domains with full explanations. Any changes can be
incorporated into the overall pattern. The threat occurs in the
technical detail required and the tendency to allow teams to
specialize in domains or areas rather than understanding the
entire architecture.
Strength: Commonly understood, global format.
Weakness: Depth required to perform initial analysis.
Opportunity: Rapid scaling with software to incorporate
changes.
Threat: Technical detail, tendency to specialize in areas.
While the full table would include all five of our primary
characteristics, this sample shows requirements and priorities.
Then, one can change and evaluate to trade different modelled
technologies for each step. One may decide that files to update
sensors must occur sooner, or with a lower priority. If the
priority decreases, one could see how comparing data between
the sensor and the tabled input within the application might
increase in priority. One could also address user input that
parking spaces were no longer available when they reached the
location so a higher-priority comparison might be needed.
Trading the architectural elements allows feedback in how
the design flows through the process. Remember, architecture
provides the plan to develop functional software. Changes in
the architecture change the priority to develop different parts
of the system. The stable architectural models may require
different coded elements to reach overall success. So, if the
right people and a design to modify the architecture are in
place, the last step becomes the technologies to support
integrations.
A technology list can be exhaustive, but two critical elements
can help expedite those challenges: first, one should have a
software-driven modelling tool, and second, one should
understand the system language for expressing architecture. In
this case, the specific examples are the ArchiMate tool for
TOGAF architectures and JSON as a system language. For other
cases, with UML some common options are Lucidchart,
Microsoft Visio Pro or Gliffy. Alternatives to JSON can include
YAML, XML, SCALA and Rust.
ArchiMate offers an open-source alternative to some of the
more expensive options. The software uses a GUI to design the
architecture, incorporate the utility tree options for a tradeoff
and quickly match between technical and functional
interactions on a common platform. Using a tool such as
ArchiMate increases design observability and allows improved
feedback across multiple layers. One advantage of ArchiMate is
that it allows the opportunity to create viewpoints within the
system to examine an architectural flow from different
perspectives.
ArchiMate is a tool based on Open Group modelling, which
follows the TOGAF standards. The system offers a design based
on layers and aspects. The pre-built layers are strategy,
business, application, technology, physical and implementation,
while the associated aspects are passive structure, behaviour,
active structure and motivation. These are all the same
requirements one uses for TOGAF modelling. As well as these
items, the tool allows construction of relationships between
various items, including structural relationships describing the
static construction, dependency relationships with event-driven
and time-based requirements, and dynamic relationships
where one event can trigger another.7
Sometimes challenges exist in converting an approved
diagram into a technical coding language. The coding language
ensures that those architectural models for communication,
message handling, event-driven triggers and other factors
remain expressed in code. Despite the name, JSON (JavaScript
Object Notation) is a language-agnostic tool supporting
communication between an application and the architecture.
JSON approaches are relatively common within many DevOps
pipelines and subsequent implementations.
JSON is a language standard with a text-based format to
represent structure and is human-readable. It establishes a
lightweight format for storing and transporting data, such as
moving elements between a server and a web page. An example
based on the parking structure appears below:
{
‘Parking’ :[
{‘Location’ : ‘Vernon’, ‘ParkID’ : 1},
{‘Location’ : ‘Vernon’, ‘ParkID’ : 2},
{‘Location’ : ‘City Garage’, ‘ParkID’ : 3},
]
}
This basic example shows the core syntax that data is contained
in name/value pairs and separated by commas. Then, curly
braces hold objects, and the square brackets hold the array. The
data and arrays can then be called within other coded options
to ensure the right connections exist. In this case, parking spots
identified at two locations will link to sensor data or other
factors. JSON allows the same notation as many other formats
with numbers, strings, Boolean, array, object and null data. One
downfall with JSON is that it does not allow commenting;
however, calls from other languages support returning to those
comments for the item requested.
Summary
This chapter introduced the basic architectural design concepts,
including using SOLID principles to support designs. From
there, the discussion compared the UML and TOGAF standards
as a basis for communicating architectures. Since any
architecture establishes a design that others must follow,
communicating those standards must happen. Without a
common language, those architectures become nothing more
than noise for developers.
Establishing that common language led to discussing UML
and TOGAF approaches to designing an architectural model.
This moved us from a thought- and narrative-based structure to
the expressed design where developers and project managers
can compare deliveries to planned architecture. With all these
steps established, we then discussed the importance of hiring
an architect and their team interactions and the Architectural
Tradeoff Analysis Method (ATAM) for comparing architecture to
requirements during those interactions. Finally, we touched on
a modelling tool (ArchiMate) and a programming language
(JSON) to convert all the architectural knowledge into digital
diagrams and expressions. In the next chapter we discuss how
to observe and ensure that a confident DevOps architecture
appears in delivered applications.
Notes
1 Architecture (2023) https://www.merriam-webster.com/dictionary/architecture
(archived at https://perma.cc/4VQL-PREW)
Basics of observability
Observability basics start with understanding the word’s
derivation word and software development applications. The
Merriam-Webster dictionary defines observe as ‘to conform
one’s action or practice to something’.1 This shows the strong
DevOps connotation in suggesting that all actions observed
must be about changing an action to another action. In DevOps,
observing’s purpose is to emphasize feedback to improve flow,
confirm experimentation and drive future actions. Three
standard implementations apply when discussing DevOps
practices with the terms observable, observing and
observability. One frequently hears these terms interchanged
but all reach a different end state. Understanding these
frameworks requires the three pillars of observability – logs,
traces and metrics – to which one should add a fourth category
for measurements.
Before deconstructing the terms, a brief step to align
understanding of the observability pillars is necessary. A log is a
computer-produced output that lists all system actions
chronologically. A trace again appears as a computer-produced
output in chronological order but one following specific system
actions. A metric performs calculated analysis combining
numerous inputs and outputs to allow rapid decisions such as a
table or graph showing all security vulnerabilities on a product.
A measurement assesses raw numbers such as the number of
features produced, work hours or other products possibly
contributing to metrics. Logs and traces are normally included
in standard system features. Some metrics and measurements
are organic to the system, while others must be created or
included from third-party software for unique tasks.
Observable actions are functions producing data in
manipulatable formats. When we watch a sports event, from
football to cricket, we’re watching actions that produce data as
a score. While those scores are not generally manipulated, they
can be reported in a manipulated sense through multiple
sources. The score could relate to the individual game,
collective scores for a group of games, or how that score
translates into a broader league standing. From the software
context, any action producing data is defined as observable. The
output might be as simple as a ‘Hello, world’ response, or a
more complicated answer considering a customer’s cloud
usage. The data typically includes computer-produced functions
like logs, traces, metrics or measurements.
Observing actions means interacting with data at specified
points. This incorporates the individuals and collection tools
used for gathering data. Those observing the actions in the
previous sports match could be scoreboard operators, referees
and assistant coaches. Each individual observes actions such as
offence or defence and records a specific data point, a hit, a
steal or a pass in the relevant format. Some individuals may
collect multiple aspects but the data all compiles to a single
point. Observing software actions occurs through an individual
or function designated to collect at a certain point, or through a
tool such as OpenTelemetry (OTel) or Prometheus. The tools can
be configured to collect data singularly and then modified for
multiple reports to contribute later. Observing software means
reporting an individual output in a trace, knowing where logs
are stored or establishing test coverage.
Observability emphasizes observing outputs from software
development tools and functions. We expect the coaches and
managers to exemplify observability in a sports contest. They
correlate multiple actions, assess the results and create actions
to attempt to modify feature results in the desired way. This
returns to the initial ‘observe’ definition that one attempts to
conform to a standard by observing. In DevOps, an
observability state generates continuous feedback. Numerous
logs, traces, metrics and measurements provide useful data
contributing to observability. When observability is achieved,
you can manually adjust inputs to provide better outputs or
automate to manage routine corrections. An excellent example
is AWS cloud monitoring; the system monitors cloud usage and
adjusts to predicted usage by making relevant changes. This
predicts cloud space needed and can rapidly scale to customer
demands.
The key is the ability to find points to observe. When
watching sports, we already understand the needed
observation points: the batter, the forward, the goalie or the
interaction at a designated space. These may seem more
difficult to find in software development but the same initial
thought governs point selection. Observation points are
anywhere something happens, whether in the system, like
pipeline runs, or external to the system, delivering a completed
product. The more observation points, the more likely it is to
find actionable data. Observation points can help find where
the development process has stalled and then generate flow-
improving feedback. Some common examples of observability
through metrics are in security scans, measuring lead time to
change in submitting fixes to production, or time needed to
recognize and fix a bug in deployment.
The basics are summarized by one of my favourite hacking
stories. The year was 1834, the place was France and the
hackers were François and Joseph Blanc. The goal was to beat
the current-day stock market and the hacked system was the
optical telegraph system. Optical telegraph uses semaphores, a
flag system controlled by manual pulleys, to pass messages
within the line of site. At the time, the stock market in Paris
controlled the pace for other trading cities such as Bordeaux.
This information was communicated through the telegraph but
normal investors only received messages by standard channels
such as horseback, which would take closer to five days than
five hours.
The target was the market rate on federal obligations, futures
for the government investing in livestock or other commodities,
which traded in Paris and Bordeaux. It seemed impossible to
subvert with humans at every messaging stop, but the Blancs
had a plan. Normally, messaging errors would be detected and
corrected in subsequent transmissions but the error was not
removed from the message until they reached the final station
in Bordeaux. The Blancs bribed an operator at the Paris station
to add notable errors to the messaging, then hired an
accomplice to observe those errors through binoculars at the
last stop before reaching the Bordeaux station. This allowed
them to purchase with advance knowledge of the stock prices.
The hack lasted for two years before the bribed operator
became sick and tried to convince his replacement to follow the
hacked plan. The replacement had a crisis of conscience and
turned all of them in. At the time, Paris had no law against
using the telegraph for private purposes but did have one
against government officials taking bribes. The operator went
to jail for years and the Blancs went on to own a casino.2
The key here was the lack of system observability. Corrections
only occurred at the endpoints with corrections included with
the error during messaging. The intermediate operators were
not trusted sufficiently to make changes. The official telegraph
could be observed at multiple spots due to the optical relay.
Those noticing the errors and adding corrections observed the
system but lacked the observability for understanding. Lacking
observability meant that errors could be found and noted but
not remediated until the end. When one thinks DevOps only
observes at designated points to create a metric, the cultural
foundation disappears. DevOps observes everywhere to create
an observability state that increases value delivery. There are
some common observability metrics recommended in the next
section that can help in building the preferred observability
state.
Observability metrics
The first step in observability metrics should be identifying
what requires measurement and how. In the previous section
we discussed establishing observability points. For this section
we will break out collected elements and some initial metrics.
Then visual models and interpretation are discussed to take
initial metrics and create useful actions. Metrics create
feedback to discuss required actions. In many cases, one uses
tools to visualize feedback beyond the raw data and create
graphics summarizing numerous levels. Important throughout
is focusing on metrics creating opportunities for action. Metrics
that are not actionable are merely noise.
System data generally appears from four points: logs, traces,
metrics and measurements. Most sources only count the first
three but that leaves out the DevOps flexibility required for
improvement. One cannot make effective metrics without
understanding measurement. Logs are time-sequenced reports
of system activity. A log sample appears below.
Advancing metrics
When working with consulting groups, one constantly proposes
new solutions to problems. One of my pet peeves when offering
those solutions is that teams frequently respond, ‘Well, how
does that affect my metrics?’ This question wholly opposes the
entire concept of metrics. Metrics exist to measure action at a
point in time, creating observability into interaction points that
are difficult to assess intuitively. If instead you plan activity to
only show the best metrics, you lose all sense of the benefit
effective metrics offer. This section looks deeper into the
purpose of metrics and suggests methods to enhance and
improve results.
Metrics appear in two forms: organic and created. Organic
metrics are measurements contained within purchased
software. Many software options, especially in security and
networking, come complete with ways to measure
performance. When you open a laptop’s system performance
logs, you typically see metrics such as system speed, internet
connectivity and memory used. With a security scanning tool,
you might see connections, software vulnerabilities or firewall
status. These tools are excellent starting places for
observability. The change happens when one makes the next
step to using created metrics.
Created metrics use system tools to focus on observable
factors impacting your performance. These use embedded
functions to expand beyond an initial purchase or
configuration. DevOps depends on feedback and improvement,
and organic metrics offer an initial approach to success within
a framework. Created metrics provide the opportunity to create
your own success criteria. As within the last section, while the
organic metric may provide a deployment frequency, a created
metric can add comparison factors such as coffee consumed,
team leadership direction, time of day or other variables. This
knowledge allows for conducting structured experiments for
continual improvement.
Metrics all start with what is being measured: a time flow,
action steps from initiation to completion or any other
quantitatively measured function. Remember the metric goal is
to drive action, as non-actionable metrics are merely noise.
Noise reduces the ability to effectively communicate by
obscuring the intended message, thus restricting flow and
impeding communication. In creating new metrics, I have used
the following rubric with success throughout my career.
How do you know?
If the first is true, what happens next?
Do any assumptions have to be true for the first to be
true?
Do the conclusions follow when I include assumptions?
Are there any premises needed for the assumptions to be
true?
Compare apples to apples and not oranges to elephants
The first element is the most important. It requires asking
yourself what source collects the data. Some of the normal
applications to route data are those like Prometheus, Grafana,
ElasticStack, Splunk or OpenTelemetry (Otel). Each requires
specifying where the data originates. Metrics observe a point in
time so knowing which sensor acquires which data is essential.
If we return to modifying the deployment frequency metric, we
ask how we obtain information. Questions include if
deployment means code reaches production, if code counts
when first submitted, or if some other measure corresponds to
deployment. You might measure other elements such as the
coffee consumed, which day is the most productive, which
hours of the day are more productive or how team meetings
affect deployment. This first step builds the foundational
knowledge for advancing metrics.
The remaining items are self-checks ensuring you understand
the internal thought process advancing the metric. This starts
with asking what happens next. When collecting a metric,
leading to the action requires asking yourself about the next
steps. If collecting on deployment frequencies, one should ask
the initial question about whether numbers should go up,
should they decrease or are you seeking consistency?
Establishing expectations lets you begin to shape potential
actions.
Once the collection point and the expectations are set, the
next step to advancing a metric will be identifying assumptions.
Here, and elsewhere, in discussing assumptions, an assumption
is a thing that must be true in order for the main metric to be
successful. Returning to the deployment frequency example, an
assumption might be that all submitted code appears in GitLab.
One would then examine associated team processes to ensure
teams were not writing code in other IDEs, consolidating in
other environments and only committing to GitLab when
finished. This would show delivery but lack the information
about how long something was in progress before completion.
Moving code between different environments and then
submitting in an alternate could create large differentials.
The fourth question addresses those differences. If the
assumption remains true that all code is in GitLab, then the
conclusions about the metric would hold true. If you discover
that teams are using other means to commit code and deploy
code, then the first conclusion that deployment frequency is a
useful metric might not be accurate. In another variation, if we
expand the metric to report coffee consumption related to code
production, the true assumption may be that all coffee is
consumed from the company kitchen. If developers obtained
caffeine from Starbucks before reaching work or preferred
energy drinks, we may need to include multiple measurements
to advance the metric.
Obtaining multiple sources of data, and alternate locations,
leads to the fifth question: are all the premises around the
conclusions true? Similar to identifying initial assumptions
around collected data, those same considerations then apply to
the second level. This verifies that we are advancing the metric
correctly. From a DevOps perspective, this question creates
feedback for our metric flow. Having reached this point, you
can be confident that the metric is stable, supportable and
creates an actionable result.
The final question should be a comparison: apples to apples,
not oranges to elephants. This means the compared metrics
must evaluate equivalent processes. Apples are similar to other
apples, even if the colour, shape or taste is different.
Substituting one apple for another apple yields roughly similar
outcomes. When starting with oranges and elephants, both
have rough skin, possess a navel and are wet on the inside but
after that, the differences expand exponentially. The equivalent
comparison is the most important part of comparing an
advanced metric. One can draw together multiple metrics that
complicate comparison, but at the base level, comparisons
should be equivalent.
In our example, if we are gathering deployment frequency
data, comparing deployments to revenue generated may be a
step too far. Revenue has multiple elements involved, and
simply comparing rough code, or even production code weekly,
to revenue generated weekly might incorrectly evaluate. An
effective comparison might be deployment code frequency to
coffee consumed or errors submitted into production to show
effective correlation. One might see that more frequent
deployments of larger code elements result in more errors or
more frequently deploying smaller elements results in fewer
errors. In this case, we ensure apples to apples and not oranges
to elephants.
The last thinking element when considering metrics should
be two statements:
correlation does not equal causation
absence of evidence does not result in evidence of
absence.
In the first, when two or more metrics appear to be tracking
relatedly – increased caffeine consumption results in higher
deployment frequency – we should not assume one causes the
other. The two may be correlated, but the causation may not be
accurate at a certain point. If I simply increase the caffeine
consumed, at some point I will likely be sending developers to
the hospital, which ultimately delays code deployments.
The other statement, absence of evidence is not evidence of
absence, suggests that if I do not measure something, I cannot
assume that it does not exist. In continuing with the previous
example, I could assume more caffeine makes my developers
more productive and thus happier. However, I should not
conclude they are happier unless it is measured. In a more
software sense, it could be not scanning for security because
the standards require secure code to reach the same place. The
evidence must be the scan, so if no scan exists, I cannot simply
conclude all the code is secure.
This model for advancing metrics has worked for me for
years. In the best DevOps sense, I suggest trying it out with
some options. At the best stage, it provides a checklist to ensure
you have adequately thought about all requirements. It even
enables the decomposition of different elements. In a practical
sense, when a customer asks why they are not producing
effectively, they may be skipping required details or not
evaluating assumptions if they are only using DORA metrics.
The next section discusses finding the right people, establishing
processes and implementing the technology that supports
observable DevOps.
Observable people, process and technology
Measuring any DevOps process from an observable standpoint
requires finding the right people who think about observable
outcomes, understanding an observable process and then
working with technology supporting the vision. People drive
technology through a process that shapes people to change
technology. Each of these elements can be critical to a
successful implementation.
Observable people
Analysing observable people relies on who needs what, and
when. Developers’ needs are feature and deployment
information, leadership tend to focus on value, cost and project
success, security folks like compliance and vulnerability
standards, and operational individuals see detection, recovery
and task time. The important part is not just understanding
individual connections to metrics but how you understand flow,
feedback and improvement across multiple metrics.
Two factors can help understand how people view metrics:
the first is Conway’s Law and the second is the Westrum
Patterns. Conway’s law states, ‘Any organization that designs a
system (defined broadly) will produce a design whose structure
is a copy of the organization’s communication system.’3 In
application then, tightly coupled, highly hierarchical
organizations produce hierarchical designs, while loosely
coupled, collaborative organizations produce collaborative
designs. Metrics allow us to observe system interactions in
comparison to organizational imperatives. The first step in
people interactions is understanding the organization and
creating metrics emphasizing personal outcomes rather than
organizational imperatives.
As an example, if a metric reflects deployment, but the
hierarchical structure requires three levels of approval,
deployments will always be delayed. An organization
emphasizing a quicker deployment but fewer checks may
deploy more bugs into production. People need to understand
organizational limits and how metrics reflect those outcomes. It
can be a great idea to introduce new observable points but
those that create the most effect for another organization might
not work for you. Thus, metrics create actionable points, having
people report the known true, and the items likely to continue
to be true, is not an effective measurement.
The other element for people management with observability
is the Westrum patterns. In an academic article, Westrum
identified three typical organizational patterns: pathological,
bureaucratic and generative.4 Examples of the three appear
below:
Pathological – power oriented: low cooperation,
messenger shot, responsibilities shirked, bridging
discouraged, extensive scapegoating, novelty crushed.
Bureaucratic – rule oriented: modest cooperation,
messengers neglected, narrow responsibilities, bridging
tolerated, failure leads to justice, novelty creates problems.
Generative – performance oriented: high cooperation,
messengers trained, risks shared, bridging encouraged,
failure leads to inquiry, novelty implemented.
We can quickly see how the preferred DevOps culture fits in the
generative model. At the same time, some challenges with the
other models can be identified through metrics to solve issues.
For example, the bridging problem can be addressed by looking
at inter-organizational communication or failure items from
team retrospectives. Westrum further highlights how the
models connect in response to organization inputs. These items
appear below:
Suppression – stopping messages from reaching broader
levels.
Encapsulation – isolating the messenger.
Public relations – minimizing impacts through associating
context with the message.
Local fix – fix immediately but fail to share impacts or
solutions.
Global fix – respond to problem where it exists and
highlight related problems.
Inquiry – identify the root of the problem.
Identifying how an organization responds to messages along
the path allows for assessing how observability benefits
outcomes. Good DevOps people use selected metrics to track
how issues occur and are resolved. Selecting the right people to
manage observability requires finding ones who fix problems
and inquire to prevent future problems. Chapter 10 discusses
incorporating inquiry through a system reliability engineer
(SRE) to provide a consistent, trained messenger, encourage
novelty and manage inquiry. Having individuals who
understand and implement metrics helps organizations support
effective, observable processes.
Observable process
In his book Measure What Matters, John Doerr suggests a macro
process used in conjunction with metrics known as OKR
(Objectives and Key Results).5 This process suggests every work
section should identify objectives that must happen and the key
result needed to accomplish the objective. This follows the first
two questions from earlier: how do we know and what happens
next? These OKRs flow from strategic to tactical and then down
to the operational level. Every element develops its own
objectives and results. A sample that includes improving
deployment frequency appears below with objectives (O) and
key results (KR).
Strategic (Company) – O: Deliver more and better products
than last quarter to the customer
KR: 10 deliveries are made to the customer
KR: Customer repurchases product in the following
quarter
Tactical – O: Deliver more operating systems to the
customer
KR: 4 operating systems updates delivered to the
customer
KR: All operating systems meet manufacturing
standards
Operational – O: The development team readies an
operating system for deployment
KR: 6 features complete in new operating system
KR: Operating system meets operational benchmarks
for speed and memory
One can see how OKRs can scale from one level to the next.
Each key result would then be assessed as red for failed to meet,
yellow for almost met and green for met. These provide a
qualitative look at where one might include some metrics as a
quantitative measurement. One can see how the deployment
frequency at the operational level scales into the overall
company. Implementing a similar process improves corporate-
level messaging, demonstrates key deliverables and provides a
launchpad to establish technical metrics. One other way to split
these responsibilities is through using a RACI matrix.
‘RACI’ stands for responsible, accountable, consulted and
informed. It suggests the four levels for observable individuals
who drive specific metrics. The matrix highlights the four
categories across the horizontal axis and the vertical axis holds
the various tasks. A responsible individual is the person tagged
for a specific deliverable while the accountable person holds
the combination of responsible items, such as a programme
manager. The consulted individuals provide guidance along the
way and informed merely means those in the communication
process.
Returning to our deployment frequency example, if the first
item on our vertical list was feature development, the team
would be responsible, the team lead would be accountable,
security and platform engineers might be consulted, and
informed would be management. If the task was feature
security, then the security engineer might be responsible, a
security manager accountable, the development team consulted
and management remains informed. These matrices can be
built from either an individual perspective with a marking in
the column or from a team perspective where the name or
position is located in the column. In either sense, it creates a
process through which one can identify observable points
across an organization to measure success.
Observable technology
When you consider observable technology, the first element is
visualizing observability results. We mentioned graphics and
comparisons earlier in histograms, control charts and
dashboards but technology converts from a manual build to
automated visualization. Prometheus and Grafana are two of
the more popular ways to visualize network activities. These
can be enhanced by functions like Open Policy Agent (OPA),
Open Telemetry (OTel) or Open Worldwide Application Security
Practices (OWASP) items like the Zero Attack Proxy (ZAP) and
PurpleTeam.
Prometheus offers a free software application to record
metrics as a time-series database with an HTTP pull model for
flexible queries and real-time alerting. The core application was
written on Go with source code available through GitHub and
graduated from the Cloud Native Computing Foundation. An
instance will use multiple exporters to transfer metrics where
Prometheus then centralizes and stores the applications. An
alert management tool can then notify individuals when
metrics hit certain levels.
Once a tool like Prometheus has been included, Grafana, as a
multi-platform interactive visualization tool, can be included.
Either Grafana or Prometheus can be used in either a health
check, pull for data, or heartbeat, pushing data, or
configuration. The tool offers combining data at a common
location rather than requiring data to be sent to third-party
locations for analysis. This allows data protection across
multiple deployments. The tool offers common, organic metrics
with options for creating additional panels to visualize your
organization’s unique options.
OPA is my personal favourite for setting various monitoring
rules through a side-car connection to containers and
applications. It provides an open-source policy engine driven by
REGO as a high-level declarative language. The side-car
implementation makes OPA effective in controlling
microservices, CI/CD pipelines and API gateways. The end goal
decouples decision making from enforcement. Any service or
event request communicates through JSON to OPA and then
activates the Rego element to evaluate and return JSON data.
This data then triggers alerting services.
OTel offers specialized protocols for collecting telemetry data
and exporting to target systems and is also a CNCF project. The
tool instruments code through various APIs to inform system
components about what and how to collect metrics. The system
pools the data, samples and filters to reduce noise and errors,
and then converts for easy export. Telemetry appears in time-
based batches directly, from the service, or indirectly through a
combined service method. Many prefer OTel implementations
as they offer a common data standard collection across
applications, whether in a legacy, micro-service or
containerized approach, as long as the APIs are accessible.
OWASP offers an alternative source than CNCF to obtain
observability tools. Many tools are focused on either
preventative or reactive security outcomes. ZAP automates
vulnerability scanning and supports options to conduct manual
penetration testing for deployed applications. This allows for
actively tracking vulnerabilities, especially Common
Vulnerability and Exposures (CVE) events. CVE is a commonly
used security term to log new problems with existing software
and offer fixes. The PurpleTeam option allows testing of
security regression within pipelines, demonstrating results
when development reverts to an earlier version to ensure gaps
remain closed. Again, most OWASP tools can be used with OTel
to report results, OPA to manage policy and Prometheus to
gather results before displaying in Grafana.
The list of observable tools and options offered changes
constantly. New companies constantly deliver new alternatives
to explore market gaps. The key to observable technology
should be finding the best solution for your processes and
experimenting. This matches DevOps standards to ensure
constant observability, drive feedback and kickstart
improvements.
Summary
Observability remains key to the DevOps process. Without data,
you cannot guarantee flow exists, feedback appears and
improvement actually creates success. The first step for
observability was understanding the observable framework for
DevOps functions. Once basic observability has been solved and
agreed upon, you can move towards deciding the best metrics
to measure progress. Alternatives are available even when
working with a basic standard, such as the DORA metrics. These
alternatives depend on understanding your own progress and
having a good framework to propose new measures that make
the most sense.
All successful outcomes then depend on finding the right
people, implementing a process and using effective technology.
Observable people means knowing how individuals interact
and the key signs supporting generative rather than
pathological organizational cultures. Processes enhancing
observability often use elements such as OKRs and RACI
matrices to create visibility between layers. At the initial
implementation, when humans drive technology, one must
select tools to gather and display outcomes. Once selected, these
tools are also subject to feedback to improve your confidence in
DevOps observability.
Now that we understand observability, we can begin the
journey in DevOps deliverables. Those deliverables start with
understanding and building pipelines and so the next chapter
explains how pipelines function within DevOps and key factors
to pipeline success.
Notes
1 Observe (2023) www.merriam-webster.com/dictionary/observe (archived at https://
perma.cc/R4PF-SXSY)
What is a pipeline?
Sometimes, the simplest concepts are the most difficult to
understand. In the mathematics field, using zero took hundreds
of years to develop. People understood additive processes, how
to add one and one and get two and multiply two by two and
get four, but multiplying four by zero to receive zero was
challenging. The first recorded use of zero by a culture was by
the Mesopotamians (3 BC), followed by India (5th century AD),
and reaching Islamic countries to drive algebra and Arabic
numerals later (7th century AD). The mathematical concept of
zero as a written expression did not reach Western Europe until
much later (12th century AD).1 In the same way, pipeline
processes have been around for years but the ability to
implement automated pipelines continues to lag in
developmental organizations.
The pipeline is a processing chain where the output of each
element generates the input to the next step. A software
pipeline can be used in various system places. Pipelines can
help when using multitasking OS to monitor and return data
compared to data rewritten by an upstream process. This
pipeline type ensures users only receive accurate data; failing
the pipeline would mean rejecting either the user read request
or the upstream request to change data. An easy example
would be a financial transaction; the user requests a purchase,
the purchase is checked against available funds and then a
successful pipeline promotes that purchase to the third party.
The pipeline then writes back to the user account to adjust
available funds. All these elements launch as a processing
element chain, a pipeline, once the initial request appears.
When discussing DevOps pipelines, they focus on
development and operational tasks. While pipelines can assist
operations, they typically resolve development and deployment
issues. This means the pipeline handles testing and promoting
code to an environment as a series of artifacts. The shorthand
name for these environments is dev, test and prod. Writing
code happens in dev, testing for sufficiency happens in test, and
prod contains the production code, the applications consumed
by the user. DevOps teams typically use this schema to prepare
deliveries. Some teams use a pre-prod to conduct further tests
while others employ prod as staging for user-accessible,
commercially ready deployments. A typical pipeline would
resolve code with initial error checks, verify dependencies,
conduct automated tests, package for deployment at the next
stage and provide results.
A pipeline in software development should be automated,
observable and allow cohesive work across multiple logical and
physical spaces. When designing, loops run in multiple
directions while a pipeline moves in one direction. This benefits
our pipeline process as artifacts are created at each step but
cannot advance until meeting pipeline requirements. Pipeline
steps are targeted through gates that enact checks for certain
functions. Each gate expresses requirements that allow
advancement to the next pipeline phase with more gates. The
final gate shows an overall pipeline pass and allows for moving
the artifact package to the next environment. This can be
challenging as when discussed in metrics, requirements are set
to allow faster advancement rather than verifying the best
code.
In my early days as a consultant I worked with a team new to
DevSecOps processes in connection with pipelines. After the
initial onboarding, they called me on the following day and
announced they had solved the security issue regarding
pipelines, and did not understand why so many found it
difficult to accomplish. I headed over to their spaces where they
showed that every code processed had passed a pipeline,
returned green for security and been promoted to the next
phase. I expressed my enthusiasm, then asked to see the
security logs verifying the improvements and risk mitigation at
every step. This returns to the concept that pipelines produce
artifacts at every step. At this, I received blank looks from the
group.
When we looked at the pipeline, and the requirements at
every stage, the security scanning tool had been set to ‘True’.
Testing processes can use integer, string, Boolean (like true) or
other expressions to verify success based on what the artifact
should return to pass the gate. This meant that as long as the
security scan ran, the pipeline would approve the step. In this
case, the team was not generating a comprehensive artifact
such as a log, just verifying scans run with the ‘true’
requirement. We fixed the issue, set the logs to generate and
then used a difference tool to measure current security versus
the product baseline through comparing the logs. It all points to
the variation one can have in pipelines. Setting every pipeline
step to simplify verifying occurrence lacks depth in
understanding that the step took some action, and prepared the
code for the next element.
One final element to remember about DevOps pipelines is
that having a single pipeline that works for some code should
not be the end state. Multiple pipelines allow tailoring elements
for different languages, changing environments or to sample
different test parameters. Programme managers often express
thoughts about having a pipeline so why would they need
another? The only limit to how many pipelines one can run is
the compute (processing power) and store (memory) available
on your environment. Pipelines can be generically constructed
to handle wide needs, promoting basic code, or specifically
tailored for certain constraints such as publishing containers
for use in edge environments or hardening containers for
security considerations. Understanding pipeline basics helps
one advance to the key points required to set effective
pipelines.
Source control
Source control means using tools and process to store and track
changes over time. Everyone has seen the file labelled
‘CompRevenue_Sales_Qtr3_v3_012323’ which tells everyone,
supposedly, what the file is, which version and when that
version was committed. Source control should designate the
initial item being selected compared to the changes highlighted
through version control. These controls can be attached to the
file as part of the artifacts rather than simply included in the
initial name. DevOps pipeline techniques and tools identify
sources of truth for good code, good documentation and
security policies. The initial thought restricts software
development source control to functional code but you should
consider the advantages in multiple areas. Knowing where
good code lands is important but having access to previous
failed attempts can also generate learning. More control means
channelling communication, not through silos but through
multiple methods to allow better control and create
observability.
In a corporate experience, the company used GitHub not only
for code but for all company documentation. This caused an
initial training hurdle for those not familiar but was adopted
quickly. GitHub provides an open-source option for gathering
data from documentation to code. It also provides the option for
public and private repositories. The company worked merging
code and submitted architectural decision records (ADR)
through GitHub even for items like changing vacation policy.
This allowed everyone to see both the current policy and past
policies, and who committed the changes. This established a
workflow with automated pipeline steps, even for
organizational documentation. Source control allows
accelerating bug fixes, simultaneous development and
increased reliability.
Source control can also benefit your non-functional pipelines.
While the primary pipeline use delivers functional code,
security, platform and other teams can create a non-functional
pipeline to control delivery. Examples could include a risk
mitigation pipeline, tool ownership or non-functional testing
requirements. A risk mitigation pipeline could be launched
through questions involving personnel or technology change,
test coverage standards or even technology change. Designating
a pipeline highlights the steps, in a checklist fashion to resolve
challenges before the next step.
A non-functional risk pipeline could ask questions about
socialization through whether stakeholders were aware of
issues, risk levels compared to company resources, probability
standards and consequences. Each step would have a reference
link and approval statuses. This allows source control over who
verified the risk, what steps were taken and what impact the
risk might currently have. A simple example is tracking
dependencies through software asset control by knowing which
systems run which third-party software and where those
systems are located. The functional pipeline could then
reference whether code had passed the risk pipeline through
the existence of tailored documentation, and the establishment
of standards. Having all teams establish a pipeline-based
process improves the cultural connection through creating
shared awareness about the methods and means necessary for
DevOps success.
You can see the first pipeline stage builds and the second
deploys here. Each stage calls the lower-level job to handle
acceptable steps. You can use multiple jobs in a stage if
necessary, such as if you had to build for two different
environments, or for a job that builds code, and then another to
check for vulnerabilities. Most tools supporting pipelines offer
multiple examples in documentation to start building first basic
to then more advanced options. Most pipeline scripting tools
use some variant of the stage, job and step methodology while
allowing for some textual and sequential variations between
versions.
One key to pipelines involves automation; each step should
launch the next step, each completed job launching the next job
with each set of completed jobs within a stage launching the
next stage. These bring massive automation benefits through
using telemetry tools to build effective metrics. Metrics can
show automation efficacy through how long pipelines take and
how successful are the completed deployments, software
quality through passed bugs or vulnerabilities, and automation
utilization through how often certain pipelines are used over
manual builds. Each improves the ability to deliver customer
value.
Another technical benefit to pipelines relies on adding
security jobs. While success in building and deploying does not
require security functionally, adding in those caveats can help
verify builds. There are four basic security areas: code
coverage, vulnerabilities, container security and dependency
scans. Each area helps to build a comprehensive coding picture
by contributing some security aspect. With any scanning tool,
multiple variants exist and selecting the right tool depends on
what best fits your performance and cost needs at the time.
Typical code coverage scanning employs a tool to remove
critical bugs, frequently including those not known to the coder,
from the early development stages through unit testing.
Common examples of code coverage tools include SonarQube,
Gradle, JUnit and Visual Studio Code. The different tools are
normally limited to certain coding languages. Unit testing will
be covered later but assesses the smallest possible element
before moving to the next. The different coverage types are
listed below.
Function – the number of functions called, alignment
between used and unused functions, and if all named
functions are used.
Statement – how many statements are executed in a
defined program.
Branch – the branch number executed from control
structure to show streamlining in the programming
process.
Condition – how many Boolean expressions (True, False,
Null) are executed in the code.
Line – how many lines tested in the source code.
Vulnerability scanning moves from the initial test to compare
against known bugs, problems with code or dependencies in
the written expression. These errors can sometimes have
published fixes that merely need application. Common tools
used for vulnerability scanning include Nessus, Tenable and
Nmap. One interesting software note – as companies move from
open source to paid services, sometimes early versions get
forked by the open-source community, resulting in a slightly
different free version with similar properties. OpenVAS offers
an example of code forked from the Nessus code with slightly
different functions and properties. Many DevOps practices use
multiple vulnerability scanners at different elements to
establish complete coverage.
NMap also offers an interesting point in vulnerability
scanning. While nominally a network discovery tool developed
for Linux, the tool supports additional packages for other
operating systems. Using NMap can quickly reveal the hosts,
services and operating systems available from a particular
device. My own early cyber security practices extensively used
NMap in an isolated case to discover vulnerabilities. It also
should be noted that using NMap against a system you do not
own can actually violate several laws. It makes a good point
that sometimes even practising with tools can require
substantial knowledge. NMap’s use in a pipeline would be to
validate that certain required ports or areas were open or
closed as the jobs built through different environments.
Container security again looks for a specialized security
application. Many pipelines build multiple containers,
deploying as a micro-service or within a service mesh so
understanding interaction vulnerabilities can be critical.
Container scanning tools should look for code vulnerabilities,
network connecting, test source code before and after
deployment and verify access. If you remember that the entire
system runs cloud, cluster, container and code, then the
container is a step up from code but does not allow for testing
the entire environment for vulnerabilities.
The pipeline container check verifies that the container does
not hold any inherent vulnerabilities and has not changed
within the pipeline. As an example, one may want to deny
access to a particular IP; however, the pipeline may change
those IPs to focus on overall connectivity. In this case, the
container scanning should flag the difference between the
committed application and the deployment. Common container
scanning tools include Anchore, Docker Bench, Twistlock, Tena
ble.io Container Security.
Dependency scans are a backwards-looking process to ensure
everything called by packaged code is accessible and functions
as required. I call it backwards as it looks from code execution
to external items required to make code execute. Again, many
of the pipeline tools available offer different scanning options.
For example, SonarQube and GitLab both offer dependency
checks. Dependency checks may also be called a software
composition analysis (OCA). OWASP offers a dependency-check
tool to find publicly available vulnerabilities located in
dependences. The goal of a dependency scanner is to prevent
any known vulnerabilities from migrating to deployed code.
Knowing the technical elements required to construct a
pipeline, there remain only a few steps. Just like the stage, job
and steps, the next stage requires finding the right people,
establishing cultural processes and determining the technology
to support advancement.
Pipeline people
In other elements, you can hire architects and software
engineers but as pipelines are pervasive throughout DevOps,
selecting a pipeline engineer might not be the best fit. Instead of
a pipeline engineer, the best fit may be a CI/CD engineer who
focuses on CI/CD precepts instead of simply building pipelines.
This individual would look not just at the current pipeline path
but also at the long-term business goals to achieve success.
These people should be strong communicators with keen
analytic skills who enjoy decomposing complex processes.
These traits lead to building teams who are proficient at
optimizing multiple pipelines rather than making the pipeline
better.
The duties for a CI/CD engineer should be to develop the
effective CI/CD principles through automating observable
pipelines for multiple functions. The title emphasizes the CI/CD
aspect but this continues through operational pipelines to
create feedback and maintain pipelines to automate upgrades
without consuming significant developer cycles. CI/CD
engineers manage tools to observe, maintain and iterate CI/CD
tools and platforms. This should extend from choosing the best
tool for a pipeline step or job, to those tools allowing smooth
automation from start to finish. They should be constantly
exploring by testing developmental and operational boundaries
through technical experimentation.
There are some technical skills associated with a CI/CD
engineer. They should be script-writing experts who can
interpret and write source code in multiple languages. These
skills then extend their ability to manage infrastructure,
working with architects to establish dev, test and prod
environments supporting multiple pipelines arching across the
various domains. They must know when a pipeline is the best
solution and when a temporary manual promotion might still
advance the experiment. This requires familiarity with
software packing and version control tools. Additionally, the
individual should be familiar with security tools for
vulnerability analysis, monitoring and code coverage. Above all,
the individual should be dedicated to process observability,
ensuring transparency across all CI/CD elements.
Pipeline process
Two main areas define initial pipeline processes: code quality
and continuous monitoring. These two elements should define
how any organization’s pipeline processes are established as
they lead to the third process element: continuous integration
and continuous delivery (CI/CD). If your process focuses on
developing these practices, you can accelerate the pipeline-
provided value.
Code quality should be the first and last consideration for all
pipelines. Every pipeline element depends on the previous
element; if the basic code is bad, then the whole pipeline can
fall apart. Extensive testing strategies appear in the next
chapter but some initial tips will help focus those discussions.
Code tests start with testing the smallest possible logical
elements. Multiple tools offer solutions on different coding
languages; sometimes using multiple tools can be helpful. You
should experiment with tools to find the best fit for your teams.
Effective unit-based code testing allows one to quickly
accelerate to static and dynamic testing tools.
Basic code evaluation allows for establishing process metrics.
Some initial metrics should be that fewer than 400 lines should
be examined in a set. The realistic expectation for manual code
reviews is to evaluate a maximum of 400 lines an hour.
Incorporating automated tools accelerates these timelines and
further advances static and dynamic testing practices.
Those initial tests lead to continuous monitoring processes.
All pipelines should be observable but continuous monitoring
ensures those elements do not go into a log blackhole but
alertable exceptions are immediately pushed to users. Pipeline
monitoring should be set for three major factors:
when something happens
when something that is expected to happen does not
happen
when policy requires it
Pipeline outcomes can drive each of these. In the earlier
security example, the pipeline can be set to report when
security scans complete, and then for discrepancies. This allows
for monitoring to highlight fixes or simply note what is being
released.
These quality and monitoring tools drive CI/CD success. To
integrate constantly, you need to know when quality coding
elements are submitted. Pipelines provide the process
verification. Then, in turn, code with quality integration can be
delivered with minimal risk. The pipeline process allows one to
rapidly identify where the process stopped and what actions
must occur next. These processes can be integrated into the
overall pipeline technology.
Pipeline technology
Pipeline technology basics appeared previously when we were
discussing internal processes. The next step determines what
technology you should select to model pipelines. As with other
tools, the right pipeline process can be done manually or
represented by various options. I prefer DevOps platforms that
support multiple tool interpretations and allow integration of
multiple languages such as YAML, Terraform and JSON.
A key technological pipeline element should be visualization.
The earlier elements showed the inclusion in code and the
source integration. Those elements should be further expanded
to allow for visually interpreting pipeline goals. The simplest
pipeline visualization appears in Figure 6.1.
FIGURE 6.1 Pipeline visualization
Summary
Pipeline variations and manual promotions offer one reason
why testing appropriately is a critical DevOps methodology.
Pipeline processes begin with understanding the reason behind
using pipelines while maintaining effective source control and
methodologies. The next step lies in understanding how
pipelines are built. Finally, we discussed the emphasis on
selecting the right people to maintain processes through the
correct skill combination. These people can be empowered by
effective processes and efficient technology, especially in
visualization.
Pipelines can best be optimized through effective testing
procedures. While we separate tests into a different category,
each pipeline stage implements one or more tests to produce
artifacts. In the next chapter, the discussion focuses on what
tests can be performed, selecting tests for different results and
testing in operational environments to maximize security and
customer value.
Note
1 Kaplan, R (2007) What is the origin of zero? Scientific American, https://www.scienti
ficamerican.com/article/what-is-the-origin-of-zer/ (archived at https://perma.cc/9G
HG-RLQ9)
CHAPTER SEVEN
Testing the process
Critical to building an efficient DevOps mentality is continuous
testing. Testing produces artifacts and creates observable
interactions. Testing different aspects and areas varies from
initial unit testing to complete functional integrations. In
addition to development testing, testing can be anything from
security testing like penetration tests to optimizing operational
methods with chaos engineering. One vital contribution offered
by testing is integrating continual testing and monitoring
through various tools. These tools support a universal approach
to observability and creating shared artifacts supporting future
development. This chapter explains testing from the basics to
the most advanced levels and demonstrates some useful
frameworks:
Testing basics
Determining testing use-cases
Establishing continuous testing
Driving improvement through testing
As with several previous chapters, the final section discusses
people, processes and technological solutions for testing
success. Finding the right individual with a testing mindset can
either block or be vital to a practical test process. All testing
elements are vital to confident DevOps.
The basics of testing
The first step to understanding testing clarifies our definition. If
we revert to the dictionary, test means ‘the procedure of
submitting a statement to such conditions or operations as will
lead to its proof or disproof or its acceptance or rejection’.1
Regardless of the level, every test seeks to prove or disprove an
item and communicate those results. The most basic tests check
items like the correct coding format, while the more advanced
ones examine connections to external software or proper
messaging formats between the two items. Each testing aspect
uses a different syntax and approach to obtain results. The next
step defines testing types and their attached solutions.
A generic term refers to most tests. Sometimes, the term
explains the function, but some cases require additional detail.
A list of testing types appears below but each receives
additional detail in the following paragraphs. One must
distinguish between generic development tests like a unit or
integration test, more specific applications like Static
Application Security Testing (SAST), and performance testing
like load, spike or stress testing.
Unit – smallest piece of code
Quality – subjective assessment
Code review – usually manual instance of paired
programming or pull request review
Static analysis – pre-run debugging
Data – check information with which the code interacts
End to end – runs workflow through application from start
to finish
Regression – runs functional and non-functional tests after
change to verify software
Functional – quality assurance testing based on black box
formats
Smoke – determine if ready for the next testing phase
Dynamic – finding errors in running programming
Red/blue – manual tests using teams to find errors
Operational/user – brings in users to check for human
errors and usability
Types of testing
Unit testing is the first step for all continuous testing practices.
These tests feature assessing the smallest possible units of
codes. A common analogy would be the spell checker in a word
processing program. As a spell checker, unit tests assess a
specific language such as Python, React, Go or others. Integrated
development environments (IDEs) often have unit testing
integrated so users can correct before code reaches pipelines.
Building unit tests into the pipeline ensures potential errors
resolve and eliminates easy missteps.
Code quality and code review are manual practices. These
remain manual as they are incredibly subjective. Code quality
establishes value-based metrics, including ensuring sections are
commented, that code blocks run smoothly in a determined
order or that code functions only occur once. For example, if
code needs to add an item multiple times rather than adding
distinct components every time, good code calls the initial
section. These include using libraries for standard functions
rather than defining each item as it occurs. These distinctions
happen as code advances, such as the machine learning
libraries available for NumPy, Pandas, Scikit-Learn, Tensorflow
etc. These elements allow others to build the simple code to
create a graphic display and your code to use the data.
Static analysis moves on from those initial stages to verify
that the code runs as written without other inputs. These types
of tests use input from previous tests and verify successful code.
The most critical element would be that the code is not running
during testing. Static tests ensure all identified variables are
called, FOR loops end and identified functions are correct. Some
standard tools that can help with static analysis are SonarQube,
Veracode and Fortify.
Data testing verifies the essential information for program
interaction. There are three types of data testing: standard,
erroneous and boundary. Standard testing prefers an expected
input to generate an expected output; when the user enters
their username and password, they are checked against the
database and admitted. Erroneous testing performs the
opposite case and looks to ensure that if wrong input occurs,
the system recognizes and rejects the request. If the same login
function requires creating a password with specific character
numbers, the system rejects attempts that fail to meet
standards. Boundary testing looks for edge values at opposite
allowable spectrum ends. A boundary testing example could be
gender on surveys; if fields only allow certain values, then the
system ensures only those values are possible.
Effective data testing secures software and prevents future
vulnerabilities. Standard Query Language (SQL) injection is a
hacking tool that can disrupt effective tools. This attack uses
malicious SQl code to send database entries that can execute
commands. Rather than the data reflecting the input, when
systems read injections, it executes a command. For example,
the query could direct a system search to delete the table rather
than returning the desired field.2
End-to-end (E2E) testing continues to progress testing. A good
CI/CD pipeline includes some E2E testing to ensure correct
operations. This testing measures unit functions, conditions and
test cases. E2E is essential in conducting test-driven
development (TDD). TDD suggests that a test exists to verify
desired functions before writing code. All TDD evaluates as a
three-step cycle: red, green and refactor. First, the red tests
evaluate the simplest test that could fail. Green tests solve the
red problem by creating the simplest code, solving the problem
and passing tests. The last step refactors overall code, going
back through all material, solving any problems or trash
created when solving the red problem.
Regression testing evaluates whether software can return to a
previous state without lingering changes. If you think about
common patching and upgrade cycles, sometimes software
works well in tests but creates different errors when installed
by a user. This test area ensures that regressing to a previous
version does not leave new changes within user spaces. For
example, if a new version calls out to new libraries, and those
libraries create errors, you would verify that returning to the
previous software does not repeat the same mistakes.
Regression is a common solution for operational teams when
errors mount after a new upgrade. Regression testing ensures
that those operational practices will be successful.
As the name implies, functional tests test for functions within
the programs. To run a practical test, one sets expectations for
what running the program achieves. Each test should have
clear inputs and expected system outputs. At this point, one
uses mocks and stubs to create results. A mock creates a fake
class to evaluate in surveying the interaction with the tested
class. A stub functions as a database connection to simulate
scenarios without live databases. If a test function involves
medical data or personal information, stubs provide significant
compliance benefit over using actual data.
Smoke testing applies a new practice through testing if code
versions work before advancing.3 These tests can be either
manual or automated to find delivered code gaps. The process
starts when a development team delivers a new application
build. The term originates from physically testing equipment
for approval if the equipment did not ignite when powered on.
QA teams perform these tests to ensure no simple but severe
failures exist by applying test cases evaluating software
functionality.
Dynamic testing verifies software code’s changing behaviour.
Dynamic tests evaluate the software’s response to changing
inputs affecting overall behaviour. Code must run sequentially
to pass dynamic testing. This testing typically occurs as either
white box or black box testing. When the user knows the
existing code, the testing is white box. Tests verify how systems
perform based on the code. In black box testing, the internal
configuration is unknown. Black box testing is also known as
user acceptance tests. Testers attempt to enter different data
types without access to code to identify errors. White box
testing requires engineering knowledge but any user can
conduct black box testing.
The last two testing types require active participation.
Red/blue tests occur at the system level and select an attack
(red) and defence (blue) team. Red/blue reveals vulnerabilities
not apparent during other tests. Operational and user tests
present the software to users and identify preferences.
Sometimes the system buttonology might work effectively
through test cases but be ineffective for users. Examples
include drop-down menus, items arranged logically but
different from user preference, or commonly used items that
might require selection at every instance rather than saving
past requests.
The best way to assess which test to use is by writing effective
test cases. Test cases define testing areas and align expectations
against occurrences. These items are essential to test-driven
development and critical to any continuous testing model. The
next section covers developing and employing test cases.
Chaos engineering
Chaos engineering is a relatively new field that offers a unique
approach to the testing process. The goal is to identify problems
before they become problems, allowing iterative fixes and
continuous development. The purpose is to use the process of
testing a system across a distributed network to ensure that the
various paths and policies can withstand unexpected
disruptions.4 This type of testing enables errors to be found
early, potentially preventing malicious uses and solving
problems a typical user may introduce. The basic principles and
procedures run in a similar path to stress testing.
The common point at which to begin chaos engineering
studies arises from theories compiled by Nassim Taleb in The
Black Swan.5 The Black Swan concept is that something that
always happens today cannot be predicted to occur similarly in
the future. An easy example is the turkey raised for a holiday;
the turkey eats every day and has a wonderful life. From the
turkey's viewpoint, the axe is not expected to come on that day
before the holiday. In software, we want to prepare for those
unexpected events. In a practical example, Japan’s 2011
earthquake registered a 9.0 on the Richter scale, followed by a
massive tsunami. Japan’s nuclear facilities had prepared for
one or the other event but not the combination.6 Chaos
engineering could have helped by looking at the effect of
random events, a flood and an earthquake, on the distributed
network of nuclear disaster recovery.
We then return to using chaos engineering in the software
environment. Netflix receives credit for pioneering these
applications when in 2010 they introduced a tool called Chaos
Monkey to randomly terminate VMs and containers in the
production environment. The goal was to ensure that the
termination of the Amazon Elastic Compute Cloud (EC22) would
not affect their users’ service experience.7 The tool expanded to
an entire group of tools known as Simian Army, available
through open source on GitHub. This developed two core
principles for chaos engineers: develop systems that do not
have single points of failure, and never be confident that a
system does not contain a single point of failure.
Developing effective chaos engineering practices reads much
like other experimentation techniques. You should first set a
baseline using established metrics and observability as the
normal baseline. From that baseline you generate a hypothesis
to consider potential system weaknesses. The hypothesis looks
for ways to stress points to identify those single failure areas.
Next, one conducts experiments, considers effects and then
evaluates effects. A quick checklist:
Set the baseline
Create the hypothesis
Experiment to validate the hypothesis
Evaluate results
These methods allow for finding the core areas in a distributed
system. It would be best to look for things that are known and
understood, things that are known but not understood, things
that are known but lack observability, and things that lack
observability and are not understood. Examples might be user
logins from particular IPs, power services for edge devices or
unexpected surges in system demands.
Building from the initial practice, one can look to computer
scientist L. Peter Deutsch and his analysis of the eight fallacies
of distributed computing.8 These fallacies are:
The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
Topology does not change
There is only one administrator
Transport cost is zero
Networks are homogeneous
Each of these likely has a complete list of explorable topics. This
section, like others, covers those wavetops in identifying areas
for exploration. As a last step, some best practices can help
avoid issues when launching a chaos test. The first practice
returns to understanding usual system behaviour. Without
knowing the basics, one may not know where to test. Next, you
should simulate realistic scenarios. Expanding into future
possibilities may yield benefits, but engineers should remain
tied to reality to avoid unnecessary expenditures or fixing non-
existent problems. Following those realistic scenarios, you
should test using real-world conditions. Chaos engineering
should occur primarily in production environments as it can be
expensive to simulate large, distributed environments solely for
testing. If you know a system has a hard break at a specific
throughput, then testing above that throughput may not be
effective.
Finally, when running chaos experiments, you should
minimize the blast radius. Limit your experiments to a small
area, with redundancy available, and not during peak times.
While perhaps counter to the overall theory, you should
remember that in injecting failures you want to minimize
overlap with real errors as much as possible. Chaos engineering
is not a panacea, but can help keep our DevOps confidence high
as you gain additional feedback and improvements for the
overall SDLC.
People, processes and technology for testing
Similar to other DevOps elements, some solutions rely on
having the right people, processes and technology to support
flow, feedback and improvement. These sections are critical, as
every forward-looking flow depends on effective testing. People
must be at the heart of an experimental culture. Then the
company processes should support their testing process
through effective means such as hackathons and well-regulated
experimental techniques. These must then include tools
allowing proper data analysis, and visualize results to make
decisions from testing.
Testing people
It is possible to hire individuals solely as test engineers. These
personnel require a deep knowledge of code, an infrastructure
understanding and the ability to work through architectures. As
with all other positions, all individuals need to root in a DevOps
philosophy. Also important is the ability to provide context
around testing solutions. Two central frameworks drive initial
DevOps delivery: constructing a minimal viable product (MVP)
and a walking skeleton. The MVP defines the absolute
minimum necessary to meet user requirements, and the
walking skeleton is one level below. The skeleton looks like the
desired MVP but lacks detail. Testing personnel should
incorporate solutions to ensure products demonstrate the right
functions when fleshing in skeletons or advancing MVPs.
A tester needs to be inquisitive; they must always think not
about only product function but about improving solutions.
Most testers come from a self-taught background, which helps
identify knowledge areas but can be challenging to integrate. In
a test engineer, one looks for general programming knowledge,
technical skills, testing expertise and some soft skills to
integrate with the team. General programming should focus on
development languages such as Java, Ruby and Go, but
knowledge can rapidly transfer into different functions. Specific
technical skills should address infrastructure, as
communication protocols are a frequent testing problem. Some
of these technical skills appear again in testing skills. Most
testing skills address protocols and strategies for approaching
tests. Like all DevOps, finding the right test engineers involves
creating the flow, feedback about how the engineers work, and
then experimenting to improve.
Testing process
The testing process should be an analytic one. Testers must
have a framework so that tests exist to support other functions
rather than derailing existing production. Sound tests are the
framework for advancement. The basic process for a tester is:
Define the story you want to tell
Verify source credibility
Automate everything
Measure twice, cut once
Present data in a meaningful manner, apples to apples
These steps should look very similar in process to constructing
metrics. At heart, tests consider a different metric solution. One
should be careful that testing processes only evaluate for the
desired information, as too much data can be worse than
insufficient. It would be best if you built tests analytically, as
there must be a clear relation between test goals and the
provided data. If those connections are not apparent, the test
will likely fail to provide solutions. You should also ensure the
test objectives are clear and any assumptions and subordinate
factors are understood. Finally, the test should be structured
based on desired output rather than instinct. Simply building a
test to verify that something works is not sufficient.
Some process approach highlights appeared earlier in the
continuous testing section, building good test cases and
integrating chaos engineering. Effective processing removes
blockers that prevent pipeline advancement and ensures only
quality code reaches production. These blockers may be in the
code, a quality assurance (QA) group that assumes
responsibility for manually signing code, or simply a lack of test
process understanding. Every DevOps practitioner assumes
some responsibility that testing processes are in place, but
establishing a testing expert, testing group or community
enforcing testing practices helps supplement those initial
approaches.
Again, some of the best practices appeared earlier. The best
DevOps process seeks to automate as many tests as possible and
then conduct paired programming or manual review if
automation fails. These techniques support test-driven
development (TDD). The TDD approach states that no story or
task appears without first developing a verification test.
Looking at the parking application, if the task integrates a user
to an account, the test verifies an account exists, can be linked
to payment, and then attaches to parking spaces. Each may
have separate tests as tasks decompose into more minor
elements, but understanding the need for tests accelerates
development. Further support includes implementing the right
testing technology.
Testing technology
Preferred testing tools should be designed to improve delivered
code quality through managing testing processes. The best tools
help automate and streamline the testing process to identify
issues and, when identified, fix those issues early. As with every
other DevOps aspect, specific tools change frequently and you
should always pick the best option for your organization. One
of my favourites, covered earlier, is Robot Framework. Other
common options include Mocha, Parasoft and Selenium.
Mocha offers a feature-rich JavaScript test framework
designed to run through Node.js or in the browser. As GitHub
sponsors the system, it can be either open-source or a
purchased upgrade. The tests run serially, so one at a time,
which creates a flexible reporting structure. This structure
catches exceptions to any designed test case. The system offers
full online documentation to work through various testing
issues.
Parasoft is a more complete tool but one you need to
purchase. The tool offers different subsets for working with
Selenium, testing APIs, or languages like C++ and Java. Testing
programs such as C++ hold external certifications from security
organizations to verify authentications. Each program for
Parasoft can be purchased individually or as a larger set.
The last-referenced tool is Selenium, a well-known testing
framework. This also supports the previous purchasable tool in
Parasoft. Selenium, again, is an open-source, automated tool
that can work with web applications across many browsers.
However, this tool can only work through web applications so
items like desktop and mobile applications are untestable.
Selenium generally provides a combination of smaller tools
such as a Selenium IDE, WebDriver, RC and Grid. Each of these
focuses on a different aspect of testing.
With those three tools in mind, I would encourage you to
research testing options for your DevOps environment. If you
keep the general structures and guidelines throughout this
chapter in place, finding the right tool can be easy. Knowing
that each organization finds its own path makes me hesitant to
recommend any tool, but the three above and the Robot
Framework options can get you started.
Summary
This chapter covered the long road from testing basics to the
specific applications that can drive success. Testing basics
included how to design test cases for any problem that might be
faced by a DevOps culture. One should remember that testing is
not just for technical solutions but can assist with personal
decisions. Some of these factors become more apparent as one
implements continuous testing solutions for DevOps.
We then continued our exploration through testing processes
for improvement. One interesting example in this area was
using chaos engineering to build sound development and
effective operational environments. Finally, we discussed
finding the right people, processes and technologies to support
solutions. The next chapter wraps up all the technical solutions
we have discussed across various chapters and begins to
explore the tools for the right DevOps mindsets within your
organization.
Notes
1 Test, https://www.merriam-webster.com/dictionary/test (archived at https://perma.c
c/2JDA-T648)
7 D’Antoni, J (2022) What is Chaos Engineering? A guide on its history, key principles,
and benefits, Orangematter, https://orangematter.solarwinds.com/2022/08/18/what-
is-chaos-engineering/ (archived at https://perma.cc/F8SG-H8J5)
8 Simple Oriented Architecture (2018) Understanding the 8 fallacies of distributed
systems, https://www.simpleorientedarchitecture.com/8-fallacies-of-distributed-sys
tems/ (archived at https://perma.cc/L7ET-NZC6)
PART THREE
The DevOps mindset
CHAPTER EIGHT
The DevOps mindset
Until now we have emphasized the importance of
understanding DevOps and integrating technical tools into
those solutions. But DevOps cannot be just a technical solution;
it must incorporate cultural change. Integrating innovative
technical solutions with cultural implementation takes a unique
mindset. Understanding that mindset can help you make
effective and confident decisions on where and when to
implement. This understanding starts with being a broad-
minded, philosophical manager. The decisions made by DevOps
are not just technical engineering solutions but also require the
consideration of human factors. Human decisions require
human interaction, so understanding those drives and
motivations helps us accelerate DevOps delivery. Some key
chapter topics are:
Understanding interaction with individuals and teams
Overcoming bias
Preparing the culture
After establishing the culture, building communication paths
and planning the future, one of the next important steps is
understanding bias. When I discuss bias, this means cognitive
bias, those decisions people make based on factors unknown to
their conscious mind. This chapter will examine those biases
and provide steps to prevent them from infecting your decision-
making processes. The last element will discuss evaluating and
building cultural relevance. Feedback does not stop with the
product delivery; one must prepare to constantly evaluate
whether your team continues to be successful and relevant
from numerous perspectives.
Social exchange
The social exchange theory states that all exchanges between
people realize some element of social interaction. The theory
determines if an exchange will occur when profit, defined as
reward minus cost, reaches a sufficient level for the involved
groups to perceive a beneficial exchange. In essence it states
that ‘social behaviour is an exchange of goods, material goods
but non-material goods’.10 The theory originated from Homans’
1951 work The Human Group, exploring the behavioural
decision theory with activities, interactions and means defining
the group. Activities are those repeated actions taken, the
DevOps culture elements including sprints, retrospectives and
open exchange. Interactions are the defined elements between
teams in large structures or between individuals on the team.
The means are the communication media, often the git
repository, the coding environments or the customer
interaction.
Social exchange follows the constructivist worldview to
address individual interactions and interpret the meaning
others express about the world. Applying social exchange
theory to your teams helps enframe the lived experience of
DevOps practitioners by establishing exchange requirements
within each experiential cycle. We will cover some of these
required elements in the final section of this chapter.
Exchange mechanisms appeared in Homans’ work as the
initial conceptual understanding of social exchanges. Later
research explained how exchanges between clear-cut economic
functions and social responsibilities appear in corporations.
This laid the foundation that culture has more impact than
code, although all code has a social impact in changing those
who interact with it. For example, developing and
implementing sound security practices can be considered
technical solutions or societal responsibility. Connecting social
responsibility to coding builds a broader tie to the individuals
on the team, increasing their satisfaction with software
development. DevOps studies continue to stress that happy
developers are effective developers. This interaction allows you
to understand why value changes are made rather than simply
stating baseline requirements. From analysing other SDLC
methods, we understand that breaking requirements and
planning from delivery can delay or disrupt effective practice.
Social capital is the critical exchange element through
organizational structures.11 Homans demonstrated
mathematically that social capital could be measured to show
exchanges between different areas. The concept supports this
research when concluding that individuals can make quantified
decisions based on qualified experiences. With the growth of
software factories as a model for DevOps delivery, we can
recognize the equivalence of software development and
traditional labour. There is an association between DevOps
through just-in-time delivery methods and capital accumulation
within the company, delivering effective products quickly.12
While some may tie these conclusions to a broader corporate
revenue stream, the decentralization of DevOps, with many
teams working on items in parallel, means that a minor
decision in one team can impact overall revenue. The DevOps
change disseminates value decisions to a lower level and
emphasizes tight ties to development teams. This dissemination,
and the corresponding technology based on version and source
control, makes it possible to identify social exchange
mechanisms down to the individual developer or operator.
Connecting the GQI study through evaluated cyclical
experience to social exchange allows for discovering how
groups determine acceptable behaviour and the medium of
exchange. Van Manen offered a qualitative perspective, stating
that we must turn to experience as the lived-through events to
guard against the cliché, conceptual and predetermined to
study not the overall meaning but the moment the individual
reaches their intuitive grasp.13 Adopting this perspective means
the focus for discovering value exchange processes will move
from the terminology to the underlying experience as each
individual proposes value to their delivery. Current work
studying the human experience in DevOps is lacking, with no
articles logged in ProQuest over the past five years.
In three case studies, research demonstrated social exchange,
showing that values begin as economic comparisons and
expand to social consideration.14 These cases exhibited four
themes: computer ethics, social information, cooperative work
and participatory design. Estimating comparative value
between coding elements like security and functions associates
with the participatory design and cooperative work for those
same elements. Simultaneously, these elements require social
information to gain user agreement and demonstrate an ethical
approach towards secure software. If we return to the parking
structure, we must ethically sell items, gather social
information within the application bounds, work cooperatively
between multiple venues and encourage participatory design. A
good parking application not only changes the parking
environment, it can also contribute to the local corporate
economy and yield long-term impacts. When code designs are
poor, the negative outcomes then prove detrimental to the
overall economy. Understanding how we exchange different
priorities for which garages integrate when, and how you
protect user information, becomes critical to our overall
DevOps culture.
Social exchange rules in industrial organizations start with
the core contract being centred around high trust, beginning as
an economic exchange between parties.15 We want DevOps to
be more than simply buying an application, but a continuous
culture embedded within the larger organization. Social
exchange practices affect leader-member relationships,
organizational support and organizational politics. Each of
those elements exchanges with the others regularly but most
organizations fail to recognize that an exchange is occurring.
Especially relevant is the research finding that leader-member
relationship exchanges are less stratified in modern companies
than in traditional, hierarchical organizations. This lack of
stratification, especially in technology industries, suggests the
appropriate framework for open communication facilitated by
DevOps cultures.
Quantitative knowledge-sharing assessments show how this
process increased productivity.16 Our DevOps feedback process
of knowledge sharing was demonstrated from a social exchange
perspective to survey established norms and workplace usage.
Contrary to what was expected, the study found unexpected
results demonstrating that the increased sharing of specialized
knowledge improved productivity more than withholding
information. This supports social exchange in the DevOps
culture as the groups’ activities, interactions and means
accentuate sharing information. Value exchange between
workplace units, whether individuals or teams, can improve
product delivery. Understanding the culture and motivations is
the first step; the next must be identifying those internal
thought processes that can prevent success.
Overcoming bias
In examining bias, it helps to have a good explanation, as we
often use the term to cover a wide variety of options. Merriam-
Webster defines bias as ‘an inclination of temperament or
outlook, especially a personal and sometimes unreasoned
judgment’.17 This links to the previous phenomenological
exploration, where experience drives perception and then
affects results. When assembling teams, different individuals
have different inherent biases; there is an organizational bias
and there may even be local cultural bias. To produce the best
products, it is key to identify bias and work through paths to
resolve any issues. This section works through cognitive bias as
a systematic thinking error occurring when you process
information from the outside world. The following are the most
common biases in DevOps.18
Confirmation bias
In DevOps, confirmation bias is a tendency to prefer a set code-
base or tool. One often sees teams selecting a tool not based on
external credentials but because of previous experience. I
worked on a team that switched from Jenkins to Git based on a
criteria model. As the teams hired more developers, the new
individuals had experience with Jenkins and wanted to return
to those solutions. The challenge in DevOps is to constantly use
feedback to verify that beliefs are accurate rather than
subjectively evaluating concerns.
Hindsight bias
Hindsight bias is very familiar to me from my intelligence
background. After an event, it is always easy to align the causes
looking back but not as easy when looking forward. This is a
constant challenge for developers. One sees this in security
where observing where an attack occurred afterwards is easy
but the challenge is predicting future attacks. As mentioned
with the Fukushima earthquake example, preparing for
hindsight can restrict the ability to prepare for the future.
Identifying hindsight bias benefits in avoiding taking
unacceptable risks in planning and development.
Anchoring bias
Similar to hindsight, anchoring bias is when someone becomes
overly influenced by the first piece of information received.
When developing in a new cloud, environment, cluster or
merely adopting a new tool, you want to be careful not to be
overly influenced by the first experience of those situations. A
version of anchoring was mentioned above where developers
preferred Jenkins over Git based on their first experience with
the tool rather than a structured evaluation. Anchoring is a
critical mover in brand loyalty discussions from a marketing
perspective and can be manipulated in DevOps by creating
first-movement opportunities. The DevOps goal to deliver
quickly aims to reaffirm anchoring by suggesting the first
choice continues to be the best option.
Misinformation bias
DevOps teams use retrospectives to help combat the
misinformation effect. Misinformation suggests that memory is
malleable and later recollections may change the events
associated with the initial experience. An example may be
delivering a piece of code, proving initially difficult but with a
later solution expediting the developed processes. You want the
team to remember the difficulty and the time spent overall
rather than the easy solution that appeared later. Accurately
evaluating work on an iterative basis through retrospectives is
one way DevOps resolves this bias.
Actor-observer bias
The actor-observer bias again ties in the phenomenological
perspective and becomes extremely important to team
functionality. This bias suggests that you often attribute
personal actions to external influences and others’ actions to
internal influences. You might state that developing a new
security model to comply with external standards was the
standard for your team but that another developer’s code
resulted from faulty preparation. A common expression for this
between teams appears in the ‘Us vs. Them’ model. One must be
careful that adversarial relationships do not appear, and if they
do, they are resolved by objective comparison rather than being
biased towards their internal perspective. Another common
DevOps bottleneck occurs when dev teams perceive security as
trying to slow development rather than realizing security teams
are also meeting external requirements.
Optimism bias
This is the penchant for overestimating that good things will
happen instead of negative ones. The optimism bias can be
especially prevalent in DevOps as you always believe the code
will work and run smoothly. While we want to be confident in
our code, just as with the availability heuristic, realistic risk
planning and management are required to mitigate bias.
Understanding the risks that can happen when deploying
software helps one understand where optimism is warranted
and where more extensive mitigation should happen.
In every element of cognitive bias, the answer is to think
objectively about the factors involved. Taking time for research,
building up decision criteria and evaluating risks allow for
sound decision making. Bias cannot be eliminated but effective
planning can help mitigate those risks. DevOps’ constant flow
and feedback should provide steady information to make
decisions. You should recognize where bias occurs and work
consistently to identify and then minimize the impact of that
bias on overall delivery. Not all bias is bad, and sometimes bias
leads to better choices, but the overall emphasis should be on
recognizing which biases appear and where.
Maturity levels
One area I have intentionally skipped until now is the maturity
level concept. Personally, I am not in favour of maturity levels
as too many organizations use them in the same way as poor
metrics; they can become targets rather than progress
measurements. Sorting maturity as targets separates good team
goals from what the organization wants to advertise in the next
conference or slick sheet. At the same time, there is a place for
maturity and including these aspects within any effective self-
assessment creates feedback for continuous improvement as a
DevOps target.
When required to perform a maturity assessment, I typically
use five levels to create an assessment: Chaos, Repeatable,
Defined, Managed and Optimized. Each level incorporates
multiple aspects and allows for customization and
experimentation. You should assign levels to the process
desired within the organization. Evaluation can be at a high
level of DevOps maturity or you can approach smaller areas
such as testing, observability or security. Understanding each
level allows you to define the specific aspects. Items might
include multiple sub-items to measure:
Testing – automation, coverage, test design, inputs,
pipeline inclusion.
Observability – points of observability, access, sharing,
automation, integration to alerting and improvements,
data collection, dashboards.
Security – security scanning, remediation, proactive
alerting, pipeline integration, security policy.
Architecture – enterprise or API options, defined
governance, metrics established, stakeholder perception,
change management.
Infrastructure – compute limits and automation, storage
virtualization, orchestration. Application delivery, load
balancing.
Chaos describes where tasks are accomplished, but no one can
describe a consistent process or application. At the repeatable
level, the teams begin being able to do things in parallel but the
emphasis becomes doing a thing again rather than
understanding why a thing occurs. However, repeatability
becomes the first step towards creating a mature culture. These
first two steps begin to recognize the required actions but they
appear as distinct sticky notes on the whiteboard rather than
establishing any flow between different items.
Defined begins what one can first think of as maturity. In a
defined stage, one establishes clear processes with metrics and
explanations. Even though Agile prefers functional software
over comprehensive planning, a framework for planning can
drive flow and feedback. Establishing some known principles
provides the platform to grow. In the visual model, defining
areas allows for stability, securability, scalability and
adaptability across phases for design, support, develop, deploy
and operate. Defining processes and building the common
understanding between teams and management allows you to
realistically assess progress as reaching the managed level.
Defined maturity organizations demonstrate automated builds,
cross-functional teams and product-focused objectives while
remaining open to cultural change.
Managed maturity means that processes exist and are
established, observed and integrated in the sense that
variations result in action. Examples of solid management
include automated dashboards, retrospective practices and
clear, observed flows. Organizations at the managed level tend
to demonstrate happy individuals who use integrated
toolchains to prevent failure. Test and deployment stages are
automated and continuous delivery happens.
Once a process or an organization reaches the managed level,
the only further step would be optimized. High-functioning
cultures typically alternate between defined and optimized
regularly. These changes are not bad but occur because, much
like all other items of observability, assessments are made at a
point in time. Optimized cultures appear as DevOps- Done
during work tracking discussions. All necessary changes are in
place, automation becomes a first step and employees get work
done. Optimized delivery reflects continuous
integration/continuous delivery. Suggestions and improvements
become the basis for experimentation, and those results
support any future process.
Organizations fluctuate between the defined, managed and
optimized levels as newly incorporated processes and decisions
can revert to early stages. When optimized observability
demonstrates the need for changes, those changes must first be
defined within the process, managed as you become familiar
and then optimized. That optimization results in generating
further data for experimentation. The variability between these
levels is why I refrain from using maturity to measure success.
That said, you should know how others use maturity levels to
assess software development. Success should directly relate to
delivering value quickly; stopping to consider maturity rather
than operational success can lead you down the wrong path.
Team maturity
With the previous terms solidly in mind, it can be effective to
give a synopsis of how those maturity levels occur within a
team setting. First, you should consider how you obtain
information. In Chapter 5 we discussed establishing various
metrics; those metrics will be the core guidance for assessing
team maturity. The DORA metrics often appear in measuring
mature teams but deal primarily with production aspects.
Instead, the emphasis should be on ensuring that cultural
practices associated with teams support our strategic goals.
It would be best if no one ever worked on a chaotic team, but
I am sure we have all been part of those environments. Chaotic
teams lunge wildly from project to project, limited
prioritizations exist and the general feeling is dissatisfaction.
One of the simplest ways to judge this process level is to ask
what a developer is doing and why they are doing it. An
inability to answer those questions means there is no process
guiding cultural decisions. One sees dissatisfied workers in
these environments and may even see duplicated work. The
lack of a common standard creates an inability to focus on clear
goals. Teams at the chaotic level have not even begun to
consider the basic DevOps principles of flow, feedback and
improvement.
The first step to move out of the chaos level is to make work
repeatable, which often happens through observability.
Implementing work boards such as Scrum or Kanban, moving
to sequenced pipelines and using automated tools such as
GitLab, Jenkins, ArgoCD, Flux and others helps one visualize
how work progresses and at what point efforts are currently.
The biggest gap in a repeatable maturity environment appears
in the lack of accurate feedback. Teams will repeat efforts
because they know how to repeat them rather than choosing
the best solution. Many operations teams can be stuck in the
repeatable maturity level as they follow checklists and conduct
routine activities but never generate feedback to reduce or
automate repetitive work. One DevOps maxim is that anything
that must be done twice should be automated; leaving these
items in a manual approach removes one of the biggest
strengths of the DevOps culture.
Moving from repeatability to a defined maturity involves
knowing why processes occur. This movement relates to the
longer nature of integrating features with individual coding
samples and linking those features back to overall profitability.
A defined element states what is happening, how it happens,
where it fits in the process and when completion may occur.
This emphasizes more than a repeatable step but the ability to
lift and shift different processes into different places. Teams
with defined characteristics can be exchanged with other teams
and explain their processes to others. From the Agile
perspective, defined teams deliver functional software and
provide the documentation to allow others to benefit from their
work. This repeatable and defined expression then allows for
management to succeed.
The managed maturity level appears in teams that are
integrated and successful. Managed teams have internalized the
flow, feedback and improvement associated with DevOps.
Those organizations understand their internal process and how
they fit with other organizational pieces. Higher levels can
assign functional tasks rather than having to establish specific
requirements. These teams manage based on consistent, quality
delivery rather than limiting to specialized items. The DevOps
essence establishes continuous delivery of new products,
changing versions and meeting adaptable demands. Managed
companies constantly look for ways to benefit from feedback to
improve flow and conduct experimentation cycles.
An optimized maturity level incorporates everything from
the managed level and becomes more so. Optimization means
not only are processes managed to create feedback but the
feedback process is managed to produce the best, most
actionable feedback regularly. These organizations are truly the
opposite of chaos in simply allowing work to happen and
managing all work through the continuous injection of order at
every part of the production cycle. Contrary to chaos, not all
optimized cycles are apparent to outsiders; often, optimization
makes the work appear easy even when the constant struggle to
maintain feedback can be challenging. The best-optimized
cultures are the ducks on the water; all appears calm on the
surface, but they are working hard underneath to remain
afloat. Optimized cultures accept that maintaining those
standards requires constant effort; the work of DevOps never
stops but remains a continuous journey. At the same time, these
cultures are confident in their ability to deliver work
successfully.
Summary
This chapter took us from the high-level discussion of team
interaction through phenomenology and social exchange to
how those thoughts appear within the workplace.
Phenomenology relates the social structure behind experience
in terms of where and when individuals make decisions. Social
exchange theory then built a framework for exchanging
individual decisions with others to create a more valuable
product. These interactive basics then allowed for a discussion
about how bias affects decisions, where bias occurs in DevOps
and how to mitigate those biases. Mitigation and risk
management lead to characterizing cultural environments,
aligning metrics and recognizing when a DevOps culture is
mature. Each of these elements is critical in understanding a
DevOps mindset, and the next chapter will examine some
practical examples in establishing a DevOps workplace.
Notes
1 Smith, D W (2018) Phenomenology, The Stanford Encyclopedia of Philosophy
(Summer 2018 Edition), Edward N Zalta (ed.), https://plato.stanford.edu/entries/phe
nomenology/ (archived at https://perma.cc/Z427-DY58)
Building teams
As with metrics, teams take time to form, and the longer the
team survives, the more effective practices become. Change is
necessary to improve, but centring teams around core ideals
helps maintain focus. Amazon promoted the idea of a two-pizza
team, where team sizes are kept small enough that they are
never too large to be fed with two pizzas. In practice, most
teams should have 5–10 members. This practice allows
interaction but prevents individuals from disappearing into the
background. Teams then align to a common purpose rather
than a project. DevOps teams should not be project-aligned
solely to build a specific application but functionally aligned to
build applications within a common framework. Doing so
prevents constantly juggling teams as projects end.
Another procedural guide helps drive successful team
formations. This guide uses a scrum master or team leader to
keep teams on track. Some organizations assign these roles as
additional tasks, but good scrum masters accelerate team flow.
The scrum master possesses comb-shaped knowledge, a broad
base with multiple narrow skills. These individuals ensure use-
cases are captured, acceptance criteria defined and work
aligned with team working agreements. The product owner on
the team possesses E-shaped knowledge, a broad depth and
specific knowledge on two or three subjects. Coders should be
T-shaped, with a breadth of coding knowledge but specific
knowledge about one area or application. You can supplement
teams with champions or representatives for security,
platforms and architecture. A champion is a team member with
specific responsibilities, while a representative is more
narrowly focused and available on demand.
The best DevOps book I have read on team structures is Team
Topologies by Matthew Skelton and Manuel Pais.1 The authors
break out the essence of working in DevOps teams and
structure much better than I could within this chapter. Teams
are broken into four essential types:
Complicated sub-system teams – handle unique and
specialized work.
Stream-aligned team – completes any and all of the
various DevOps business tasks.
Enabling team – manages temporary special functions
such as architecture or security.
Platform team – manages the environment on which
teams operate; development for developers.
Each team has strengths and weaknesses but they emphasize
flow and communication to achieve effective outcomes. The
underlying perspective is that even with all DevOps teams, not
all teams are equal. At the same time, managers must prepare
to handle DevOps flow rather than committing to static,
Waterfall deliveries. One important note is that Skelton and
Pais do not include distinct operational teams, as one expects
teams to either run their own products or be a distinct stream-
aligned team committed to the operational experience.
Culture has been a continuing theme, and a good DevOps
culture possesses four pillars: trust, respect, failure-accepting
and blame-free. Trust means we expect everyone to bring their
best effort including sharing all plans and data when
developing features. A good analogy is that we leave the keys to
the kingdom out, trusting that anyone who grabs them uses
them appropriately. Respect appears in trust. Respectful
interactions mean one does not stereotype others for race,
religion, coding background or other factors. When we
disagree, we do not just say ‘no’; we identify bias and present
respectful arguments. Respect also means not hiding; we admit
problems and errors and seek a group consensus. These factors
lead to the last two pillars.
The last two elements are failure-accepting and blame-free.
DevOps depends on iterating fast, failing quickly and learning
from feedback to improve. Failures encourage growth by
thinking about a response to failure. Cultures with zero-risk
attitudes that need perfection are unlikely to accept failure.
Failure now does not mean permanent failure; every failed test
or failed build creates an opportunity to refactor and improve
the code overall. One common standard to address failure
should be practising until it hurts, especially in dealing with
operational practices such as incident management. Sometimes
repetition can be stressful but practising common approaches
allows for better failure recognition early and ensures all team
members bring the same standards to solutions.
The last cultural pillar is blame-free. No failure or error
should be associated with an individual or team for
punishment. We do not want teams to point fingers but to think
about who wakes up if something fails inopportunely. Code
criticism is never personal; correction should be core to
feedback and improvement. One of the best pieces of leadership
advice I ever received was that one should not take work
criticism as a personal assault. Work, code and operations are
all things we do but not who we are. Representing blame-free as
a core cultural pillar helps embed these philosophies within
teams.
One final team element to consider must be hiring effective
managers. Teams generally work and manage themselves, but
generating organizational value requires perspective. Managers
address multiple teams and integrate several value streams,
including platforms, specialized practices and regular features.
The most challenging managerial task is often accepting the
lack of control within DevOps. Typical programme managers
use Gantt Charts, control diagrams and status reports to inform
on progress. In the DevOps culture, observability is built in and
progress is always available to all. The challenge for the
manager will be integrating progress from the team to the
business level through connected features, epics and portfolios.
Conversational practices
Building a team depends on progressing cultures by embedding
conversational practices to emphasize trust, respect and blame-
free outcomes. Conversations appear simple but depend on two
practices: teamwork agreements and communication training.
The first element provides progress guidelines; the second is the
ability to tie work to experience. Each contributes to overall
success, emphasizing flow, feedback and improvement.
Team working agreements are individual to a team. Most
important to agreements is physically writing them and posting
in an observable location. These agreements should include
high-level goals, intermediate objectives and standard practices,
and typically restate the cultural pillars and their daily
expressions. An example appears below:
Goals – deliver effective, valuable features promptly.
Demonstrate respect and trust in a blame-free culture.
Exemplify observability in all practices.
Intermediate objectives – characterize work to the
smallest possible increment. Help others when needed.
Implement automated tests whenever possible.
Daily – be on time and present for all meetings. Report
work hours daily. Link tickets between work tracking and
merge requests. All code is commented. Pair programming
and code reviews are the preferred standard of working.
You can see the derivation from the top level to the daily. Each
team may represent these differently. Teams often include
contact numbers, working times and scheduled meetings
within agreements. Another good inclusion links onboarding
tasks, knowledge hubs and shared environments for users.
Placing all the information into a single area creates team
observability, documenting required functions and establishing
common standards. The next step allows teams to communicate
effectively about these steps.
Like many DevOps practices, there are many excellent books
about team communications. The best book I have found is
Agile Conversations by Douglas Squirrel and Jeffrey Frederick.2
These authors provide a detailed understanding of the flow and
blockers within DevOps communications and suggest four Rs as
the essential conversation elements: record, reflect, revise and
roleplay.
Two additional Rs appear as subsets of the last two: repeat
and role reversal. The first element, record, should be essential
for all communication growth. Conversations should be
recorded in writing or in a media format for replaying when
learning. Recording presents a solid basis to move discussions
forward. It prevents hearsay discussion such as ‘I think
someone said this’, or ‘I thought they said that’. The familiar
track allows movement to the next conversational tool: reflect.
Reflection means going back over the recording and ensuring
accurate communications. We all know items in an off-hand
conversation are not always clear or as intended. A good
conversation example would be telling a friend you put new
tyres on his car because the old ones broke. The friend might
assume you purchased brand-new tyres when in fact you put
used tyres on, new to the car but not new from the
manufacturer. Reflection helps clarify meaning, explain
acronyms and ensure all parties have the same meaning. Once
reflection occurs, the conversation can be revised.
Revising a conversation means returning to the original track
and clarifying confusing issues. In the previous tyre example,
one could see how stating that old tyres were replaced with
functional, used tyres would help clarify issues. In coding
examples, participants might lose or misunderstand detailed
technical explanations. Detailing meanings during revising
provides a better track of how the conversation might have
gone. Detailing leads to roleplay, understanding the various
terms and then retrying conversations. Roleplay leads to
repeating conversations and sometimes role reversal. Role
reversal means retrying the conversation from an another
perspective to see if one reaches the same conclusions.
These methods are not for every conversation. Using them
with initial sample conversations can help improve the overall
communication flow. Later, methods can help with challenging
conversations. Once a team understands the communication
barriers, their communication will likely improve. A sound
communication basis allows for effective processes to keep
DevOps flow, feedback and improvement at value-producing
levels.
Architectural patterns
Generally, most architectures deliver seven to ten core designs.
These designs include some combination of physical style,
processing style and information interaction to derive goals. My
experience categorizes architecture into three rough groups.
The first group is physical, including layered, microkernal,
microservices and space-based architecture. The second is an
action group exhibiting service-based, event-driven or
orchestration architectures. The final group is relationships
with peer-to-peer or master-client architectures. Each
architecture appears briefly but I encourage you to conduct
additional research into areas critical to your organization.
Architectural patterns solve functional problems. In some
cases, more than one architecture can solve the same problem.
Elements driving one architectural decision over another relate
strongly to available resources, expertise and time.
Understanding different architectural approaches drives
informed decisions based on feedback. Just because an
architecture was selected does not mean it will always be the
best. You must remember the sunk cost fallacy, which states
that just because a certain expenditure level has occurred, it
does not mean continuing to throw good resources after bad for
minimal results; feedback is crucial to success.
The first physical architecture example is layered design. A
layered design states that every element within the architecture
fills a unique task, and those tasks are open and closed within
an independent layer. This design prevents a layer from
advancing without completing a task. This inherent separation
works well for teams that either lack experience or prefer a
traditional approach. Connections between layers can create
tight coupling and create problems in changing events.
However, one can implement this model quickly.
FIGURE 9.1 Layered design
Sources: https://devblogs.microsoft.com/cppblog/vscode-embedded-development/;
https://stackoverflow.com/questions/74459113/spyder-ide-with-python-3-10-seems-f
reezing-when-click-run-button-but-it-works-f
Dashboards
One last technological DevOps decision deals with dashboards.
Dashboards are the pretty cover to the complicated processes
within the DevOps engine. At the highest level, dashboards
should demonstrate any metric your team tracks. Grafana,
Prometheus, Jaeger, Open Census and OTel are all options to
gather data and present metrics usefully. The most important
aspect should be that dashboards are not the end point of
discussion. Dashboards gather all information in a shared
place, visible between all business workers, and allow outcome
modelling. It is one thing to see an error published when
committing code and another to see a red dot on the pipeline
execution graphic. Visualization allows abstraction, and just
like with code and platforms, every abstraction level produces
additional benefits.
Dashboards are highly customized applications. Each
dashboard supports specialized viewers. Individuals who
prefer one dashboard over another might see similar data, just
in a different format. When building a dashboard there are two
questions: how do you learn from the dashboard, and how does
it create action? These mimic the first steps we took in creating
metrics: how do you know, and what do you see next?
Integrating the thought process into visualization allows
sharing between teams, isolating problems and leaning into
experimental solutions.
Project managing dashboards appear best as workflow
boards like Kanban or Scrum, burndown charts showing work
completed and GitHub project boards. Application monitoring
dashboards use tools like Jaeger and Open Census. Jaeger
performs open-source, end-to-end trace requests to evaluate
response time, including service mesh. Open Census allows one
to view the host application’s data but can be exported to other
aggregation dashboards. Prometheus scrapes metrics from
nodes, Grafana uses event-driven metrics with panels
correlating multiple inputs and OTel establishes a single, open-
source standard to capture and deliver metrics, traces and logs
from cloud-native applications. Each tool can be used in various
proportions to build an effective dashboard.
Kubernetes implementations can challenge our dashboard
structure due to the numerous containers and service mesh
integration. The problem is that each cluster of containers can
be interpreted separately with no centralized application data
and rapidly changing clusters. Data must be aggregated to a
stable environment to observe through providing context,
history and the overall environment state. There are a couple of
keys to succeeding with Kubernetes: first, do not just aggregate
logs. Logs must be contextual and bringing them to a single
point does not create the needed success. You cannot just rely
on picking the right application or using a managed Kubernetes
service. Kubernetes requires active monitoring because of the
distributed and ephemeral nature of each cluster
Overall, dashboards are a key to bringing together
information. It would be great if there were one customized
solution to provide the best outcome, but much like other
DevOps approaches, you will have to start from the beginning
and experiment towards success. Finding the right dashboard
will be largely trial and error for processes like observability,
pipelines and testing. I would start with the options available
through some services, then adjust to show flow.
Summary
Building from technology to find the overall process governing
DevOps decisions can be difficult. Expediting this process
requires hiring the right people, creating efficient team
practices and emphasizing conversation. None of these things
happen or will continue to happen if not supported by top-
down direction and bottom-up participation. This participation
requires solid connection in building architectural patterns
supporting delivery. Solid Scrum and Kanban techniques that
emphasize flow further support those software architectural
patterns. Picking the right IDE, using platforms and clearly
displaying flow through dashboards can help distinguish
between indifferent DevOps and confident delivery. Our next
step explores how to emphasize stable practices with security
and stability to ensure external factors do not crush your
success.
Notes
1 Skelton, M, and Pais, M (2019) Team Topologies: Organizing business and technology
for teams with fast flow, IT Revolutions
Security experts
One of the challenges with security experts is growing from
past models to today’s standards. In the old days (pre-2010),
security experts gained their positions as experts on
development or operations, and then someone asked if they
wanted to change. The individual moved from their team to
building and running compliance teams. The older security
standards were about scan timing, whether compliance
occurred and satisfying audits. This practice means security
experts spent the last 10 years thinking about paperwork
results rather than active integration. DevOps means bringing
those experts back together and integrating into daily delivery
experiences.
When you think of security experts, they should know three
areas: standards, risk management and business solutions.
Each area shapes security thoughts and guides future
discussion. The first item is standards. Whatever industry area
in which you deliver products, there are required security
standards. In the United States, medical care is governed by the
Health Insurance Portability and Accountability Act (HIPAA) or,
if working in government, you have the National Institute for
Standards and Technology (NIST). From a generic perspective,
there are the ISO 27001 and 27002 guides. Further, legal
restrictions such as the General Data Protection Regulation
(GDPR) in Europe or the North American Electric Reliability
Corporation Critical Infrastructure Protection (NERC CIP) for US
utilities often apply. Any expert needs to be intimately familiar
with the challenges associated with those elements.
Commercial organizations supplement, enhance and explain
these standards to others to gain revenue from solving security
assistance issues. The proliferation of third-party groups
emphasizes why your experts must live and breathe relevant
standards. Many approaches contain hundreds of pages,
contradictory guidance and one-time exceptions. Only through
regular use and application can one truly understand security
standards. The difficulty is more than recognizing breaches or
incorporating firewalls, but delves into daily practices for users,
screen-saver times, backup power and other items customarily
ignored. Once your expert learns the basics, establishing sound
risk management fundamentals is the next step.
As with the initial security standard, many standards offer
different approaches to risk. For example, the NIST standard
uses the Risk Management Framework, with these steps:
Prepare
Categorize
Select
Implement
Assess
Authorize
Monitor
These will undoubtedly help define individual risks, but the
most crucial aspect revolves around two areas: first,
categorizing the risk through assigning a probability, timeline
and impact. Probability defines the event likelihood, the
timeline addresses when events happen and impact shows the
product effects. For example, on our parking application, a
DDOS attack might be unlikely, can occur any time from today
onwards and create a significant impact if customers cannot
reach our service. The second step is mitigation by accepting
the risk, avoiding it, controlling it, transferring it to others or
monitoring it. In this case we accept that a DDOS happens
(accept), only allow registered users access (avoid and control),
use a third-party service (transfer) or use solid operational
practices (monitor). Understanding standards and risk moves to
the final requirement for an expert, understanding business
solutions.
The last area for the experts to show their skills should be
generating business solutions. There are a multitude of
different tools that advertise security guarantees. An expert
should distinguish between the different variants and find
where we can rent or buy a solution rather than building our
own. The sheer number of tools available can be
overwhelming, and experts should be able to rapidly narrow
those by efficiency and price. One common area appears in
AI/ML security solutions. The expert should differentiate
between generic ML solutions and learning that mitigates the
overall risk. Another solution for experts is creating teams to
increase the shared benefit.
Security teams
A security team brings multiple experts together to share ideas
and improve flow, feedback and improvement. These experts
work together to identify crown jewels within the organization,
areas requiring protection. The teams have two goals: aligning
operations to compliance standards and filling security
framework holes. Each happens best by working with other
DevOps teams. When I support security, I assign team roles with
active participation in the DevOps teams, learning the language,
understanding the flow and communicating security benefits.
From the earlier team discussion, good security teams enable
functions that improve everyone.
In compliance alignment, we previously mentioned that
security experts should live and breathe compliance standards.
The next step is applying those standards to regular actions.
One can move forward only by knowing the security standard
and the operational guidance. As mentioned earlier, security
can either guard or signal. This practice requires knowing all
organizational gaps. These gaps appear in three areas:
compliance, vulnerability and organizational. Distinguishing
categories helps find the best ways to eliminate problems at
notification rather than after damage occurs. Compliance gaps
are when one knows a standard must be met and is not.
Vulnerability gaps assess current defects and use proactive
measures to stop defects from reaching production.
Organizational gaps are issues in the corporate process with
training, cultural acceptance and team interaction.
Addressing the compliance gap addresses known standards to
create market value through advertising security fixes. If a
product falls within HIPAA, NIST or PCI DSS, meeting the
standard is the primary way to be competitive. Companies
advertise compliance through maturity levels, evaluations like
SOC2 or ISO 27001, or achieving a certification like the US
Government’s FedRAMP or Authority to Operate (ATO). These
gaps must meet internal evaluation standards combined with
external approval. An organization runs tests to demonstrate
standards and then produces artifacts for external review.
Sometimes, compliance can involve third-party assessments,
penetration testing or even red/blue team exercises.
Vulnerability gaps deal with technical challenges posed by
producing new code. The broad security community routinely
evaluates new products for fun and profit. One sees these
evaluations by known security tools, practices like Hack the Box
competitions or bug-bounty items. Hack the Box competitions
offer a target for prizes by breaking more areas faster than
competitors. Conferences like Black Hat and DEFCON routinely
bring groups together to discuss vulnerabilities. Bug bounties
are when companies offer rewards for anyone finding software
vulnerabilities.
Typical vulnerability identification emerges from the
published Common Vulnerabilities and Exposures (cve.mitre.or
g) on a public website. Some are associated with known fixes
and others still need solutions. Solutions apply mainly to
published software or when development includes open-source
and third-party inclusions. Most software depends on other
tools, libraries and code as dependencies to work effectively.
Scanning tools like ACAS, ZAP, Anchore, Fortify and others can
identify these gaps in production and prevent publication.
Vulnerabilities tie into the risk equation of risk (R) equals threat
(T) times vulnerability (V) (R=TxV). Without a vulnerability,
risks are minimal but similarly, without a threat, risks are also
minimized. Removing known and suspected vulnerabilities
zeroes out half of the equation.
Threats emerge internally from the organizational gap.
External threats, those seeking to harm your organization, can
never be fully addressed, but one can remediate internal
threats. Having good people creates value. Similarly, people can
create errors. The most secure system is the one with no users;
a locked system in a locked room with no external connections
generates complete security but produces no value. Introducing
users creates organizational gaps, but one can mitigate these
with sound management practices. Training and culture are the
two biggest wins for organizations. These elements must adjust
for constant change in people, tools and processes to ensure
familiarity and secure practices. One of the best ways to deal
with all gaps is to link security teams with development
elements and connect security teams to operational and
enterprise tasks.
Emphasizing operations
The basic DevOps philosophy suggests a ‘you build it, you run it’
approach. This philosophy means that every dev team should
be capable of running their production environment. Self-
operations work excellently for small instances, but an
independent ops team becomes necessary when the goal is to
expand value to millions of customers. Ops teams categorize
issues and bugs whose resolution might distract developers and
reduce organizational flow value flow. These flows work best
with a help desk and hot fix resolutions, although one key
element remains interaction to update baselines continually.
Interacting with a designated ops team can be challenging.
One must first recognize that ops teams are no less critical than
other developers; after all, ops works with the customers, the
direct value source, so revenue flow ceases without ops. The
first step realizes that interactions with a code base occur as
three levels: simple, complicated and complex:
Simple: Adding two apples to two apples equals four
apples. Changing any number changes only the total
number of apples. Making one change in code creates a
direct result.
Complicated: Buying two apples requires managing the
store inventory, personal inventory and available funds.
Code changes create defined results in multiple areas
requiring mapping.
Complex: Every issue ties to a random number of other
issues, the Gordian knot. Buying an apple relates to store
space, business income, personal outcomes, recipe
management etc. Complex issues occur in code with
tightly coupled, extensive interactions where it’s difficult
or impossible to identify all outcomes from a code change.
The software development goal produces a baseline with
primarily simple issues, and the operational goal solves
customer issues from complex and complicated to simple
resolutions without excessive development. Development
constantly resolves complex and complicated issues to build
simple, loosely coupled coding integrations. Architectural
patterns and solutions in the previous chapters helped model
building within these frameworks.
Once identifying an issue, one must also assess how change
affects other elements. An ops team solving all customer issues
would be spectacular, but sometimes bug fixes, CVE elements or
improved user interfaces must be reflected within the standard
code base. Interactions between DevOps teams are defined as
dependent, independent and interdependent:
Dependent: Changing one item changes the other; heating
water raises the temperature in the pot. Adding columns
to a shared user database interaction affects all other
users.
Independent: Items do not interact; heating water does not
change the refrigerator temperature. Creating a user login
does not affect the overall codebase.
Interdependent: All items interact with all other items.
DevOps interactions are interdependent as teams rely on
each other. Any significant change by Ops should be
reflected in the overall baseline code.
Ops do not operate independently. Their success depends on
establishing dev team connections, understanding new code
flow and managing release cycles. Software development never
ceases so ops teams will constantly add new upgrades and
features to already-delivered software. In the cloud-native
environment, these changes occur as a continuous flow.
Upgrades that impede flow are detrimental to the overall
DevOps outcomes.
Ops teams emphasize keeping developers working on new
code and the next feature as long as possible. Sometimes users
have issues requiring resolution that are inherently
uncomplicated from an experienced operator’s perspective. If
each customer reaches out to the original developer, it can
impact flow. For example, if your car starts leaking, you do not
contact the engineering team but instead visit a local mechanic.
Sometimes the manufacturer contacts teams about a recall
notification or a problem with preemptive fixes. The same
distinction applies to ops. One way for teams to manage user
inputs is through a tiered value flow like this:
Tier 1 – Consult: the issue can be solved with current
documentation as written, i.e. I need a new user account.
Tier 2 – Configuration: the issue can be solved with an
application of current standards, accesses and
permissions, i.e. my system cannot reach a tool I need.
Tier 3 – Code: the issue cannot be solved without changing
existing code, i.e. I need new database fields or I want the
button on the other side of the screen.
Ops teams should be familiar with all published documentation
to resolve Tier 1 issues quickly, and remind users where they
can locate instructions. Tier 2 issues require the most expertise
from the help desk, merging current standards with their
system understanding. These problems happen when users
import new tools and then security standards prevent
integration without additional privileges. Ops teams function as
the administrators for deployed instances or enterprise-level
solutions. Tier 3 issues require new code to resolve them. If the
code change is short, one to three sprints, the ops team writes
the code and resolves issues. New issues can be added to the
overall dev backlog if more time is required.
Resolving issues relies on an effective ticketing system.
Ticketing characterizes the issue within a tier, passes it to the
right individual and ensures resolution. Writing strong service-
level agreements (SLAs) supported by objectives and indicators
supports executing the ticketing system. Ticketing also provides
a historical backlog to identify common issues, update needed
documentation and experiment with group solutions such as
manufacturing recall. Operationally focused elements of the
DevOps teams are not merely an administrative function but a
vital integration into maintaining flow throughout the process.
Processing for security and stability
Defining stable and secure processes can be challenging. We do
not merely affect static guidelines but a continuous model
changing daily. The goal becomes keeping the car between the
lines and keeping a constantly changing shape within dynamic
lines with moving traffic on the road. Good stability and
security processes are akin to teaching someone to juggle, ride a
unicycle and move through a crowded market. Each contains
difficulty but learning the three together requires an
interdependent approach.
Providing security compliance for delivered components,
often through established controls, can be critical to effective
software. Establishing delivery model controls depends on
several factors, including what standards exist, volatility,
agency theory and transaction costs.3 Volatility controls how
much standards change. Subscribing to an external policy
addresses compliance but sometimes one loses control over
how often elements change. Agency theory entails how much
control your company has over changing more extensive
standards and how you affect those changes. For example,
changes in user passwords are relatively easy but massive
protocol changes, such as when Google went from HTTP to
HTTPS, take more work. Transaction costs are company costs to
make those adjustments.
Security control frameworks sometimes address integrating
software but often focus on policy, while mitigation attempts
begin with artifact creation to assess controls.4 Those all
provide implementable solutions but no factor addresses
initially bracketing preconceived security experiences to
understand how those affect the social basis of exchange
between project teams. Stability and security processes require
observability by all teams having pipeline artifact access,
system processes from dashboards and voluminous standard
documentation.
Security creates key DevOp functionality when needed
functions become part of the overall culture. Seventeen per
cent of articles from 2016 to 2019 discussed the importance of a
security culture.5 Some organizations can manage technical
solutions to cultural security with automated models providing
continuous security.6 The selected cultural strategy matters as
sound strategies are statistically significant in reducing
organizational security breaches. In addition to reducing
breaches, processes maintain value by creating customer
stability. Sound cultures include extra-role functions as
processes employees follow even when required by policy.
Defining initial standards allows the integration of operational
monitoring practices, incorporating SRE personnel and building
security emphasis within the organizational culture.
Platform engineering
We briefly touched on platforms as the integrated technology
set to build applications. Many teams start with software
development by attempting to construct basic platforms.
Platform construction can often take one to two years before it
sufficiently matures to support development. When markets
change every 30 days, the desired software application can be
well behind by the time the platform is ready. Purchasing or
renting pre-designed platforms allows companies to focus on
the needed development aspects to increase ROI rather than
building a unique platform.
A common analogy for employing effective platforms
compares the process to building a woodshop. If you need to
build a shelf for the house (the ROI-producing software) and
you have not built one in the past, you likely lack the tools. You
head to the hardware store and purchase wood, a hammer,
nails, a saw and other items. When you return to the house, you
realize you are missing items and make another trip to the tool
store. This cycle repeats, and what was supposed to be a small
project turns into a multi-day expedition and practice. Good
platforms provide you with all the necessary tools and some
basic blueprints for shelves, birdhouses or racecars. One can
then modify the blueprint for multiple shelves, birdhouses with
multiple holes, or different models of toy racecars. The best
platforms might include a community to provide help, support
systems and collaboration so friends can visit the woodshop
and help. With that definition in mind, we can return to
platform engineering in terms of a DevOps platform.
The most essential element of the platform for modern
business is the collaborative environment. This collaborative
area includes work management functions, IDE to write code,
communication tools and source/version control options for the
deployed code. This area also includes knowledge repos, the
blueprints, suggesting known good options for pipeline
approaches, previous code and possible integrations to third-
party code. The next element requires a platform to provide the
woodshop as deployment and production targets as clusters. In
software, this means a place exists to test and deploy new code.
As the developer, you own and manage target specifications.
However, the purchased platform stays current with modern
technology to keep clusters stable and secure. At the same time,
the platform provides observability to manage items from users
to operational code.
Many companies currently offer some portion of this
integration. Examples like GitLab, Jenkins, Harness and Azure
DevOps manage different elements. One challenge I see with
many platforms is that rather than end-to-end options, they
focus on a particular spot, such as operational deployment or
development to production. This practice significantly improves
an area but still causes difficulty in creating full observability
or leaves significant training hurdles. The platform goal should
allow developers to focus on development while automating
other functions.
Summary
Effective development cannot happen unless applications are
stable and secure. Some of the other chapters discuss how to
build adaptive software and scaling solutions but this chapter
laid the foundation for the other half of the DevOps table. The
first step was finding and shaping personnel to follow slightly
different guidelines than those focused on development. We
examined the difference between the DevSecOps Manifesto and
traditional Agile, and how to pick security professionals and
then integrate those folks into the teams necessary to your
organization.
Moving from people, we developed the processes to show
continuous monitoring with security and stability. The chapter
expanded into how to use site reliability engineers to buttress
your operational processes, which appear in the overall cultural
outlook. These processes then link to finding the right
technological tools to monitor and observe, whether by
implementing a platform or using unique scanning tools. These
chapters have prepared you for a confident DevOps
implementation now, but the next chapter will look at what you
can expect for DevOps as we move into the future.
Notes
1 History of DevSecOps, https://www.devopsinstitute.com/the-history-of-devsecops/
(archived at https://perma.cc/HM5D-BUNL)
Advancing process
Advancing process requires one to stay in line with innovation
within the industry. You want to avoid being either a first
adopter or a late adopter. First movers gain some advantage
but can suffer some delays as they fix issues later adopters will
use to their advantage. Good company processes utilize the
three horizons model shown in Figure 11.1 to spend 70 per cent
of their time on sustaining current features, 20 per cent on
adjacent technology and then 10 per cent on new markets.
FIGURE 11.1 Three horizons model
Artificial intelligence
AI, or artificial intelligence, has monopolized many
conversations over the past several years. Two broad categories
of AI exist: task-oriented and general. A task-oriented AI focuses
on completing specific tasks such as winning chess games. The
first AI engine for chess was Deep Blue, which in 1997 became
the first AI to defeat a human grandmaster. General AI is
machines accomplishing multiple tasks such as winning a chess
game, completing a coding program and holding a conversation
simultaneously. Most AI implementations are narrowly focused,
accomplishing a specific task faster and more efficiently than
humans. The fear associated with AI solutions assumes that
general AI machines could eventually replace human workers.
Many AI tasks and tools like GitHub’s Copilot work today to
assist human workers in tasks while leaving the final decisions
to humans.
Another option for AI looks at generative adversarial AI
models. These tools set multiple AIs against each other to
accomplish tasks and then learn by comparing outputs. An
example occurred in the Defense Advanced Research Project
Agency (DARPA) AlphaDogfight trials. The AlphaDogfight taught
an AI model to conduct dogfights in an F-16 flight simulator,
first against AI models and then with the winner combatting a
human. The Heron Systems AI model beat the human in five
straight fights, none lasting more than 30 seconds.1 The
advantage was that the AI had run four billion simulations
before the trials, garnering experience unmatched by any
human pilot and not suffering missions potentially resulting in
a human’s death or capture. These future AI models benefit
significantly from integrated machine learning practices.
Machine learning
Machine learning (ML) practices are highly integrated with AI
solutions. These models create complicated computer
algorithms designed to learn decision trees and execute them
quickly across large data volumes. In either case, ML systems
are a subset of most AIs. Integrated data from ML systems
predicts what comes next, a faithful metrics implementation
through flow, feedback and improvement. One subset within
ML involves deep learning systems designed to mimic human
brain activities through multi-layered neural networks. Rather
than offer a single algorithm, each neural network element
processes data similar to what we know of human brain
functions.
ML systems appear in many DevOps processes. As discussed,
most software development produces significant data in many
areas as gathered by observability processes through metrics.
These systems use either unsupervised or supervised models to
reach predictions. Unsupervised models draw inferences and
find patterns through clustering group data points and
dimensionality reduction. Reducing dimensions eliminates
variables through elimination or extraction, moving those
things that do not relate to the desired solutions. An ML solving
our coffee problem from Chapter 5 might start with all location
data and then rapidly remove elements unrelated to coffee
within the kitchen or multiple kitchens.
Unsupervised models use regression or classification to
address problems. Regression depends on inputting continuous
output through linear regression as a best-fit line; decision trees
offer two options at every decision point or the aforementioned
neural networks. Classification seeks discrete outputs through
logistic regression, support vector machines, hyperplanes and
naïve Bayes. Logistic regression applies logarithmic scales to
model events where exponential scales are more common than
linear ones. Support vector machines (SVM) help find
hyperplanes with multiple decision points. SVMs find a best-fit
line in multiple dimensions, becoming the plane or shape to
reach the next step. Naïve Bayes looks for the best fit through
an initial assessment of the probability of Y occurring given an
X. The advantage is that these machines operate much faster
than humans and can be programmed to ignore the cognitive
bias elements discussed earlier.
Every ML system only operates as effectively as the data
entered. Data that already contains cognitive bias demonstrates
the same bias in their predictions. If you work with an ML,
preparing data takes four steps. First, one gathers metrics, logs
and traces before aggregating that data into appropriate
formats. Next, pre-process the data through cleaning, verifying
and formatting into usable sets. Pre-processing matches formats
between sets, ensures reliable data and de-duplicates some
records. Once data is available, divide the elements into three
sets: one to train the ML to process, one to validate accuracy
and a third for model performance. At that point, one can train
and refine the model to develop the needed predictions. AI/ML
models pose significant advantages, but as they expand, the
available technology can only limit the data needed for
practical conclusions.
Advancing technology
Throughout our DevOps discussion, one common factor
appears as humans select technology, shape it to their needs
and then the technology shapes the humans. Then the cycle
repeats as you select additional tools to maximize value. Only
human imagination limits tools but at some point the technical
capabilities shape the direction in which our imagination
grows. I offer two suggestions for where expanded imagination
may dramatically alter the ability to produce accelerated value:
quantum computation and bio-design. Each of these offers not
only a different thought path but new hardware-shaping input.
Today, we are all wedded to our basic laptops, hardware
systems, servers, GPUs, cloud computing and other technology.
Bio-design and quantum computing currently require moving
from that model to different approaches. One common element
is that neither requires binary restrictions that have formed the
root of computing processes since Claude Shannon’s 1948
paper, ‘A Mathematical Theory of Communication’.2 This article
was the first to suggest that any analogue input could appear as
digital components, the binary standard where all objects must
be either a 1 or a 0. Quantum computation and bio-design turn
this theory on its head.
Quantum computation
Quantum theories deal with the smallest discrete units of any
physical theory; these particles behave differently than larger
particles. One of the greatest human minds, Albert Einstein,
called quantum mechanics ‘spooky action at a distance’.3
Quantum does not use simple switches since no quantum action
finalizes until observed. Every quantum particle occupies four
states rather than the two associated with binary. These states
are spin-up, spin-down, positive and negative. At its heart,
quantum computation harnesses the collection properties of
quantum states to incorporate properties of superposition,
interference and entanglement to perform calculations. This
section presents an overview, but one could spend a career
exploring the ramifications involved with quantum computing.
Superposition states that you can add any two quantum
states, resulting in another valid quantum state. This theory
allows for mathematical computations with quantum particles,
critical for quantum positioning. Interference states that each
particle interferes with itself in a wave state. On measurement,
the initial superposition collapses into a single state. Basic
interference states the process is random, but the third
property, entanglement, prevents those options. Entanglement
means any entangled pair, like generating a photon and
splitting it, are related in a way that is indefinite until
measured. Once measured, the act of measurement defines the
state of the other entangled particle in a predictive way.
Those principles are all complicated and evaluate with matrix
mathematics. One application of this uses Grover’s algorithm to
conduct unstructured searches in a disordered list to find
solutions more quickly. This algorithm is known as the
travelling salesperson problem: if a salesperson must visit X
cities and carry Y items, what is the best combination to
generate the most profit and the lowest cost?4 A typical
algorithm that requires finding a solution with 1,000 (N)
possible elements must try, on average, least 500 (N/2) or at
worst (N-1) possible solutions to verify accuracy. When
quantum computation is applied, this same solution only takes
31.6 tries ( ). One can easily see the advantages in time and
computation power required.
The lack of computation power needed is a significant
advantage as most quantum particles require sophisticated
environments operating at temperatures near 0 degrees Kelvin.
For a quick reference, the conversion between a Celsius
temperature and Kelvin is to add 273 degrees. Zero Kelvin is
absolute zero in scientific terms, the temperature at which all
molecular motion ceases. Quantum computation occurs in
qubits, similar to bits on a classical computer, with each qubit
holding one quantum state. The largest current quantum
computer just surpassed 1,000 qubits.5 One other limitation is
that classical computation dedicates significant processing to
error detection and correction. In common error detection
without correction, it can be as simple as adding one bit to
every byte (8 bits sent). Quantum requires nine error detection
qubits for every message qubit sent. One can see where this
rapidly becomes difficult with smaller machines.
Quantum offers significant opportunities to solve more
complicated problems in less time. Most of the current
advantages are in offering predictions for mathematically
complex models. This benefit means quantum excels at
problems like materials science, pharmaceutical research, sub-
atomic physics and logistics. If your DevOps model supports
these areas, experimentation in this area can help. Several
quantum coding languages like Qiskit, SILQ, Q and others exist.
These allow the use of classical computers to model quantum
problems. One can use a service like AWS Braket to rent time
and run code on actual quantum computers without physically
purchasing the computer. I again suggest evaluating quantum
computing beyond this short reference if these areas are
intriguing.
Bio-design
Moving away from any classic material study would be the field
of bio-design. This field is still largely experimental, moving
from using binary switches powered by electricity to using
protein molecules to provide calculations. These models use
protein or DNA particles to perform computations but are still
largely experimental. All bio-design computers would be
constructed from living things rather than silicon circuit
boards. That prospect raises the possibility for power changes,
rapidly expanding capability through growing new hardware
solutions and integrating neural networks based on brain tissue
rather than extensive algorithms. The hardest part of bio-
designed systems will likely be translating biological results
into mechanical answers.
Considering biology, the problem becomes whether one could
generate a protein to show a certain answer. These answers
might be accessible through DNA sequencing, environment
changes like PH acidity or alkalinity, or even just the production
of a simple protein. Since bio-design computers do not exist,
even the basic algorithms and tools for evaluating those models
are lacking. DNA uses four bases in two pairs: adenine with
thymine and guanine with cytosine. These pairs are then
extended across DNA strands to create proteins. At the core
level, one could modify pairs to carry data, since they already
carry some data. The challenge becomes converting those pair
models into the data we need as DevOps professionals rather
than the data supporting a living being. I am eagerly following
this area to explore the new potential. Of course, all those
technological potentials should be shaped by how we
implement cultural processes for flow, feedback and
improvement.
Final thoughts
First, thank you for joining me on this DevOps exploration. I
hope it has garnered some thoughts and suggested some
practices to improve your initial DevOps transformation or
your continuing journey into these areas. Our journey started
with a look at DevOps from a historical basis, compared it to
other solutions in practice and then formed an initial
foundation. From there, we moved into technical processes
associated with architecture, observability, constructing
pipelines and testing. In that sense, we took our initial human
perspective and examined how it shaped the technology we
use.
In the following part, we examined how technology shaped
our practices. Understanding those tools associated with
technology allowed the return to future DevOps practices,
examining the practitioner mindset and how we maintain an
effective culture through practices. This exploration evaluated
cognitive bias and how individual experience works through
social exchange to shape organizational interactions or the
worker’s phenomenological experience. Equally important, we
took time to evaluate maintaining secure and stable practices in
developing software. Finally, the focus changed to future
technologies and how those may affect continuing to deliver
confident DevOps solutions.
In parting, I would offer some thoughts to remember across
your DevOps journey. First and foremost, DevOps is a cultural
solution, not a technological one, and should always centre on
people. People are the most important part; keeping people
happy, effectively trained and managing flow is the primary
step. Flow then generates the necessary feedback for
continuous improvement. Second, I would recommend that one
never sublimates learning for delivery schedules. Value is
necessary, the customer is essential, but too often DevOps
organizations start weeding out learning, hackathons and time
to experiment in favour of delivery schedules. This generates
short-term gains but moves the culture away from DevOps to
more Waterfall practices in the long run. Solving this requires
constant communication from leadership to developers and
operators and regularly listening to those at the tactical level.
Third, and most important, DevOps is constantly changing.
How one organization implements DevOps solutions will not
transfer directly to another organization. Your confident
DevOps interpretation may radically differ from the DevOps
team around the corner or down the street. Do not let someone
else’s success detract from your current culture and model.
Every DevOps solution is individual, so I have offered basic
frameworks and models to guide your journey. Confidence
emerges from knowing your solutions drive from the flow you
want, feedback that creates action, and improving to generate
additional value.
Thanks again for joining me on this journey. Please feel free
to reach out directly and ask any further questions you have
about this book. Whether you like the book or have issues, feel
free to post reviews and comment to others. I remain adamant
that DevOps cultures offer the best value-driven software
development options and that the cultural model has wide-
ranging benefits beyond just software. Some of these models
appeared early as we saw how DevOps grew from physical
manufacturing solutions and I think future organizations will
incorporate more and more DevOps concepts. Lastly, this book
would not have been possible without the dedicated support of
my family and the folks at Kogan Page. I hope your DevOps
journey is confident and successful.
Notes
1 Tucker, P (2020) An AI just beat a human F-16 pilot in a dogfight — again, Defense
One, https://www.defenseone.com/technology/2020/08/ai-just-beat-human-f-16-pilot
-dogfight-again/167872/ (archived at https://perma.cc/W2V2-PZ6B)
5 Wilkins, A (2023) Record-breaking quantum computer has more than 1,000 qubits,
New Scientist, https://www.newscientist.com/article/2399246-record-breaking-quan
tum-computer-has-more-than-1000-qubits/ (archived at https://perma.cc/ZY89-9VX
4)
INDEX
failure-acceptance 214
false consensus bias 200–01
feature demonstrations (demos) 83
federal holidays 70
FedRAMP 239
feedback 13, 22, 53–54, 61, 80–84, 132, 173
see also continuous monitoring; continuous testing
Fibonacci sequences 35, 36
flickering tests 171–72
flow 13, 17, 60–61, 71, 75–79, 173, 213, 223, 225, 242–43
Flux 205, 228
Ford, Henry 14, 17
Fortify 167, 239–40
Fowler, Martin 20, 170
France-England tunnel 16
Fukushima (Japan) earthquake 179, 199
functional pipelines 148
functional tests 166, 168
functions 154
Kaiser Permanente 26
kaizen 17, 18
kanban 79, 146, 205, 223–26, 230
Karate Kid 67–68
key fobs 38
knowledge sharing 70, 78, 82, 87, 197–98, 255
knowledge, skills, abilities assessments 265
Kodak 253
Kubernetes 111, 152, 162, 231, 247, 252
Mac 125
machine learning (ML) 166, 238, 263, 266, 269–71
maintenance 34, 38–39, 45, 46, 48, 50
managed maturity 204, 206
managers 60, 72–75, 214–15
Manen, Max van 191, 193, 196
Manhattan Project 15
manual processes 21, 122, 125, 158, 159, 163, 257
manual tests 166, 175–76
master-client architecture 218, 221–23
Mattermost 77
maturity levels 178–81, 202–07, 239, 251
mean time to restore 126
meaning relationships 192
means 195
see also communication
measurements 121, 122, 124, 126–29
mediator-driven architecture 221
merge requests 62, 215
metaphor 4–5, 55–56, 57, 192
metrics 87, 120–22, 124–40, 154, 159, 173, 247–50
DevOps Research Association 23
lead time to change 23, 78
test coverage 178
micro-controllers 97
microkernals 219, 222
microservices 109, 110, 111–12, 219, 221, 222
Microsoft Excel 247
Microsoft 365 25
Microsoft Visio Pro 114
Microsoft Windows 25, 125
mindset 40, 189–209
see also culture
minimal viable products 182
misinformation bias 200
mixed method experiments 86
Mocha 184
mocks 168, 176
monitoring 159, 172–77, 245–47
monolith structures 109, 110, 111, 221
Mozilla Public Licence v2.0 40
multiple metrics 126, 132, 134
multiple pipelines 45, 145, 146, 157, 158
multiple testing environments 171
Pacific Railroad 15
packages 101, 102
pair programming 16, 56, 215, 256
Pandas 166
pandemic (Covid-19) 33, 44, 110
Parasoft 184, 185
Pareto measurements 128–29
‘parking lots’ 78
passive elements 106
patches 35, 36, 37, 38, 48, 51, 52, 61, 168, 201
pathological organizations 135
PayPal 26
peer-driven architecture 221–22
people observation 134–36
PeopleCert 23
perception 75, 193
phenomenology 190–94, 198, 200
Phoenix Project, The 23
physical architecture 218–19
see also microservices
physical design requirements 38
pipeline gates 144
pipeline visualization 160–62
pipelines 38, 45, 77, 142–63, 205, 246
automation 43
testing plans 172–73, 178
Planet Fitness 25
planning 34–37, 39–40, 46–47, 51–52, 54–55, 56
platform teams 42, 213
platforms 228–30, 254–55
see also cloud computing; clusters; code (coding); containers
Pluto 247
Polaris 15
polymorphism 98
pre-processing 271
pre-prod 144, 174
primary artifacts 252, 253
primary pipelines 148
probability 16, 148, 201, 237, 270
procedure testing 170
process management 77, 158–59, 182–84, 204, 217–26, 243–47
manual 21, 122, 125, 163, 257
process observation 136–38
production 143, 175, 177
profile 101
programme increment design 60–61
programme management 14–16, 20, 214–15
Programme Management Professional certification 16
project management 16, 230–31
Prometheus 121, 130, 138–39, 140, 230, 231
public relations 136
pull approach 17–18, 150
PurpleTeam 138, 140
push approach 79
Q 273
Qiskit 273
qualitative experiments 85
quality assurance (QA) teams 169, 183
quality control 128, 151, 165, 166, 178
quantitative experiments 85
quantum computing 271–74
qubits 273
sandboxes 42
SCALA 114
scalability 43, 109, 111
scaled agile framework (SAFe model) 59–61
scanning technology 156, 162, 256–57
Schutz, William 192, 193
Schwaber, Ken 20
Scikit-Learn 166
scrum 26, 72, 79, 146, 205, 223–26, 230
scrum masters 212
SDLC see software development lifecycle (SDLC)
secondary artifacts 252, 253
security 42–43, 154, 200, 203, 233–59
security experts 236–38
security patching 37, 38, 52, 61
security standards 236–37
security teams 238–40
Selenium 170, 184, 185
senior DevOps engineers 211
sequencing 103
server status 246
serverless structures 109, 110
servers 109
service-based architectures 220, 221
service level agreements 243, 249
service level indicators 114, 249
service level objectives 249
service mesh 112, 152, 156, 231
Shafer, Andrew 21–22
sharing 70, 78, 82, 87, 197–98, 255
‘shiny topics’ 78
sigma deviations 128
silos 23, 24, 26, 31, 45, 82
SILQ 273
Simian Army 179
simple interactions 241
simulation 109–10, 181
single code bases 110–11
site reliability engineering 247–51
Skelton 213
Skype 221
Slack 77
SLAs 243, 249
SLIs 114, 249
SLOs 249
small releases 48, 56
smart home thermostats 97
smoke tests 166, 168–69
social behaviour 72–73, 101, 103, 106, 195
social capital 196
social exchange theory 72–75, 190, 194–98
social responsibility 73–74, 195
socialization 148
SOC2 239, 251
software architects 60, 112–13, 222
software asset control 148
software composition analysis 156
software development lifecycle (SDLC) 30–64
traditional 85, 175
SOLID architecture 98–100
solutions/requirements alignment 38
SonarQube 154, 156, 167
source control 147–48
space-based architecture 218–19
Space Shuttle Challenger 16
spikes 201
spiral approach 49–50
Splunk 130
sprints 76, 79, 82–83, 191, 195, 225, 243
square brackets 116
stability 233–59
standard query language (SQL) 167, 224
standard testing 167
standardized IDEs 227
standards 238–39, 244, 245, 257
codes 55, 57, 112
see also National Institute for Standards and Technology (NIST)
Starbucks 26
‘State of DevOps’ (Brown) 23, 257
state machines 103
statement coverage tools 155
static analysis 165, 167, 176, 177
static application security testing 165
stories 223
story points 225
stream-aligned teams 213
strengths 31–32, 61, 63, 79, 83, 87
agile 57, 58
DevOps 42, 43–44, 45
SDLC 39, 41
TOGAF 108
UML 103, 104
waterfall 51, 52
stubs 161, 162, 168
sub-tasks 223
subcontractors 70
subscription models 25
sunk cost fallacy 218
superposition 272
supervisory control and data administration systems 109–10
support vector machines 270
suppression 136
surveys 167, 193, 198
sustainable pacing 57
Sutherland, Jeff 20
SWOT model 30, 31–33, 39–41, 43–45, 51–52, 57–58, 79, 103–04, 108
syslogs 125
system design 95–98
system reliability engineers 136
XML 114
XP 52–58
First published in Great Britain and the United States in 2024 by Kogan Page Limited
Apart from any fair dealing for the purposes of research or private study, or criticism or
review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may
only be reproduced, stored or transmitted, in any form or by any means, with the prior
permission in writing of the publishers, or in the case of reprographic reproduction in
accordance with the terms and licences issued by the CLA. Enquiries concerning reproduction
outside these terms should be sent to the publishers at the undermentioned addresses:
The right of Mark Peters to be identified as the author of this work has been asserted by him
in accordance with the Copyright, Designs and Patents Act 1988.
All trademarks, service marks, and company names are the property of their respective
owners.
ISBNs
A CIP record for this book is available from the British Library.
A CIP record for this book is available from the Library of Congress.