0% found this document useful (0 votes)
340 views44 pages

Big Data Quarterly Fall 2021 Issue

Uploaded by

Twee Yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
340 views44 pages

Big Data Quarterly Fall 2021 Issue

Uploaded by

Twee Yu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Volume 7 Number 3 n FALL 2021

BIG
DATA
COMPANIES
DRIVING
INNOVATION

WWW.DBTA.COM

Accelerating Value With Machine Learning

Cybersecurity Is a Data Problem

Is There an IoT Rainmaker in Your Company?


REGISTRATION
OPENS SOON!
FEATURING THESE SPECIAL EVENTS

MAY 17-18, 2022


PRECONFERENCE WORKSHOPS HYATT REGENCY
MONDAY, MAY 16
BOSTON, MA

dbta.com/datasummit
PLATINUM SPONSOR WE’RE EXCITED TO WELCOME OUR COMMUNITY of data professionals
back to Boston next May for three days of practical advice, inspiring thought
leadership, and in-depth training. JOIN YOUR PEERS to learn, share, and
celebrate the trends and technologies shaping the future of data. See where
the world of big data and data science is going, and how to get there first.

BROUGHT TO YOU BY ORGANIZED AND PRODUCED BY


CONNECT:
#DataSummit
BIG DATA
QUARTERLY
Fall 2021 CONTENTS

editor’s note | Joyce Wells


PUBLISHED BY Unisphere Media—a Division of Information Today, Inc.
EDITORIAL & SALES OFFICE 121 Chanlon Road, New Providence, NJ 07974 2 Innovative Approaches for a Data-Driven World
CORPORATE HEADQUARTERS 143 Old Marlton Pike, Medford, NJ 08055

Thomas Hogan Jr., Group Publisher Lauree Padgett,


609-654-6266; thoganjr@infotoday Editorial Services departments
Joyce Wells, Editor-in-Chief Tiffany Chamenko,
908-795-3704; [email protected] Production Manager
3 BIG DATA BRIEFING
Joseph McKendrick, Erica Pannella,
Contributing Editor; [email protected] Senior Graphic Designer Key news on big data product launches,
Adam Shepherd, Jackie Crawford,
Advertising and Sales Coordinator Ad Trafficking Coordinator partnerships, and acquisitions
908-795-3705; [email protected]
Sheila Willison, Marketing Manager,
Stephanie Simone, Managing Editor Events and Circulation
908-795-3520; [email protected] 859-278-2223; [email protected]

Don Zayacz, Advertising Sales Assistant DawnEl Harris, Director of Web Events;
features
908-795-3703; [email protected] [email protected]

4 THE VOICE OF BIG DATA


ADVERTISING
Stephen Faig, Business Development Manager, 908-795-3702; [email protected] Accelerating Machine Value With Machine Learning:
Q&A with Lynda Partner, Senior Vice President, Pythian
INFORMATION TODAY, INC. EXECUTIVE MANAGEMENT
Thomas H. Hogan, President and CEO Thomas Hogan Jr., Vice President,
Marketing and Business Development 6 FEATURE ARTICLE | Joe McKendrick
Roger R. Bilboul,
Chairman of the Board Bill Spence, Vice President,
Information Technology Building a Competitive Data Architecture,
Mike Flaherty, CFO
One Technology at a Time

BIG DATA QUARTERLY (ISBN: 2376-7383) is published quarterly (Spring, Summer, Fall, and Winter)
by Unisphere Media, a division of Information Today, Inc.
SPECIAL SECTION > BIG DATA 50
POSTMASTER
Send all address changes to: 22 Introduction
Big Data Quarterly, 143 Old Marlton Pike, Medford, NJ 08055
Copyright 2021, Information Today, Inc. All rights reserved.
23 Big Data 50: Companies Driving Innovation
PRINTED IN THE UNITED STATES OF AMERICA
Big Data Quarterly is a resource for IT managers and professionals providing information on the
enterprise and technology issues surrounding the ‘big data’ phenomenon and the need to better 32 BIG DATA BY THE NUMBERS
manage and extract value from large quantities of structured, unstructured and semi-structured
data. Big Data Quarterly provides in-depth articles on the expanding range of NewSQL, NoSQL,
Hadoop, and private/public/hybrid cloud technologies, as well as new capabilities for traditional
Hadoop-to-Cloud Migration Priorities
data management systems. Articles cover business- and technology-related topics, including
business intelligence and advanced analytics, data security and governance, data integration,
data quality and master data management, social media analytics, and data warehousing.
No part of this magazine may be reproduced and by any means—print, electronic or any other—
without written permission of the publisher.
columns

COPYRIGHT INFORMATION
Authorization to photocopy items for internal or personal use, or the internal or personal use 36 DATA SCIENCE PLAYBOOK | Jim Scott
of specific clients, is granted by Information Today, Inc., provided that the base fee of US $2.00
per page is paid directly to Copyright Clearance Center (CCC), 222 Rosewood Drive, Danvers, Cybersecurity Is a Data Problem
MA 01923, phone 978-750-8400, fax 978-750-4744, USA. For those organizations that have
been grated a photocopy license by CCC, a separate system of payment has been arranged.
Photocopies for academic use: Persons desiring to make academic course packs with articles
from this journal should contact the Copyright Clearance Center to request authorization through 37 DATA DIRECTIONS | Michael Corey & Don Sullivan
CCC’s Academic Permissions Service (APS), subject to the conditions thereof. Same CCC address as
above. Be sure to reference APS.
The Age of Pirates Is Being
Creation of derivative works, such as informative abstracts, unless agreed to in writing by the
copyright owner, is forbidden.
Revisited in Today’s Digital World
Acceptance of advertisement does not imply an endorsement by Big Data Quarterly. Big Data
Quarterly disclaims responsibility for the statements, either of fact or opinion, advanced by the
contributors and/or authors.
The views in this publication are those of the authors and do not necessarily reflect the views of
39 THE IoT INSIDER | Bart Schouw
Information Today, Inc. (ITI) or the editors.
Is There an IoT Rainmaker in Your Company?
SUBSCRIPTION INFORMATION
Subscriptions to Big Data Quarterly are available at the following rates (per year):
Subscribers in the U.S. —$97.95; Single issue price: $25
40 GOVERNING GUIDELINES | Kimberly Nevala

© 2021 Information Today, Inc. Next, Ask: Why Not?


EDITOR’S NOTE
Innovative Approaches
for a Data-Driven World
By Joyce Wells

O UR BIG DATA 50 ISSUE is all about innovation. that thinking, “Why not?” in the deployment
Data technologies are changing quickly, with of cutting-edge technology is a critical step
new opportunities emerging to leverage data for but one that is often not given the attention it
business advantage. deserves. Governance and criticism get a bad
Taking a comprehensive view of the fresh rap, she notes, but thinking about what could
approaches being applied to build competitive go wrong and having beliefs and assumptions
data architectures, BDQ contributing editor Joe challenged is vital since AI and other advanced
McKendrick examines the key technologies for solutions can amplify errors and biases, serv-
data-driven organizations in an increasingly com- ing to reinforce patterns of behavior. Adding
petitive world. In this article, industry leaders to the focus on inventiveness in this issue,
weigh in on the use of data fabric, information NVIDIA’s Jim Scott considers machine-learn-
catalogs, AIOps, automation, cloud deployments, ing-driven approaches for cybersecurity that tar-
and workflow orchestration, as well as methodol- get prevention to reduce the risk of an attack
ogies for data governance, data quality, and real- and minimize the damage if there is one. The
time delivery. next evolution of machine learning approaches
In addition, in our latest Voice of Big Data is logistical, with data pipelines that leverage
interview, BDQ takes a deep dive into an increas- data ingestion, processing, and inference, says
ingly important subset of AI—machine learn- Scott, who explains that an example of such a
ing—with Pythian’s Lynda Partner. Partner pro- framework is Morpheus.
vides an overview of what it is, why it matters, New ways to improve data security are also
where it is being used, and the best ways for on the minds of License Fortress’ Michael Corey
companies to get started. “As more and more and VMware’s Don Sullivan. In this month’s arti-
companies realize that algorithms can do things cle, they liken today’s cybercriminals to the sea-
at a scale and speed that humans or regular faring pirates that emerged centuries ago. Simi-
analytics can’t match, machine learning will no lar to pirates, these modern-day criminals pose
longer be optional,” she states. a serious threat although they may be armed
Asking readers if they see an “IoT rainmaker” with little more than keyboards and internet
in their company, SoftwareAG’s Bart Schouw connections, say Corey and Sullivan, who discuss
explores the qualities that exemplify these individ- possible remedies.
uals and the innovations they will take advantage Of course, central to this issue of Big Data
of in order to dominate their market. Schouw Quarterly is the annual list of the “Big Data 50—
notes digital rainmakers not only make com- Companies Driving Innovation.” Each year, this
panies more efficient but also help to turn them special report shines a spotlight on companies that
into digital disruptors that can reap the ben- are helping to expand and improve the data eco-
efits of what the World Economic Forum has system with technologies, products, and services.
described as a $100 trillion opportunity created To continue to stay on top of technology
by the combined value to society and industry trends, industry research, and news, visit www
of digital transformation. .dbta.com/bigdataquarterly. Be sure to also
But modern approaches also require over- mark your calendar for the next Data Summit
sight, points out SAS’s Kimberly Nevala in her conference coming to the Hyatt Regency Boston
data governance column. Looking at AI and May 17–18, 2022 with preconference workshops
other data-driven solutions, Nevala suggests on May 16, 2022.

2 B IG D ATA QU A RTERLY | FA LL 2021


Key news on big data

BIG DATA BRIEFING product launches,


partnerships, and
acquisitions

Couchbase has announced the Hat Marketplace and the Red Hat organizations adopt an edge
general availability of COUCHBASE Ecosystem Catalog. https://min.io computing strategy. SCALE
SERVER 7, which the company is COMPUTING HC3 EDGE
calling a landmark release for its Rivery, the provider of a DataOps COMPUTING SOLUTIONS are
ability to bridge the best aspects platform, is introducing RIVERY CLI designed to provide customers
of relational databases, such as (command-line-interface), enabling with an autonomous infrastructure
support for ACID transactions, with customers to take the power, speed, that can run modern containerized
the flexibility of a modern database. and scalability of DataOps to a new applications alongside legacy
With Couchbase Server 7, enterprise level. Rivery CLI enables data applications as virtual machines.
development teams get a unified engineers and other data developers www.scalecomputing.com and
platform and no longer need to use to remotely execute, edit, deploy, www.ibm.com
one database for transactions and a and manage data pipelines via CLI
separate database for developer agility and convert data pipelines into Virtana, the AIOps observability
and scale. www.couchbase.com infrastructure as code. https://rivery.io company for hybrid cloud, is
expanding its offering of the
Redis Labs is now registered as Domino Data Lab, provider of an VIRTANA PLATFORM on AWS
REDIS, having dropped “Labs” enterprise MLOps platform, has Marketplace. According to Virtana,
from its name. According to the released DOMINO 4.4, which adds offering end-user customers a
company, the change signals the enhancements that reimagine the unified SaaS platform on the AWS
maturation of the company and the data science workbench. Domino Marketplace delivers precision
Redis open source project, which 4.4 introduces several new features, observability, helps to reduce cloud
it has contributed to since 2011 and including Durable Workspaces and costs, and de-risks public cloud
sponsored since 2015. In addition, CodeSync, which support a more migration to AWS. www.virtana.com
the company says, the name change productive way for data scientists to
also reflects the company’s mission to work. www.dominodatalab.com VERITAS TECHNOLOGIES , a
continue the growth of Redis as a real- provider of enterprise data protection
CognitiveScale, an enterprise AI products, has updated its Enterprise
time data platform. https://redis.com
company providing AI-powered Data Services Platform to extend
TIGERGRAPH CLOUD is now digital systems, is releasing CORTEX ransomware protection to every
available on all three major cloud FABRIC VERSION 6—a new, part of the enterprise. Veritas’
marketplaces: AWS, Azure, and low-code developer platform for flagship NetBackup solution
GCP. With the choice of public cloud automation, augmentation, and delivers ransomware protection
vendors, businesses are able to transformation. Cortex 6 helps for containerized environments,
significantly reduce the friction of enterprises create trustworthy immutability for Amazon S3, and
getting started with graph analytics AI applications faster, more integrated anomaly detection.
projects on the vast ecosystems affordably, and with business http://vrt.as/NetBackup
built by the major cloud platforms, outcomes delivered through KPIs
according to the vendor. based on insights from data, models, Software intelligence company
www.tigergraph.com and actions—all with minimal Dynatrace is extending
dependencies on underlying SMARTSCAPE, the Dynatrace
MinIO, a provider of high- infrastructure, according to the platform’s real-time and
performance, Kubernetes-native vendor. www.cognitivescale.com continuously updated topology,
object storage, has announced that to bring Dynatrace’s AIOps and
MINIO HYBRID CLOUD OBJECT Scale Computing, a market leader analytics capabilities to more
STORAGE has achieved Red Hat in edge computing, virtualization, open source services, including
OpenShift Operator Certification and and hyperconverged solutions, OpenTelemetry, FluentD, and
is now available through the Red is partnering with IBM to help Prometheus. www.dynatrace.com

DBTA. COM / BI GDATAQUARTER LY 3


THE VOICE OF BIG DATA

&
ACCELERATING VALUE WITH
MACHINE LEARNING
Lynda Partner, senior vice president for products and
offerings at Pythian, spoke at Data Summit Connect
earlier this year about how to accelerate value with
machine learning (ML). In this interview with BDQ,
she continues the conversation and highlights how
to select the right use cases for ML, avoid mistakes,
and manage an ML project once it becomes a reality.
Lynda Partner, Senior Vice President, Pythian
How do you describe machine learning?
Machine learning can be described as an application of AI that
provides systems the ability to automatically access data and use fraudulent so they can cancel a credit card before it is used for
that data to learn and improve from experience without being more stolen purchases. Chatbots are making employees more
explicitly programmed. While it’s fascinating to think that com- efficient by answering questions intelligently, saving the time
puters can learn without human intervention, the real value of employees would otherwise spend sifting through FAQs or taking
machine learning comes when the business can use the outputs up a live person’s time. And supply chains are being optimized
that are created as part of that learning—so when the result of as algorithms take into account more possible factors than any
machine learning is the ability to adjust an action according to human could keep in mind.
what was learned, then you get valuable machine learning.
Why is it so important for companies to get started now?
Can you provide some examples of where it is being used? As more and more companies realize that algorithms can do
One of the most well-known is evident every time we visit things at a scale and speed that humans or regular analytics can’t
Netflix or Amazon and we are shown suggestions about what we match, machine learning will no longer be optional. Competitors
should watch or buy next. To do this, machine learning algorithms will be using it and providing customer experiences, efficiencies,
have accessed tons of data about you and others like you to learn or improved functionality that will give them an edge, forcing its
what you are most likely to be interested in, and then they take adoption more broadly just to keep up. It is a change that will
that recommendation and automatically display it to you, and just be as impactful as the advent of computers themselves.
you. This has been incredibly effective in boosting sales in the case
of Amazon and engagement in the case of Netflix. According to Can you describe the level of maturity and experience
McKinsey, 35% of what consumers purchase on Amazon and 75% with machine learning at organizations you work with?
of what they watch on Netflix come from product recommendations The media does a great job of showcasing pioneers of machine
based on machine learning algorithms. learning, but the reality that we see every day is different. Yes, there
are a few companies that are very mature but most companies
What is the benefit for businesses? are still early in their ML journey. Even those who are actively
These algorithms have been able to generate high-quality developing models tend to be early in their lifecycle. We find very
recommendations that actually result in strong business outcomes few companies who have deployed ML models at scale within
by accessing massive amounts of data and then producing their organizations.
recommendations in real time. And because the costs to process this
amount of data have come down with the advent of cloud computing As a consultancy and solution provider, how does
and big data systems, the business case for using machine learning Pythian seek to help companies on their journey to
is much more compelling than it was, say, a decade ago. leveraging machine learning?
Machine learning isn’t just for consumer-facing services like We focus on helping companies get to business value through
Netflix and Amazon; it also has increasingly widespread use behind the use of machine learning. Building models for the sake of
the scenes. In industrial applications, ML models are predicting models is not the goal; the end goal is seeing the model working
when equipment needs to be serviced before it actually fails. Banks at scale, all by itself, learning and delivering outputs that drive
are using ML models to predict the likelihood that a transaction is revenue growth, or create loyal customers, or reduce costs, or even

4 B IG D ATA QU A RTERLY | FA LL 2021


THE VOICE OF BIG DATA
save lives. There are many steps in this journey; it’s much more than How do you advise companies to get started? Are there
hiring data scientists and setting them loose. We focus on the entire predictable steps in the process?
lifecycle of machine learning, from ideation to implementation at There are predictable steps, but we often find that companies
scale—and the care and feeding of models beyond that. start in the wrong place. You want to avoid doing models for
the sake of modeling and ensure you focus on selecting the best
Has the pandemic accelerated or slowed down use case before you start modeling. The best use case is one
activities in this area? that can be tied directly to a business problem or opportunity.
During the pandemic, we saw a shift toward ML use cases that The best use case when you are starting out is one that has
focused on either cost savings or enhancing the digital customer a high technical-feasibility score and a low data-risk score.
experience. It really depended on how each organization was These two attributes will help you deliver faster, which will
affected. But I would say that the shift to cost savings was the make the ROI on your project higher. To do all this requires
most obvious. Those companies that were forced to become more a team of businesspeople, analysts, IT folk, and data scientists
digital were usually preoccupied with less sophisticated digitization all working together.
projects like adding curbside pickup if they were a retailer or
dealing with huge growth in online traffic. Others, whose business What are some of the adjacent technologies and
was really hurt by the pandemic, took the opportunity to look at methodologies that organizations need to focus on?
ML to reduce costs and become more efficient. Depending on which study you look at, data scientists spend
between 45% and 80% of their time integrating and cleaning
What are the issues that are speeding or hindering data, when they really want to spend all their time visualizing
efforts to adopt machine learning? and modeling. An enterprise data platform that provides access
The worst thing that can happen as companies start to adopt to integrated, cleaned data in its non-aggregated form would
machine learning is that they realize a string of failures upfront that speed up machine learning projects immensely. It is the single
sour them against future investments. So, to me, spending the time best thing you can do to accelerate your outcomes and make
to pick the best use cases, the ones that are most likely to be winners— your project more cost-effective at the same time.
even small winners—is one of the best things a company can do.
A second area to focus on—and it’s tied to my previous point— Beyond rigorous use case selection, what else should
is the need for collaboration across multiple teams. A successful organizations do early on?
machine learning project involves IT, business owners, data They should also invest in a cloud data platform for easy
scientists, architects, security folks, and more. Bringing these access to data. In addition, they should start a data governance
people together early in the process is a critical success factor. program, and educate more people about ML to reduce the
communications gap. For companies that have already done
What else is important? those things, it is time to invest in an integrated development
Having access to lots of data—clean, integrated data—is critical. environment for faster model deployment with IT help. They
Without data, your model will starve to death. I joke that a machine should also invest in MLOps skills, tools, and processes and
learning model is like Sesame Street’s Cookie Monster with his start thinking about model management.
insatiable hunger for cookies. In the case of machine learning, your
model has an insatiable appetite for data. Your model is only as Looking ahead, where do you see the greatest potential
good as the data you feed it. of machine learning?
Because there is no shortage of places that machine learning
Are there market segments or departments that will can be applied, its greatest potential will be unlocked when
benefit most from its use? its complexity can be reduced. Like the advent of low and
There isn’t a functional area that can’t benefit from machine no code, which brings programming capabilities to the non-
learning—but not all projects are created equal. Picking the right programmer, machine learning’s full potential will be fully
use case is so important. For every eight projects that are started, realized when a), data scientists can be made more productive
we see four making it to the deployment phase, and one getting by eliminating the non-modeling parts of their job, and b),
deployed into production. Based on published survey data about when even the modeling parts of their job are accelerated with
the effort and duration for each phase, we think that can translate prepackaged models that can be adjusted—not developed
into $1 million to $2 million per successfully deployed model. That from scratch.
cost can be reduced with better upfront planning and easy access to
data, but with that kind of spend profile, we see machine learning This interview has been edited and condensed. For the full version,
projects more often in larger enterprises. go to www.dbta.com/bdq.

DBTA. COM / BI GDATAQUARTER LY 5


Building
a Competitive
Data Architecture,
One Technology
at a Time By Joe McKendrick
IT’S TIME to rethink data Data fabric “integrates, manages, and sis, or preparation to make sense,” he said.
architecture. The architectures that have governs all data across the hybrid cloud, “At the core of any data strategy, you need
been built over the years were suitable bringing together pieces of business that to know your data really well—automa-
for on-premise, legacy environments but would otherwise be approached in iso- tion can help a company do this quickly
tended to be static, inflexible, and accessi- lated instances,” said Beate Porst, program and reliably.”
ble to only a few. Today’s data-driven orga- director, product management, data, and Still, even with all this data, automa-
nizations demand capabilities that adapt AI, at IBM. “Advanced machine learning tion isn’t yet pervasive enough to make a
to the enterprise and open new paths of and AI technology allow for data fabric to difference. “There’s only about 5% to 10%
innovation to business users. Achieving bring all of these parts together while also penetration of automation at companies,
leadership in today’s economy requires reaching a higher degree of automation, maybe even less,” Parashar said. “The emer-
identifying and preparing for the emerg- optimization, and augmentation.” gence of smart discovery and integrators
ing technologies and methodologies that A data fabric architecture has the will enable such automation. For exam-
deliver transformation. potential to expose data “to AI and other ple, if we can auto-discover and identify all
At stake is the ability to compete on ana- projects without having to move it,” Porst parts of a company’s data sources, such as
lytics, deliver superior customer experience, continued. “A well-implemented data fab- [through the use of] a smart discovery cata-
enable personalization, and do so in real ric can analyze and extract information log that can feed into an integrator that can
time. Data industry leaders see an impressive regardless of where it is stored. Crucially, start syncing all of data automatically, data
constellation of technologies that are mak- it can build a connected network of data engineers, data integrators, and application
ing data architecture development more assets, giving businesses a holistic view of engineers can immediately understand
aligned with pressing business require- where their data lives.” where data is coming from, where it’s clas-
ments, as well as helping to make their jobs Looking ahead, Porst sees the data fab- sified, and how to leverage it.”
easier. Here are leading technologies that are ric approach continuing to help businesses Currently, Parashar pointed out, “auto-
reshaping approaches to data architecture. “unlock their data and open it up to the mation can connect 10 or so business appli-
power of AI.” In addition, she added, “there cations together without interference and
DATA FABRIC will most likely be a significant increase do data discovery. In 5 years, we should
Interest is growing in data fabric, an in the level of automation, self-healing, have automation that can do this for hun-
architecture intended to provide standard- self-learning, and knowledge awareness dreds of applications at a time. We’ll also
ized and consistent data services across that takes place within data fabric technol- see new advances in automation intelli-
enterprises. Gartner defines a data fabric ogy to achieve hyper-automation.” gence. Right now, we have a lot of rule-
as “a design concept that serves as an inte- based automation, but we’re starting to add
grated layer fabric of data and connect- AUTOMATION more artificial intelligence, machine learn-
ing processes,” employing “continuous Without automation, a company has ing, natural language processing, and more
analytics over existing, discoverable and no chance to compete on data in any way, on top of automation.”
inferenced metadata assets to support shape, or form, said Ayush Parashar, vice
the design, deployment and utilization president of engineering with Boomi. INFORMATION CATALOGS
of integrated and reusable data across “With 850 applications on average, an Essential to any data-driven enterprise
all environments, including hybrid and enterprise has too many data sources and is an architecture built around information
multi-cloud platforms.” sprawl for manual data integration, analy- catalogs that helps data producers and data

6 B IG D ATA QU A RTERLY | FA LL 2021


consumers understand the data available easy to use and more widely accessible to continued. “Streaming analytics underpins
to them. These catalogs can “inventory, end users across enterprises. The versatility this shift to a real-time decision-making
search, discover, profile, interpret, and and accessibility of these technologies are process, and more businesses will employ
apply semantics to not just data but also driving the trend to data democratization. this approach to turn their data into
reports, analytic models, decisions, and The challenge in making this a reality is actionable insights.” In addition, stream-
other analytic assets,” said Tapan Patel, the level of data literacy among end users, ing analytics will move more to the edge,
senior manager for data management at stated Jordan Morrow, senior director of as IoT devices proliferate and explode data
SAS. While information catalogs are in data and design management skills with management needs. “Edge devices will gain
the early mainstream stage of maturity, Pluralsight. “The key with any data and greater processing power and capacity, and
the technology is understood and vendors analytics technology is not the technology software will take increasing advantage of
are responding, he added. that. Edge-captured data will be aggre-
There is a key role for AI and machine gated in the cloud, and decisions made by
learning in the efficacy and performance
of information catalogs. Looking to the
The rise of pervasive machine learning and AI will be available
in real time to mobile devices,” Raab said.
future, Patel sees machine learning algo- and abundant data
rithms being used “to further simplify,
augment, and automate the information
means a new generation EDGE COMPUTING
The convergence of edge computing and
catalog process for broader adoption.” For of applications and AI means greater responsiveness close to
example, he explained, “the features of use cases that more end users and the data they are generating
information catalogs are likely to include
automated flags and alerts of data outliers; aggressively employ AI or consuming. Edge, boosted by machine
learning, “enables enterprises to process
detection of personally identifiable infor- and machine learning. massive amounts of data locally, thereby
mation and recommended next steps; reducing reliance on cloud networks,” said
automated data profiling and role-based Jeffrey Ricker, CEO of Hivecell. “This makes
user access; suggestions to source, prepare, itself but understanding whether people data processing significantly more efficient,
and serve data; and identification of issues can adopt it correctly. The technology can especially for businesses with hundreds or
with data pipelines and lineage analysis.” be sound and amazing, but if the workforce thousands of locations. Machine learning at
is not confident in its data literacy skills, the the edge also addresses security concerns by
CLOUDS adoption of those products can be quite processing sensitive data locally, rather than
No discussion of evolving architecture low. In order for these technologies to suc- in the cloud.”
is complete without weighing the impact ceed and work the way they should, orga- While AI-driven edge computing will
of ubiquitous cloud computing. Through nizations need to combine them with data reduce traffic from the cloud, it also will
public clouds, advanced features such as AI literacy learning and strategies.” help manage the proliferation of 5G
acceleration and high-performance com- networks, Ricker pointed out. 5G—with
puting are possible, said David Rhoades, REAL-TIME CONNECTIONS greater capacity and intelligence built
manager of cloud and software-defined Importantly, today’s—and tomor- in—“could easily overwhelm the fiber net-
infrastructure at Intel. “The public cloud row’s—data architectures need to support works, the data centers, and the cloud,” he
is a perfect environment for enterprises the increasing use of real-time capabilities. explained. “The answer is to move compute
to run their database applications, and it’s Enterprises must be responsive to custom- power from the data center to the base of the
evolving at a hyperscale pace.” ers as they specify and consume digital tower to handle caching, preprocessing, and
In the years ahead, “more enterprises products or services and also be able to local processing. We are just scratching the
will tap into the public cloud to run their move data rapidly across internal processes. surface when it comes to edge computing.”
infrastructure and software environments. “Long gone are the days of receiving weekly,
We’ve already seen key workloads like AI monthly, or quarterly reports; scrambling AI FOR DATA, DATA FOR AI
and high-performance computing become to react to them; then making decisions that The rise of pervasive and abundant
mainstream in the public cloud, and this may be outdated by the time they’re put into data means a new generation of appli-
will only continue.” For its part, Intel is place,” said Eric Raab, CTO of KX. “Now, cations and use cases that more aggres-
working with leading cloud service pro- organizations can bring together real-time sively employ AI and machine learning.
viders to support hyperscale data appli- and historical data from across the business While AI and machine learning have been
cation capabilities, Rhoades noted. for analysis in the moment to drive faster, around for decades, “it is the availability
smarter decision making.” of large datasets that have been created in
DATA DEMOCRATIZATION When real-time data is coupled with recent years that makes the practical use
A number of products—particularly historical data, “organizations can gain and application of AI and machine learn-
analytical tools and platforms supported greater context and also build predictive ing productive today,” said Marshall Choy,
by the cloud—are becoming increasingly models to be able to act in advance,” Raab vice president of product at SambaNova

DBTA. COM / BI GDATAQUARTER LY 7


BUILDING A COMPETITIVE DATA
ARCHITECTURE, ONE TECHNOLOGY AT A TIME

Systems. “As organizations achieve greater


degrees of digital maturity and AI fluency,
we are seeing an increase in project scopes AIOps ‘helps transform enterprise IT operations
and a shift in focus from cost reduction to from being slow and reactive to agile and proactive.’
gaining efficiency to drive profits.”
Looking ahead, Choy sees AI and
machine learning “being pervasive across
the enterprise as the backbone for all appli- data scientist bottleneck.” In the end, the DATA GOVERNANCE AND QUALITY
cations that are centered around use cases entire enterprise benefits, Schwenk con- “Data governance and data quality
such as natural language processing, com- tinued. “Platform domain experts, such as software is fundamentally changing the
puter vision, and recommendation. AI and quants, risk analysts, and business ana- way organizations compete on data ana-
ML will transform business and technology lysts, can become directly involved in the lytics,” said Amy O’Connor, chief data and
in a more dramatic way than the internet process of developing models without a information officer of Precisely. “Previ-
did decades ago—refactoring the Fortune deep knowledge of machine learning.” ously, this type of work was done by hand,
500 and enabling capabilities that were making it incredibly time-consuming,
unthinkable only a short few years ago.” AIOPS tedious, and oftentimes inaccurate. With
Big data is making AI possible, and AI is AIOps—AI operations—leverages AI data governance and data quality solutions,
helping to manage big data. This “applied and machine learning to acquire enter- businesses can more easily and efficiently
AI” is providing enterprises with “the ability prise IT data, analyze it, and take required assess and explore their data.”
to harness structured and unstructured data actions for autonomous IT operations. While the benefits of good data gov-
at scale from both within and outside and AIOps “helps transform enterprise IT oper- ernance and data quality are obvious,
derive descriptive, predictive, and cognitive ations from being slow and reactive to agile most organizations are still struggling to
insights,” according to Sreedhar Bhaga- and proactive, thus addressing the key IT implement it, said O’Connor. “For the
vatheeswaran, senior vice president and operational and business challenges,” said last decade or so, everyone has been so
global head of digital business for Mindtree. Akhilesh Tripathi, CEO of Digitate. AIOps focused on creating and using data, while
Applied AI is on its way to “becoming main- “combines big data, analytics, and automa- just assuming the quality was good.”
stream amongst most enterprises over the tion to help gain full-stack visibility across The implementation of data governance
next 5 years,” he added. Use cases include hybrid environments, predict failures and and data quality tools requires organiza-
“effective use of natural language processing their direct impact on business, and provide tions to pause and re-evaluate their data,
techniques to enhance text-to-speech and a resilient and efficient IT.” O’Connor noted. However, once com-
speech-to text-technologies.” Bhagavath- panies are able to do this, the concept
eeswaran also sees computer vision on the WORKFLOW ORCHESTRATION of data engineering, or building sound
rise, in addition to classical AI/ML use cases Data stacks are getting more complex, data processes so it can be reused, “will
powered by advances in deep neural net- making workflow orchestration and mon- allow companies to have total control
works, which will become an essential tool itoring increasingly important. “Every and be able to repurpose their data with-
for making better business decisions. handoff from one system to another carries out inaccuracies.”
As a result of these capabilities, machine some potential for error or data loss, even Demonstrating the importance of data
learning is fast becoming a technology of if the two systems in question have strong integrity requires greater adoption of
choice for data professionals—and may internal guarantees,” said Jeremiah Lowin, data governance and data quality tools,
help digitize sophisticated data science CEO and founder of Prefect Technologies. said O’Connor. “It’s likely that these tech-
tasks. “Machine learning helps predictive “Though these errors are infrequent, nologies will become pervasive across
models to become smarter over time, they are disproportionately disruptive as industries.”
process and generate natural language, they evade traditional monitoring systems
and uncover patterns in large datasets,” and require all-hands-on-deck searches EVOLVING ARCHITECTURES
said Helena Schwenk, vice president at for the culprit,” Lowin continued. Workflow Big data will just keep getting bigger,
Exasol. “A data science and ML platform orchestration systems “may not prevent and new approaches to big data archi-
can accelerate this process by helping data errors in all cases but can usually reduce tectures need to keep evolving as well.
scientists and, most importantly, other the time-to-error-discovery from hours As the 2020s progress, we are likely to see
data-savvy users—citizen data scientists— to minutes.” The benefits of workflow additional approaches evolve, forming the
build predictive models, derive advanced orchestration stem from a combination foundation of data-driven enterprises posi-
insights, and infuse AI into applications of reclaimed infrastructure spending and tioned to compete in a hypercompetitive
at scale, ultimately helping to alleviate the the reduction of maintenance, he added. global economy. n

8 B IG D ATA QU A RTERLY | FA LL 2021


Cloudera
PAGE 12
MODERNIZE YOUR
ANALYTICS AND AI
WITH CLOUDERA
DATA PLATFORM

Nutanix

MODERNIZING
PAGE 14
HOW TO MAKE
DATABASES “INVISIBLE?”

Qlik
PAGE 16 DATA
MANAGEMENT
ENABLING DATAOPS AND
THE GROWING ROLE OF
DATA INTEGRATION IN A
MULTI-CLOUD WORLD

Delphix
PAGE 17
FOR THE
ENTERPRISES MUST
ADOPT CLOUD FASTER
TO ACCELERATE DIGITAL
HYBRID,
MULTI-CLOUD
TRANSFORMATION

Pythian
PAGE 18
YES, YOU DO NEED AN
ORACLE ESTATE PLAN
WORLD
ChaosSearch
PAGE 19
DATA LAKE VS. DATA
WAREHOUSE: WHAT’S THE
FUTURE-PROOF SOLUTION
FOR YOUR BUSINESS?

CData
PAGE 20
5 DATA ESSENTIALS
IN A HYBRID/MULTI-
CLOUD WORLD

Denodo
PAGE 21
HOW DOES A LOGICAL
DATA FABRIC SUPPORT
ORGANIZATIONS IN THEIR
JOURNEY TOWARDS A
“HYBRID AND MULTI-
CLOUD” WORLD? Best Practices Series
ENHANCING DATA
MANAGEMENT
FOR NEW HYB
HYBRID,
RID,
MULTI-CLOUD
REALITIES

For advanced analytical capabilities, just about everyone is turn- The bottom line is that data-driven applications such as ana­
ing to the cloud in some form or another, as well as increasingly lytics and AI require well-managed data to deliver value. Building
rely­ing on cloud services for basic database management. The these applications on hybrid, multi-cloud platforms is advanta-
cloud is the foundation of the emerging class of data-driven geous but also introduces new complexities. The problem that data
enter­prises that is actively and aggressively competing on data. managers and their organizations face is dealing with large vol­
Success with cloud, in all its many forms, however, requires umes of a variety of data from sources, which are changing from
greater vig­ilance and proactive initiatives to ensure data quality, week to week, if not day to day. For larger organizations, this can
governance, and effective integration. mean managing information streaming from hundreds, or even
Not too long ago, cloud mainly meant cost savings resulting thousands, of data sources—especially with the increasing flow of
from access to compute resources on a subscription basis. Now, it data from devices and systems across the Internet of Things.
is seen as a gateway to support a wide range of sophisticated func- Data managers still need to grapple with many of the same
tions, from analytics to AI. In addition, cloud platforms offer greater challenges and tasks they faced before transitioning to cloud-
flexibil­ity since applications and data can be moved between plat- based platforms. If anything, data volumes, varieties, and appli-
forms as necessary. A recent study by Unisphere Research, a divi- cation workloads may increase significantly. As a result, data
sion of Information Today, Inc., found that 66% of respondents may still be dispersed widely across an assortment of cloud sites,
now use or plan to use cloud as a strategy to reduce the time and on-premise systems, and devices.
money spent on ongoing database management activities, and The factors data managers must address as they move into
30% consider a cloud or cloud-like experience to be an important cloud realms include the following:
consideration in selecting their database infrastructure. Skills: Data managers and their organizations need to pre­
In a hybrid, multi-cloud world, data management must pare for highly diverse scenarios. One of the most critical issues
evolve from traditional, singular approaches to more adaptable is having the skills to build and deploy these environments. Data
approaches. This applies to the tools and platforms that are being scientists and AI developers are necessary to create models and
employed to support data management initiatives—particularly algorithms that leverage or test datasets, but organizations also
the infrastructure-as-a-service, platform-as-a-service, and SaaS require data engineers and administrators to ensure the availability,
offerings that now dominate the data management landscape. viability, and quality of the data being fed into these sophisticated

10 BI G D ATA QU A RTERLY | FA LL 2021


Data managers and their organizations need
to prepare for highly diverse scenarios.
One of the most critical issues is having the
skills to build and deploy these environments.

counterparts need to sit down and design systems and infra-


structures that allow for the rapid and free movement across
platforms, whether they are on-premise, in the cloud, or a
combination of both.
Data security: This is another area of concern as data moves
into hybrid and multi-cloud environments. Moving to the cloud
while still maintaining on-premise applications increases the
attack surface for hackers and malicious viruses. This requires
additional attention to backup and recovery processes to main-
tain data availability, as well as to enable data encryption.
Data availability: Data availability also becomes an issue as
the move to cloud intensifies. The variety of places where data
can land and be stored—on-premise, in the cloud, or on the
edge—results in the need to manage multiple venues. From the
perspective of the business, it’s important that data be rapidly
available from all platforms, and data architects and systems
Best Practices Series planners must design for instantaneous failover and recovery in
such a way that any mishaps are invisible to end users.
Performance: Performance is challenging in hybrid and
multi-cloud environments—both in terms of visibility and
systems. At the same time, organizations may need to hang on observability. Performance needs to be monitored across the
to the skills associated with the on-premise data environments various platforms employed, especially in terms of configura-
that will be part of the picture for some time to come. Larger tions, usage, and costs.
companies may have access to cadres of these various specialties, Tools: Part of the difficulty with managing on-premise envi-
but many organizations do not have these skills in-house. ronments along with multi-cloud offerings comes from the
Data governance: Governance also needs to be reconsid- need to use multiple tools and platforms to manage end-to-end
ered as enterprises move data and applications into hybrid and processes. Cloud services provide robust tools associated with
multi-cloud settings. With global regulations affecting how and their platforms, but there remains a need for data managers to
where data is stored, data managers have to understand the align these environments to provide end-to-end flows of data
physical architecture of their cloud providers. Other issues may and associated applications—so there are still manual tasks
be tied to internal corporate policies. This includes who has often required at this level.
access to data and how data is to be used for specific customer
groups. Some departments may even be using their own cloud A HOLISTIC APPROACH
services without the guidance of IT departments. Enterprises Developing a holistic approach to data management is crit-
are basing more and more key decisions on data, and that data ical. Data applications are no longer confined to back-end data
needs to be trustworthy, timely, and secure. This can become warehouses and data lakes, accessible only to in-house analysts
difficult as data keeps flowing in. and executives. Data is required to improve customer experience
Data integration: Until recently, data was often managed and perform predictive analysis to drive automated systems. Data
manually, bound by written scripts and patchworks of inter- governance and proactive management need to be built around a
faces. Along with automated capabilities, many cloud platforms forward-looking strategy that maintains consistency across the
and services automatically provide for connectivity and inte- enterprises. New players who can serve as business-​focused data
gration, reducing the need for such manual work. At the same stewards must be brought into the equation to help organizations
time, the focus of data managers’ work may move to keeping realize the value of their data assets.
cloud services aligned rather than concentrating exclusively
on internal applications. Data managers and their business —Joe McKendrick

DBTA. COM/ BI GDATAQUARTERLY 11


sponsored content

Modernize Your Analytics and


AI with Cloudera Data Platform

YOUR ORGANIZATION’S ABILITY TO transform, innovate,


and compete depends on how effectively your team utilizes
every bit of data at your disposal. While the importance of
data is being recognized, accessing and using that data in the
myriad of ways you require is far from a simple matter. If your
organization is like most, then you are:
• Receiving a greater amount of data from an ever larger
number of sources
• Increasingly capturing data at the edge, where it must be
ingested, processed, and analyzed in real-time
• Keeping your data in multiple of systems that are difficult
to integrate
• Concerned with the security of data as more people access
CDP Hybrid Cloud is a hybrid data cloud platform designed for
it across the organization unmatched freedom of choice—any cloud, any analytics, any data.
• Moving to hybrid and multi-cloud for greater flexibility
and efficiency •P  ower best-in-class analytics and machine learning. CDP
eliminates the complexities of managing petabytes of
FINDING INSIGHT IN THE DATA HAYSTACK data and multiple workloads, clearing the way for your
As long as data remains locked within diverse environments, organization to collect, enrich, report, serve, and model
which makes it difficult to access or consume—then the value data for deeper insights across the business.
of that data is greatly diminished. That’s why overcoming •G  et the most from hybrid cloud and multi-cloud environments.
the fragmented data landscape to get the full value of data, CDP unifies any on-premises, hybrid, and multi-cloud
across analytics and AI workloads requires a new approach. environment into a single platform that can support
One that gives you full control of the entire data lifecycle across an unlimited number of concurrent workloads with a
every environment and system and enables your business consistent experience.
teams to make faster, smarter data-driven decisions. •T  ake control with security and governance. CDP ensures
security and governance of your data and analytics
THE ENTERPRISE DATA CLOUD: lifecycle with built-in policies to manage access, activity,
A NEW APPROACH TO DATA and usage of shared data. Your organization gains greater
The key to eliminating data silos and realizing the full agility to act quicker on insights while keeping its data safe
potential of your diverse data sets to drive business value is an and ensuring regulatory compliance.
enterprise data cloud. An enterprise data cloud is a “big data” • S implify management with faster, easier administration.
platform that helps you control and manage the deluge of CDP gives you a holistic view of the full data lifecycle
data across all of your environments (edge, private cloud, and through a single pane of glass that simplifies data
multiple public clouds) and throughout the data lifecycle to gain management tasks.
deeper insights. With the power of an enterprise data cloud, you
can adopt the most valuable and transformative business use ACHIEVE THE FULL POTENTIAL
cases by realizing the full analytical potential of all your data. OF THE DATA LIFECYCLE
Too often, workloads run independently in their own
INTRODUCING CLOUDERA DATA PLATFORM (CDP) silos because analytic engines and data science tools weren’t
CDP is a first-of-its-kind enterprise data cloud. With its designed to work together.
open data architecture, it powers data-driven decisions and Each silo has its own security, governance, and control
predictive actions by seamlessly connecting your data across needs. They also come with their own schema and data catalog.
your fragmented IT landscape. It helps you secure and govern And the arduous task of keeping these silos in sync falls to the
the entire data lifecycle—while still providing your users IT team. So, as a result, you get increased operational costs and
quick, easy access to the critical data needed for predictive overhead and your data users are hindered by siloed systems
insights and guided decision-making, based on a foundation that don’t allow multiple analytics workloads to run against
of open source. the same data set.

12 BI G D ATA QU A RTERLY | FA LL 2021


sponsored content

The most compelling use cases for data require multiple


analytic engines and processes working together, sharing data
CDP for Multi-function Analytics
regardless of where that data lives. and AI
CDP empowers users to work on the same data sets, DataFlow and Streaming
with potentially different tools, while also adhering to CDP has a highly scalable, real-time streaming engine that
compliance regulations without compromising security. ingests, curates, and analyzes data for key insights and immediate
With self-service access granted by CDP, users and groups actionable intelligence.
navigate the full data lifecycle and apply analytics directly Data Engineering
where the data lives, interacting with that data at its source CDP enables you to enrich, transform, and cleanse data to
to uncover the value it holds. With CDP, your organization easily create, execute, and manage end-to-end data pipelines.
transforms into a data-driven enterprise by taking control Data Warehouse
over the entire data lifecycle, from edge to AI. CDP is auto-scaling, highly concurrent, and cost-effective
because it ingests and analyzes large-scale data in any form
HYBRID AND MULTI-CLOUD FOR (structured, semi-structured, and unstructured).
INTEROPERABILITY ACROSS ENVIRONMENTS Operational Database
Many organizations are utilizing the benefits of multiple CDP has a real-time, scalable database, giving you the ability
public clouds in conjunction with their private cloud. This to build innovative applications on large volumes of structured
combination, a hybrid cloud environment, offers flexibility, and unstructured data.
security, performance, and cost-efficiency. It gives you the Machine Learning
freedom to decide where each of your workloads will run, CDP provides data science teams simple and intuitive
based on each workload’s needs. For example, you can run access to the data, tools, and computing resources required
intermittent/batch workloads in public cloud while running for end-to-end machine learning workflows.
more consistent workloads in your private cloud.
Often, mission-critical data can be found in multiple In an effort to grant access to data while working under these
places throughout an organization. And while it’s critical conditions, your IT team must create and manage multiple copies
to analyze that data, there’s a catch: data silos lead to of the same data sets and stay ahead of the barrage of access
incomplete and inaccurate analysis. CDP provides a requests. This is complex, manual, and costly. Meanwhile, your
centralized way to manage workloads across different IT team still needs to maintain compliance. The only way to do
environments, ensuring data governance and security so effectively is to restrict access to the valuable data that users
across your hybrid cloud. This reduces the painstakingly need. The results: delayed business insight, increased operational
manual and time-consuming labor of managing overhead, and blind spots that only create greater security risks.
disparate data sets, leading to better compliance. If you want to give more users the benefits of analytics and
You get the greatest value from your data when you’re machine learning for better decision making— and to do so
able to run multiple analytics and machine learning models securely with the appropriate restrictions as to the data they
against the same source data for all kinds of business use can see and use—you need a consistent, shared governance
cases. But workloads needa way to share data—and CDP and security model that enables a variety of analytics across
does so without adding greater complexity to managing your private and multiple public cloud environments.
hybrid cloud.
ABOUT CLOUDERA
ENABLE SELF SERVICE ACCESS TO DATA— At Cloudera, we believe that data can make what is impossible
WHILE ENFORCING GOVERNANCE AND CONTROL today, possible tomorrow. We empower people to transform
You want to give employees, partners, and customers access complex data into clear and actionable insights. Cloudera delivers
to your data for a wide array of business use cases. But without an enterprise data cloud for any data, anywhere, from the Edge
an enterprise data cloud, maintaining security, governance, to AI. Powered by the relentless innovation of the open source
and control across siloed systems is a losing battle. Your community, Cloudera advances digital transformation for the
private and public cloud environments all have different tools, world’s largest enterprises.
capabilities, and processes for securing data. However, typical
data management doesn’t let IT and information security LEARN MORE AT cloudera.com
teams apply blanket policies across these environments.
Additionally, your IT team is likely forced to work within Cloudera
the confines of legacy systems and proprietary lock-in. www.cloudera.com

DBTA. COM/ BI GDATAQUARTERLY 13


sponsored content

How to make databases “invisible?”


By Krishna Kattumadam, VP Engineering, Solutions, Nutanix

WHAT IF YOUR ORGANIZATION’S DATABASES JUST As a result, these organizations are experiencing tremendous
worked, with all the complexity and tedium of database pressure to reduce costs and are looking for innovative ways to
administration and maintenance hidden from view? Let’s make IT operations and infrastructure more efficient, agile, and
explore what invisible databases might mean in a real-world responsive. Migrating all applications and data to the public cloud
business environment. may look like an easy answer from a 10,000-foot perspective, but
Have you ever: the on-the-ground reality of implementing these fundamental
•N eeded to quickly restore a multi-terabyte operational database changes is very difficult.
to a point in time and recover the customer’s business from It is a well-known fact that public cloud adoption has
crippling data corruption? grown dramatically in the past few years, promising operational
•R eceived an urgent request from a business user for a copy efficiencies. However, thousands of organizations in industries
of a critical Oracle® database while your Oracle database such as government, finance, and healthcare simply can’t move
administrator (DBA) was on vacation and there was no more their entire data estates to a public cloud service because of strict
free space on the storage subsystem to host this copy? data sovereignty, security, compliance, and data gravity issues.
•H ad to write and maintain complex operational scripts to These organizations have been left desperately searching for a
manage the lifecycle of more than one database type? better solution to achieve the same operational and business
•B een asked to support and manage the lifecycle of thousands of benefits promised by the public cloud providers.
test and development database copies with no additional DBAs?
Now imagine that you can: COMPLEXITY BEGINS WITH INFRASTRUCTURE
•P rovision any commercial or open-source relational database For much of the 1990s and 2000s, legacy infrastructure
management system (RDBMS), NoSQL, or in-memory vendors responded to on-premises needs by wiring together
database in minutes without complex scripts. disparate components such as dedicated storage arrays, Fibre
•K eep database and OS software up to date by automatically Channel SAN, bare-metal compute, and so on to offer compute
applying patches in minutes. and storage infrastructure stacks with cloud-like control plane
•R apidly spin up copies of a multi-terabyte database while using software bolted on top. They sold and supported these stacks
almost no additional storage space. under a variety of brand names and business structures ranging
•P rovision and manage databases on the public cloud of from partner-owned and managed fulfillment to joint coalitions
your choice. between infrastructure giants.
If you answered yes to any of the questions in the first list, or These arrangements, which were very popular in the mid-
considered the possibilities in the second, then you are already 2000s, solved some of the IT pain in the areas of procurement,
thinking about invisible database operations and the Database- support, and operational complexity. However, managing each
as-a-Service (DBaaS) phenomenon that is sweeping enterprise IT. layer individually with siloed teams using poorly integrated
DBaaS is a cloud computing operating model, typically provided management tools continued to hamper progress, especially
by public cloud vendors in a managed service setting, that when the layers were governed under separate IT policies that
offers access to databases without requiring users to perform didn’t always align with the enterprise’s overall business goals.
complex operations such as installing, configuring, and
maintaining complex infrastructure and database software. WE USUALLY FEEL THE EFFECTS OF INFRASTRUCTURE
COMPLEXITY AT THE “TOP” OF THE STACK
CONSIDER THE ISSUES Enterprises also spend enormous amounts of time
Enterprise organizations are grappling with an exponential and resources to manage multiple database types (from
increase in complexity and intricate interdependencies between commercial to open source to cloud native), database sizes,
applications, databases, and external services. Modern and legacy and availability and performance SLAs. These organizations
business applications are built on multiple database types, each with have developed critical business applications and services
its own set of complications and operational needs, requiring DBAs on top of their databases, and database administrators must
with specific skills to manage them. Traditional IT approaches that then employ separate processes and scripts for maintaining
use legacy infrastructure and dedicated teams to support these them. For example, an enterprise running Microsoft® SQL,
database infrastructure layers are too slow to meet the mission- Oracle®, and PostgreSQL may have three different database
critical initiatives of the business. These traditional approaches administrator teams who navigate the expensive and complex
have led to siloed operations, inefficiencies, and duplicated database licensing rules and own and manage the end-to-end
resources and processes, thus increasing the cost of operations. lifecycle for these databases at enormous operational cost.

14 BI G D ATA QU A RTERLY | FA LL 2021


sponsored content

In response, many organizations are consolidating their NUTANIX ERA


database estates into a smaller set of database types or Nutanix Era™ is database lifecycle management software
re-platforming to open-source databases such as PostgreSQL, developed using the same design principles that made the
and some are even rewriting and redesigning their business Nutanix Cloud Platform simple and elegant. Era runs on the
and consumer-facing applications to be cloud native and Nutanix Cloud Platform and empowers you to achieve the
database agnostic. invisible databases we imagined at the beginning. It unifies
and simplifies database operations such as provisioning, copy/
DIGITAL TRANSFORMATION clone, backup/restore, and database patching for a wide
Enterprises are on a journey to digitally transform their range of commercial and open-source database engines such
businesses and derive faster insights from the vast amount of as Oracle, Oracle RAC, MS SQL AG, PostgreSQL, EDB®,
structured and unstructured data that they collect; such data MySQL™, MariaDB®, SAP® HANA, and MongoDB®.
analysis is far more important for them than worrying about
how data is stored, retrieved, and backed up. This mindset has
given rise to point-and-click, pay-as-you-grow infrastructure
and services such as IaaS and platform services such as DBaaS.
Public cloud providers anticipated this trend and launched
successful services such as AWS® RDS and Azure® SQL. These
services by themselves do not eliminate database software
acquisition and licensing costs, but they do reduce a significant
portion of the operational costs, mainly through simplification,
abstraction, and managed services. Meanwhile, on-premises
and private cloud software and hardware providers continued
to produce and sell complex pieces of software and hardware
that were difficult to integrate and maintain until the mid- Era can manage databases deployed on on-premises
2000s, when those delivering a simplified customer and user infrastructure and on a public cloud of your choice (AWS
experience began to push them out. available now, Azure available in early 2022), making it a true
Converged Infrastructure (CI) triggered an industry multicloud solution ready to modernize your database estate.
inflection point that promised purchase, consumption, and Era includes integration with ServiceNow® to assist with
operational simplification and agility in the datacenter. This database ticket requests.
approach brought these large infrastructure software and You can now build a DBaaS offering on your own terms, in
hardware vendors and their products and support organizations your datacenter, and using a cloud-like consumption model with
together under a joint business contract to benefit customers. our partner HPE GreenLake. This solution provides a viable
The Nutanix approach was to “hyperconverge” compute, option for enterprises that have strict data governance requirements.
storage, and virtualization using pure software on commodity Nutanix recently surveyed Era customers and learned the following:
x86 servers and an intuitive control plane to make IT infrastructure • “ We have been able to simplify database deployment across multiple
management so simple that it becomes invisible. This bold vision, database engines. We tested with MySQL and Microsoft SQL. With
laser-focused execution, and a world-class support organization MySQL, you can use the APIs to automate deployment. “
is why 19,000+ customers ranging from small mom-and-pop • “ Using this solution has made cloning databases much simpler
stores to the largest Global 2000 companies trust the Nutanix and much faster. Previously, it would take us two to two and a
Cloud Platform to modernize their datacenter infrastructure half hours to do a clone or a snapshot. Now it’s five, six minutes.
and database operations to achieve cloud-like operational and That’s a lot of time that we save.”
cost efficiencies. • “ When it comes to data management, I can see how Era will help
to reduce the time spent on operational database workloads.”
Nutanix Era’s unified one-click approach streamlines
complex database tasks, making them delightfully simple—
and therefore “invisible.” Application developers and users with
no database knowledge can now handle these operations in
a self-service manner. Watch all the administrative burden of
your organization’s databases simply disappear.
Experience Nutanix Era today by visiting https://www.
nutanix.com/test-era and taking a Test Drive.

Nutanix
www.nutanix.com

DBTA. COM/ BI GDATAQUARTERLY 15


sponsored content

Enabling DataOps and the Growing Role


of Data Integration in a Multi-Cloud World
By Dan Potter, VP Product Marketing, Data Integration, Qlik

THERE’S A GROWING NEED FOR and other C-level executives. Half the battle layering in quality assurance with role
organizations to unite and govern data is closing the skills gap. The other half is and responsibility designations that
across multiple clouds, gain actionable making trusted data as easy as possible bring more of the right data to the right
insights instantaneously, and embark on to access, use, and mine for insights—for people at the right time.
new ways to augment intelligence in a the entire spectrum of users. Modern data
more collaborative manner. Therefore, integration and management can rapidly 5. Fuller collaboration
these business trends are forcing a new accelerate this process, centralizing control DataOps makes it faster and easier
approach to data integration with an while democratizing access. for data scientists and business analysts
eye to improving business analytics. to join forces—and for discrete business
It’s therefore not surprising that the 2. Faster, more agile units to collaborate around the analysis
analytics world has also gone through analytic processes of data and sharing of results. In fact,
some very interesting phases of late: To become truly data-driven, DataOps is a great vehicle for creating
Internet of Things (IoT), streaming agility and real-time insights are key. the long-sought-after business/IT
data from operational systems, artificial DataOps allows you to move data as alignment that tends to be elusive as
intelligence (AI), machine learning it’s changed, in real time. Automating companies grow. And unlike traditional
(ML), and real-time streaming analytics manual tasks reduces analytics cycle task forces that tackle niche issues,
have all come to the fore. This in turn time, freeing resources for higher-level DataOps affects the entire organization
has changed the way in which data is focus. And flexible integration solutions by delivering valuable data to every
stored and managed. Consequently, let IT change a source or target without business user when they need it, in
cloud data warehousing, data lakes, disrupting the infrastructure—so you a consumable and governed way.
and streaming infrastructures have all can stay agile as technology evolves. Data-driven transformation is a
risen in response to these new business mandate for today’s CIOs and CDOs,
demands and the requirement for 3. Data democratization and it’s likely to remain so for the
alternative styles of analytics. DataOps lets you make vetted, foreseeable future. Consequently,
However, with the pace of innovation governed data universally accessible. modern multi-cloud data architectures
accelerating, the demand for digital Instead of limiting analytical insights to together with new services powered
transformation and the move to the data scientists, you can extend them to a by DataOps are making it possible
cloud has dramatically changed the broad set of line-of-business users with to make that future a reality.
landscape. Consequently, the many focused expertise. And this includes
different forms of data and multitudes users at the front lines and edges of the ABOUT QLIK
of technologies in place to transform business—via mobile devices, IoT, and Qlik’s vision is a data-literate world,
it, organizations are struggling to at any point of customer interaction— where everyone can use data and analytics
establish and maintain the skills set helping optimize operations and to improve decision-making and solve their
that address this complexity. Our advice customer experiences. most challenging problems. Qlik provides
is therefore that organizations should an end-to-end, real-time data integration
concentrate on deriving insights from 4. C
 ontinuous governance throughout and analytics cloud platform to close the
data, as opposed to focusing on the data the data delivery lifecycle gaps between data, insights, and action. By
integration itself. This can be achieved Smart data catalogs, data indexes, transforming data into Active Intelligence,
by moving to a DataOps model. DataOps and other tools enable IT to design businesses can drive better decisions,
helps you overcome the former market a modern governance process with improve revenue and profitability, and
challenges, accelerates cycle times, and the access controls needed to avoid optimize customer relationships. Qlik does
has the potential to transform your data-decision variability and chaos. business in more than 100 countries and
organization in the following ways: And IT can achieve scale and agility serves over 50,000 customers around the
by leaving data in lakes, warehouses, world. Learn more at www.qlik.com
1. A boost in data literacy and other repositories on-premise and
Data literacy is quickly becoming a in the cloud. This gives users timely Qlik
strategic initiative for CIOs, CDOs, access to enterprise-ready data while www.qlik.com

16 BI G D ATA QU A RTERLY | FA LL 2021


sponsored content

Enterprises must adopt cloud faster


to accelerate Digital Transformation

BUT DEEP DATA CHALLENGES copies and comply with data privacy laws such as GDPR
DRAMATICALLY SLOW THEM DOWN and CCPA—while preserving referential integrity.
Today data availability and compliance challenges lay 3. BOOST DATA QUALITY. Deliver and refresh
waste to far too many leading enterprises’ cloud ambitions. production-like environments faster during migration
As markets transform rapidly, every company must and in new cloud environments with real data. Take
deliver innovative software faster while complying advantage of self-service, on-demand, and API-driven
with increasingly stringent privacy laws. Meanwhile, access to eliminate stale data and limit synthetic and
information technology leaders such as Microsoft, subsetted data use.
Amazon, Apple, and Google are setting a challenging pace 4. H ARNESS DEVOPS APIS. Refresh, rewind, and integrate
with digital products and services in several markets and data to enable CI/CD. Enable Dev, Test, QA, and analysts
sophisticated cloud platforms so others can follow suit. to leverage precise and granular data for migration and
The result is, while more traditional enterprises scramble post-migration use cases in the cloud.
to modernize legacy applications, migrate to the cloud, 5. V ERSION DATA LIKE CODE. Uniquely enable data
and transition from waterfall to DevOps and CI/CD to sharing with independent versions of masked pre-
keep up, cloud-first companies deploy digital innovations prod data. Leverage for migration testing, rehearsals,
globally at breakneck speeds. and cut-over. Enable parallel app dev for updates,
cloud-first releases and data-sync between clouds to
DATA IS THE LAST AUTOMATION FRONTIER accelerate innovation.
Enterprises everywhere are redoubling digital 6. V IRTUALIZE DATA TO SPEED UP. Shrink migration
transformation efforts. However, despite automated cloud- project timelines and efforts, and the data footprint and
compute, storage, networks, and code, snail-paced manual cost by up to 10x. Move prod environments only to the
efforts to protect, provision, and refresh enormous amounts cloud and rebuild the 7 to 9 downstream ones virtually
of sensitive data locked in legacy systems exacerbate to stand up non-prod for dev/test, cloud-based AI, etc.
privacy, productivity, and latency risks. Before Delphix, 7. IMMUTABLE DATA: BOOST RANSOMWARE
one customer with 30,000 applications across a multi- DEFENSES. Take advantage of an immutable data time
generational landscape took weeks to deliver a new data machine built on a write-once, read-many architecture
environment to development and AI teams. Typically, to recover data from any point in time easily.
production environments spawned seven to nine additional
non-production ones, while a petabyte of new data entered HARNESS THE MULTI-CLOUD TODAY
their systems each week. Manual efforts to protect sensitive Combining data compliance with on-demand delivery
data across all that was leaving petabytes of data at risk. helps leaders like BNP Paribas stay ahead. Its CIO, Bernard
Moving all those applications and all that attendant data Gavgani, believes data is part of a company’s DNA and
into the cloud is complex. Moreover, re-platforming, testing, focuses on innovative use of it to increase productivity
rehearsals etc., require data environments—without that and performance. At the center of his operation, one team
data, budget and schedule become a problem. DevOps teams rationalizes and secures retail and corporate payments data
must effectively secure data on-prem and in the cloud to to keep BNPP’s instant payment innovation program and
mitigate compliance and breach risks and eliminate the data rollout on track. The bank radically accelerated the assured
delivery bottlenecks slowing adoption to capitalize on cloud delivery of privacy regulation-compliant environments to
speed and bring new products and services to market faster: accelerate cloud adoption. As a result, development and
testing teams across the globe increased AI projects going
ESSENTIAL DEVOPS DATA SUPERPOWERS into production three-fold, slashed the time to launch an
1. DELIVER DATA-READY ENVIRONMENTS ACROSS open API marketplace, improved software quality and
THE MULTI-CLOUD. Automatically ingest and sync reduced downtime. Find out more about what BNPP’s CIO,
data from all sources and efficiently deliver virtualized Bernard Gavgani, thinks at https://www.delphix.com/blog/
data across private and public clouds. inside-bnp-paribas-digital-banking-innovation-cloud-data-ai
2. AUTOMATE CLOUD DATA COMPLIANCE. Find,
profile, and irreversibly protect all sensitive data from Delphix
all sources to deliver realistic, consistently masked www.delphix.com

DBTA. COM/ BI GDATAQUARTERLY 17


sponsored content

Yes, You Do Need an Oracle Estate Plan


By Simon Pane, Principal Consultant, Oracle ACE

IF YOUR ORGANIZATION IS LIKE MANY, your focus a managed DB service with both a BYOL and license-
has been on keeping your Oracle databases running included pricing model.
without outages so that daily operations run without •G
 oogle Cloud: Google’s Bare Metal Solution
interruption. But in the midst of keeping everything (BMS) leverages Google Cloud’s rich partner
running, have you been able to think strategically about ecosystem. Google takes care of provisioning the
the future and the best ways to keep your entire estate infrastructure and maintaining the interconnect
running optimally in a hybrid, multi-cloud environment? while minimizing Oracle licensing risks.
And what does “optimal” mean for your organization? Ensure that Oracle licensing and support requirements are fully
Now is an ideal time for organizations to look at how understood when deploying Oracle software to 3rd party clouds.
to modernize, upgrade, re-home or even re-platform their
existing Oracle Databases both to save money and leverage UPGRADE, CONSOLIDATE OR DOWNGRADE?
new capabilities. The right decisions and technical knowhow If you’re considering an upgrade, Oracle’s process
can turn a short-term investment into long-term savings. has improved greatly with new upgrade tooling.
Their new release model and long-term release of
MYRIAD CLOUD OPTIONS: WHAT IS RIGHT FOR YOU? Oracle Database 19c is supported through April 2024
You have a number of strong cloud-based options to or April 2027, respectively, with extended support
consider to realize cost savings and performance benefits by and unlimited license agreement options.
upgrading your current on-premises Oracle Databases. If consolidation is a better fit for you, you can reduce
license requirements or save on administrative aspects of
ORACLE CLOUD INFRASTRUCTURE (OCI) database management, depending on the server source.
OCI, Oracle’s second generation cloud, is a feature-rich With Oracle’s comprehensive multi-tenant option, the
cloud with extensive options for Oracle Database solutions. number of tenant pluggable databases (PDBs) allowed
It has gained parity with the other major cloud vendors in without a license has increased, letting customers
terms of scope and broadness of offerings. consolidate small amounts with low risk.
OCI is a high-performance and cost-effective Alternatively, savings can be realized by finding a
cloud for running Oracle Databases for many reasons, compromise between downgrading editions and the
including compatibility and licensing options. A associated reduced capabilities. For example, some
wide variety of configurations are possible, from self- customers can tolerate a modest and quantified amount
managed solutions such as custom installed Oracle of potential disaster recovery loss (that a downgrade may
Database running on IaaS VMs to the fully automated bring) as their recovery point objective (RPO) in extreme
Autonomous Database service. Any number of real circumstances and can easily get by with a business-
application cluster nodes and nearly any disaster recovery acceptable data recovery solution at reduced cost.
configurations are possible. OCI adds more configuration
flexibility for Oracle Databases than any other cloud. WHICH OPTIONS ARE BEST?
All of these options have complexities and nuances.
THIRD-PARTY CLOUDS This means organizations must make smart decisions to
You can run Oracle Databases on supported third-party ensure a positive impact on systems and overall business
clouds, each with their own benefits and challenges depending for the long term.
on the number of databases, server size and the desired level Pythian’s Oracle Estate Planning Guide provides
of service management: further detail for you on this important topic. Grab your
•  Microsoft Azure: Running Oracle Databases on Azure copy today.
VMs is supported but can be cost-prohibitive. There is The database and cloud experts at Pythian have extensive
a solution—Azure-OCI interconnect, which maximizes experience and deep technical expertise to help your
usage of Azure services while running Oracle workloads organization generate value—without the stress and time of
in OCI. hiring in-house. Contact us today at [email protected].
•A mazon Web Services: With a supported option on IaaS
using a Bring Your Own License (BYOL) model, Amazon Pythian
Relational Database Service (RDS) for Oracle provides www.pythian.com

18 BI G D ATA QU A RTERLY | FA LL 2021


sponsored content

Data Lake vs. Data Warehouse:


What’s the Future-proof Solution for Your Business?
DATA WAREHOUSES and data lakes represent two leading warehouse functionality, governance, or integration with
solutions for enterprise data management in 2021. Data warehouses known ETL or analytics tools, the data lake can become a
were born in the 1990s as on-premise solutions, and in recent years “data swamp”—a murky mire of data that’s impossible to sift
have re-emerged in the cloud to support digital transformation. through. It accumulates and sits stagnant because users don’t
Data lakes were also initially built to run on premise, on Apache know how to effectively access or glean insights from the data.
Hadoop, in the early 2000s. But with the rise of secure, resilient Smaller datasets are duplicated and pushed to end user tools
public cloud storage offerings from the likes of Amazon, Microsoft, for analytics, creating silos.
and Google, they too have found new footing in the cloud. Progress is being made, though. Today’s data lakes are built
Data warehouses and data lakes share some overlapping features on cloud object storage and can be activated directly to support
and use cases, and both have embraced modern approaches to multi-dimensional use cases including full text search, relational
data management by operating in the cloud. However, there are queries, and machine learning.
fundamental differences in their data management philosophies, In fact, Gartner’s Hype Cycle for Data Management, 2021
design characteristics, and ideal use conditions that should be reveals that data lake technologies are poised to exit the Trough
considered as you develop your data management strategy. of Disillusionment and enter the Slope of Enlightenment.
According to Gartner, “A data lake, when designed properly, can
WHAT’S THE DIFFERENCE? provision data for the diverse exploration requirements of multiple
A data warehouse is a data management system that provides user types and use cases… Today’s data lake is on cloud, and it
business intelligence for structured operational data with clear and supports multiple analytics techniques (not just data science).”
defined use cases, usually from a relational database management
system (RDBMS). CRAFTING A MODERN DATA MANAGEMENT STRATEGY
Data warehouses follow a schema-on-write data model; Neither a data lake, nor a data warehouse on its own, comprises
source data must fit into a predefined structure (schema) a Data & Analytics Strategy—but both solutions can be a part
before it can enter the warehouse, where it is then connected to of one. Enterprises continue to rely on a variety of solutions to
downstream analytical tools that support BI initiatives. This is meet their needs, including RDBMS, operational data stores, data
usually accomplished through an ETL (extract-transform-load) warehouses and marts, Hadoop clusters, and data lakes.
process. This connection between data ingress and the ETL While most of these solutions have been around long enough
process means that storage and compute resources are tightly that their shortcomings are well-known, newer alternatives
coupled. If you want to ingest more data into the warehouse, you like data lakes are still reaching maturity and showing their
need to do more ETL, which requires more computation. potential for the future of scalable, flexible, and resilient data
The data warehouse is all about functionality and performance. management in the cloud.
These functions are all essential, but the data warehouse paradigm Across the board, a modern data management solution must
of schema-on-write, tightly coupled storage/compute, and reliance be cloud-native, simple to manage, and interconnected with
on predefined use cases makes data warehouses a sub-optimal known analytics tools to deliver value.
choice for big, multi-structured data or multi-model capabilities.
A data lake, on the other hand, is a centralized repository KNOW BETTER™ WITH CHAOSSEARCH
where multi-structured data from a variety of sources can be At ChaosSearch, our goal is to help customers prepare for
stored in their raw format. This encourages a schema-on-read the future state of enterprise data management by bridging
model where data is aggregated or transformed at query-time. the gap between data lakes and data warehouses. ChaosSearch
Bypassing the ETL process means you can ingest large volumes activates the data lake for analytics; We publish analytic APIs
of data into your data lake with less time, cost, and complexity. that a data warehouse would also provide, indexing data within
Data lakes provide a less restrictive philosophy that’s more your cloud storage environment, rendering it fully searchable,
suited to the demands of a big data world: schema-on-read, and enabling analytics at scale. With its revolutionary approach
loosely coupled storage/compute, and flexible use cases that delivered in a fully managed service, ChaosSearch overcomes
combine to drive innovation by reducing the time, cost, and the cost and complexity of competitive solutions, delivering
complexity of data management. unlimited scalability, industry-leading resiliency, and massive
But data lake solutions don’t inherently include analytic time and cost savings.
features. They’re often combined with other cloud-based
services and downstream software tools to deliver data indexing, ChaosSearch
transformation, querying, and analytics functionality. Without www.chaossearch.io

DBTA. COM/ BI GDATAQUARTERLY 19


sponsored content

5 Data Essentials in a Hybrid/Multi-Cloud World


TODAY’S DIGITAL WORLD RUNS problems it needs to solve and resources the implications of data are fully
on data. Companies rely on their and data needed to achieve the desired understood, they should be turned
unique data to serve their customers outcomes. This data can range from into actionable insights.
and employees and plan critical sales, traditional, structured data such as Prashanth Southekal, a business
marketing, and business operations. corporate and financial records residing analytics author and professor and
But unfortunately, this data is not in a data warehouse, to unstructured head of DBP-Institute (Data for
always easily packaged or utilized big data sources such as social media, Business Performance) speaking at
because it is not easily accessible. IoT devices, and website data mining. MIT’s 14th annual CDOIQ Virtual
To get the most out of their data, Finally, it is crucial to develop policies Symposium, stated that, “[Success]
an enterprise must have the ability and rules for handling all of the data. isn’t about data collection, it’s about
to easily access it, regardless of its data management and insight.” For
source, type, or location, and connect 3. Recognize and utilize organizations to be successful, they
analytics to provide truly useful and existing data technology must integrate storytelling into their
actionable business insights. Since data After the enterprise has identified thinking about analytics and insights.
is generated from many different sources its goals and business data and defined Bringing data to leaders in a form it
in many different formats, organizations policies and rules, it must exploit can be interpreted and understood will
must have plans for access. How can available data-centric technologies promote meaningful insights because
organizations effectively manage this to ensure the asymmetrical flow of it humanizes the data. Without this
increasing amount of data? Five key data between on-premises and hybrid key step, the effort of bringing together
actions must be taken to sustainably business networks and the cloud. and managing the data goes to waste.
and effectively manage business data: This means the ability to gather,
manage, integrate, and analyze data EMBRACE A PROACTIVE APPROACH
1. Define clear goals needs to become a core competency Businesses need data not just to
First, each business must define its for organizations. Understanding survive but to thrive. However, many
mission-critical data goals. These goals data architectures like logical data organizations struggle with the pace
will determine all the decisions around warehousing, data fabric, and data of data growth, the collection and
data management. An organization’s mesh, and the products that support management of data, and the process
goals could include increasing sales data strategies are critical for supporting of extracting meaningful insights from
effectiveness, closing additional cloud and hybrid data initiatives. the data. Proactively addressing these
deals, gaining competitive insight, or data-related challenges is the only way
gathering details for a SWOT analysis. 4. Plan for changes in for businesses to gain a competitive
Consider the example of a healthcare data and technology advantage, drive expansion, and create
organization whose goal is to improve The optimization and consumption revenue growth. As organizations
patient experience and quality of care of mission-critical business data must be increasingly rely on modern technologies,
while lowering costs. This organization monitored regularly to ensure success. data is destined to become continuously
will need to combine clinical data Businesses need to understand how the more complex as the amount and types
and financial information into a single data is being used now and how that of data available continue to increase.
environment. In this scenario, as the will likely change over time. As new The needs seem obvious, but how
organization builds the environment data and digital technologies become organizations manage them is not.
from multiple data sources, it will available, organizations need to become
have to deal with specific and complex more agile to quickly react to change. Continue the conversation: Visit us at
rules around data privacy and how www.cdata.com to find out more about
that data is managed due to HIPAA 5. Analyze and action data how we can help you to identify and
and other regulatory requirements. Access to accurate data is only overcome your biggest data challenges.
one part of the equation. Combining
2. Identify relevant data qualitative and quantitative measures
Once an organization has defined to understand the data fully requires CData Software
its goals, it must also define the specific specialized expertise. As soon as www.cdata.com

20 BI G D ATA QU A RTERLY | FA LL 2021


sponsored content

How does a Logical Data Fabric Support


Organizations in their Journey Towards
a “Hybrid and Multi-Cloud” World?
THE TERM DATA FABRIC IS NOW WIDELY UNDERSTOOD
and used. It first appeared around six years ago. It’s hard to tell
who first coined the term, with many sources pointing towards a
2016 whitepaper from NetApp. That said, I personally prefer the
succinct definition provided by leading industry analyst Gartner,
who defines data fabric as a “design concept that serves as an
integrated layer (fabric) of data and connecting processes. A data
fabric utilizes continuous analytics over existing, discoverable and
inferenced metadata assets to support the design, deployment and
utilization of integrated and reusable data across all environments,
including hybrid and multi-cloud platforms.” 3) Boost performance of analytics with rapid data delivery—
As we know from various surveys and studies including the Intelligent features such as smart query acceleration
Denodo Cloud survey 2021, cloud adoption is on the rise, with support the need to have high performant data delivery.
a 25% increase year-over-year in advanced cloud workloads, 4) Support data discovery and data science initiatives—
indicating that more complex workloads are moving to the Key to the success of a logical data fabric is the ability for
cloud and that COVID-19 has perhaps driven that increase. discovery. The implementation of the logical data layer
The same survey also saw that hybrid-cloud model remains facilitates easy data discovery across cloud environments,
in the lead, with more than one-third users leveraging that enhancing the ease of data discovery for data scientists
architecture. Private cloud saw some good boost, with nearly and business users.
25% of their workloads still being run on-premises. A logical 5) A nalyze across data at rest and data in motion—
data fabric is an architectural style which allows for users to Traditionally BI was all about data at rest, however over
access data in a virtual manner. When migrating to a cloud, 20% of respondents to TDWI surveys suggest machine
we see the implementation of the logical data fabric as an data and IoT are becoming more and more important.
important step as it helps minimize risk and adds security This means that any data fabric needs to be able to
during the migration phase. Post migration the logical data integrate both data at rest as well as real time data feeds
fabric is equally important as especially in hybrid and multi from environments such as social media or IoT data.
cloud scenarios it provides the logical data access layer through 6) C
 atalog all data for data discovery, lineage and associations—
data virtualization. As data discovery gets more complex in a hybrid / multi
Data virtualization at the core of this architecture. Data cloud world, so does the lineage and associations, it is
virtualization as a technology allows you to connect data essential in a logical data fabric to have a holistic view of
together across your enterprise, whether it be structured or the data. Smart data catalogs are now at the forefront and
unstructured data, real time or historic, on premise or in a cloud have the ability to show meta data and associations as well
environment. Data virtualization is protocol agnostic as a virtual as to include user feedback for example by way of votes
consumption layer, allowing for data consumers to use open or notes. Data catalogs should be a tool that any data user
standards-based connections, JDBC, ODBC or even GraphQL can access with their persona or role to provide access to
and see the data as if it is from a single relational data source. the data they are interested in, including the ability for
Apart from virtual data access, what characteristics are critical the solution to use AI to make recommendations for data
for a logical data fabric? According to a recent report by TDWI users in much the same way Netflix and other streaming
a logical data fabric must do the following things. platforms do for their consumers.
1) Integrate data across multiple cloud environments— As digital transformation projects are taking place at an
typically we see the use of data virtualization as the only accelerated rate in a majority of organizations, with cloud
credible way to achieve this. migration and multi/hybrid cloud deployment being essential
2) Automate manual tasks using augmented intelligence— part of the digital transformation, now is the time to put in
this area is probably the area of the most rapid technical place a data architecture such as logical data fabric that can
development with the evolution of AI and ML features support hybrid and multi-cloud environments with lowest risk,
in data integration tools such as Denodo. For example, extremely high ROI and is future proof.
the Logical data fabric should be able to auto-scale
in cloud environments based on predictive load and Denodo
compute volumes. www.denodo.com

DBTA. COM/ BI GDATAQUARTERLY 21


BIG
DATA
COMPANIES
DRIVING
INNOVATION

Today, organizations need data-driven insights to the greater focus on data, the Experian report pointed out, there
advance decision making at all levels, and digital transformation is is a need for greater investment in hiring talent with expertise in
a key component of those efforts. Supporting data-driven insights both data and data management tools, the use of more automated
and digital transformation takes an ever-growing range of services, processes, the selection of the right technologies, and the
products, and tools from forward-thinking companies that are development of data literacy programs.
working to help their customers deliver the right insights to the Digital transformation is a goal steadily being embraced,
right people at the right time. according to a study of IT professionals produced by Unisphere
Whether helping to react faster to trends, create timely products Research and sponsored by Aerospike. The majority (78%) of
and services, or enable rapid fraud detection, the availability of better respondents’ organizations have digital transformation budgets.
access to fresh, high-quality data delivers competitive advantage. This number jumps to 94% among organizations with 5,000 or
Some of the new approaches being embraced to help drive greater more employees. The top digital transformation projects being
benefit from data are DevOps and DataOps, data quality and undertaken by organizations right now are cloud solutions (73%),
governance initiatives, hybrid and multi-cloud architectures, IoT BI and data analytics (55%), and cybersecurity (42%). In addition,
and edge computing, and a range of next-gen databases. a notable number of survey respondents reported IoT and machine
The message has been clear for some time, but the last year learning/AI projects, 30% and 29%, respectively.
and a half has provided even more evidence of the need to To support organizations in navigating through new challenges
understand what is happening in real time and, if possible, even and a rapidly evolving big data ecosystem, Big Data Quarterly
anticipate future events. Still, difficulties persist, which need to presents the 2021 “Big Data 50,” a list of companies driving
be overcome. innovation and expanding what is possible in terms of collecting,
For example, a recent survey conducted by Flatfile found that storing, and extracting value from data. The list is wide-ranging,
more than three-quarters of respondents either “sometimes or often” with some companies that are longtime industry leaders continuing
run into problems onboarding data. Overall, 90% of respondents to innovate at a rapid pace, and others that are newer arrivals
must transfer data from one system to another at some point. impacting the data management and analytics scene.
The survey revealed that data onboarding is increasingly routine, We encourage you to go beyond the brief descriptions in
with 50% of respondents citing data onboarding as a daily activity. this special report and explore these companies by visiting their
Another 28% said onboarding is done weekly and 22% reported websites. In addition, on the pages following the Big Data 50 list,
that they have to onboard data multiple times per day. under the Trailblazers header, executives share perspectives on
Another survey conducted by Experian also revealed that 84% their companies’ unique approaches to driving innovation.
of respondents say there has been more demand for data insights You can also gain more insight into important market trends
in their organizations during the COVID-19 pandemic. The report about how data is being managed, consumed, and leveraged by
found that most companies did not need to start brand-new data accessing Unisphere Research’s survey reports at www.unisphere
initiatives since data projects had been in the works for years, but research.com and an extensive library of white papers at www.dbta.
had just lacked the prioritization to move forward strongly. With com/DBTA-Downloads/WhitePapers.

22 BI G D ATA QU A RTERLY | FA LL 2021


COMPANIES DRIVING INNOVATION BIG DATA

Accenture Cambridge Semantics


www.accenture.com www.cambridgesemantics.com
Providing experience and specialized skills across A modern data management and enterprise analytics software
more than 40 industries, Accenture offers strategy and company, Cambridge Semantics provides solutions that
consulting, interactive, technology, and operations
transform siloed data into enterprise-scale knowledge graphs
services—all powered by its large network of advanced
to reveal previously hidden insights, fuel pervasive analytics,
technology and intelligent operations centers.
and make previously unanswerable questions answerable.

Actian 
Cloudera
www.actian.com
Through the deployment of innovative, enterprise-class, www.cloudera.com
hybrid data products, fully managed cloud services, Powered by the innovation of the open source community,
mobile and IoT edge data management, and industry Cloudera delivers an enterprise data cloud for any data,
solutions, Actian helps ensure that business-critical anywhere, from the edge to AI, helping to advance digital
systems can analyze, transact, and connect at their transformation for the world’s largest enterprises.
very best—both on-premise and in the cloud.
Cockroach Labs
Aerospike www.cockroachlabs.com
https://aerospike.com
Founded by a team of engineers dedicated to building cutting-
Enabling organizations to rapidly act on billions of transactions
edge systems infrastructure, Cockroach Labs is the company
while reducing server footprint, Aerospike provides a
behind CockroachDB, a cloud-native, distributed SQL database
real-time data platform that helps customers fight fraud,
that provides next-level consistency, ultra-resilience, data
increase shopping cart size, deploy global digital payment
networks, and deliver personalization for customers. locality, and massive scale to modern cloud applications.

Alluxio Collibra 
www.alluxio.io www.collibra.com
Orchestrating data close to data analytics and AI/ Accelerating trusted business outcomes by
ML applications in any cloud across clusters, regions, connecting the right data, insights, and algorithms
and countries, Alluxio provides intelligent data tiering to all data citizens, Collibra provides a cloud-based
and data management to enable high performance platform that connects IT and the business to build
for customers in financial services, high tech, retail,
a data-driven culture for the digital enterprise.
telecommunications, and pharmaceuticals industries.

Couchbase
Amazon Web Services
www.couchbase.com
https://aws.amazon.com  
Combining the best of NoSQL with the power and familiarity
In 2006, AWS began offering IT infrastructure services to
businesses in the form of web services—now commonly of SQL to simplify the transition from mainframe and

known as cloud computing—and today, its infrastructure relational databases, Couchbase provides a modern
platform in the cloud powers hundreds of thousands cloud database that offers the robust capabilities
of businesses in 190 countries around the world. required for business-critical applications.

DBTA. COM/ BI GDATAQUARTERLY 23


BIG DATA COMPANIES DRIVING INNOVATION

Databricks  Franz
https://databricks.com https://franz.com
Built on a modern lakehouse architecture in the cloud, An early innovator in AI and leading supplier of graph
Databricks combines the best of data warehouses and data database technology, Franz provides AllegroGraph, a
lakes to offer an open and unified platform for data and AI that graph-based platform that unifies all data and siloed
is relied upon by more than 5,000 organizations worldwide. knowledge into an entity-event knowledge graph
solution that can support big data analytics.
DataKitchen 
https://datakitchen.io GigaSpaces
Offering an enterprise DataOps Platform that enables www.gigaspaces.com
organizations to implement and manage an end-to-end An in-memory technology vendor that is driving
DataOps program using tools they already own, DataKitchen
enterprise digital transformation, GigaSpaces is
helps to simplify complex toolchains, environments,
relied upon by hundreds of tier-1 and Fortune-listed
and teams so that data analytics organizations can
organizations and OEMs across financial services,
innovate, collaborate, and deliver on-demand insight.
retail, transportation, telecom, healthcare, and more.

DataStax 
Google Cloud
www.datastax.com
https://cloud.google.com
Helping organizations decrease total cost of ownership and
With distributed cloud solutions that provide consistency
accelerate their innovation speed, DataStax delivers DataStax
between public and private clouds, Google Cloud
Astra, an open, multi-cloud serverless database that provides
has a commitment to open source, multi-cloud,
Cassandra-as-a-service with pay-as-you-go data, simplified
and hybrid cloud—allowing customers to use their
operations, and the freedom of multi-cloud and open source. 
data and run their apps in any environment.

Denodo
GridGain
www.denodo.com
www.gridgain.com
A data virtualization leader providing agile, high-performance
GridGain Systems is a provider of enterprise-grade
data integration, data abstraction, and real-time data services
across a broad range of enterprise, cloud, big data, and in-memory computing solutions powered by Apache

unstructured data sources, Denodo helps customers Ignite, an open source in-memory computing platform

achieve faster access to unified business information. that delivers speed, scalability, and real-time data
access for both legacy and greenfield applications. 
Dremio
www.dremio.com HPE (Hewlett Packard Enterprise)
Reimagining the data lake service, Dremio eliminates www.hpe.com
the need to copy and move data to proprietary data A global edge-to-cloud company, HPE helps organizations
warehouses or create cubes, aggregation tables, and accelerate outcomes by unlocking value from all of their
BI extracts, enabling flexibility and control for data data, so they can develop new business models, engage
architects and self-service for data consumers. in new ways, and increase operational performance.

24 BI G D ATA QU A RTERLY | FA LL 2021


COMPANIES DRIVING INNOVATION BIG DATA

HVR IRI, The CoSort Company


www.hvr-software.com www.iri.com
A provider of real-time data replication technology, HVR A data management and security ISV founded in 1978,
enables organizations to plan, predict, and respond IRI provides fast data manipulation, broad data source
with the freshest data available so companies can support, fit-for-purpose job design wizards and APIs—
better serve their customers, reduce margins, resource as well as extensive third-party technology tie-ins.
plan, and ultimately improve their bottom line.
MariaDB Corp.
IBM (International https://mariadb.com
Business Machines) Providing the MariaDB Platform, an enterprise open source
www.ibm.com database solution with the versatility to support transactional,
A global hybrid cloud, AI, and business services provider, analytical, and hybrid workloads as well as relational, JSON,
IBM recently acquired Turbonomic, an application resource and hybrid data models, MariaDB enables organizations
management and network performance management to depend on a single database for all their needs.
software provider, complementing its purchase of Instana
and the launch of IBM Cloud Pak for Watson AIOps. Matillion
www.matillion.com
Immuta Optimized for modern enterprise data teams, Matillion
www.immuta.com is built on native integrations to cloud data platforms
Relied upon by data-driven organizations, Immuta such as Snowflake, Delta Lake on Databricks,
automates access control for any data, on any cloud Amazon Redshift, Google BigQuery, and Microsoft
service, across all compute infrastructure to speed Azure Synapse to enable new levels of efficiency
time-to-data, enable data-sharing with more users, and productivity across any organization.
and mitigate the risk of data leaks and breaches.
Melissa
InfluxData www.melissa.com
www.influxdata.com More than 10,000 clients worldwide, in arenas such as
InfluxData is the provider of InfluxDB, a time series platform retail, education, healthcare, insurance, finance, and
that is built to handle the massive volumes and countless government, rely on Melissa for full-spectrum data quality
sources of time-stamped data produced by sensors, and ID verification software, including data matching,
applications, and infrastructure and enables developers validation, and enhancement services, to gain critical
to build IoT, analytics, and monitoring software. insight and drive meaningful customer relationships.

Informatica Microsoft
www.informatica.com www.microsoft.com
A leader in enterprise cloud data management, Microsoft offers an array of technologies and
Informatica accelerates data-driven digital transformation, solutions for businesses of all sizes, spanning desktop
enabling companies to fuel innovation, become applications, relational database management
more agile, and realize new growth opportunities, technology, operating systems, search, and mobile
resulting in intelligent market disruptions. devices, in the cloud and on-premise.

DBTA. COM/ BI GDATAQUARTERLY 25


BIG DATA COMPANIES DRIVING INNOVATION
BIG DATA

MongoDB Redis
www.mongodb.com https://redis.com
MongoDB provides a modern, general-purpose database Enabling a competitive edge for any business by delivering 
platform that is designed to unleash the power of software and open source and enterprise-grade data platforms to power
data for developers and the applications they build and, with the applications that drive real-time experiences at any scale,
release of MongoDB 5.0, has added native time series support. Redis helps organizations reimagine how fast they can
process, analyze, make predictions, and take action on data.
NVIDIA
www.nvidia.com
Reltio
NVIDIA’s GPU deep learning has ignited modern AI—
www.reltio.com
the next era of computing—with the GPU acting as
Supporting digital transformation, Reltio provides the
the brain of computers, robots, and self-driving cars
Reltio Connected Data Platform, a proven multi-tenant,
that can perceive and understand the world.
multi-domain MDM solution that masters all data types

Ontotext in real time and at scale, giving customers enhanced

www.ontotext.com agility, scale, simplicity, security, and performance.


Employing big knowledge graphs to enable unified data access
and cognitive analytics via text mining and integration of data SAP
across multiple sources, Ontotext is a leader in enterprise www.sap.com
knowledge graph technology and semantic database engines. Known for HANA, its platform for next-generation
applications and analytics, SAP is a global provider
Oracle of enterprise application software that empowers
www.oracle.com people and organizations to work together more
Helping organizations to devote more time and efficiently and use business insight more effectively.
resources to adding value for their users and customers,
Oracle provides capabilities in SaaS, platform as a SAS Institute
service, infrastructure as a service, and data as a
www.sas.com
service from data centers throughout the world.
A leader in analytics through innovative software
and services, SAS empowers and inspires customers
Pure Storage
around the world to transform data into intelligence,
www.purestorage.com
giving users “the power to know.”
Delivering a modern data experience, Pure Storage empowers
organizations to run their operations as a true, automated,
Semarchy
storage as-a-service model seamlessly across multiple clouds.
www.semarchy.com
Quest Software Providing the xDM platform that allows organizations

www.quest.com to quickly bring together critical information scattered


Quest helps customers solve their next IT challenge, from across applications into a single data hub, Semarchy
maximizing the value of their data, to Active Directory and enables data to be discovered, mastered, governed,
Office 365 management, and cybersecurity resilience. and centrally managed in a non-intrusive way.

26 BI G D ATA QU A RTERLY | FA LL 2021


COMPANIES DRIVING INNOVATION BIG DATA

SnapLogic Syniti
www.snaplogic.com www.syniti.com
With a self-service, AI-powered integration platform, A global provider of enterprise data management, Syniti helps

SnapLogic helps organizations connect applications clients manage their data journey—across data conversion, data
quality, data archiving and replication, master data management,
and data sources, automate common workflows
analytics, information governance, and data strategy.
and business processes, and deliver exceptional
experiences for customers, partners, and employees.
Tamr
www.tamr.com
Software AG
With a cloud-native data mastering platform, Tamr provides an
www.softwareag.com alternative to traditional MDM tools by using machine learning to
Software AG empowers truly connected enterprises using do the heavy lifting to consolidate, cleanse, and categorize data.
integration and APIs, IoT and analytics, and business
and IT transformation, establishing a fluid flow of data Teradata
that allows everything and everyone to work together. www.teradata.com
As a multi-cloud data warehouse platform provider, Teradata aims
Snowflake to solve the most complex data challenges at scale and to help
businesses unlock value by turning data into their greatest asset.
www.snowflake.com
Snowflake enables every organization to mobilize
TigerGraph
its data with Snowflake’s Data Cloud, uniting
www.tigergraph.com
siloed data to discover and securely share data
Based on a distributed native graph database, TigerGraph’s proven
and execute diverse analytic workloads.
technology supports advanced analytics and machine learning
applications such as fraud detection, anti-money laundering,
SQream entity resolution, Customer 360, recommendations, knowledge
https://sqream.com graphs, cybersecurity, supply chain, IoT, and network analysis.
Used by global enterprises to analyze more data
faster than ever before, while achieving improved VMware
performance, reduced footprint, and significant cost www.vmware.com
savings, SQream provides the ability to scale the amount Powering complex digital infrastructures with its cloud, app
modernization, networking, security, and digital workspace
of data analyzed to hundreds of terabytes and more.
offerings, VMware helps customers deliver any application
on any cloud across any device.
Swim
www.swim.ai
Yellowbrick Data
Delivering Swim Continuum, a platform for building,
www.yellowbrick.com
managing, and operating continuous intelligence Helping companies to make faster decisions with all of
applications on-prem, in the cloud, or at the edge, Swim their data, Yellowbrick Data provides the Yellowbrick Data
helps users monitor data streams, anticipate disruption, Warehouse, built for enterprises and the hybrid cloud, and
and respond to global changes in their industries. offers the ability to provide powerful analytics anywhere.

DBTA. COM/ BI GDATAQUARTERLY 27


BIG DATA TRAILBLAZERS
sponsored content

DataStax  Denodo Technologies


DATASTAX HAS A MISSION to deliver THE TERM “BIG DATA” HAS BEEN
products that developers love and change around for many years, in many respects
the trajectory of enterprises.  it has come to mean almost everything
The world runs on Apache Cassandra and nothing at the same time. This is not
and DataStax was created to make the the fault of the  data, but more so how
world’s most scalable database easier to people use it and interpret it.
run and manage, deploy the future of mod- Why is this the case? there are many
Chet Kapoor, ern cloud applications cost-effectively and Ravi Shankar, factors, but essentially if you can’t see
Chairman & CEO free enterprises from cloud vendor lock-in. SVP & Chief the wood for the trees, it is hard to gain
Marketing Officer
We do that by shattering the traditional value from the data. Big data systems
methods of managing real-time data and solving pain points for fail to become the single repository for all enterprise data.
developers, while delivering always-on business continuity Organizations stumble on the challenges of moving/stor-
and bringing the power of Cassandra to every developer and ing data of different types especially in multi cloud/hybrid
enterprise, for mission-critical workloads. With DataStax, any cloud enterprises. It is quite common for no one person
developer or enterprise can now deploy data at massive scale, in an organization to have that single view of the data,
with 100% uptime, for lower cost. organizations spend more time collecting  data  than they
Through a unique open data stack for the future, DataStax do analyzing it. 
empowers any enterprise to tap the power of data without ETL processes are often the tool of choice when orga-
limits, providing a solution that is:  nizations look to integrate siloed data. ETL processes are
•K scripted to move  data  in batches and fail to deliver real
 ubernetes-based for cloud-native agility
•D time insights. They also fail to accommodate new sources
 eveloper-ready with APIs to reduce time to market
without extensive testing and coding, and of course even
for new apps
•C more challenging are more modern  data  formats such as
 loud-delivered to simplify operations and reduce TCO
streaming IoT or unstructured  data  (which actually are
Uniquely positioned to deliver the modern database of often the real key to success in big data projects).
the future, DataStax harnesses its power to solve real business Data virtualization on the other hand is a data integration
problems by making the distribution of data easy to scale, technology that integrates data in real time, without the need
accelerating the data-driven enterprise and streamlining for replication. Data virtualization allows organizations to
developer operations. establish flexible modern disparate logical data architectures
Today, nearly 500 of the world’s most demanding enterprises such as a logical data fabric, allowing them to draw data 
and half of the Fortune 100 rely on DataStax to power modern seamlessly across silos of a big data implementation.
data apps, including Netflix, The Home Depot, T-Mobile, Intuit The award-winning Denodo Platform offers the most
and so many more. advanced  data  virtualization capabilities available for estab-
lishing a logical data fabric to maximize big data investments.
Its built-in data catalog provides seamless access to data via a
DataStax searchable, contextualized interface, and in-memory parallel
www.datastax.com processing accelerates data access to unparalleled speeds.

Denodo Technologies
www.denodo.com

28 BI G D ATA QU A RTERLY | FA LL 2021


TRAILBLAZERS BIG DATA
sponsored content

Franz Inc. HVR


ENTERPRISE KNOWLEDGE HVR’S BEST-IN-CLASS real-time data
GRAPHS FOR A MODERN replication technology removes the com-
BIG DATA ARCHITECTURE plexity of high-volume data movement. Its
Industry analysts recognize the power enhanced user interface and use of REST
of Knowledge Graphs in delivering a APIs makes it simple to design and orches-
modern big data architecture that provides trate real-time, continuous data pipelines
integrated, trusted, and real-time views of between various on-premises database
Jans Aasman, enterprise data. The accelerating adoption Mark Van de Wiel, technologies like Oracle, SQL Server, SAP
CEO Chief Technology HANA, DB2, PostgreSQL, and others, into
in the enterprise of this Knowledge Graph
Officer
approach, which unifies business data with knowledge bases, cloud-based platforms such as Snowflake,
industry terms, and domain knowledge, is clearly the future AWS, Azure, Google Cloud, and more. Plus, with the ability
of AI and advanced analytics. to replicate one-to-one, one-to-many, and many-to-one, you’re
Franz’s AllegroGraph platform further extends this mod- guaranteed fresh, accurate real-time data where you need it
ern Knowledge Graph approach with a novel Entity-Event when you need it. 
Model, natively integrated with domain ontologies and meta- As an all-in-one solution, HVR comes with a robust feature
data, and dynamic ways of setting the analytics focus on all set and diverse capabilities. Its non-intrusive log-based Change
entities in the system (patient, person, devices, transactions, Data Capture (CDC) technology is asynchronous and moves
events, operations, etc.) as prime objects that can be the focus of changes as they happen, reducing the load on the source
an analytic (AI, ML, DL) process. system and ensuring optimum efficiency. Its high-performing
The Entity-Event Data Model utilized by AllegroGraph and modular distributed architecture enhances security and
with FedShard puts core “entities” such as customers, patients, scales to accommodate growing business needs. 
students, or people of interest at the center and then collects No matter where you are on your digital transformation
several layers of knowledge related to the entity as “events.” journey, HVR makes real-time data more accessible than ever.
Events represent activities that transpire in a temporal context. HVR Agent as a Service for Azure and AWS Quick Start for
The rich functional and contextual integration of multi- HVR are quick and easy ways to revolutionize your business
modal predictive modeling and artificial intelligence is what with real-time cloud data. Go beyond simple replication and
distinguishes AllegroGraph as a modern, scalable, enterprise realize the full potential of your data with HVR.
knowledge platform. AllegroGraph is the first big tempo- CONTACT US AT [email protected] or visit
ral Knowledge Graph technology that encapsulates a novel hvr-software.com to learn more.
entity-event model to deliver a modern data architecture to
the Enterprise.
Financial institutions, healthcare providers, contact cen- HVR Software
www.hvr-software.com
ters, manufacturing firms, government agencies, and other
data-driven enterprises that use AllegroGraph gain a holistic,
future-proofed Knowledge Graph architecture for big data
predictive analytics and machine learning across complex
knowledge bases to discover deep connections, uncover new
patterns, and attain explainable results.
CONTACT FRANZ INC. TODAY to build your Enterprise
Scale Knowledge Graph solution.

Franz Inc.
https://franz.com

DBTA. COM/ BI GDATAQUARTERLY 29


BIG DATA TRAILBLAZERS
sponsored content

InfluxData IRI, The CoSort Company


TIME SERIES DATA CONSISTS OF WHAT OUR 43 YEARS IN BIG DATA
measurements or events that are captured MEAN TO YOU.
and analyzed, often in real time, to operate
Since 1978, Innovative Routines Inter­
a service within an SLO, detect anomalies,
national (IRI) has specialized in the manip-
or visualize changes and trends.
ulation and management of “big data.”
Many analyses measure changes over
Long before the term fell under the rubric of
time—time series data is everywhere.
Hadoop, our customers used it to describe
Ryan Betts, Developer friendly time series databases
David Friedland, their very large file and database sources
VP Engineering must support high-performance ingest and SVP & COO transformed and reported on in CoSort ...
real-time analytics for graphing, alerting,
and still do!
and alarming, and the ability to perform rich historical
Though today’s CoSort-powered activities are far more
analytics against the collected data.
wide-ranging, they still run famously fast in volume. They are
InfluxData’s purpose-built platform handles the massive
front-ended in Eclipse and are now available in an affordable,
volumes of time-stamped data produced by IoT devices,
all-in-one data management platform called Voracity, for
applications, networks, containers and computers to allow
multi-source:
just that. Programmable and performant with a common API
across OSS, cloud, and Enterprise offerings, InfluxDB gives •D
 ata Discovery—classifying, diagramming, profiling, and
you high granularity, high scale, and high availability. Cap- searching of structured, semi-structured, and unstruc-
ture, analyze, and store millions of points per second using tured data sources, on premise or in the cloud
the popular Telegraf plugins, and gain visibility across all •D
 ata Integration—individually optimized and consoli-
your data sources. dated E, T, and L tasks, plus change data capture, slowly
Easily write data into InfluxDB as well as query the stored changing dimensions, and the ability to speed or leave
data using its language-specific client libraries and quickly legacy ETL tools
start monitoring your data using InfluxDB Templates, pre- •D
 ata Migration—and conversion of data types, file for-
made monitoring solutions that allow users to create and mats, and database platforms, plus incremental or bulk
share a comprehensive monitoring solution. data replication and federation
Companies like Cisco, IBM, Hulu and MuleSoft store and •D
 ata Governance—PII data masking and re-ID risk scor-
analyze real-time data, empowering them to build transfor- ing, DB subsetting, synthetic test data generation, data val-
mative monitoring, analytics, and IoT applications quicker idation, cleansing, and enrichment, master and metadata
and to scale. management, etc.
Using InfluxDB, IBM created a solution that ensures online •A
 nalytics—embedded reports, feeds to DataDog, KNIME,
fraud protection SLA requirements are met and enables and Splunk, and fast data wrangling for BOBJ, Cognos,
billing for customer overages by collecting logs and metrics Cubeware, iDashboards, Microstrategy, Oracle, Power BI,
from 85,000 network devices for IBM Cloud. Qlik, R, Spotfire, and Tableau
Voracity’s common, reusable metadata and Eclipse IDE
support multiple job design, deployment, and sharing options.
InfluxData So whether you are a DBA, BI/DW architect, data scientist,
www.influxdata.com data privacy officer, application developer, or IT manager, you
can leverage and collaborate in a one-stop solution stack that
features the best of old and new techniques.
FOR MORE INFORMATION, visit www.iri.com/voracity.

IRI, The CoSort Company


www.iri.com

30 BI G D ATA QU A RTERLY | FA LL 2021


TRAILBLAZERS BIG DATA
sponsored content

Reltio TigerGraph
RELTIO DISRUPTED the master data WHO IS TIGERGRAPH?
management (MDM) software market when TigerGraph is a platform for advanced analytics
it launched the first cloud-native MDM and machine learning on connected data. Based
software-as-a-service (SaaS) platform in on the industry’s first and only distributed native
2011. The Reltio Connected Data Platform graph database, TigerGraph’s proven technology
is a proven multi-tenant, multi-domain supports advanced analytics and machine learning
MDM platform that masters all data applications such as fraud detection, anti-money
Venki Subramanian, types in real-time and at-scale. Custom- Dr. Yu Xu, laundering (AML), entity resolution, customer
Vice President ers benefit from agility, scale, simplicity, CEO
360, recommendations, knowledge graph, cyber-
of Product
security, and performance unmatched by security, supply chain, IoT, and network analysis.
Management
Reltio’s competitors.
HOW DOES TIGERGRAPH HELP ORGANIZATIONS OF
Leading Global 2000 companies in Life Sciences, Health-
ALL SIZES?
care, Financial Services and Insurance, Retail, Consumer
Organizations of all sizes Connect, Analyze, and Learn from
Products, High Tech, and Travel and Hospitality manage
Data with TigerGraph:
mission-critical data on the Reltio Connected Data Platform.
•C
 onnect internal and external datasets and pipelines with a
Reltio Connected Data Platform is a multi-tenant, multi-do-
distributed Graph Database
main MDM solution that enables companies to create a uni-
•U  nitedHealth Group is connecting 200+ datasets and Kafka-
fied, trusted data repository for operational, analytical, and
based streaming data pipelines to deliver a real-time customer
real-time requirements of enterprises. Reltio Enterprise 360,
360 to improve quality of care for 50 million members;
Reltio Enterprise 360 Site Intelligence (for the life sciences
•X  andr (part of AT&T) is connecting multiple data pipelines
and pharmaceutical industries), Reltio Connected Customer
to build an identity graph for entity resolution to power the
360, and Reltio Identity 360 provide the flexibility, scalability,
next-generation AdTech platform.
security, business continuity, and choice that only a cloud-native
•A
 nalyze connected data for never-before insights with
MDM SaaS platform can deliver.
Advanced Analytics
Reltio Connected Data Platform uniquely features big
• J aguar Land Rover has accelerated supply chain planning from
data architecture to manage massive data volumes in real-
three weeks to 45 minutes, reduced supplier risk by 35%, and
time for operational, analytical, and data science use cases,
has added 100 million pounds annually in profits.
an API-first SaaS business model for rapid configuration
•N  ewDay, a leading specialist financial services provider and one
and responsive data management, and Connected Graph
of the largest issuers of credit cards in the UK, uses advanced
technology to discover relationships.
graph analytics to prevent and preempt financial fraud.
•L
 earn from the connected data with In-Database Machine
Learning
Reltio • Intuit has built an AI-based customer 360 with in-database
www.reltio.com
machine learning for entity resolution, personalized recom-
mendations, and fraud detection. Graph-based machine learn-
ing at Intuit has reduced at-risk (fraud) events by 50% while
improving accuracy by 50%. reducing false positives.
• 7 out of the top 10 banks are driving real-time fraud detection
and credit risk assessment with in-database machine learning.
GET STARTED TODAY WITH THE FREE TIER at http://
www.tigergraph.com/cloud.

TigerGraph
www.tigergraph.com

DBTA. COM/ BI GDATAQUARTERLY 31


BIG DATA BY
HADOOP-TO-CLOUD MIGRATION PRIORITIES

R
ecent years have seen an acceleration of cloud adoption as the next-
generation platform for running big data and analytics. Likewise, Hadoop-
specific vendors have been consolidating, further fueling ambiguity as
companies look to make meaningful decisions about the long-term development
and management of their big data platforms.
To better understand current plans, priorities, and challenges associated with Hadoop data
migration to the cloud, Unisphere Research, a division of Information Today, Inc., conducted
a study in partnership with Radiant Advisors, sponsored by WANdisco. The goal of the study was
to objectively distill insights benefiting companies that have not already migrated their on-prem
Hadoop data to the cloud.
The full research summary, “2021 Hadoop-to-Cloud Migration Benchmark Report,” authored by
John O’Brien and Lindy Ryan, is available at www.dbta.com/DBTA-Downloads/ResearchReports.

This survey found that nearly three-quarters This migration process is a race against time—
of respondents have either already migrated, most companies expect their total volume of
are in the process of migrating, or intend data to increase over the next year, and the
to migrate their on-prem Hadoop data to vast majority expect their unstructured
the cloud. data to increase.

What is the current status of your For your unstructured data in the cloud, how
organization’s Hadoop data migration to the cloud? do you expect your volume to change over the next year?

Increase
Have not started, but
likely will in the future 20.09% 59.07%
Started planning, but Total 84.19%
migration not yet started 18.22%
In progress,
14.02%
but not completed Increase
significantly Remain
Fully completed 21.03% the
25.12%
Do not plan to migrate our same
Hadoop data to the cloud 26.64% 15.35%
Decrease
Total 73.36% 0.47%

32 BI G D ATA QU A RTERLY | FA LL 2021


THE NUMBERS
The leading Top 5 Drivers for Migrating Hadoop Data to the Cloud
drivers for
companies Data modernization initiative 78.43%
migrating Hadoop Cloud-scale analytics (Modern AI/ML
data to the and analytics tooling available)
60.78%
cloud are data
modernization initiatives followed Adopt scalable cloud storage 49.02%
by the desire to take advantage of Cost management 41.83%
cloud scale analytics with modern
AI/ML and analytics tooling. Upcoming Hadoop license renewal 33.99%

Demonstrating the mission-critical or essential The most important capability in selecting


nature of Hadoop applications, survey respondents migration tools was to ensure no data loss
indicated the acceptable downtime for the Hadoop with data migration validation, reinforcing
migration to the cloud was measured in hours the mission-critical nature of Hadoop.
followed by days. Zero downtime was required by
about one-fifth of respondents.
Most Important Capabilities for Products
For recent or planned Hadoop migrations for production and Tools Used for Hadoop Data Migration
data to the cloud, how much downtime is acceptable?
1.96% Data migration validation (ensure no data loss) 88.82%
1.96%
21.57% Months Weeks Support for cloud object storage 73.68%
Zero (e.g., AWS S3, Azure Blob Storage, ADLS Gen2,
Google Cloud Storage, etc.)
33.99%
40.52% Days Support data changes during migration 63.16%
Hours (does not require downtime and ensures changes
are also migrated from on-premise to cloud)

Selective migration (ability to define what 57.24%


data should and should not be migrated)
What migration products or tools have you used
(or plan to use) for the Hadoop data migration? Support for bidirectional replication 40.13%
(allow changes on either on-premise or cloud
Bulk transfer devices (e.g., AWS Snowball, Azure Data Box) 55.56% environment and ensure consistency is maintained)
Apache DistCp (distributed copy) 48.37% Non-intrusive migration (does not require 38.16%
WANdisco LiveData products 24.84% any changes to applications, HDFS cluster, or
node configuration)
Cloudera BDR or Replication Manager 23.53%
Bandwidth management 31.58%
Azure Data Factory (ADF) 20.26% (ability for admin to configure amount of
network bandwidth to use for migration process)
Custom developed 16.99%
Support for multi-petabyte 27.63%
Other 3.92% scale migrations

DBTA. COM/ BI GDATAQUARTERLY 33


YOUR
CONNECTION
TO THE
INDUSTRY

Database Trends and Applications


produces 10 original email newsletters,
each with targeted content on specific
industry topics. Subscribe today to dbta.com/newsletters
receive concise reports on what’s
happening in the data world.
sponsored content

Big Data Packaging & Provisioning


According to the Open
Knowledge Foundation,
data packaging “is a simple way of
putting collections of data and their
descriptions in one place so that they can
be easily shared and used” and that a
data package is “in a format that is very
simple, web friendly and extensible.”
To IRI, and many people in
the world of data processing and
data science, data packaging is a
manifestation of data integration,
staging, or wrangling operations.
But beyond data manipulation
and movement, packaging can
also involve data consolidation or
segmentation, data cleansing, and
PII anonymization/masking.
Considered essential as well Big data packaging in IRI Workbench, the Eclipse IDE for IRI Voracity
are that these capabilities run fast,
comply with privacy laws and WHAT CAN I DO SPECIFICALLY, • Data Validation, Cleansing,
business rules, and are affordable. AND HOW DO I DO IT? and Standardization
IRI software readily and reliably More specifically, you can use the • (Embedded) Reporting (embedded BI)
packages data, big and small, from IRI Voracity “total data management” • Data Migration and Replication
multiple sources and silos on-premise platform powered by CoSort (or • Test Data Generation
or in the cloud. This ability stems from Hadoop) to package disparate sources (see Big Data Protection)
the company’s traditional strength— of data. You can unify and distill related • Data Wrangling for Power BI
fast sorting (via IRI CoSort) —and elements into multiple, purpose-built, Qlik and Tableau, or analytic
its tight coupling with concurrent custom-formatted targets ready for software like R or KNIME
transformations it supports, including: research and analytics. With Voracity, Most of these activities can be
lookups, joins, aggregations, filtering, you can do all of these things: specified and combined in wizard-driven,
masking, and remapping. • Data Integration, including: task-consolidating, single-IO job scripts,
Today, you can leverage the CoSort • Data acquisition or well-diagrammed batch workflows
engine or interchangeable Hadoop (extraction), manipulation that contain them. Using the intuitive
engines (MR2, Spark, Spark Stream, (transformation), and diagrams or self-documenting text files
Storm, and Tez) within the IRI Voracity population (loading) managed in the free IRI Workbench
data management platform to package • Data filtering, cleansing IDE (built on Eclipse), you can easily
data in many ways. Combine, munge, and validation (data understand, modify, run, schedule,
cleanse, mask, and mine structured and quality improvement) and share your combined packaging,
semi-structured internal and ‘open’ • Data consolidation and protection, and provisioning jobs.
sources for analytics, governance, standardization (MDM) For more information email
and DevOps. There are also many • Data federation and [email protected].
things you can do with semi- and virtualization
unstructured data discovered and • Data Classification, Scanning & IRI, The CoSort Company
extracted in Voracity. (See Figure) Masking (see Big Data Protection) www.iri.com/solutions/big-data

DBTA. COM/ BI GDATAQUARTERLY 35


DATA SCIENCE PLAYBOOK
Cybersecurity Is a Data Problem
CYBERSECURITY CAN BE SUMMARIZED AS A DATA PROBLEM. The amount The applications can then be deployed in an on-prem data center,
of data associated with networks is massive, especially when cloud, or hybrid cloud scenario—inspecting traffic in real time, mak-
compared to the percentage of traffic that is actually malicious. ing decisions, and automatically updating security rules for newly
There is just too much data to analyze in a single day, and this found threats across all the data feeds to stop attacks at the front door.
problem is compounded on a daily basis. Threat tactics are con- Every packet, every application log, and every network flow rep-
stantly changing, with events occurring at a higher frequency, resents the heterogeneous data that can be ingested and processed.
which is forcing the network security industry to prepare for
and react to any questionable situation. To make matters worse, Considering All Data Feeds
cybersecurity specialists are in very high demand, and there is From a data source perspective, it is critically important to con-
a limited pool of talent from which to draw. sider all the different data feeds. These include publish/subscribe
Traditionally, a cyber-event would be identified and escalated systems (e.g., Kafka and Pulsar), data files (application logs), data
via static rules determined through known signatures, previously direct from a security information and event management (SIEM)
identified by a security specialist’s triage, review, and response. system or a security orchestration, automation, and response
Technology is now helping to fill the gap and assist the limited (SOAR) system, or even other sources of threat intelligence. After
resource pool in successfully executing these processes on a daily processing data from myriad sources, findings can be fed back into
basis with increased efficiency. systems such as the SIEM or SOAR as insights, policies, or actions
as they have been determined through the customized pipeline.
Machine Learning for Cybsecurity All of this is accomplished using machine learning rather than a
Machine learning-driven approaches for cybersecurity are rules-based approach. Pretrained models provided by Morpheus
dynamic and focus on prevention in order to reduce risk and min- can be leveraged to handle a variety of use cases including distrib-
imize impact of a successful attack event. All components and steps uted denial-of-service (DDoS) situations, leaked sensitive informa-
are automated in an effort to reduce the time and effort of the security tion, anomalous behavior profiling, phishing detection, predictive
teams to address attacks. Over the last decade, security analytics pro- maintenance, network mapping, asset classification, domain gen-
cesses have evolved from basic offline batch analysis using statistical erating algorithm (DGA) detection, and generic lightweight online
metrics to real-time machine learning (and deep learning) techniques. detection of anomalies (LODA). Specifically for LODA, Morpheus
The next evolution of machine learning approaches is logistical: can be deployed across many different telemetry streams in parallel
data pipelines that leverage data ingestion, processing, and inference to monitor for anomalies—and all this is just the tip of the iceberg.
followed by an action. One such open source framework providing
these facilities that was recently announced is called Morpheus. Limiting the Threat of Attacks
Morpheus provides a streamlined and customizable pipeline, Morpheus has models such as cyBERT (natural language pro-
combining the data-preprocessing, inferencing, post-processing, cessing) to handle automatic parsing of new and unknown log
and decision-making steps, all of which may be tailored to the formats, but it can also leverage XGBoost tree-based models to
environment. With such a framework, cybersecurity developers perform anomaly detection. To add to the simplicity and to sup-
can easily create their own tools to support their very specific port the wide range of models, Morpheus integrates with MLFlow,
business environments. Starting with a pretrained model, devel- an open source machine learning model repository, so that models
opers can customize and optimize those models with their own can be managed, trained, and tested offline on historical data before
datasets, and apply the AI pipeline to suit their needs, depending they are deployed into the production network security workflow.
on their network and data sources. With Morpheus, the foundation is in place to ingest data
and then process, analyze, classify, react, and repeat in an auto-
mated manner. Morpheus models also run on GPUs, providing
Jim Scott is head of developer relations, engineers the absolute fastest solutions available for cybersecurity
Data Science, at NVIDIA (www.nvidia.com). in the marketplace, which is critical given that security is based
Over his career, he has held positions running on time-to-action.
operations, engineering, architecture, and QA With the help of machine learning models and a customizable
teams in the big data, regulatory, digital adver-
pipeline framework in Morpheus, security developers can more
tising, retail analytics, IoT, financial services,
manufacturing, healthcare, chemicals, and geographical easily manage the massive influx of data to limit the threat that
information systems industries. malware, ransomware, phishing, and other malicious attacks have
on organizations, both large and small.

36 BI G D ATA QU A RTERLY | FA LL 2021


DATA DIRECTIONS
The Age of Pirates Is Being
Revisited in Today’s Digital World
THE FIRST PIRATES APPEARED IN THE 14TH CENTURY when they attacked
the ships of the Aegean and Mediterranean civilizations. Cen- Today’s privateers do not wield deadly swords,
turies later, history’s most renowned and romanticized pirate,
Edward Teach—more commonly known as Blackbeard—sailed
but they are expert with a keyboard.
the Caribbean and became a legendary outlaw of the waves.
Blackbeard was arguably the most terrifying pirate of all time. In
the heat of battle, Blackbeard would tie lit fuses (slow matches) to
his beard as he boarded an ill-fated vessel in blazing glory.

Seeking Cryptocurrency, Not Gold


However, as is often the case, the real history is even more
interesting than the legends. Most pirates from time to time
worked for European governments that could either not afford
to build out capable navy fleets or simply wanted to maintain a
farcical level of plausible deniability. Many of these pirates were
more aptly described as “privateers,” and they were sponsored
by countries through a “letter of marque.” Expressed more suc-
cinctly, these seaborn outlaws of lore who plundered the massive
supply lines of commerce between the new and old worlds were
in reality secret agents of national governments whose job it was
to wreak havoc on opposing nations’ supply chains. For this, their
payment was a percentage of the plunder that they could capture
and unload in a safe location such as Nassau in The Bahamas.
Most European countries used this approach at some point.
Ironically, the highly unnecessary Battle of New Orleans (the It is true that today’s plunder is not gold, but cryptocurrency, and
War of 1812 had concluded weeks before, but communication it won’t be buried on a beach on a Caribbean island; instead it willl
was slow in those days), which vaulted future president of the be protected by multinational banks using blockchain technology.
United States Andrew Jackson to national hero status, would not
have been won by U.S. forces without the services of privateer Modern-Day Privateers
Jean Lafitte. Anyone who doubts the importance of Lafitte’s But as we so often note, the more things change, the more they
actions should visit the National Historical Park and Preserve remain the same. Today’s privateers do not wield deadly swords,
in Louisiana that proudly brandishes his name. but they are experts with keyboards. The modern-day privateers
sail the information superhighway and seek out vulnerability in the
supply chains of their targeted antagonistic states. The exposures
Michael Corey, D.B.A., is co-founder of License are simple. They search for unqualified and untrained IT admins
Fortress. (www.licensefortress.com). He was who fail to upgrade common software, weak passwords and the
recognized in 2015 and 2017 as one of the Top lack of multi-factor authentication, and, most significantly, critical
100 people who influence the cloud, and is an infrastructure connected to the internet.
Oracle Ace, VMware vExpert, a former Microsoft In a previous article, we emphasized that the best cybersecurity
Data Platform MVP, and a past president of the IOUG. Check
was “concrete backed up by air,” and we now reiterate that point.
out his blog at http://michaelcorey.com.
The tools of the cyber-pirate of 2021 include inexpensive, every-
Don Sullivan has been with VMware (www. day software such as virtual private networks (VPNs) and secure
vmware.com) since 2010 and is the product line
browsers such as “Tor” that can obscure attackers’ IP addresses.
marketing manager for Business Critical Appli-
cations and Databases with the Cloud Platform Phishing attacks which lure the unsuspecting email user to inad-
Business Unit. vertently reveal various open and exposed paths to the entire envi-
ronment are another favorite weapon of the cyber-pirate.

DBTA. COM/ BI GDATAQUARTERLY 37


DATA DIRECTIONS
Unlike their historical ancestors who festers within the U.S. and he may very well
carried out their acts during a violent war, The ghostly apparition on the have a valid point. But it is obvious that
these pirates of the information super- the state-sponsored actions against the
highway hone their skills so that they can
horizon may not be a ship flying the U.S. are on the rise, and, unless the U.S.
attack remotely when war is imminent or skull and crossbones but, in 2021, and other Western governments decide to
whenever else they wish. They can use a plastic keyboard, an untraceable reclaim the high ground of this ongoing
these cyberskills to disrupt key infrastruc- and comically fearsome moniker, cyberwar, more damage will be done. It is
ture such as a gas pipeline, a hospital, or likely that at some point we will experience
other critical facets of an ever-narrowing
and an internet connection may be the cyber-equivalent to 9/11.
and fragile supply chain. While they do equally menacing and deadly.
not carry letters of marque, we know many The Remedies for Cyberterrorism
are state-sponsored. The evidence is over- The remedies are quite simple. First, all
whelming. In our article “Data Security— critical infrastructure should be removed
and the Real Dragon in the Room,” we from the physical internet. We have taken
asked the tough questions about the Mar- this position before, and we believe in that
riott data breach: “Who did it? What hap- position today. All personnel working on
pened to the data? What do they plan to do critical infrastructure should be required to
with the data?” In the case we highlighted, undergo background checks equivalent to
the data had already been absent from vis- government security clearances, and those
ibility for 4 years. If the cyber-pirates had same people should have certified exper-
all this data but yet not a single credit card tise in information technology hygiene
number was sold on the dark web, why and security as well as in their specific area
was it stolen? Only a government-spon- of expertise. Of course, this approach will be
sored cyberterrorist could or would gather more expensive—but the $5 million paid
information for years and not monetize by Colonial to unlock its system to start the
it. This suggests that the modern-day flow of fuel from Texas to New York was
cyberterrorist has the backing of nation a miniscule amount when considering the
states similar to their historical brethren. potential damage brooding on the horizon.
The recent cyberattack in May 2021 Whatever the reasons, the countries
forced a shutdown of a top U.S. pipeline that host these cyberterrorists choose to
operated by Colonial Pipeline. The New York Times reported look the other way, as they collect their cybercurrency plunder.
the pipeline was 5,500 miles long and carried 45% of the East And, in the absence of strong governmental action, cyber-pirates
Coast’s fuel supplies. It is believed that this act of cyberterror- will continue to dominate global headlines.
ism was perpetrated by a criminal group known as DarkSide. Regardless of where they live or what bunker they hide within,
It is likely that this is a bunch of personally unintimidating they will project an ominous presence similar to Blackbeard’s
ne’er-do-wells as opposed to the ominous physical presence blaze of glory. The ghostly apparition on the horizon may not be
posed by Blackbeard and his cohorts, but they are equally destruc- a ship flying the skull and crossbones but, in 2021, a plastic key-
tive. Longtime Russian president Vladimir Putin proclaimed board, an untraceable and comically fearsome moniker, and an
recently in Geneva that most of the cybervillainy comes from and internet connection may be equally menacing and deadly.

FOR SPONSORSHIP
DETAILS,
CONTACT
WINTER STEPHEN FAIG
STEPHEN @ DBTA.COM
2021 OR 908-795-3702

38 BI G D ATA QU A RTERLY | FA LL 2021


THE IoT INSIDER
Is There an IoT Rainmaker in Your Company?
A RAINMAKER IS SOMEONE who brings in lots of business to your
company. Rainmakers appear across a number of industries, but Digitial rainmakers take advantage of changes
what unites them is their ability to acquire affluent clients and in customer behavior and changes in technology
generate significant revenue for your company.
and—most importantly—bend the rules
A “digital rainmaker” not only makes your business more
efficient with the use of technology, but also turns you into an of the market to create tremendous confusion
industry disruptor by tapping into what the World Economic and disruption for competitors.
Forum calls “the $100 trillion opportunity.”
Digital rainmakers have a quick-strike mentality. They take
advantage of changes in customer behavior and changes in tech-
nology and—most importantly—bend the rules of the market to Internet of Things: Fortunately, the technologies that are
create tremendous confusion and disruption for competitors. on the rise, such as IoT, can help with ESG. IoT technology
Take Elon Musk: He introduced an electric vehicle to bank on helps industries answer “when, where, why, and how” ques-
the desires of customers to be more “green” and at the same time tions better by gathering the data that is needed to answer
explored the concept of self-driving cars by leveraging AI technol- those questions. Getting products connected and making them
ogy. To top it off, he disrupted the competition by establishing a smarter can supercharge the customer experience into a supe-
new sales channel, selling new cars through the web rather than rior one—more interactive and intelligent—and also make
through an expensive dealer network. It left his competition the production and the product itself more efficient, helping
completely bewildered and confused. to reduce waste.
For example, in agriculture, smart farms are using advanced
Leapfrog Your Competition IoT technologies to help farmers reduce waste and increase
Competition, however, has ways of leapfrogging innovation. productivity. This helps them produce more food with fewer
So, what innovations will the next generation of digital rainmakers resources. Data derived from IoT sensors provides essential sta-
take advantage of? Will the real winners have innovations where tistics on weather forms, soil humidity, and mineral levels to
you think, “Really? It’s that simple? Why didn’t I come up with it?” lower waste and enables better yields. IoT, in combination with
Here are some of the things digital rainmakers will probably use other technologies, such as AI, paves the way for precision agri-
in the near future to conquer the market: culture, smart greenhouses, and monitoring of livestock.
Environmental, Social, and Governance (ESG): While not Payment and Billing: Answering the “Wh” questions (who,
technology per se, ESG is an inevitable part of the equation. If your what, where, when) leads to an understanding of how and
new innovation hasn’t a green, sustainable, environmentally when products are used, opening the door to new payment and
friendly aspect, having a positive impact on, for example, decarbon- billing models. Imagine that you don’t have to pay upfront for
ization, it will not only be hard to get the masses interested, but also a self-driving car feature, which might be really expensive, and
to attract finance. Currently, government funds, banks, and even only pay when you feel the need to use it—for example, on a
venture capitalists are very sensitive to this aspect. Green banks are long journey. Where Tesla is still offering this as an expensive
already a given in Europe and Australia and on the rise now in the option, VW is going to offer the feature for a mere €7 per hour.
U.S. too. So, if it isn’t green, they are not keen. You had better make
sure that any technical innovation you have is making a sustainable Going Digital
contribution to society at large if you want easy access to funds. By leveraging digital, every industry will adopt subscription
and pay-as-you-use models to compete with traditional purchas-
ing methods. Gartner predicts that by 2025, at least 23% all busi-
ness will have adopted such a model.
All of these kinds of innovation can alter your customers’
Bart Schouw is vice president
of technology and digital alliances, behavior and disrupt your competition for sure. The pandemic
Software AG (www.softwareag.com). has shown clearly that the risk of doing nothing in digital is
exceeding the risk of doing something. Now is the time to seize
the opportunity to become the next digital rainmaker.

DBTA. COM/ BI GDATAQUARTERLY 39


GOVERNING GUIDELINES
Next, Ask: Why Not?
AUTHOR AND INSPIRATIONAL FIGURE SIMON SINEK has ably demon-
strated that great leaders (and leading companies) “start with AI and other data-driven solutions don’t
why.” No argument there. But validating whether your why is just learn at scale; they execute at scale,
authentically and appropriately reflected in the solutions you
thereby amplifying small errors or biases
deploy requires a follow-up question, namely: Why not?
Humans are notoriously protective creators. We naturally, and reinforcing patterns of behavior.
and with the full conviction of our good intent, eloquently
defend the “why” or—more often—the “how” of the solutions
we create zealously. But, as stories of unintended consequences
and harms accumulate, the need for mindful critique has become
self-evident.
AI and other data-driven solutions don’t just learn at scale;
they execute at scale, thereby amplifying small errors or biases
and reinforcing patterns of behavior. Ensuring data-driven
solutions are healthy, safe, productive, and just requires the
creators of AI solutions to become their own worst critics.
To be sure, governance and criticism get a bad rap. Governance
has become synonymous with rigid control and constraints; criti-
cism, with negativity alone. Both terms invoke a means by which
to stifle innovation or stall progress. Yet, this is not the objective
of governance or criticism.
As author Max Florschutz highlighted in a July 20, 2020, trea-
tise on literacy criticism, being our own worst critic is not about
denigrating our creations. A “critic used to be someone who intended outcome, might not be well-received, and might not
carefully judged the merits of a work” so as “to help the creator safeguard human dignity and liberties.
improve by knowing where to focus their efforts next.” This, in Also think of why a person might not use the application
a happy turn, is the ultimate objective of effective governance. as intended, might not match our expected user profile, may
not expect something to work that way or not want to use the
Ask Questions First solution, and may not trust our intent.
Do your users or customers want what you think they want? Try to consider why the data might not accurately represent
Do they believe your products and services are working in their the current state, not reflect a desired future state, not be appro-
best interest? Do they want to engage with you? Will they trust priate to use in this context, not have been intended for this use,
you? Should they? or not represent what we think it represents.
In the rush to bring AI and data solutions to bear, don’t guess Also evaluate why the model might not accurately depict
and don’t just ask, “Why?”; also ask, “Why not?” Consider why cause and effect, not be optimized for what we intended, not
this application might not be a good idea, may not lead to our be sustainable, or not lead to a defensible conclusion/action.

Asking ‘Why Not?’


Asking “Why not?” does not undermine the legitimacy of our
Kimberly Nevala is a strategic advisor at efforts or call into question our intent. Rather, robust critique
SAS (www.sas.com). She provides counsel improves data-driven products and services. It shifts our point
on the strategic value and real-world realities of view and invites others into the fray to challenge our beliefs
of emerging advanced analytics and infor-
mation trends to companies worldwide. and assumptions, thereby enabling us to identify and remediate
She is currently focused on demystifying a spectrum of ills, from simple logical fallacies to gross mistakes
the business potential and practical implications of AI and in judgment.
machine learning. Are your AI and data-driven solutions part of the problem or
the solution? Why not go ahead and ask?

40 BI G D ATA QU A RTERLY | FA LL 2021


DATA
Now, more than ever, the ability to pivot and adapt is a key characteristic of
modern companies striving to position themselves strongly for the future.
Download this year’s Data Sourcebook to dive into the key issues impacting
enterprise data management today, and gain insights from leaders in cloud,
data architecture, machine learning, and data science and analytics.

Download Your Copy Today!


https://bit.ly/BDSbook8

FROM THE PUBLISHERS OF


melissa·
Melissa.com
1-800-MELISSA

You might also like