apidays LIVE Australia 2021 - Levelling up database security by thinking in APIs by Lindsay Holmwood, Cipherstash

Levelling up
database security
by thinking in APIs
Lindsay Holmwood
@auxesis
Chief Product Officer @ CipherStash

Techniques for building secure APIs have
improved tremendously over the last decade.
Database security is mostly unchanged.

Average breach costs
$4.24m USD
10% increase in
average total cost of breach
between 20202021

New York
JULY
Australia
SEPTEMBER
Singapore
APRIL
Helsinki & North
MARCH
Paris
DECEMBER
London
OCTOBER
Jakarta
FEBRUARY
Hong Kong
AUGUST
JUNE
India
MAY
Check out our API Conferences here
50+ events since 2012, 14 countries, 2,000+ speakers, 50,000+ attendees,
300k+ online community
Want to talk at one of our conferences?
Apply to speak here

The landscape is changing
○ Compliance requirements (e.g.,
GDPR, CCPA are becoming
more stringent
○ Ransomware cost $20B
globally in 2020
○ Attackers are becoming more
sophisticated (exploiting supply
chains, brokering access) and
are moving faster
Notable breaches
2015 Anthem Health
80 million health records
2020 Nintendo
160,000 user accounts exposed
2020 BigFooty.com
132GB sensitive data in Elastic
2020 Antheus Tecnologia
81.5 million personal records
2019 CapitalOne
100m personal records

In 2020, over 300,000 patient records (including detailed
consult notes) were leaked and used to extort users.
Vastaamo’s system violated one of the “first principles of
cybersecurity”: It didn’t anonymize the records. It didn’t even
encrypt them. The only thing protecting patients’ confessions and
confidences were a couple of firewalls and a server login screen.
 Mikael Koivukangas, OneSys Medical
Case study: Vastaamo

Techniques sorted by breach
Source: IBM Cost of a Data Breach Report 2021
Compromised credentials

Attackers use stolen credentials to gain access
to a target.
Credentials can come from:
● Public data breaches
● Version control
● BEC & phishing
● Password stores
Compromised credentials
Source: IBM Cost of a Data Breach report 2021
Source: MITRE ATT&CK
Average time to discovery:
250 days

Cloud misconfiguration
Types of misconfiguration:
● Default
● Unused features
● Untested
Can be used to:
● Expose information
● Gain access Source: IBM Cost of a Data Breach report 2021
Source: OWASP Top Ten
186 days

SQL injection
Malicious user input used in SQL queries.
Can be used to:
● Exfil data
● Tamper with data
● Escalate privileges
154 days
Source: IBM Cost of a Data Breach report 2021

Observer can:
○ view data in transit
○ manipulate data in
request/response
Person in the Middle

Denial of Service
Make the service unavailable for legitimate users
Resource exhaustion (network, CPU, memory, storage, IO
Can be used as cover for remote code execution and data exfil

What are the big API security
advances in the last decade?

What can we
learn from APIs
and apply to databases?

1. Standardised
serialisation
formats

Strongly typed communication for:
● Network transport
● Storage
Reduces attack surface, to mitigate attacks like
● SQL injection
Serialisation formats

Example: Protocol Buffers
Binary representation of data
structures:
1. Describe data structure using
built in types
2. Compile bindings for languages
3. Encode/decode data structure in
efficient binary format
Supports basic backwards
compatibility via tags.
service SearchService {
rpc Search(SearchRequest) returns (SearchResponse);
}
message SearchRequest {
required string query = 1;
optional int32 page_number = 2;
optional int32 result_per_page = 3;
}
message SearchResponse {
repeated Result results = 1;
}
message Result {
string url = 1;
string title = 2;
repeated string snippets = 3;
}

Example: BSON
Lightweight binary representation of
data structures.
Binary encoding of JSON-like data
(includes field names in encoded
data).
Handle marshal/unmarshal in each
language.
{"hello": "world"} →
x16x00x00x00 // total document size
x02 // 0x02 = type String
hellox00 // field name
x06x00x00x00worldx00 // field value
x00 // 0x00 = type EOO

Build secure clients, faster:
● Automatically generate clients for different languages
● Automatically generate documentation
● Backwards compatibility baked in
Serialisation formats for databases

Deserialization attacks:
● Injection — data injection, only support primitive data types
● Privilege escalation — gaining RCE through object deserialisation
Denial of Service attacks:
● Resource exhaustion — drop and log bad deserialisations
Serialisation formats — defend against:

Defence in depth:
● Use strongly typed languages to stop injection attacks
propagating from client to server
“New” attacks like request smuggling
Serialisation formats — but also consider:

RPC  before
Single Request/Response APIs:
● CORBA
● SOAP HTTP, XML
● XMLRPC
● REST HTTP, URI, JSON, XML
Databases:
● Unique wire protocols

Use code generation to handle:
● Routes
● Serialisation
● HTTP methods, request/response headers
● Errors
RPC  now

Example: gRPC
From Google
Uses protobufs
Requires HTTP/2
Bidirectional streaming

Example: Twirp
From Twitch
Supports binary and JSON payloads
HTTP 1.1 only
No bidirectional streaming

Example: GraphQL
“Query language for APIs”
Single API endpoint.
Clients request the data and the
structure.
New fields and types can be added
without affecting existing queries.
Query:
{
person {
name
height
}
}
Response:
{
“person”: {
“name”: “Ada Lovelace”,
“height”: 166
}
}

RPC for databases
Ensure protocol compatibility between client and server
● Force clients to upgrade to latest versions
Reduce attack surface
● To only what the endpoint explicitly exposes
● Stop enumeration

Broken authentication
● Session timeouts to limit foothold, through short lived tokens
Broken access controls
● Privilege escalation, through scoped credentials
Denial of service
● Strict encoding and deserialization
● Logging of deserialization failures
RPC  defend against:

gRPC reflection
● Enumerates gRPC services
● Exposes protobufs in human readable format (arguments, fields)
You can use this now!
● ProfaneDB defines schema in protobufs and talks gRPC
RPC  but also consider:

Auth — before
Authentication:
● Challenge–Response authentication
● Secure Remote Password protocol
● Client certificate authentication

Auth — now
Authentication:
● OAuth2  JWT
● SAML
● Self managed identity via G Suite, O365
Proliferation of third party IDP
● Auth0
● Ping
● Okta

Auth for databases
Don’t roll your own auth — use third party identity provider
Untrusted clients, trusted servers:
● Client authenticates to IDP
● IDP sets up session with database
● Database is ignorant of users — only knows if IDP gives an OK

Auth for databases
Benefits:
● Less code, lower ongoing costs
● Database is integrated with broader organisational IAM controls
You can use this now!
● MongoDB, OpenSearch, CouchDB all support JWT authentication

Auth — defend against:
Broken authentication
● Limit impact of compromised credentials and account takeovers
⬆ involved in 20% of all breaches
Broken access controls
● Privilege escalation, through strictly scoped credentials

Certs were costly!
Economise by not using TLS everywhere:
● TLS termination at your load balancers
● Unencrypted from load balancers onwards
Poor automation for managing cert lifecycle
Poor visibility into certificate supply chain
TLS  before

Certificates are basically free
Proliferation of end-to-end TLS
Better developer experience for the entire lifecycle:
○ Let’s Encrypt — automates nearly the entire cert lifecycle
○ mkcert — can use certs in local dev
Certificate Transparency logs create supply chain visibility
TLS  now

TLS for databases
Terminate TLS in the database server itself
Handle the cert lifecycle in the database server itself
Use well-automated PKI infrastructure
Strictly use Forward Secrecy ciphers (ECDHE, DHE

Sensitive data exposure:
● Observer can view data in transit (PITM
Injection attacks:
● Attacker can inject data into request/response (PITM
Replay attacks (with TLS 1.2
● Attacker can perform operations repeatedly
Impersonation:
● Monitor cert transparency logs for compromised CAs
TLS  defend against:

$ subfinder -silent -d cipherstash.com
discuss.cipherstash.com
landing.cipherstash.com
docs.cipherstash.com
dev.cipherstash.com
Easier passive asset discovery:
● Cert transparency logs fasttrack some asset discovery
TLS  but also consider:

“never trust, always verify”
Build all your systems like they are connected to the public internet
All input is untrusted — sanitise everything
Expose database to the network?

Thank you!
🙋 What questions do you have?
💖 the talk? Let @auxesis know.

Appendix: Data Serialization Formats
● Protocol Buffers [developers.google.com]
● BSON [bsonspec.org]
● Apache Avro [arvo.apache.org]

Appendix: JWT-based database authentication
● Custom JWT Authentication [docs.mongodb.com]
● Use JSON Web Tokens (JWTs) to Authenticate in Open Distro for
Elasticsearch and Kibana [aws.amazon.com]
● Authentication — Apache CouchDB [docs.couchdb.org]

Appendix: Attack Techniques
● HTTP Request Smuggling [portswigger.net]
● Credential Access techniques [attack.mitre.org]

Other security advances
● Web Application Firewalls
● Infracode static analysis
○ Semgrep
● Reproducible builds
○ Bazel

apidays LIVE Australia 2021 - Levelling up database security by thinking in APIs by Lindsay Holmwood, Cipherstash

More Related Content

What's hot

Similar to apidays LIVE Australia 2021 - Levelling up database security by thinking in APIs by Lindsay Holmwood, Cipherstash

More from apidays

Recently uploaded

apidays LIVE Australia 2021 - Levelling up database security by thinking in APIs by Lindsay Holmwood, Cipherstash