Levelling up
database security
by thinking in APIs
Lindsay Holmwood
@auxesis
Chief Product Officer @ CipherStash
The problem
Techniques for building secure APIs have
improved tremendously over the last decade.
Database security is mostly unchanged.
Average breach costs
$4.24m USD
10% increase in
average total cost of breach
between 20202021
New York
JULY
Australia
SEPTEMBER
Singapore
APRIL
Helsinki & North
MARCH
Paris
DECEMBER
London
OCTOBER
Jakarta
FEBRUARY
Hong Kong
AUGUST
JUNE
India
MAY
Check out our API Conferences here
50+ events since 2012, 14 countries, 2,000+ speakers, 50,000+ attendees,
300k+ online community
Want to talk at one of our conferences?
Apply to speak here
The landscape is changing
○ Compliance requirements (e.g.,
GDPR, CCPA are becoming
more stringent
○ Ransomware cost $20B
globally in 2020
○ Attackers are becoming more
sophisticated (exploiting supply
chains, brokering access) and
are moving faster
Notable breaches
2015 Anthem Health
80 million health records
2020 Nintendo
160,000 user accounts exposed
2020 BigFooty.com
132GB sensitive data in Elastic
2020 Antheus Tecnologia
81.5 million personal records
2019 CapitalOne
100m personal records
In 2020, over 300,000 patient records (including detailed
consult notes) were leaked and used to extort users.
Vastaamo’s system violated one of the “first principles of
cybersecurity”: It didn’t anonymize the records. It didn’t even
encrypt them. The only thing protecting patients’ confessions and
confidences were a couple of firewalls and a server login screen.
 Mikael Koivukangas, OneSys Medical
Case study: Vastaamo
The techniques
Techniques sorted by breach
Source: IBM Cost of a Data Breach Report 2021
Compromised credentials
Attackers use stolen credentials to gain access
to a target.
Credentials can come from:
● Public data breaches
● Version control
● BEC & phishing
● Password stores
Compromised credentials
Source: IBM Cost of a Data Breach report 2021
Source: MITRE ATT&CK
Average time to discovery:
250 days
Cloud misconfiguration
Types of misconfiguration:
● Default
● Unused features
● Untested
Can be used to:
● Expose information
● Gain access Source: IBM Cost of a Data Breach report 2021
Source: OWASP Top Ten
Average time to discovery:
186 days
SQL injection
Malicious user input used in SQL queries.
Can be used to:
● Exfil data
● Tamper with data
● Escalate privileges
Average time to discovery:
154 days
Source: IBM Cost of a Data Breach report 2021
Source: OWASP Top Ten
Observer can:
○ view data in transit
○ manipulate data in
request/response
Person in the Middle
Source: OWASP Top Ten
Denial of Service
Make the service unavailable for legitimate users
Resource exhaustion (network, CPU, memory, storage, IO
Can be used as cover for remote code execution and data exfil
Source: OWASP Top Ten
What are the big API security
advances in the last decade?
What can we
learn from APIs
and apply to databases?
1. Standardised
serialisation
formats
Strongly typed communication for:
● Network transport
● Storage
Reduces attack surface, to mitigate attacks like
● SQL injection
Serialisation formats
Example: Protocol Buffers
Binary representation of data
structures:
1. Describe data structure using
built in types
2. Compile bindings for languages
3. Encode/decode data structure in
efficient binary format
Supports basic backwards
compatibility via tags.
service SearchService {
rpc Search(SearchRequest) returns (SearchResponse);
}
message SearchRequest {
required string query = 1;
optional int32 page_number = 2;
optional int32 result_per_page = 3;
}
message SearchResponse {
repeated Result results = 1;
}
message Result {
string url = 1;
string title = 2;
repeated string snippets = 3;
}
Example: BSON
Lightweight binary representation of
data structures.
Binary encoding of JSON-like data
(includes field names in encoded
data).
Handle marshal/unmarshal in each
language.
{"hello": "world"} →
x16x00x00x00 // total document size
x02 // 0x02 = type String
hellox00 // field name
x06x00x00x00worldx00 // field value
x00 // 0x00 = type EOO
For databases?
Build secure clients, faster:
● Automatically generate clients for different languages
● Automatically generate documentation
● Backwards compatibility baked in
Serialisation formats for databases
Deserialization attacks:
● Injection — data injection, only support primitive data types
● Privilege escalation — gaining RCE through object deserialisation
Denial of Service attacks:
● Resource exhaustion — drop and log bad deserialisations
Serialisation formats — defend against:
Defence in depth:
● Use strongly typed languages to stop injection attacks
propagating from client to server
“New” attacks like request smuggling
Serialisation formats — but also consider:
2. RPC
RPC  before
Single Request/Response APIs:
● CORBA
● SOAP HTTP, XML
● XMLRPC
● REST HTTP, URI, JSON, XML
Databases:
● Unique wire protocols
Use code generation to handle:
● Routes
● Serialisation
● HTTP methods, request/response headers
● Errors
RPC  now
Example: gRPC
From Google
Uses protobufs
Requires HTTP/2
Bidirectional streaming
Example: Twirp
From Twitch
Supports binary and JSON payloads
HTTP 1.1 only
No bidirectional streaming
Example: GraphQL
“Query language for APIs”
Single API endpoint.
Clients request the data and the
structure.
New fields and types can be added
without affecting existing queries.
Query:
{
person {
name
height
}
}
Response:
{
“person”: {
“name”: “Ada Lovelace”,
“height”: 166
}
}
For databases?
RPC for databases
Ensure protocol compatibility between client and server
● Force clients to upgrade to latest versions
Reduce attack surface
● To only what the endpoint explicitly exposes
● Stop enumeration
Broken authentication
● Session timeouts to limit foothold, through short lived tokens
Broken access controls
● Privilege escalation, through scoped credentials
Denial of service
● Strict encoding and deserialization
● Logging of deserialization failures
RPC  defend against:
gRPC reflection
● Enumerates gRPC services
● Exposes protobufs in human readable format (arguments, fields)
You can use this now!
● ProfaneDB defines schema in protobufs and talks gRPC
RPC  but also consider:
3. Auth
Auth — before
Authentication:
● Challenge–Response authentication
● Secure Remote Password protocol
● Client certificate authentication
Auth — now
Authentication:
● OAuth2  JWT
● SAML
● Self managed identity via G Suite, O365
Proliferation of third party IDP
● Auth0
● Ping
● Okta
For databases?
Auth for databases
Don’t roll your own auth — use third party identity provider
Untrusted clients, trusted servers:
● Client authenticates to IDP
● IDP sets up session with database
● Database is ignorant of users — only knows if IDP gives an OK
Auth for databases
Benefits:
● Less code, lower ongoing costs
● Database is integrated with broader organisational IAM controls
You can use this now!
● MongoDB, OpenSearch, CouchDB all support JWT authentication
Auth — defend against:
Broken authentication
● Limit impact of compromised credentials and account takeovers
⬆ involved in 20% of all breaches
Broken access controls
● Privilege escalation, through strictly scoped credentials
4. TLS everywhere
Certs were costly!
Economise by not using TLS everywhere:
● TLS termination at your load balancers
● Unencrypted from load balancers onwards
Poor automation for managing cert lifecycle
Poor visibility into certificate supply chain
TLS  before
Certificates are basically free
Proliferation of end-to-end TLS
Better developer experience for the entire lifecycle:
○ Let’s Encrypt — automates nearly the entire cert lifecycle
○ mkcert — can use certs in local dev
Certificate Transparency logs create supply chain visibility
TLS  now
For databases?
TLS for databases
Terminate TLS in the database server itself
Handle the cert lifecycle in the database server itself
Use well-automated PKI infrastructure
Strictly use Forward Secrecy ciphers (ECDHE, DHE
Sensitive data exposure:
● Observer can view data in transit (PITM
Injection attacks:
● Attacker can inject data into request/response (PITM
Replay attacks (with TLS 1.2
● Attacker can perform operations repeatedly
Impersonation:
● Monitor cert transparency logs for compromised CAs
TLS  defend against:
$ subfinder -silent -d cipherstash.com
discuss.cipherstash.com
landing.cipherstash.com
docs.cipherstash.com
dev.cipherstash.com
Easier passive asset discovery:
● Cert transparency logs fasttrack some asset discovery
TLS  but also consider:
Zero trust
“never trust, always verify”
Build all your systems like they are connected to the public internet
All input is untrusted — sanitise everything
Expose database to the network?
Thank you!
🙋 What questions do you have?
💖 the talk? Let @auxesis know.
Appendix: Data Serialization Formats
● Protocol Buffers [developers.google.com]
● BSON [bsonspec.org]
● Apache Avro [arvo.apache.org]
Appendix: JWT-based database authentication
● Custom JWT Authentication [docs.mongodb.com]
● Use JSON Web Tokens (JWTs) to Authenticate in Open Distro for
Elasticsearch and Kibana [aws.amazon.com]
● Authentication — Apache CouchDB [docs.couchdb.org]
Appendix: Attack Techniques
● HTTP Request Smuggling [portswigger.net]
● Credential Access techniques [attack.mitre.org]
Other security advances
● Web Application Firewalls
● Infracode static analysis
○ Semgrep
● Reproducible builds
○ Bazel
New York
JULY
Australia
SEPTEMBER
Singapore
APRIL
Helsinki & North
MARCH
Paris
DECEMBER
London
OCTOBER
Jakarta
FEBRUARY
Hong Kong
AUGUST
JUNE
India
MAY
Check out our API Conferences here
50+ events since 2012, 14 countries, 2,000+ speakers, 50,000+ attendees,
300k+ online community
Want to talk at one of our conferences?
Apply to speak here

apidays LIVE Australia 2021 - Levelling up database security by thinking in APIs by Lindsay Holmwood, Cipherstash

  • 1.
    Levelling up database security bythinking in APIs Lindsay Holmwood @auxesis Chief Product Officer @ CipherStash
  • 2.
  • 3.
    Techniques for buildingsecure APIs have improved tremendously over the last decade. Database security is mostly unchanged.
  • 4.
    Average breach costs $4.24mUSD 10% increase in average total cost of breach between 20202021
  • 5.
    New York JULY Australia SEPTEMBER Singapore APRIL Helsinki &North MARCH Paris DECEMBER London OCTOBER Jakarta FEBRUARY Hong Kong AUGUST JUNE India MAY Check out our API Conferences here 50+ events since 2012, 14 countries, 2,000+ speakers, 50,000+ attendees, 300k+ online community Want to talk at one of our conferences? Apply to speak here
  • 6.
    The landscape ischanging ○ Compliance requirements (e.g., GDPR, CCPA are becoming more stringent ○ Ransomware cost $20B globally in 2020 ○ Attackers are becoming more sophisticated (exploiting supply chains, brokering access) and are moving faster Notable breaches 2015 Anthem Health 80 million health records 2020 Nintendo 160,000 user accounts exposed 2020 BigFooty.com 132GB sensitive data in Elastic 2020 Antheus Tecnologia 81.5 million personal records 2019 CapitalOne 100m personal records
  • 7.
    In 2020, over300,000 patient records (including detailed consult notes) were leaked and used to extort users. Vastaamo’s system violated one of the “first principles of cybersecurity”: It didn’t anonymize the records. It didn’t even encrypt them. The only thing protecting patients’ confessions and confidences were a couple of firewalls and a server login screen.  Mikael Koivukangas, OneSys Medical Case study: Vastaamo
  • 8.
  • 9.
    Techniques sorted bybreach Source: IBM Cost of a Data Breach Report 2021 Compromised credentials
  • 10.
    Attackers use stolencredentials to gain access to a target. Credentials can come from: ● Public data breaches ● Version control ● BEC & phishing ● Password stores Compromised credentials Source: IBM Cost of a Data Breach report 2021 Source: MITRE ATT&CK Average time to discovery: 250 days
  • 11.
    Cloud misconfiguration Types ofmisconfiguration: ● Default ● Unused features ● Untested Can be used to: ● Expose information ● Gain access Source: IBM Cost of a Data Breach report 2021 Source: OWASP Top Ten Average time to discovery: 186 days
  • 12.
    SQL injection Malicious userinput used in SQL queries. Can be used to: ● Exfil data ● Tamper with data ● Escalate privileges Average time to discovery: 154 days Source: IBM Cost of a Data Breach report 2021 Source: OWASP Top Ten
  • 13.
    Observer can: ○ viewdata in transit ○ manipulate data in request/response Person in the Middle Source: OWASP Top Ten
  • 14.
    Denial of Service Makethe service unavailable for legitimate users Resource exhaustion (network, CPU, memory, storage, IO Can be used as cover for remote code execution and data exfil Source: OWASP Top Ten
  • 15.
    What are thebig API security advances in the last decade?
  • 16.
    What can we learnfrom APIs and apply to databases?
  • 17.
  • 18.
    Strongly typed communicationfor: ● Network transport ● Storage Reduces attack surface, to mitigate attacks like ● SQL injection Serialisation formats
  • 19.
    Example: Protocol Buffers Binaryrepresentation of data structures: 1. Describe data structure using built in types 2. Compile bindings for languages 3. Encode/decode data structure in efficient binary format Supports basic backwards compatibility via tags. service SearchService { rpc Search(SearchRequest) returns (SearchResponse); } message SearchRequest { required string query = 1; optional int32 page_number = 2; optional int32 result_per_page = 3; } message SearchResponse { repeated Result results = 1; } message Result { string url = 1; string title = 2; repeated string snippets = 3; }
  • 20.
    Example: BSON Lightweight binaryrepresentation of data structures. Binary encoding of JSON-like data (includes field names in encoded data). Handle marshal/unmarshal in each language. {"hello": "world"} → x16x00x00x00 // total document size x02 // 0x02 = type String hellox00 // field name x06x00x00x00worldx00 // field value x00 // 0x00 = type EOO
  • 21.
  • 22.
    Build secure clients,faster: ● Automatically generate clients for different languages ● Automatically generate documentation ● Backwards compatibility baked in Serialisation formats for databases
  • 23.
    Deserialization attacks: ● Injection— data injection, only support primitive data types ● Privilege escalation — gaining RCE through object deserialisation Denial of Service attacks: ● Resource exhaustion — drop and log bad deserialisations Serialisation formats — defend against:
  • 24.
    Defence in depth: ●Use strongly typed languages to stop injection attacks propagating from client to server “New” attacks like request smuggling Serialisation formats — but also consider:
  • 25.
  • 26.
    RPC  before SingleRequest/Response APIs: ● CORBA ● SOAP HTTP, XML ● XMLRPC ● REST HTTP, URI, JSON, XML Databases: ● Unique wire protocols
  • 27.
    Use code generationto handle: ● Routes ● Serialisation ● HTTP methods, request/response headers ● Errors RPC  now
  • 28.
    Example: gRPC From Google Usesprotobufs Requires HTTP/2 Bidirectional streaming
  • 29.
    Example: Twirp From Twitch Supportsbinary and JSON payloads HTTP 1.1 only No bidirectional streaming
  • 30.
    Example: GraphQL “Query languagefor APIs” Single API endpoint. Clients request the data and the structure. New fields and types can be added without affecting existing queries. Query: { person { name height } } Response: { “person”: { “name”: “Ada Lovelace”, “height”: 166 } }
  • 31.
  • 32.
    RPC for databases Ensureprotocol compatibility between client and server ● Force clients to upgrade to latest versions Reduce attack surface ● To only what the endpoint explicitly exposes ● Stop enumeration
  • 33.
    Broken authentication ● Sessiontimeouts to limit foothold, through short lived tokens Broken access controls ● Privilege escalation, through scoped credentials Denial of service ● Strict encoding and deserialization ● Logging of deserialization failures RPC  defend against:
  • 34.
    gRPC reflection ● EnumeratesgRPC services ● Exposes protobufs in human readable format (arguments, fields) You can use this now! ● ProfaneDB defines schema in protobufs and talks gRPC RPC  but also consider:
  • 35.
  • 36.
    Auth — before Authentication: ●Challenge–Response authentication ● Secure Remote Password protocol ● Client certificate authentication
  • 37.
    Auth — now Authentication: ●OAuth2  JWT ● SAML ● Self managed identity via G Suite, O365 Proliferation of third party IDP ● Auth0 ● Ping ● Okta
  • 38.
  • 39.
    Auth for databases Don’troll your own auth — use third party identity provider Untrusted clients, trusted servers: ● Client authenticates to IDP ● IDP sets up session with database ● Database is ignorant of users — only knows if IDP gives an OK
  • 40.
    Auth for databases Benefits: ●Less code, lower ongoing costs ● Database is integrated with broader organisational IAM controls You can use this now! ● MongoDB, OpenSearch, CouchDB all support JWT authentication
  • 41.
    Auth — defendagainst: Broken authentication ● Limit impact of compromised credentials and account takeovers ⬆ involved in 20% of all breaches Broken access controls ● Privilege escalation, through strictly scoped credentials
  • 42.
  • 43.
    Certs were costly! Economiseby not using TLS everywhere: ● TLS termination at your load balancers ● Unencrypted from load balancers onwards Poor automation for managing cert lifecycle Poor visibility into certificate supply chain TLS  before
  • 44.
    Certificates are basicallyfree Proliferation of end-to-end TLS Better developer experience for the entire lifecycle: ○ Let’s Encrypt — automates nearly the entire cert lifecycle ○ mkcert — can use certs in local dev Certificate Transparency logs create supply chain visibility TLS  now
  • 45.
  • 46.
    TLS for databases TerminateTLS in the database server itself Handle the cert lifecycle in the database server itself Use well-automated PKI infrastructure Strictly use Forward Secrecy ciphers (ECDHE, DHE
  • 47.
    Sensitive data exposure: ●Observer can view data in transit (PITM Injection attacks: ● Attacker can inject data into request/response (PITM Replay attacks (with TLS 1.2 ● Attacker can perform operations repeatedly Impersonation: ● Monitor cert transparency logs for compromised CAs TLS  defend against:
  • 48.
    $ subfinder -silent-d cipherstash.com discuss.cipherstash.com landing.cipherstash.com docs.cipherstash.com dev.cipherstash.com Easier passive asset discovery: ● Cert transparency logs fasttrack some asset discovery TLS  but also consider:
  • 49.
  • 50.
    “never trust, alwaysverify” Build all your systems like they are connected to the public internet All input is untrusted — sanitise everything Expose database to the network?
  • 51.
    Thank you! 🙋 Whatquestions do you have? 💖 the talk? Let @auxesis know.
  • 52.
    Appendix: Data SerializationFormats ● Protocol Buffers [developers.google.com] ● BSON [bsonspec.org] ● Apache Avro [arvo.apache.org]
  • 53.
    Appendix: JWT-based databaseauthentication ● Custom JWT Authentication [docs.mongodb.com] ● Use JSON Web Tokens (JWTs) to Authenticate in Open Distro for Elasticsearch and Kibana [aws.amazon.com] ● Authentication — Apache CouchDB [docs.couchdb.org]
  • 54.
    Appendix: Attack Techniques ●HTTP Request Smuggling [portswigger.net] ● Credential Access techniques [attack.mitre.org]
  • 55.
    Other security advances ●Web Application Firewalls ● Infracode static analysis ○ Semgrep ● Reproducible builds ○ Bazel
  • 56.
    New York JULY Australia SEPTEMBER Singapore APRIL Helsinki &North MARCH Paris DECEMBER London OCTOBER Jakarta FEBRUARY Hong Kong AUGUST JUNE India MAY Check out our API Conferences here 50+ events since 2012, 14 countries, 2,000+ speakers, 50,000+ attendees, 300k+ online community Want to talk at one of our conferences? Apply to speak here