Stephan Fudeus, Expert Continuous Delivery

Dr. Sascha Mühlbach, Expert Infrastructure Architect
KUBERNETES IN A GROWN ENVIRONMENT 

AND

INTEGRATION INTO CONTINUOUS DELIVERY
1&1 Mail & Media Development & Technology GmbH
United Internet / 1&1 Mail & Media
20.06.18!2
United Internet
▪ Is a leading European internet specialist
▪ > 9000 employees
▪ 90k servers in 10 data centers
▪ Access
• DSL and Mobile
▪ Applications
• Business (Server, Hosting)
• Consumer (WEB.DE etc.)
1&1 Mail & Media
▪ Main brands GMX, WEB.DE and MAIL.COM
▪ Various services around a free or paid mail account (calendar, news portal, cloud
storage)
▪ 33 million active users / month
1&1 Mail & Media Development & Technology GmbH
Speakers
20.06.18!3
▪ Stephan Fudeus
▪ Joined 1&1 in 2005
▪ Long-term experience in building highly scalable multi-tenant applications
▪ Product Owner and TechLead for our Kubernetes Clusters
▪ Twitter: @der_sfu
▪ Dr. Sascha Mühlbach
▪ Expert Infrastructure Architect
▪ 15 years professional experience
▪ Responsible for the global operations strategy of the applications and
systems infrastructure
1&1 Mail & Media Development & Technology GmbH
■ Motivation / Environment
■ Cluster-Design
■ Network-Setup / Ingress
■ Git-driven cluster operations
■ Multi-Tenancy
■ Build processes
■ Continuous delivery environment
■ Onboarding / Training
!4
Agenda
20.06.18
1&1 Mail & Media Development & Technology GmbH
Motivation
20.06.18!5
▪ Why Container?
▪ Strong coupling between code and application runtime environment
▪ One build responsibility
▪ Hide core infrastructure from application
▪ Reproducibility in development
▪ Follow new standards in software development
▪ Requirements
▪ Reliable platform that provides the same level of availability that our existing environment is delivering
▪ Efficient deployment for geo-redundant services in multiple data centers
▪ Self-service for the development and application teams
▪ Multi-tenancy with strong separation for security reasons
▪ Must fit into the existing network environment
▪ Mostly automated operation of the base Kubernetes platform
1&1 Mail & Media Development & Technology GmbH
Environment
!6
▪ Organizational Environment
▪ Approx. 25 Dev Teams with 10 Ops Teams
▪ Strong organizational separation between PM / Dev / Ops
▪ 24/7
▪ Central Ops Team to build and run the Kubernetes platform
▪ Technical Environment
▪ 3 datacenters (2 in DE, 1 in US) that are owned by us
▪ bare metal and virtual machines (KVM, ESX)
▪ All servers are Puppet managed
• Our infrastructure has ~15.000 Puppet clients
▪ Majority of services are written in Java
▪ Ongoing transition to CD and microservices
20.06.18
1&1 Mail & Media Development & Technology GmbH
Cluster Design
20.06.18!7
1&1 Mail & Media Development & Technology GmbH
Network Setup for Frontend Zone
!8
▪ Integration of existing F5 BigIP load balancing platform with their features
▪ Service IPs are BGP-routed to Balancer and then forwarded with SNAT to NodePorts
▪ BGP enables global redundancy
▪ No public IPs inside Kubernetes cluster
kube-proxy kube-proxy kube-proxy
POD
POD
POD
POD
POD
ServiceType: NodePort
10.8.0.1:30001 10.8.0.2:30001 10.8.0.3:30001
82.165.230.17:443
Pool Member
10.8.0.1:30001
10.8.0.2:30001
10.8.0.3:30001
F5 - K8S
KubernetesAPI
REST Calls
Worker
Node
20.06.18
1&1 Mail & Media Development & Technology GmbH
Network configuration via ConfigMap for F5
20.06.18!9
1&1 Mail & Media Development & Technology GmbH
▪ In backend networks, we use MetalLB (no specific Layer 7 requirements)
▪ Service IPs are BGP-announced with ECMP distribution (easy scaling)
▪ LoadBalancing only with K8S base algorithms or ingress controller features
Network Setup for Backend Zone
20.06.18!10
kube-proxy kube-proxy kube-proxy
POD
POD
POD
POD
POD
Worker Node
MetalLB
10.176.0.6:80
MetalLB
1&1 Mail & Media Development & Technology GmbH
Network configuration via Service for MetalLB
20.06.18!11
1&1 Mail & Media Development & Technology GmbH
Git-driven cluster operations
20.06.18!12
▪ Maturity level via 3 branches (master, integration, production)
▪ All cluster operations are triggered based on Gitlab-CI pipelines
▪ automatically on git-pushes to relevant branch
▪ manually triggered jobs for cluster changes
▪ scheduled jobs for periodic changes (namespace updates / purges)
1&1 Mail & Media Development & Technology GmbH
Git-driven operations use cases
20.06.18!13
▪ Full redeployment of clusters
▪ Only if cluster is broken, will wipe everything
▪ Will redeploy all nodes in parallel
▪ Rolling upgrade of clusters
▪ Usually done on a weekly basis
▪ Will wipe and reset nodes one by one
▪ Namespaces update
▪ Nightly updates for production
▪ on-push for integration
▪ Addon update
▪ Addons as helm charts, rendered via helm template and injected via kubectl apply
▪ Done ad-hoc for addon changes without redeployment
1&1 Mail & Media Development & Technology GmbH20.06.18!14
1&1 Mail & Media Development & Technology GmbH
Multi Tenancy
20.06.18!15
▪ Common platform for several teams
▪ PodSecurityPolicies (no-root, no host-net, r/o layers)
▪ Dedicated resources for teams
• Dedicated in-cluster prometheus for scraping
• Configurable log-sink (Elasticsearch, Kafka)
▪ Authentication via OIDC <-> Dex <-> LDAP
▪ Maximum separation between teams targeted
▪ Namespaces are a „managed“ resource
▪ Resource constraints defined centrally per namespace
▪ Users are restricted to their namespaces via RBAC
▪ Network policies
▪ Team-centric „helper“ namespace
• e.g. $team-helper
• Used for managed resources, e.g. team-prometheus
▪ Individual namespaces per (group of) application and stage
• $team-$app-live, $team-$app-prelive
1&1 Mail & Media Development & Technology GmbH
Multi Tenancy
20.06.18!16
▪ Dedicated namespaces for individuals
▪ Purpose: Training, PoC, Experiments
▪ Daily process to read users from LDAP and generate and flush namespaces
▪ Service exposure via central ingress controller (traefik)
1&1 Mail & Media Development & Technology GmbH
Namespace-Config via yaml
20.06.18!17
Rendered via helm
36 resulting manifests:
1 kind: Deployment
1 kind: Ingress
1 kind: Service
2 kind: ConfigMap
2 kind: ServiceAccount
4 kind: LimitRange
4 kind: Namespace
4 kind: ResourceQuota
8 kind: NetworkPolicy
9 kind: RoleBinding
1&1 Mail & Media Development & Technology GmbH
Build Processes
20.06.18!18
▪ Fully automated builds
▪ High degree of standardization
▪ e.g. central maven POM
▪ Parallel builds for classical and container deployments
▪ Containers use a centrally provided base image
▪ Build processes are triggered upon base image changes
▪ Policy: updates / rebuilds are enforced every 4 weeks
1&1 Mail & Media Development & Technology GmbH
Continuous Delivery Environment
20.06.18!19
▪ GoCD maps business processes
▪ Dedicated instance per team
▪ Standardized pipeline templates
▪ Technical processes are mapped separately
▪ Ansible for host based deployments
▪ Helm/Kubectl for k8s deloyments
▪ Supports hybrid deployments
▪ Container and Hosts in parallel
▪ Hybrid usage via loadbalancer
▪ Assists during transition phase
1&1 Mail & Media Development & Technology GmbH
Fully automated deployment chain
20.06.18!20
1&1 Mail & Media Development & Technology GmbH
Onboarding & Training
20.06.18!21
▪ 4 training blocks for system administrators (1-2 days each)
▪ Docker & Kubernetes
▪ GoCD & Helm
• Pipeline Design
• Helm Templating
▪ Development Techniques for Ops
• Repositories and versioning
• Secure Software Development Lifecycle
▪ Operating Container Applications
• Monitoring, Logging and Failure Handling
• Operations Lifecycle
1&1 Mail & Media Development & Technology GmbH
Links
20.06.18!22
▪ F5-Ctrl (https://github.com/F5Networks/k8s-bigip-ctlr)
▪ MetalLB (https://metallb.universe.tf/)
▪ Dex (https://github.com/coreos/dex)
▪ GoCD (https://www.gocd.org)
▪ https://jobs.1und1.de/
▪ https://web.de
▪ https://www.gmx.net
▪ https://www.mail.com
▪ https://www.united-internet.de/

Kubernetes in a grown environment and integration into continuous delivery

  • 1.
    Stephan Fudeus, ExpertContinuous Delivery
 Dr. Sascha Mühlbach, Expert Infrastructure Architect KUBERNETES IN A GROWN ENVIRONMENT 
 AND
 INTEGRATION INTO CONTINUOUS DELIVERY
  • 2.
    1&1 Mail &Media Development & Technology GmbH United Internet / 1&1 Mail & Media 20.06.18!2 United Internet ▪ Is a leading European internet specialist ▪ > 9000 employees ▪ 90k servers in 10 data centers ▪ Access • DSL and Mobile ▪ Applications • Business (Server, Hosting) • Consumer (WEB.DE etc.) 1&1 Mail & Media ▪ Main brands GMX, WEB.DE and MAIL.COM ▪ Various services around a free or paid mail account (calendar, news portal, cloud storage) ▪ 33 million active users / month
  • 3.
    1&1 Mail &Media Development & Technology GmbH Speakers 20.06.18!3 ▪ Stephan Fudeus ▪ Joined 1&1 in 2005 ▪ Long-term experience in building highly scalable multi-tenant applications ▪ Product Owner and TechLead for our Kubernetes Clusters ▪ Twitter: @der_sfu ▪ Dr. Sascha Mühlbach ▪ Expert Infrastructure Architect ▪ 15 years professional experience ▪ Responsible for the global operations strategy of the applications and systems infrastructure
  • 4.
    1&1 Mail &Media Development & Technology GmbH ■ Motivation / Environment ■ Cluster-Design ■ Network-Setup / Ingress ■ Git-driven cluster operations ■ Multi-Tenancy ■ Build processes ■ Continuous delivery environment ■ Onboarding / Training !4 Agenda 20.06.18
  • 5.
    1&1 Mail &Media Development & Technology GmbH Motivation 20.06.18!5 ▪ Why Container? ▪ Strong coupling between code and application runtime environment ▪ One build responsibility ▪ Hide core infrastructure from application ▪ Reproducibility in development ▪ Follow new standards in software development ▪ Requirements ▪ Reliable platform that provides the same level of availability that our existing environment is delivering ▪ Efficient deployment for geo-redundant services in multiple data centers ▪ Self-service for the development and application teams ▪ Multi-tenancy with strong separation for security reasons ▪ Must fit into the existing network environment ▪ Mostly automated operation of the base Kubernetes platform
  • 6.
    1&1 Mail &Media Development & Technology GmbH Environment !6 ▪ Organizational Environment ▪ Approx. 25 Dev Teams with 10 Ops Teams ▪ Strong organizational separation between PM / Dev / Ops ▪ 24/7 ▪ Central Ops Team to build and run the Kubernetes platform ▪ Technical Environment ▪ 3 datacenters (2 in DE, 1 in US) that are owned by us ▪ bare metal and virtual machines (KVM, ESX) ▪ All servers are Puppet managed • Our infrastructure has ~15.000 Puppet clients ▪ Majority of services are written in Java ▪ Ongoing transition to CD and microservices 20.06.18
  • 7.
    1&1 Mail &Media Development & Technology GmbH Cluster Design 20.06.18!7
  • 8.
    1&1 Mail &Media Development & Technology GmbH Network Setup for Frontend Zone !8 ▪ Integration of existing F5 BigIP load balancing platform with their features ▪ Service IPs are BGP-routed to Balancer and then forwarded with SNAT to NodePorts ▪ BGP enables global redundancy ▪ No public IPs inside Kubernetes cluster kube-proxy kube-proxy kube-proxy POD POD POD POD POD ServiceType: NodePort 10.8.0.1:30001 10.8.0.2:30001 10.8.0.3:30001 82.165.230.17:443 Pool Member 10.8.0.1:30001 10.8.0.2:30001 10.8.0.3:30001 F5 - K8S KubernetesAPI REST Calls Worker Node 20.06.18
  • 9.
    1&1 Mail &Media Development & Technology GmbH Network configuration via ConfigMap for F5 20.06.18!9
  • 10.
    1&1 Mail &Media Development & Technology GmbH ▪ In backend networks, we use MetalLB (no specific Layer 7 requirements) ▪ Service IPs are BGP-announced with ECMP distribution (easy scaling) ▪ LoadBalancing only with K8S base algorithms or ingress controller features Network Setup for Backend Zone 20.06.18!10 kube-proxy kube-proxy kube-proxy POD POD POD POD POD Worker Node MetalLB 10.176.0.6:80 MetalLB
  • 11.
    1&1 Mail &Media Development & Technology GmbH Network configuration via Service for MetalLB 20.06.18!11
  • 12.
    1&1 Mail &Media Development & Technology GmbH Git-driven cluster operations 20.06.18!12 ▪ Maturity level via 3 branches (master, integration, production) ▪ All cluster operations are triggered based on Gitlab-CI pipelines ▪ automatically on git-pushes to relevant branch ▪ manually triggered jobs for cluster changes ▪ scheduled jobs for periodic changes (namespace updates / purges)
  • 13.
    1&1 Mail &Media Development & Technology GmbH Git-driven operations use cases 20.06.18!13 ▪ Full redeployment of clusters ▪ Only if cluster is broken, will wipe everything ▪ Will redeploy all nodes in parallel ▪ Rolling upgrade of clusters ▪ Usually done on a weekly basis ▪ Will wipe and reset nodes one by one ▪ Namespaces update ▪ Nightly updates for production ▪ on-push for integration ▪ Addon update ▪ Addons as helm charts, rendered via helm template and injected via kubectl apply ▪ Done ad-hoc for addon changes without redeployment
  • 14.
    1&1 Mail &Media Development & Technology GmbH20.06.18!14
  • 15.
    1&1 Mail &Media Development & Technology GmbH Multi Tenancy 20.06.18!15 ▪ Common platform for several teams ▪ PodSecurityPolicies (no-root, no host-net, r/o layers) ▪ Dedicated resources for teams • Dedicated in-cluster prometheus for scraping • Configurable log-sink (Elasticsearch, Kafka) ▪ Authentication via OIDC <-> Dex <-> LDAP ▪ Maximum separation between teams targeted ▪ Namespaces are a „managed“ resource ▪ Resource constraints defined centrally per namespace ▪ Users are restricted to their namespaces via RBAC ▪ Network policies ▪ Team-centric „helper“ namespace • e.g. $team-helper • Used for managed resources, e.g. team-prometheus ▪ Individual namespaces per (group of) application and stage • $team-$app-live, $team-$app-prelive
  • 16.
    1&1 Mail &Media Development & Technology GmbH Multi Tenancy 20.06.18!16 ▪ Dedicated namespaces for individuals ▪ Purpose: Training, PoC, Experiments ▪ Daily process to read users from LDAP and generate and flush namespaces ▪ Service exposure via central ingress controller (traefik)
  • 17.
    1&1 Mail &Media Development & Technology GmbH Namespace-Config via yaml 20.06.18!17 Rendered via helm 36 resulting manifests: 1 kind: Deployment 1 kind: Ingress 1 kind: Service 2 kind: ConfigMap 2 kind: ServiceAccount 4 kind: LimitRange 4 kind: Namespace 4 kind: ResourceQuota 8 kind: NetworkPolicy 9 kind: RoleBinding
  • 18.
    1&1 Mail &Media Development & Technology GmbH Build Processes 20.06.18!18 ▪ Fully automated builds ▪ High degree of standardization ▪ e.g. central maven POM ▪ Parallel builds for classical and container deployments ▪ Containers use a centrally provided base image ▪ Build processes are triggered upon base image changes ▪ Policy: updates / rebuilds are enforced every 4 weeks
  • 19.
    1&1 Mail &Media Development & Technology GmbH Continuous Delivery Environment 20.06.18!19 ▪ GoCD maps business processes ▪ Dedicated instance per team ▪ Standardized pipeline templates ▪ Technical processes are mapped separately ▪ Ansible for host based deployments ▪ Helm/Kubectl for k8s deloyments ▪ Supports hybrid deployments ▪ Container and Hosts in parallel ▪ Hybrid usage via loadbalancer ▪ Assists during transition phase
  • 20.
    1&1 Mail &Media Development & Technology GmbH Fully automated deployment chain 20.06.18!20
  • 21.
    1&1 Mail &Media Development & Technology GmbH Onboarding & Training 20.06.18!21 ▪ 4 training blocks for system administrators (1-2 days each) ▪ Docker & Kubernetes ▪ GoCD & Helm • Pipeline Design • Helm Templating ▪ Development Techniques for Ops • Repositories and versioning • Secure Software Development Lifecycle ▪ Operating Container Applications • Monitoring, Logging and Failure Handling • Operations Lifecycle
  • 22.
    1&1 Mail &Media Development & Technology GmbH Links 20.06.18!22 ▪ F5-Ctrl (https://github.com/F5Networks/k8s-bigip-ctlr) ▪ MetalLB (https://metallb.universe.tf/) ▪ Dex (https://github.com/coreos/dex) ▪ GoCD (https://www.gocd.org) ▪ https://jobs.1und1.de/ ▪ https://web.de ▪ https://www.gmx.net ▪ https://www.mail.com ▪ https://www.united-internet.de/