Manager, Site Reliability Engineering

Posted 9 Days Ago
Be an Early Applicant
Hiring Remotely in Czechia
Remote
Senior level
Cloud • Security • Software • Cybersecurity
The Role
The SRE Manager will lead a team to enhance system reliability, automate processes, and implement SRE principles, ensuring effective incident management and operational coverage.
Summary Generated by Built In

Veeam, the #1 global market leader in data resilience, believes businesses should control all their data whenever and wherever they need it. Veeam provides data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep their businesses running. Join us as we move forward together, growing, learning, and making a real impact for some of the world’s biggest brands. The future of data resilience is here - go fearlessly forward with us.

About the Role

Veeam is expanding its global Site Reliability Engineering (SRE) organization to support the Veeam Data Cloud. As an SRE Manager, you will report to our Global Director of SRE and will build and lead a high-performing team that partners with product, platform, and security engineering to make our systems reliable, scalable, and observable from the ground up. You’ll collaborate with peer engineering leaders to embed reliability into service roadmaps, and you’ll represent your team in global SRE planning and delivery of cross-cutting reliability initiatives across all VDC services.

You’ll drive adoption of SRE principles (SLIs/SLOs/error budgets, toil reduction, blameless learning) and operate a healthy, daytime follow-the-sun on call model in partnership with our other regions. You will lead your team to make code contributions leading to improvements in the overall operability, reliability, resilience, and security of the codebase(s) we support.

What You’ll Do

People & Team Leadership

  • Hire, onboard, and grow your SRE team; coach career development and performance
  • Foster a psychologically safe, blameless culture that favors learning over blame and emphasizes engineering over firefighting
  • Ensure a sustainable operational coverage; monitor on-call health and workload
  • Track and cap toil so engineers spend the majority of time on project work that reduces future toil

Reliability Strategy & Governance

  • Establish and operationalize SLIs/SLOs and error budgets with service owners; run reliability reviews and hold teams accountable to outcomes
  • Define reliability standards, runbooks, readiness checklists, and alerting patterns (including SLO-based alerting)
  • Partner with product/EMs to align reliability work with service goals and customer experience, not as a gate but as an enabler

Operations & Incident Excellence

  • Ensure incident response readiness; lead/coordinate major incidents; drive fast, high-quality postmortems and systemic fixes
  • Measure MTTR, change failure rate, SLO posture, and repeat-incident reduction; publish learning broadly

Engineering & Automation

  • Lead software-first reliability investments: observability, deployment safety (canary/blue-green), resilience testing/chaos, and self-service guardrails
  • Drive platform improvements (IaC, CI/CD, Kubernetes) and internal tools that scale operations and improve developer experience
What You’ll Bring
  • 7+ years in Software, Platform, and/or Reliability Engineering with 2+ years managing engineers
  • Demonstrable experience leading engineering teams to predictably deliver outcomes
  • Experience leading cross-functional initiatives collaboratively with peers through influence
  • Experience with public cloud (Azure preferred), Kubernetes, IaC (Terraform, Pulumi), CI/CD (Github Actions, ArgoCD, Azure DevOps), and observability (OpenTelemetry, Elastic, Datadog, Prometheus, Grafana)
  • Coding background with experience improving service reliability
  • Hands-on incident management and postmortem practice; excellent cross-geo communication
  • Willingness to participate in an on-call rotation (typically during daytime hours, including weekends/holidays)
Bonus Skills
  • Demonstrated success leading SLO/error-budget adoption and reliability programs for cloud services
  • Experience operating a multi-region, follow-the-sun on-call model
  • Background in chaos/resilience/performance testing and release validation
  • Track record building or scaling SRE teams and influencing org-wide standards
  • Familiarity with compliance frameworks common to SaaS
What You’ll Get 
  • 25 vacation days, 4 sick days, 21 paid medical leave days, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
  • Premium private medical insurance for employees and dependents
  • Daily meal vouchers for restaurants and groceries (180 CZK per working day)
  • Flexible cafeteria platform with thousands of lifestyle benefit options
  • Multisport Card for gym and wellness, with family add-on options
  • Annual public transport reimbursement up to a set limit
  • Corporate mobile plan with optional family tariff
  • Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops and learning events like our annual Global Day of Learning

Please note: If the applicant is permanently present outside of the Czech Republic, Veeam reserves the right to refuse to consider the application for a job. Remote job is only possible in case the employee is located in the Czech Republic.

#LI-EZ1
#Remote

Veeam Software is an equal opportunity employer and does not tolerate discrimination in any form on the basis of race, color, religion, gender, age, national origin, citizenship, disability, veteran status or any other classification protected by federal, state or local law. All your information will be kept confidential.

Please note that any personal data collected from you during the recruitment process will be processed in accordance with our Recruiting Privacy Notice.  

The Privacy Notice sets out the basis on which the personal data collected from you, or that you provide to us, will be processed by us in connection with our recruitment processes. 

By applying for this position, you consent to the processing of your personal data in accordance with our Recruiting Privacy Notice.
By submitting your application, you acknowledge that the information provided in your job application and any supporting documents is complete and accurate to the best of your knowledge. Any misrepresentation, omission, or falsification of information may result in disqualification from consideration for employment or, if discovered after employment begins, termination of employment.

Top Skills

Argocd
Azure
Azure Devops
Datadog
Elastic
Github Actions
Grafana
Kubernetes
Opentelemetry
Prometheus
Pulumi
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Alpharetta, GA
4,172 Employees
Year Founded: 2006

What We Do

Veeam provides a single platform for modernizing backup, accelerating hybrid cloud and securing data. Veeam has 400,000+ customers worldwide, including 82% of the Fortune 500 and 69% of the Global 2,000. Veeam’s 100% channel ecosystem includes global partners, as well as HPE, NetApp, Cisco and Lenovo as exclusive resellers, and boasts more than 35K transacting partners worldwide.

Similar Jobs

Rapid7 Logo Rapid7

Program Coordinator

Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Remote or Hybrid
Prague, CZE
2400 Employees

Rapid7 Logo Rapid7

Senior Product Manager

Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Remote or Hybrid
Prague, CZE
2400 Employees

Mondelēz International Logo Mondelēz International

Director, Engineering Bakery MEU

Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
Remote or Hybrid
Czech Republic
90000 Employees

Apollo Next LTD Logo Apollo Next LTD

Junior Crypto Trader (Remote)

Blockchain • Fintech • Analytics • Financial Services • Cryptocurrency • Web3
Remote
13 Locations
57 Employees
2-5 Annually

Similar Companies Hiring

Scotch Thumbnail
Software • Retail • Payments • Fintech • eCommerce • Artificial Intelligence • Analytics
US
25 Employees
Milestone Systems Thumbnail
Software • Security • Other • Big Data Analytics • Artificial Intelligence • Analytics
Lake Oswego, OR
1500 Employees
Fairly Even Thumbnail
Software • Sales • Robotics • Other • Hospitality • Hardware
New York, NY

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account