I am a Platform-focused SRE dedicated to building resilient distributed systems. I specialize in bridging the gap between development and operations by creating Internal Developer Platforms (IDPs) that reduce cognitive load and accelerate delivery.
- Reliability: Implementing SLIs/SLOs, Error Budgets, and automated incident response.
- Scalability: Managing multi-region Kubernetes clusters (AKS/EKS/GKE) and high-traffic cloud networking.
- Developer Experience: Building "Golden Paths" using Backstage, Terraform, and GitOps.
| Category | Tools & Technologies |
|---|---|
| Cloud | AWS, Azure, Google Cloud Platform |
| Orchestration | Kubernetes (Managed Service and Selfmanaged One), Docker, Nomad |
| Infrastructure | Terraform, Pulumi, Crossplane, Ansible, Helm, Kustomize |
| CI/CD / GitOps | ArgoCD, FluxCD, GitHub Actions, GitLab CI, Buildkite |
| Observability | Prometheus, Thanos, Grafana, Loki, Opensearch, ELK Stack, Datadog, OpenTelemetry |
| Languages | Go, Python |
- Multi-Cluster Metrics Aggregation (Thanos/Prometheus): Engineered a centralized observability platform across 10+ global clusters using Thanos to provide long-term storage and a single pane of glass for Grafana dashboards.
- Kubernetes Operator for Timebased Scalling (Go): Developed a kubernetes operator that can scale workloads during specific time in the day, optimized 35% cost for dev cluster by scaling down workload
- Kubernetes Postgres Backup Operator (Go): Developed a custom Go-based operator using the Controller-Runtime to manage automated database snapshots and offsite S3/Azure Blob syncing via CRDs which increase reliability by 50%.
- Enterprise Hub-Spoke AKS Architecture: Designed a private-link-first network topology for Azure, securing traffic with AGIC (Application Gateway) and ensuring zero-trust communication via Calico policies.
- Automated FinOps Dashboard: Built a Python tool integrated with AWS/Azure Billing APIs to identify orphaned resources and idle clusters, reducing cloud spend by 22% annually.
- Internal Developer Portal (Backstage): Architecting a self-service portal that allows developers to spin up ephemeral environments and RDS instances with a single click using Crossplane.
- Chaos Engineering Framework: Implementing LitmusChaos experiments into CI/CD pipelines to validate service resilience against pod evictions and network latency.
- Multi-Cloud GitOps Controller: Building a custom controller in Go to synchronize secrets and configurations across disparate EKS and AKS environments seamlessly.
- π¦ Learning Rust for high-performance systems tooling.
- βΈοΈ Deep diving into eBPF for advanced network observability.
- βοΈ Scaling Platform Engineering as a product within organizations.
- LinkedIn: linkedin.com/in/muntashir-islam
- Email: islam.muntashir@gmail.com
- Availability: Open to discussing Senior SRE, Platform Engineer, DevOps or Cloud Engineering opportunities.
"Automate everything, document the rest."

