See all the jobs at InfraCloud Technologies here:
Site Reliability Engineer
, , | Full-time | Partially remote
Experience - 4 - 8 Years
Location - Bangalore (Hybrid)
We are seeking a highly skilled Site Reliability Engineer (SRE) to design, build, and operate scalable, reliable, and secure cloud-native platforms. The ideal candidate will have strong experience with Kubernetes ecosystems, cloud infrastructure, automation, observability, and GitOps practices.
Key Responsibilities
- Manage and optimize Kubernetes-based platforms, including Cilium, Istio, Ingress Controllers, and related ecosystem components.
- Design, deploy, and maintain infrastructure on Google Cloud Platform (GCP).
- Automate infrastructure provisioning and lifecycle management using Terraform.
- Implement and manage GitOps workflows using ArgoCD and GitLab.
- Deploy and maintain Helm charts for Kubernetes applications.
- Manage secrets, service discovery, and distributed systems using Vault and Consul.
- Build and maintain monitoring, logging, and observability platforms using Prometheus Operator and the Grafana Stack (Grafana, Mimir, Loki, Alloy, Tempo, and Pyroscope).
- Collaborate with development teams to improve platform reliability, performance, scalability, and operational excellence.
- Develop CI/CD pipelines and automation to support modern cloud-native deployments.
Required Skills
- Strong hands-on experience with Kubernetes (K8s) and cloud-native technologies.
- Experience with GCP, Terraform, Helm, and ArgoCD.
- Knowledge of Service Mesh technologies, particularly Istio and Cilium.
- Experience with Vault, Consul, and infrastructure security best practices.
- Strong expertise in observability tools including Prometheus and the Grafana ecosystem.
- Proficiency with GitOps, GitLab, CI/CD pipelines, and automation.
- Good understanding of Linux systems, networking, and troubleshooting in distributed environments.
Preferred Qualifications
- Experience operating large-scale production environments.
- Knowledge of SRE principles, incident management, capacity planning, and reliability engineering.
- Relevant cloud-native certifications (CKA, GCP, Terraform, etc.) are a plus.
Fetching your Linkedin profile ...