See all the jobs at InfraCloud Technologies here:
| Engineering | Full-time | Partially remote
, ,Staff/ Principal Engineer
Location: Pune (Hybrid)
Employment Type: Full-Time
Experience Level: 10-15 Years
About the Role:
We are seeking a highly experienced Principal Engineer to lead the design, development, and optimization of cloud infrastructure, CI/CD pipelines, and site reliability engineering. As a key technical leader, you will ensure the scalability, availability, and performance of systems, collaborating closely with engineering, operations, and leadership teams.
The ideal candidate will bring a proven track record in architecting robust DevOps solutions, mentoring teams, and driving best practices for automation, cloud infrastructure, and system reliability.
Key Responsibilities:
Technical Leadership:
-
Lead architecture, design, and implementation of scalable and highly available infrastructure - primarily in AWS cloud
-
Define and establish best practices for DevOps, SRE, and Infrastructure as Code (IaC).
-
Serve as a technical advisor for complex infrastructure and cloud architecture decisions - including but not limited to security, design, building for multi tenant applications
Site Reliability Engineering:
-
Implement monitoring, alerting, and observability solutions to ensure system reliability.
-
Define and measure Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets.
-
Proactively identify and address performance bottlenecks and scalability issues.
Cloud Infrastructure:
-
Architect and optimize infrastructure across cloud platforms (e.g., AWS, Azure, GCP).
-
Manage cost optimization, performance tuning, and security of cloud environments.
Collaboration and Mentorship:
-
Work closely with software development teams to streamline workflows and improve system performance.
-
Mentor and guide junior engineers, fostering a culture of learning and innovation.
-
Collaborate with leadership to align DevOps/SRE strategies with business goals.
Qualifications:
-
10-15 years of experience in DevOps, SRE, or infrastructure engineering roles.
-
Strong expertise in AWS Landing zones
-
Hands-on experience with containerization tools like Docker and orchestration systems such as Kubernetes.
-
Proficiency in CI/CD tools (e.g., Jenkins, GitLab CI, GitHub Actions, or similar).
-
Deep understanding of Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or Ansible.
-
Strong experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, ELK, or New Relic).
-
Expert-level scripting and automation skills in languages like Python, Go, Bash, or similar.
-
Experience with managing high-availability, distributed systems, and cloud infrastructure at scale.
-
Knowledge of security best practices, networking concepts, and compliance standards (e.g., GDPR, SOC 2, ISO 27001).
-
Excellent problem-solving skills, leadership capabilities, and strong communication skills.
Nice-to-Have Skills:
- Experience in hybrid cloud or multi-cloud environments.
- Exposure to machine learning or AI-driven automation within DevOps/SRE practices.
- Certifications such as AWS Solutions Architect Professional, Certified Kubernetes Administrator (CKA), or equivalent.
- Experience working in Agile or DevSecOps environments.
- Managed offerings like Managed Kafa, Managed FLink etc.