Data Platform Engineer

Pune, Maharashtra, India | Full-time | Partially remote

Apply

About the Role: We seek a skilled and dynamic Data Platform Engineer to design, build, and optimize scalable data processing and orchestration systems. The ideal candidate will have strong expertise in advanced batch processing systems and data streaming technologies. You will be pivotal in creating resilient data platforms and enabling efficient batch and stream processing pipelines.

 

What will you be learning and doing?

  • Design, implement, and manage data platforms utilizing Kubernetes (K8s), Volcano, Yunikorn, or AWS Batch.

  • Develop and optimize large-scale, distributed data processing pipelines with Apache Flink.

  • Collaborate with cross-functional teams to integrate data solutions with existing infrastructure and workflows.

  • Enhance system performance by tuning Kubernetes clusters and batch processing workloads.

  • Troubleshoot, monitor and maintain high availability for data orchestration systems.

  • Create and maintain detailed documentation of system architecture, configurations, and processes.

  • Stay updated with emerging technologies in data platforms and make recommendations for improvements.

 

What are we looking for?

  • Proven experience with Kubernetes (K8s) and scheduling tools like Volcano or Yunikorn.

  • Hands-on expertise with AWS Batch for large-scale batch processing.

  • Proficiency in Apache Flink for real-time data processing and stream computation.

  • Strong understanding of distributed computing, containerization, and orchestration principles.

  • Experience with cloud platforms such as AWS, GCP, or Azure.

  • Knowledge of DevOps best practices, including CI/CD pipelines and monitoring tools.

  • Proficient in programming languages like Python

  • Strong problem-solving skills with a focus on scalability and performance optimization.

  • Excellent communication and teamwork abilities.

 

Good to have:

  • Experience with other batch and stream processing frameworks (e.g., Apache Spark, Kafka Streams).

  • Familiarity with data serialization formats such as Avro, Parquet, or Protobuf.

  • Certification in Kubernetes, AWS, or related technologies.