AI/ML Operations

Pune, Maharashtra, India | Full-time | Fully remote

Apply

Job Overview:

We are seeking an MLOps Engineer to support the development, deployment, and operationalization of AI-based models. The ideal candidate will have strong experience in both machine learning and DevOps, with a deep understanding of model deployment, monitoring, and scaling. In this role, you will be responsible for migrating our AI-based application from a cloud-hosted solution to a self-hosted model, ensuring robust model integration, and managing the lifecycle of machine learning models, from development and training to deployment and monitoring.

Key Responsibilities:

  1. Model Development & Training.

    • Develop and implement efficient pipelines for training, validating, and fine-tuning machine learning models.

    • Ensure that models are trained on relevant datasets and are capable of handling a wide variety of use cases

  2. Deployment and Infrastructure:

    • Design and implement the architecture required for self-hosted AI models, both on-premise and in the cloud, depending on project requirements.

    • Work closely with cloud infrastructure teams (AWS, GCP, Azure) to deploy and scale models in a production environment.

    • Set up automated deployment pipelines for the continuous integration and delivery (CI/CD) of machine learning models.

  3. Model Monitoring and Performance Tuning:

    • Monitor model performance and identify areas for improvement, including latency, resource consumption, and prediction accuracy.

    • Develop strategies for model retraining, continuous learning, and tuning to maintain the best performance over time.

    • Implement logging and monitoring systems to track model health and trigger alerts for potential issues.

  4. Integration & Automation:

    • Integrate AI models with the existing platform, ensuring smooth interaction between models and front-end systems.

    • Automate the end-to-end ML pipeline, from data collection and preprocessing to model deployment and feedback loops.

    • Collaborate with DevOps and infrastructure teams to ensure smooth operation of models in production environments.

  5. Collaboration & Documentation:

    • Document model architectures, training processes, deployment pipelines, and system configurations for internal use.

    • Provide technical support and troubleshooting for model-related issues during integration or in production.

  6. Security and Compliance:

    • Ensure compliance with data privacy and security regulations in the AI model deployment process.

    • Implement robust access controls, encryption, and audit logging for model and data access.

Qualifications:

  • Education: Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field.

  • Experience:

    • 3+ years of experience in MLOps, with a focus on machine learning model deployment and management.

    • Hands-on experience with both on-premise and cloud-based AI model deployment (AWS, GCP, Azure).

    • Experience with containerization technologies (Docker, Kubernetes) for model deployment and scaling.

    • Familiarity with continuous integration/continuous delivery (CI/CD) tools (e.g., Jenkins, GitLab CI, CircleCI).

    • Experience in monitoring tools (e.g., Prometheus, Grafana) and managing model performance at scale.