See all the jobs at InfraCloud Technologies here:
| Full-time | Fully remote
, ,Job Overview:
We are seeking an MLOps Engineer to support the development, deployment, and operationalization of AI-based models. The ideal candidate will have strong experience in both machine learning and DevOps, with a deep understanding of model deployment, monitoring, and scaling. In this role, you will be responsible for migrating our AI-based application from a cloud-hosted solution to a self-hosted model, ensuring robust model integration, and managing the lifecycle of machine learning models, from development and training to deployment and monitoring.
Key Responsibilities:
-
Model Development & Training.
-
Develop and implement efficient pipelines for training, validating, and fine-tuning machine learning models.
-
Ensure that models are trained on relevant datasets and are capable of handling a wide variety of use cases
-
Deployment and Infrastructure:
-
Design and implement the architecture required for self-hosted AI models, both on-premise and in the cloud, depending on project requirements.
-
Work closely with cloud infrastructure teams (AWS, GCP, Azure) to deploy and scale models in a production environment.
-
Set up automated deployment pipelines for the continuous integration and delivery (CI/CD) of machine learning models.
-
Model Monitoring and Performance Tuning:
-
Monitor model performance and identify areas for improvement, including latency, resource consumption, and prediction accuracy.
-
Develop strategies for model retraining, continuous learning, and tuning to maintain the best performance over time.
-
Implement logging and monitoring systems to track model health and trigger alerts for potential issues.
-
Integration & Automation:
-
Integrate AI models with the existing platform, ensuring smooth interaction between models and front-end systems.
-
Automate the end-to-end ML pipeline, from data collection and preprocessing to model deployment and feedback loops.
-
Collaborate with DevOps and infrastructure teams to ensure smooth operation of models in production environments.
-
Collaboration & Documentation:
-
Document model architectures, training processes, deployment pipelines, and system configurations for internal use.
-
Provide technical support and troubleshooting for model-related issues during integration or in production.
-
Security and Compliance:
-
Ensure compliance with data privacy and security regulations in the AI model deployment process.
-
Implement robust access controls, encryption, and audit logging for model and data access.
Qualifications:
-
Education: Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field.
-
Experience:
-
3+ years of experience in MLOps, with a focus on machine learning model deployment and management.
-
Hands-on experience with both on-premise and cloud-based AI model deployment (AWS, GCP, Azure).
-
Experience with containerization technologies (Docker, Kubernetes) for model deployment and scaling.
-
Familiarity with continuous integration/continuous delivery (CI/CD) tools (e.g., Jenkins, GitLab CI, CircleCI).
-
Experience in monitoring tools (e.g., Prometheus, Grafana) and managing model performance at scale.