Lead AI/ML Infrastructure Engineer
Come join a team of industry and science leaders to achieve a vision of empowering innovation through state-of-the-art artificial intelligence and machine learning. We are addressing exciting challenges for our customers, at the intersection of AI/ML and cutting-edge cloud infrastructure with ML being both a core enabler for and a major feature of, our platform.
We are looking for candidates adept at implementing and researching AI/ML engineering and infrastructure engineering capabilities.
- AI/ML infrastructure management: Architect, deploy, and maintain scalable AI/ML infrastructure leveraging Kubernetes and KFserve for model hosting and management.
- Model deployment and optimization: Implement efficient deployment pipelines for AI/ML models, focusing on optimization, scalability, and reliability.
- Performance monitoring and tuning: Monitor model performance metrics, identify bottlenecks, and implement improvements to enhance efficiency and accuracy.
- Team leadership and collaboration: Lead a small team of engineers, fostering a collaborative environment and ensuring effective communication and knowledge sharing.
- Cross-functional collaboration: Work closely with data scientists, software engineers, and other stakeholders to understand requirements, translate them into scalable solutions, and ensure successful deployment.
- Continuous integration / Continuous deployment (CI/CD): Implement and maintain CI/CD pipelines for AI/ML models to ensure rapid and reliable model updates and releases.
- Documentation and best practices: Develop and maintain documentation, best practices, and standard operating procedures related to ML infrastructure and deployment processes.
Required skills and qualifications
- Proficiency in KFserve, Large Language Models (LLMs), Kubernetes, and Flyte for AI/ML model deployment and management.
- Strong background in managing AI/ML infrastructure at scale.
- Experience with CI/CD pipelines for AI/ML models.
- Proven ability to lead and manage small teams effectively.
- Excellent problem-solving skills with a focus on scalability and reliability.
- Strong communication and collaboration skills, ability to work effectively in cross-functional teams.
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent experience).
- Experience with additional ML frameworks and tools beyond KFserve and Flyte.
- Certifications in Kubernetes or related technologies.
- Previous experience in deploying and managing Large Language Models (LLMs).
- Familiarity with cloud platforms (AWS, GCP, Azure) for ML model hosting.