We're looking for an experienced ML Infrastructure Engineer who has successfully implemented large-scale ML infrastructure optimization projects. The primary focus is migrating and optimizing computer vision models from Nvidia GPU-based infrastructure to AWS Inferentia/Trainium while getting performance boost and cost reduction.
Current Infrastructure:
ML Models: RetinaFace, OpenPose, CLIP, and other CV models
Hardware: A10/T4 GPUs on EKS
Serving: Triton Inference Server
Orchestration: Mix of Kubernetes and Ray
Stage: Presale and Delivery
Duration: 2 months (preliminary)
Capacity: part-time (20h/week)
Areas of Responsibility
Technical Leadership:
Lead the architecture design for ML infrastructure modernization
Define compilation and optimization strategies for model migration
Establish performance benchmarking framework
Set up monitoring and alerting for the new infrastructure
Performance Optimization:
Implement efficient model compilation pipelines for Inferentia2
Optimize batch processing and memory layouts
Fine-tune model serving configurations
Ensure latency requirements are met across all services