About The Project

We're looking for an experienced ML Infrastructure Engineer who has successfully implemented large-scale ML infrastructure optimization projects. The primary focus is migrating and optimizing computer vision models from Nvidia GPU-based infrastructure to AWS Inferentia/Trainium while getting performance boost and cost reduction.

Current Infrastructure:

  • ML Models: RetinaFace, OpenPose, CLIP, and other CV models
  • Hardware: A10/T4 GPUs on EKS
  • Serving: Triton Inference Server
  • Orchestration: Mix of Kubernetes and Ray

Stage: Presale and Delivery

Duration: 2 months (preliminary)

Capacity: part-time (20h/week)

Areas of Responsibility

  • Technical Leadership:
    • Lead the architecture design for ML infrastructure modernization
    • Define compilation and optimization strategies for model migration
    • Establish performance benchmarking framework
    • Set up monitoring and alerting for the new infrastructure
  • Performance Optimization:
    • Implement efficient model compilation pipelines for Inferentia2
    • Optimize batch processing and memory layouts
    • Fine-tune model serving configurations
    • Ensure latency requirements are met across all services
  • Cost Optimization:
    • Analyze and optimize infrastructure costs
    • Implement efficient resource allocation strategies
    • Set up cost monitoring and reporting
    • Achieve target cost reduction while maintaining performance
Skills

  • Proven track record of ML infrastructure optimization projects
  • Hands-on experience with AWS Neuron SDK and Inferentia/Trainium deployment
  • Deep expertise in PyTorch model optimization and compilation
  • Experience with high-throughput computer vision model serving
  • Production experience with both Kubernetes and Ray for ML workloads

Knowledge

  • Model Optimization Expertise:
    • Deep understanding of ML model architecture optimization
    • Experience with model compilation techniques for specialized hardware (Inferentia/Trainium)
    • Proficiency in optimizing computer vision models (CNN architectures)
    • Knowledge of model serving optimization patterns
  • Performance Optimization:
    • Advanced understanding of ML model inference optimization
    • Expertise in batch processing strategies
    • Memory layout optimization for vision models
    • Experience with pipeline parallelism implementation
    • Proficiency in latency/throughput optimization techniques
  • Hardware Acceleration:
    • Deep knowledge of ML accelerator architectures
    • Understanding of hardware-specific optimizations
    • Experience with model compilation for specialized chips
    • Proficiency in memory access pattern optimization
  • Performance Analysis:
    • Proficiency in ML model profiling tools
    • Experience with performance bottleneck identification
    • Knowledge of performance monitoring techniques
    • Ability to analyze and optimize inference patterns
Nice to Have:

  • Experience with Ray architecture for ML serving
  • Knowledge of distributed ML systems
  • Understanding of ML pipeline optimization
  • Experience with model quantization techniques

Experience

  • Model Optimization (4+ years):
    • Proven track record of optimizing large-scale ML inference systems
    • Successfully implemented hardware-specific model optimizations
    • Demonstrated experience with computer vision model optimization
    • Led projects achieving significant performance improvements
  • Proven Results (Examples):
  • Successfully optimized computer vision models similar to RetinaFace/CLIP
  • Achieved significant cost reduction while maintaining performance
  • Implemented efficient batch processing strategies
  • Developed performance monitoring and optimization frameworks

Salary

Competitive

Project Basis based

Remote Job

Worldwide

Job Overview
Job Posted:
1 year ago
Job Type
Contractual
Job Role
Any
Education
Any
Experience
Any
Total Vacancies
-

Share This Job:

Location

Finland