Principal ML Engineer (Infra/hardware)

at Neurons Lab

Contractual Featured

About The Project

We're looking for an experienced ML Infrastructure Engineer who has successfully implemented large-scale ML infrastructure optimization projects. The primary focus is migrating and optimizing computer vision models from Nvidia GPU-based infrastructure to AWS Inferentia/Trainium while getting performance boost and cost reduction.

Current Infrastructure:

ML Models: RetinaFace, OpenPose, CLIP, and other CV models
Hardware: A10/T4 GPUs on EKS
Serving: Triton Inference Server
Orchestration: Mix of Kubernetes and Ray

Stage: Presale and Delivery

Duration: 2 months (preliminary)

Capacity: part-time (20h/week)

Areas of Responsibility

Technical Leadership:

Lead the architecture design for ML infrastructure modernization
Define compilation and optimization strategies for model migration
Establish performance benchmarking framework
Set up monitoring and alerting for the new infrastructure

Performance Optimization:

Implement efficient model compilation pipelines for Inferentia2
Optimize batch processing and memory layouts
Fine-tune model serving configurations
Ensure latency requirements are met across all services

Cost Optimization:

Analyze and optimize infrastructure costs
Implement efficient resource allocation strategies
Set up cost monitoring and reporting
Achieve target cost reduction while maintaining performance

Skills

Proven track record of ML infrastructure optimization projects
Hands-on experience with AWS Neuron SDK and Inferentia/Trainium deployment
Deep expertise in PyTorch model optimization and compilation
Experience with high-throughput computer vision model serving
Production experience with both Kubernetes and Ray for ML workloads

Knowledge

Model Optimization Expertise:

Deep understanding of ML model architecture optimization
Experience with model compilation techniques for specialized hardware (Inferentia/Trainium)
Proficiency in optimizing computer vision models (CNN architectures)
Knowledge of model serving optimization patterns

Performance Optimization:

Advanced understanding of ML model inference optimization
Expertise in batch processing strategies
Memory layout optimization for vision models
Experience with pipeline parallelism implementation
Proficiency in latency/throughput optimization techniques

Hardware Acceleration:

Deep knowledge of ML accelerator architectures
Understanding of hardware-specific optimizations
Experience with model compilation for specialized chips
Proficiency in memory access pattern optimization

Performance Analysis:

Proficiency in ML model profiling tools
Experience with performance bottleneck identification
Knowledge of performance monitoring techniques
Ability to analyze and optimize inference patterns

Nice to Have:

Experience with Ray architecture for ML serving
Knowledge of distributed ML systems
Understanding of ML pipeline optimization
Experience with model quantization techniques

Experience

Model Optimization (4+ years):

Proven track record of optimizing large-scale ML inference systems
Successfully implemented hardware-specific model optimizations
Demonstrated experience with computer vision model optimization
Led projects achieving significant performance improvements

Proven Results (Examples):
Successfully optimized computer vision models similar to RetinaFace/CLIP
Achieved significant cost reduction while maintaining performance
Implemented efficient batch processing strategies
Developed performance monitoring and optimization frameworks

Salary

Competitive

Project Basis based

Remote Job

Worldwide

Job Overview

Job Posted:

1 year ago

Job Type

Contractual

Job Role

Any

Education

Any

Experience

Any

Total Vacancies

Location

Finland

Salary

Competitive

Remote Job

Company

Candidate

Employer

Support

Job Details

Salary

Competitive

Remote Job

Share This Job:

Related Jobs

Company

Candidate

Employer

Support