Software Engineer, Machine Learning Platform
Get more other jobs in your inbox
Verified daily — no ghost listings.
About This RoleAI processing…
About the role Chime’s Machine Learning Platform (MLP) team builds and operates the infrastructure, tooling, and developer experience that powers machine learning across the company. We enable data scientists and ML engineers to develop, train, deploy, and monitor models reliably and efficiently. As a Machine Learning Platform Engineer, you will design and build scalable systems that support model training, feature computation, real-time inference, and experimentation. You’ll work at the intersection of distributed systems, cloud infrastructure, and applied machine learning. This role focuses
Key Responsibilities
- 1Design, build, and operate scalable ML infrastructure on AWS
- 2Develop distributed training and batch processing systems using Ray
- 3Build and maintain infrastructure-as-code using Terraform
- 4Support and evolve the feature store and feature pipelines
- 5Develop data ingestion and streaming systems (e.g., Kinesis, Kafka, Flink, Spark, or similar technologies)
- 6Improve CI/CD workflows for ML models and platform components
- 7Enhance observability, reliability, and cost visibility across ML workloads
- 8Partner closely with Data Science and ML Engineering teams to improve developer experience
- 9Contribute to platform architecture decisions and technical roadmaps
- 10Participate in on-call rotations to support production systems
Requirements
- 5+ years of experience in ML infrastructure, platform engineering, or production ML systems
- Knowledge of the machine learning model development lifecycle, including data preprocessing, model training, evaluation, and deployment
- Experience with distributed systems, cloud computing, or large-scale data processing
- Strong foundation in computer science and software engineering principles
- Deeply interested in the impact and evolution of advanced AI technologies
- Hands-on experience with CI/CD pipelines, DevOps practices, and infrastructure as code
- Experience with containerization technologies such as Docker and Kubernetes, and orchestration systems
- Knowledge of cloud platforms such as AWS and distributed computing frameworks such as Spark and Ray
- Experience with GPU programming(CUDA) and GPU costs/optimization
- Strong programming skills in Python, Go, Scala, Java or similar languages
- Familiarity with infrastructure-as-code (e.g., Terraform, CloudFormation)
- Solid understanding of software engineering fundamentals (testing, version control, code review, observability)
- Experience with distributed compute frameworks such as Ray
- Experience building or operating a feature store
- Experience with real-time ML systems or model serving
- Familiarity with streaming technologies (Kafka, Kinesis, Flink, Spark Streaming, etc.)
- Experience supporting ML lifecycle workflows (training, evaluation, deployment, monitoring)
- Knowledge of ML experimentation platforms and model governance practices
Perks & Benefits
Apply to This Job in Minutes
Generate ATS-optimized resume + cover letter + interview prep with Jobease.ca AI. Complete your application faster.
75% of AI Resumes Get Rejected
Beat the ATS with Jobease.ca's AI Resume Builder. Optimized for real hiring systems.
Build My ResumeProfile Match
Loading…Checking your profile against this job…
Job Overview
Share This Job
Track All Your Applications
Never lose track again. Jobease.ca organizes every application, interview, and follow-up.
Organize My Search