AI Engineer: ML Infrastructure
- Remote
- Full-Time
About Zyphe
Zyphe is building agentic compliance for regulated fintechs, crypto exchanges, and enterprises. Our five-agent system (Document, Liveness, Name Resolution, Risk Profile, Historical) automates KYB, KYC, and AML while a privacy-first architecture means personal data never has to live on our customers' servers.
We are early, focused, and shipping. Real customers, real revenue, an active fundraise with a committed lead. The next twelve months are about turning that into a defended category position.
About the role
Zyphe is hiring an AI Engineer specializing in ML infrastructure to build and scale the platform that powers all of our machine learning systems.
This is not a modeling seat. You will own the entire ML platform, from training orchestration to serving infrastructure, ensuring our AI capabilities are reliable, fast, and cost-efficient in production. You will work at the intersection of distributed systems, MLOps, and cloud-native infrastructure, building the foundation every AI engineer at Zyphe depends on.
Responsibilities
- Design and maintain scalable ML training pipelines with experiment tracking and reproducibility.
- Build and optimize model serving infrastructure for low-latency, high-availability inference.
- Develop feature stores and data pipelines that feed training and real-time prediction.
- Implement CI/CD for ML, automated testing, validation, and deployment of model artifacts.
- Build monitoring and alerting for model performance, data drift, and system health.
- Optimize compute costs across training and inference (GPU scheduling, spot instances).
- Manage Kubernetes-based ML workloads and container orchestration.
- Translate research prototypes from ML engineers into production-grade systems.
You may be a good fit if you
- Have strong experience building ML infrastructure and platform tooling in production.
- Have deep knowledge of Kubernetes, Docker, and cloud-native orchestration on AWS or GCP.
- Have hands-on experience with ML workflow tools such as Ray, Kubeflow, or MLflow.
- Have designed model serving systems (Triton, TorchServe, or custom gRPC services).
- Have a solid understanding of distributed training and GPU resource management.
- Have strong software engineering fundamentals (Python, Go, or Rust; CI/CD; infrastructure-as-code).
- Are familiar with feature stores, data versioning, and experiment tracking.
Strong candidates may also have experience with
- Cost optimization for GPU workloads.
- Privacy-preserving ML and confidential computing (Nitro Enclaves, SGX, or similar).
- Scaling ML platforms in regulated environments.
- Open-source contributions to ML platform tooling.
Annual salary
Competitive, commensurate with experience. Equity included.
Logistics
Location: Remote. Hybrid policy: Fully remote, with periodic on-sites for offsites and key meetings. Visa sponsorship: Not available at this time.
Education
We require at least a Bachelor's degree in a related field, or equivalent professional experience.