Learn more about the latest security and privacy threats
All open roles
Engineering

AI Engineer: ML Infrastructure

  • Remote
  • Full-Time

Company Description Zyphe provides a privacy-first identity verification solution that prioritizes user control over personal data while ensuring businesses are protected from fraud and data breaches. Powered by a decentralized platform, Zyphe enables seamless identity verification and retention without storing Personally Identifiable Information (PII) on company servers. With advanced KYC, AML, and KYB modules built on Web3 principles, Zyphe helps organizations meet modern privacy and security requirements. The platform also offers users secure identity vaults and effortless one-click verification for smooth onboarding experiences. Role Overview We're looking for an AI Engineer specializing in ML Infrastructure to build and scale the platform that powers all of our machine learning systems. This is not a modeling role. You will own the entire ML platform, from training orchestration to serving infrastructure, ensuring our AI capabilities are reliable, fast, and cost-efficient in production. You'll work at the intersection of distributed systems, MLOps, and cloud-native infrastructure, building the foundation that every AI team at Zyphe depends on. What You'll Do - Design and maintain scalable ML training pipelines with experiment tracking and reproducibility

  • Build and optimize model serving infrastructure for low-latency, high-availability inference
  • Develop feature stores and data pipelines that feed training and real-time prediction
  • Implement CI/CD for ML, automated testing, validation, and deployment of model artifacts
  • Build monitoring and alerting systems for model performance, data drift, and system health
  • Optimize compute costs across training and inference workloads (GPU scheduling, spot instances)
  • Manage Kubernetes-based ML workloads and container orchestration
  • Collaborate with ML engineers to translate research prototypes into production-grade systems What We're Looking For - Strong experience building ML infrastructure and platform tooling in production
  • Deep knowledge of Kubernetes, Docker, and cloud-native orchestration (AWS/GCP)
  • Hands-on expertise with ML workflow tools (Ray, Kubeflow, MLflow, or similar)
  • Experience designing model serving systems (Triton, TorchServe, custom gRPC services)
  • Solid understanding of distributed training and GPU resource management
  • Strong software engineering fundamentals (Python, Go, or Rust; CI/CD; infrastructure-as-code)
  • Familiarity with feature stores, data versioning, and experiment tracking
  • Experience with cost optimization for GPU workloads is a plus What Makes You a Great Fit - You think in reliability and throughput, not just accuracy
  • You're obsessed with developer experience, making ML engineers productive and autonomous
  • You combine systems engineering depth with ML domain understanding
  • You don't just keep the lights on, you build platforms that accelerate the entire team

Apply for this role

Paste your LinkedIn profile, with or without https://.
PDF, DOC, or DOCX. Max 10 MB.