Returning Candidate?

Staff Software Development Engineer

ID: 2026-3547
Position Type: Full time

Lattice Overview

There is energy here…energy you can feel crackling at any of our international locations. It’s an energy generated by enthusiasm for our work, for our teams, for our results, and for our customers. Lattice is a worldwide community of engineers, designers, and manufacturing operations specialists in partnership with world-class sales, marketing, and support teams, who are developing programmable logic solutions that are changing the industry. Our focus is on R&D, product innovation, and customer service, and to that focus, we bring total commitment and a keenly sharp competitive personality.

Energy feeds on energy. If you flourish in a fast paced, results-oriented environment, if you want to achieve individual success within a “team first” organization, and if you believe you can contribute and succeed in a demanding yet collegial atmosphere, then Lattice may well be just what you’re looking for.

Responsibilities & Skills

Role Overview
The AI DevOps Engineer plays a critical role at the intersection of machine learning, software engineering and platform operations. This role ensures the reliable, scalable, and secure deployment of AI/ML models into production by building automated pipelines, optimizing model serving infrastructure, and integrating observability into the entire ML lifecycle. The AI DevOps Engineer partners closely with data scientists, ML engineers, and platform teams to accelerate the delivery of AI solutions.
________________________________________
Key Responsibilities
1. ML Infrastructure & Platform
•   Design, build, and maintain AI platform components (model training, model registry, feature store, inference services).
•   Implement container-based and serverless architectures for scalable AI workloads.
•   Manage GPU/TPU compute clusters and optimize resource utilization for training and inference.
2. CI/CD for ML (MLOps)
•   Build and maintain CI/CD pipelines for ML workflows including data validation, model testing, packaging, and automated deployment.
•   Integrate model governance, approval workflows, and rollback mechanisms into pipelines.
•   Enable reproducible pipelines using tools like MLflow, Kubeflow, Vertex AI, Databricks, Azure ML, or Amazon SageMaker.
3. Production Model Deployment & Inference
•   Deploy real time, batch, and streaming inference pipelines.
•   Optimize performance of model serving systems (e.g., Triton Inference Server, TorchServe, BentoML, Ray Serve).
•   Implement A/B testing, shadow deployments, and model versioning strategies.
4. Monitoring & Observability
•   Build end to end observability including:
o   Model performance monitoring (drift, bias, accuracy decay).
o   System health monitoring (latency, throughput, resource usage).
o   Data quality checks using automated detectors.
•   Integrate monitoring dashboards and alerts via Prometheus, Grafana, ELK, Datadog, etc.
5. Security, Compliance & Governance
•   Ensure secure handling of model artifacts, datasets, and inference endpoints.
•   Implement identity, access, and compliance controls (PII, GDPR, SOC2, ISO, Responsible AI frameworks).
•   Conduct threat modeling for AI systems (model stealing, prompt injection, data poisoning).
6. Collaboration & Engineering Practices
•   Work closely with data scientists to productionize research prototypes.
•   Partner with cloud, SRE, and platform teams to align on best practices.
•   Write high-quality documentation, runbooks, and architectural diagrams.
________________________________________
Required Skills & Qualifications
Technical Skills
•   Strong programming experience in Python (preferred), plus experience with Bash, Go, or Java.
•   Strong knowledge in cloud services (Azure / AWS / GCP) including managed ML services.
•   Hands-on with containerization and orchestration: Docker, Kubernetes, Helm.
•   Experience with CI/CD tools: GitHub Actions, Azure DevOps, GitLab CI, Jenkins.
•   Familiarity with ML frameworks: PyTorch, TensorFlow, Scikit learn.
•   Experience deploying and scaling AI inference systems.
DevOps & Infra Skills
•   Strong Linux fundamentals and system troubleshooting skills.
•   Knowledge of networking, load balancing, and distributed systems.
AI/ML Skills
•   Understanding of ML lifecycle, model artifacts, hyperparameter tuning, and model evaluation.
•   Experience with ML metadata management, experiment tracking, and data validation tools.
________________________________________
Preferred Qualifications
•   Experience with LLMOps: deploying and optimizing large language models, vector DBs, and retrieval pipelines.
•   Knowledge of frameworks such as LangChain, LlamaIndex, Milvus, Weaviate, Pinecone.
•   Prior work in high scale, low latency inference environments.
•   Certifications in cloud (Azure/AWS/GCP) or ML engineering.
________________________________________
Behavioral Competencies
•   Strong problem-solving and debugging skills.
•   Ability to collaborate across cross-functional teams.
•   Ownership mindset with a focus on reliability, performance, and automation.
•   Effective communication with both technical and non-technical stakeholders

Options

Apply for this job onlineApply

Email this job to a friendRefer

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.

Share on your newsfeed

Application FAQs