DevOps & Backend Engineer — Mid/Senior
Open InsuranceResume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
JOB DESCRIPTION
DevOps & Backend Engineer — Mid/Senior
InsurTech Platform | Engineering / Product Development
Location: Remote or Hybrid (if US Located)
Employment Type: Contract — Full-Time
Department: Engineering / Product Development
Experience Level: Mid/Senior (4–7+ years)
Reports To: Director of Engineering
Role Overview
We are looking for a DevOps & Backend Engineer who can bridge the gap between platform
infrastructure and application development. In this role, you will design and operate the cloud-native
infrastructure that powers our InsurTech product suite while contributing directly to backend services built
with TypeScript and Nest.js.
A critical dimension of this position is enabling and supporting our internal LLM and AI platform. You will
build the infrastructure foundations that allow our AI team to train, serve, and scale custom large
language models and AI-powered services—including GPU-accelerated workloads, model inference
endpoints, high-throughput data pipelines, and the CI/CD automation that brings AI capabilities reliably
into production across all products.
Key Responsibilities
Cloud Infrastructure & DevOps
Design, build, and maintain production-grade CI/CD pipelines (GitHub Actions, GitLab CI) with
automated testing, security scanning, and progressive deployment strategies (blue-green, canary,
feature flags).
Manage and optimize AWS infrastructure including EKS, EC2, RDS, ECR, S3, Lambda,
CloudFront, Route 53, and IAM—with a focus on cost optimization, high availability, and disaster
recovery.
Build and maintain Kubernetes clusters (EKS) with Helm charts, custom operators, autoscaling
policies, and multi-environment management (dev, staging, production).
Automate infrastructure provisioning and configuration using Terraform (primary), Ansible, and
CloudFormation with GitOps workflows and drift detection.
Implement comprehensive observability using Prometheus, Grafana, Datadog, ELK/OpenSearch,
and distributed tracing (Jaeger/OpenTelemetry) for full-stack visibility.
Design and maintain networking architecture including VPCs, security groups, load balancers,
service meshes (Istio/Linkerd), and DNS management.
AI/LLM Infrastructure Support
Provision and manage GPU-accelerated compute environments (AWS P4/P5 instances,
Inferentia, SageMaker) for LLM training, fine-tuning, and inference workloads.
Build containerized model-serving infrastructure supporting vLLM, TGI (Text Generation
Inference), NVIDIA Triton, and custom inference endpoints with autoscaling based on request
load and latency targets.
Design and operate data pipelines and storage architectures (S3, EFS, FSx for Lustre) optimized
for large-scale model training datasets and artifact management.
Implement CI/CD automation specifically for ML/AI workflows—model versioning, automated
evaluation gates, staged rollouts of model updates, and A/B inference routing.
Collaborate with the AI team to optimize GPU utilization, manage spot instance strategies, and
implement cost-aware scheduling for training jobs.
Set up monitoring dashboards for model inference latency, throughput, token usage, GPU
utilization, and cost tracking.
Backend Development
Contribute to and extend backend services built with Nest.js and TypeScript, focusing on
scalability, reliability, and clean architecture.
Developing internal TypeScript framework.
Build and maintain scalable microservices and RESTful/GraphQL APIs that integrate with AI
inference endpoints and the LLM Composer platform.
Design event-driven architectures using Kafka, SQS/SNS, and WebSockets for real-time data
processing and AI-powered features.
Ensure all deployments are production-ready, horizontally scalable, and follow 12-factor app
principles with proper health checks, graceful shutdowns, and circuit breakers.
Collaborate with backend and AI teams on system architecture, API contracts, database schema
design, and reliability improvements.
Implement database management best practices including migration strategies, read replicas,
connection pooling, and query optimization for PostgreSQL and Redis.
Required Skills & Qualifications
4–7+ years of professional experience in DevOps, Cloud Engineering, or Platform Engineering,
with meaningful backend development experience.
Hands-on Kubernetes experience (EKS strongly preferred), including cluster administration, Helm
chart development, autoscaling, and troubleshooting.
Strong proficiency with TypeScript and Nest.js (or comparable Node.js backend frameworks like
Express, Fastify).
Deep AWS expertise across compute, storage, networking, IAM, and managed services—with
experience optimizing for cost and performance.
Strong Infrastructure-as-Code skills with Terraform; experience with modular, reusable
configurations and state management.
Solid understanding of microservices architecture, distributed systems patterns, and container
orchestration.
Experience with Docker, container registries, and container security best practices.
Proficiency with CI/CD pipeline design including automated testing, security scanning, and
deployment strategies.
Familiarity with GitOps workflows and version-controlled infrastructure management.
Strong Linux systems administration and shell scripting skills.
Preferred Qualifications (Nice to Have)
Experience provisioning and managing GPU workloads for ML/AI model training and inference in cloud environments.
Familiarity with ML model serving frameworks (vLLM, TGI, Triton, BentoML, SageMaker
Endpoints).
Experience with Kafka, event-driven architectures, and real-time streaming systems.
Familiarity with service mesh technologies (Istio, Linkerd) and API gateway management.
Experience with HIPAA, SOC 2, or other healthcare/financial compliance frameworks in cloud
environments.
Knowledge of database technologies beyond PostgreSQL—vector databases (Pinecone,
PGVector), graph databases, or time-series databases.
Experience with chaos engineering, load testing, and reliability engineering practices (SRE).
AWS certifications (Solutions Architect, DevOps Engineer, or equivalent).
Technology Stack & Tools
Category
Technologies
Languages
TypeScript, JavaScript, Python, Bash, SQL, HCL (Terraform)
Backend
Nest.js, Node.js, Express, Fastify, GraphQL
Cloud (AWS)
EKS, EC2, RDS, S3, Lambda, ECR, CloudFront, SageMaker, IAM, KMS
Containers & Orch.
Docker, Kubernetes, Helm, Kustomize, ArgoCD
IaC & Config
Terraform, Ansible, CloudFormation, Pulumi
CI/CD
GitHub Actions, GitLab CI, CodePipeline, semantic-release
AI/ML Infra
vLLM, TGI, Triton, SageMaker, GPU instances (P4/P5/Inferentia)
Monitoring
Prometheus, Grafana, Datadog, ELK/OpenSearch, OpenTelemetry,
Jaeger
Data & Messaging
PostgreSQL, Redis, Kafka, SQS/SNS, S3, DynamoDB
Security
Vault, SOPS, OPA, Trivy, Snyk, AWS Security Hub
What We Offer
A high-impact role at the intersection of infrastructure, backend development, and cutting-edge AI
platform engineering.
Opportunity to build the infrastructure backbone powering enterprise AI and LLM capabilities.
Direct collaboration with AI, backend, and product teams across multiple verticals—telemedicine,
InsurTech, analytics.
Competitive contract compensation commensurate with experience.
Access to modern cloud infrastructure, GPU resources, and industry-leading tooling.
Job Type: Contract
Pay: From $4,000.00 per month
Work Location: Remote
Similar Jobs
Sr .NET Fullstack Developer(UI +Backend+Azure 5-7 yrs
techvantage.ai
Software Developer (TypeScript + PHP/Symfony) (Remote, Full-Time) [AS208]
Smart Working Solutions
QA Engineer (Payments Squad)
Qu
[Remote] Algotale-Backend Nodejs Developer
Algotale Group
TechOps Backend Engineer (Python, AWS, PostgreSQL)
Dinjan Group
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free