
SRE (Site Reliability Engineer) +AI
Tekgence IncResume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Role: SRE (Site Reliability Engineer) +AI
Hyrbid: 3 days in office- Face 2 Face interview
Location: Montreal
Experience: 8+ years of experience as a Site Reliability Engineer or in a similar role, with hands-on experience in supporting IaaS platforms with networking and system engineer-ing knowledge.
Roles and Responsibilities:
- Operate, monitor, and maintain the infrastructure supporting GenAI applications (training, inference, feature store, data ingestion, model serving)
- Design and build automation for core platform capabilities, reducing manual toil
- Establish, monitor, and enforce SLOs/SLIs/SLAs, error budgets, alerting, and dashboards.
Skills
- Production experience in SRE / Infrastructure / ops for large-scale systems
- Strong programming/scripting skills (Python, Go, Java, or equivalent)
- Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
- Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
- Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
- Networking & systems engineering knowledge (TCP/IP, DNS, routing, load bal-ancing, distributed storage)
- Solid experience in capacity planning, performance tuning, scaling, and incident response
About Tekgence Inc

Tekgence Inc
tekgence.com
AI-powered job search
Get every job scored to your resume
Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.
Get Started FreeNo credit card to start