AI Site Reliability Engineer
Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!Role Overview
Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision! is hiring a senior-level AI Site Reliability Engineer. This is a full-time hybrid role, based in Québec City. Part of Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!'s Devops hiring. Full responsibilities, required qualifications, and the apply link are listed in the description below.
Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Role: SRE +AI
Hybrid: 3 days in office
Location: Montreal
Experience: 8+ years of experience as a Site Reliability Engineer or in a similar role, with hands‑on experience in supporting IaaS platforms with networking and system engineering knowledge.
Roles and Responsibilities
- Operate, monitor, and maintain the infrastructure supporting GenAI applications (training, inference, feature store, data ingestion, model serving)
- Design and build automation for core platform capabilities, reducing manual toil
- Develop and maintain infrastructure-as-code (IaC) for provisioning and managing compute, storage, network, GPU clusters, Kubernetes / container orchestration, etc.
- Establish, monitor, and enforce SLOs/SLIs/SLAs, error budgets, alerting, and dashboards
- Lead incident response, root cause analysis (RCA), postmortems, and systemic remediation
- Perform capacity planning, scaling strategies, workload scheduling, and resource forecasting
- Optimize cost vs. performance tradeoffs in large‑scale compute environments
- Harden systems for security, compliance, auditability, and data governance
- Collaborate across teams (cloud engineers, data engineers, infrastructure, security) to ensure safe deployment, rollout, rollback, and integration of new systems
- Define disaster recovery (DR) strategies, backup/restore practices, fault‑tolerance mechanisms
- Maintain runbooks, operational playbooks, documentation, and training materials
- Participate in on‑call rotations and respond to production incidents 24/7 as needed
- Continuously evaluate and integrate new tools, frameworks, or technologies to enhance platform reliability
Skills
- Production experience in SRE / Infrastructure / ops for large‑scale systems
- Strong programming/scripting skills (Python, Go, Java, or equivalent)
- Deep experience with containerization (Docker), orchestration (Kubernetes, etc.)
- Infrastructure-as-code (Terraform, Helm, CloudFormation, Ansible, etc.)
- Familiarity with GPU / AI compute clusters, high‑performance data storage, and distributed architectures
- Experience with monitoring / observability / logging / alerting tools (Prometheus, Grafana, ELK / EFK, Datadog, etc.)
- Networking & systems engineering knowledge (TCP/IP, DNS, routing, load balancing, distributed storage)
- Solid experience in capacity planning, performance tuning, scaling, and incident response
- Demonstrated ability to lead RCAs, deploy fixes, and drive reliability improvements
- Experience in regulated environments (financial services, compliance, audit, security) is a strong plus
- Excellent communication, documentation, and cross‑team collaboration skills
- Proven track record of reducing operational toil via automation
Frequently Asked Questions
How do I apply for the AI Site Reliability Engineer position at Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!?
Use the Apply button above to submit your application directly to Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!. Most applications take less than 5 minutes if your resume and contact details are ready, and you'll be routed to the employer's official application system to finish.
Is the AI Site Reliability Engineer role at Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision! remote or in-office?
This is a hybrid role based in Québec City. Expect a mix of in-office and remote days, with the specific cadence set by the hiring manager.
What does a AI Site Reliability Engineer at Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision! earn?
Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision! has not disclosed a salary range in this posting. Many employers share specifics later in the interview process; you can also ask during a recruiter screen if compensation transparency is important to you.
When was the AI Site Reliability Engineer role at Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision! posted?
This role was posted on March 22, 2026 (78 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.
How much experience does the AI Site Reliability Engineer role at Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision! require?
This is a senior-level position. Most senior roles call for 5+ years of directly relevant experience. Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision! lists their specific requirements in the description below, so review the must-have qualifications closely before applying.
AI-powered job search
Get every job scored to your resume
Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.
Get Started FreeNo credit card to start