Lead Site Reliability Engineer
Movable InkRole Overview
Movable Ink is hiring a Lead Site Reliability Engineer. This is a full-time remote role, with the team based in Movable Ink - Ontario (Remote). Part of Movable Ink's Devops hiring. The posted range is $154k to $200k. Full responsibilities, required qualifications, and the apply link are listed in the description below.
Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Job Description
As one of our Lead Site Reliability Engineers, you will combine hands-on technical expertise with strategic technical leadership across infrastructure and software development. You will own the design and evolution of major systems within our multi-cloud, multi-region, active-active content serving platform that serves upwards of 25 Billion requests daily. Through a combination of architectural vision, cross-team collaboration and mentorship, you will help drive the reliability initiatives and define the technical strategy that scales our platform to 50 Billion requests per day and beyond.
Responsibilities:
- Define and drive the automation strategy for infrastructure tooling, establishing standards that minimize manual work, increase performance and reduce incident frequency and severity of incidents
- Own the design, reliability and evolution of core platform applications, mentoring team members on best practices and ensuring systems meet long-term business objectives
- Architect and lead the logging platform strategy, driving its design and balancing availability, retention and cost optimization
- Establish capacity planning and performance management frameworks, proactively identifying scaling opportunities and guiding teams through complex troubleshooting scenarios
- Lead cross-functional reliability initiatives with SRE and service engineering teams, influencing architectural decisions and championing practices that ensure resilient service delivery
- Demonstrate a high level of autonomy in anticipating, identifying, and addressing systemic weaknesses and opportunities for platform improvement without direct supervision.
Qualifications:
- Proven track record in Site Reliability or Software Engineering, designing, building, and owning scalable, resilient services with a focus on long-term reliability strategy
- Deep expertise in architecting and operating complex distributed systems such as Apache Pulsar, Apache Kafka, Grafana Loki, ScyllaDB/Cassandra, with the ability to guide teams through distributed system challenges
- Designing and owning automation strategies to manage services at scale, with expertise in establishing performance analysis frameworks and mentoring others on diagnostics and resolution
- Deep, hands-on experience (6+ years) in Site Reliability or Software Engineering, specifically leading and shaping multi-cloud architecture and strategy (AWS and GCP).
- Experience architecting and leading large-scale observability platforms, including defining observability standards and SLO frameworks. We use Prometheus and Thanos with Grafana Alloy, Loki and Tempo
- Experience leading on-call excellence, including driving improvements to monitoring and alerting strategies, automating runbooks and mentoring team members on incident response best practices. Every member of the SRE team does a week long on-call rotation
- Expert-level proficiency with infrastructure as code, including defining IaC standards and patterns across teams. We use Terraform and Chef
- Advanced Kubernetes expertise, including cluster architecture design, multi-tenancy strategies, and guiding teams on container orchestration best practices. We use EKS and GKE
- Proficiency in multiple programming languages with the ability to design and review code that meets reliability standards. We use NodeJS, Golang, Ruby, Python and shell scripting
- Advanced Linux systems expertise, with the ability to diagnose complex system-level issues and mentor others on performance tuning and troubleshooting
The base pay range for this position is $154,000-$200,000 CAD/year, which can include additional bonus depending on the position ultimately offered, in addition to a full range of medical, financial, and/or other benefits. The base pay offered may vary depending on job-related knowledge, skills, and experience.
Studies have shown that women, communities of color, and historically underrepresented people are less likely to apply to jobs unless they meet every single qualification. We are committed to building a diverse and inclusive culture where all Inkers can thrive. If you’re excited about the role but don’t meet all of the abovementioned qualifications, we encourage you to apply. Our differences bring a breadth of knowledge and perspectives that makes us collectively stronger.
We welcome and employ people regardless of race, color, gender identity or expression, religion, genetic information, parental or pregnancy status, national origin, sexual orientation, age, citizenship, marital status, ethnicity, family or marital status, physical and mental ability, political affiliation, disability, Veteran status, or other protected characteristics. We are proud to be an equal opportunity employer.
Frequently Asked Questions
How do I apply for the Lead Site Reliability Engineer position at Movable Ink?
Use the Apply button above to submit your application directly to Movable Ink. Most applications take less than 5 minutes if your resume and contact details are ready, and you'll be routed to the employer's official application system to finish.
Is the Lead Site Reliability Engineer role at Movable Ink remote?
Yes. This is a remote role. The team is based in Movable Ink - Ontario (Remote), but the position itself does not require relocating to that office.
How much does the Lead Site Reliability Engineer role at Movable Ink pay?
Movable Ink has posted a compensation range of $154k to $200k for this position. Final offers typically vary based on candidate experience, location, and internal salary bands.
When was the Lead Site Reliability Engineer role at Movable Ink posted?
This role was posted on April 15, 2026 (70 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.
AI-powered job search
Get every job scored to your resume
Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.
Get Started FreeNo credit card to start