Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score
Job Description
Mattermost is seeking an experienced and visionary Lead Site Reliability Engineer (SRE) to guide the architecture, reliability, and operational excellence of the infrastructure powering our secure, mission-critical collaboration platform.
In this role, you will provide technical leadership across our SRE function, driving strategic initiatives for scalability, observability, performance, and automation across cloud and hybrid environments. You will mentor engineers, establish best practices, and collaborate closely with development, security, and operations teams to ensure our customers in defense, government, and critical infrastructure sectors experience exceptional reliability and performance.
Responsibilities Include:
- Define the strategy, architecture, and roadmap for Mattermost’s site reliability engineering function, aligning infrastructure initiatives with product and business goals.
- Lead the design, deployment, and optimization of production-grade containerized workloads, infrastructure-as-code, and compliant cloud environments for regulated domains (e.g., FedRAMP, DoD).
- Establish and evolve observability, monitoring, and alerting frameworks to ensure performance, reliability, and capacity planning at scale.
- Drive incident management processes, including on-call rotations, root cause analysis, and systemic reliability improvements.
- Partner with security and compliance teams to meet data sovereignty, security, and regulatory requirements.
- Champion automation and operational excellence to improve efficiency, reduce risk, and scale operations.
- Oversee cloud cost management and capacity planning to optimize infrastructure spending while meeting performance targets.
- Build and maintain a developer platform that enables fast, secure software delivery and improves application stability in production.
- Mentor and coach SRE team members, fostering a culture of learning, collaboration, and technical excellence.
Requirements:
- BS in Computer Science, Cybersecurity, Software Engineering, or a related technical field, or equivalent experience, with 5+ years of relevant experience in site reliability engineering, DevOps, or cloud infrastructure roles.
- Proven expertise in container orchestration platforms, ideally Kubernetes.
- Extensive experience with infrastructure-as-code, ideally Terraform.
- Strong background in cloud platforms, ideally AWS.
- Demonstrated experience designing and implementing monitoring, alerting, and performance optimization strategies.
- Exceptional troubleshooting and incident management skills for distributed systems.
- Proficiency in at least one scripting or programming language for automation.
- Excellent communication skills with a track record of influencing cross-functional teams.
- Experience leading globally distributed teams in a rem
Similar Jobs
Sr DevOps Engineer
Temple University Health System
Dotnet Developer(.net Server)
People Prime Worldwide
SQL Developer
LanceSoft Inc
SQL Server Developer – Azure
Finance Professionals
SQL Server Developer, Data Solutions
Procom
More Jobs at Mattermost
View all →AI-Native Digital Experience & SEO Automation Lead
Mattermost
AI-Native Digital Experience & SEO Automation Lead
Mattermost
Technical Account Manager
Mattermost
Staff Software Engineer, Testing Infrastructure
Mattermost
Senior Product Manager
Mattermost
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free