Skip to main content
Aarorn Technologies Inc. logo

Production Support/Site Reliability Engineer

Aarorn Technologies Inc.
Full Timemid
Toronto, Ontario, CAPosted 7 weeks ago

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonGoBashShellAzureKubernetesTerraformAnsibleJenkinsLinuxMongoDBRedisElasticsearchGitLabCI/CDDevOps

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Role: Site Reliability Engineer - Production Support

Rate Max for $50/hr.

Position Overview

seeks a skilled and experienced Production Support Engineer through vendor staffing to support our digital applications. This role combines hands-on production support with Site Reliability Engineering (SRE) principles, focusing on toil elimination, infrastructure automation, and ensuring high availability of critical digital applications and backend systems.

Primary Responsibilities

1. Toil Removal & Infrastructure Maintenance (15%)

  • Execute SSL/TLS certificate updates and renewals across production environments
  • Perform Windows and Linux server patching and security updates
  • Manage NPID password updates and credential rotation protocols
  • Implement security vulnerability remediation in production systems
  • Identify, document, and eliminate repetitive manual operational tasks

2. Infrastructure & Database Cluster Management (20%)

  • Manage and support Elasticsearch cluster operations (deployment, scaling, monitoring, troubleshooting, performance tuning)
  • Administer MongoDB clusters including replication, sharding, backup, recovery, and maintenance
  • Operate and maintain Redis instances for caching and session management
  • Monitor cluster health, capacity planning, and optimization
  • Execute failover and disaster recovery procedures
  • Ensure data integrity and backup compliance

3. Automation & SRE Activities (15%)

  • Develop, maintain, and enhance Ansible playbooks for infrastructure automation
  • Build infrastructure-as-code solutions to reduce manual intervention
  • Create and maintain comprehensive runbooks and operational playbooks
  • Design monitoring, alerting, and observability solutions
  • Implement automated remediation for common operational issues
  • Quantify and prioritize toil reduction opportunities

4. Production Application Support (50%)

  • Troubleshoot and resolve production incidents affecting digital applications
  • Collaborate with application development and support teams on issue diagnosis
  • Participate in incident response, root cause analysis, and post-mortems
  • Monitor and respond to application performance degradation

---

Technical Requirements

Required Expertise (Must-Have)

  • Ansible: 2+ years hands-on experience writing playbooks, roles, and automation workflows
  • Elasticsearch: 2+ years managing and troubleshooting Elasticsearch clusters in production
  • MongoDB: 2+ years with replica sets, sharding, backup/recovery, and performance tuning
  • Redis: Proficiency in deployment, configuration, and operational support
  • OpenShift: Experience deploying and managing containerized applications on OpenShift
  • Azure: Knowledge of Azure cloud services, resource management, and deployments
  • Linux Administration: 3+ years with RHEL, CentOS, or Ubuntu in production environments
  • Windows Server Administration: Experience with patching, certificate management, and maintenance
  • Shell Scripting: Bash scripting for automation and operational tasks
  • Incident Management: Experience responding to and resolving critical production incidents

Preferred Skills

  • Kubernetes or container orchestration platforms
  • Python or Go scripting for automation
  • CI/CD pipeline experience (Jenkins, GitLab CI, Azure DevOps)
  • Monitoring and observability tools (Prometheus, Grafana, ELK Stack, Datadog)
  • Infrastructure-as-Code tools (Terraform, CloudFormation)
  • Security best practices and vulnerability management
  • Relevant certifications (AZ-900, CKA, Elasticsearch, etc.)

---

Required Qualifications

  • Minimum 5 years of production infrastructure support or SRE experience
  • Minimum 3 years with at least 2 of the core technologies (Elasticsearch, MongoDB, Ansible, OpenShift)
  • Experience working in regulated financial services environment (preferred)
  • Ability to work independently and in teams
  • Strong troubleshooting and analytical capabilities
  • Excellent documentation and communication skills
  • Must be available for on-call support rotation (with reasonable notice)

---

Operational Expectations

  • On-Call Rotation: Participates in production support on-call schedule
  • Incident Response: Available for critical incident resolution outside standard business hours as required
  • Availability: Core business hours + flexibility for critical production issues
  • Response Time: First response to critical incidents within 30 minutes
  • Documentation: Maintains detailed runbooks, playbooks, and knowledge base articles
  • Collaboration: Regular communication with infrastructure, development, and operations teams

About Aarorn Technologies Inc.

Aarorn Technologies Inc. logo

Aarorn Technologies Inc.

aarorn.com

BackendOn-site

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free