Job Description
Job Description
We are seeking a Production Support Specialist/Site Reliability Engineer (SRE) with deep expertise in Amazon Connect (CCaaS). This is a production-focused role. You will ensure reliability, stability, and operational excellence for mission-critical contact center environments through proactive monitoring, live troubleshooting, and automation.
Responsibilities
- Provide incident response within defined SLAs, troubleshoot production issues, and perform root cause analysis.
- Monitor and maintain observability using Splunk, CloudWatch, Zabbix, and similar tools.
- Investigate issues across AWS services, networking, APIs, and integrations.
- Manage Amazon Connect configurations, contact flows, bots (Lex), and integrations with Lambda, S3, QuickSight, and DynamoDB.
- Develop visual process flows, standardized troubleshooting playbooks, and how-to guides for support teams.
- Document alert resolution steps and maintain runbooks, knowledge repositories, and playbooks.
- Analyze ServiceNow incidents, RCAs, and historical events to extract actionable insights for documentation.
- Collaborate with platform and operations teams for incident triage, mock troubleshooting sessions, and continuous improvement.
Want AI-powered job matching?
Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.
Get Started Free