Skip to main content
S

Site Reliability Engineer - SRE (L1)

Sarvin
Full TimeentryRemote
RemoteRemote$35k – $48kPosted February 4, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonGoBashLinuxPostgreSQLMongoDBKafka

Sign up free to auto-tailor your resume with all these keywords and get a higher ATS score

Job Description

Accepting candidates in Brazil ONLY.

Professional Role Overview

We are seeking a Site Reliability Engineer (L1) to ensure the continuous availability and performance of our mission-critical production services. This role is designed for a professional who possesses the technical rigor required to manage complex distributed systems under a 100% on-call mandate within South American time zones. You will be responsible for the stewardship of high-stakes data environments—specifically those involving message queuing, relational and non-relational databases, and enterprise data warehouses—with a primary objective of maintaining strict service-level objectives (SLOs) through proactive monitoring, rapid incident response, and automated intervention.

Key Responsibilities

  • Production Stewardship: Serve as the first responder for production anomalies, managing the end-to-end incident lifecycle from initial detection to post-incident resolution.
  • Data Infrastructure Management: Ensure the reliability and scalability of high-throughput data platforms, including message brokers, relational (PostgreSQL or similar) and non-relational databases (MongoDB or similar), and data warehouse environments.
  • Operational Excellence: Execute 100% on-call rotations, providing consistent coverage and rapid response to critical system alerts.
  • Automation & Toil Reduction: Develop and maintain scripts (Python, Go, or Bash) to automate routine operational tasks, enhancing system resilience and reducing manual overhead.
  • Observability & Telemetry: Configure and optimize monitoring suites (e.g., Prometheus, Grafana, Datadog) to ensure comprehensive visibility into application and system health.

Must Have:

  • Prior SRE/On-call Experience: A mandatory background in SRE or production support roles, with a demonstrated ability to manage high-pressure on-call rotations and running production services.
  • Data Systems Proficiency: Message Queuing: Experience managing brokers (e.g., Kafka), topics, and troubleshooting throughput issues.
  • Relational & Non-Relational Databases: Proficiency in managing database health, query optimization, and high-availability configurations.
  • Data Warehouse: Experience in managing large-scale data warehouse performance and resource allocation.
  • Systems Engineering: Strong competency in Linux internals and networking protocols.
  • Regional Alignment: Must be based in and able to operate effectively within South American time zones to facilitate synchronized operations.

Preferred Skills:

  • Analytical Rigor: The ability to diagnose root causes in complex, interconnected systems rather than applying superficial fixes.
  • Communication: Exceptional technical documentation skills and the ability to provide concise, professional updates during active incidents.
  • Dedication: A steadfast commitment to system uptime and a proactive approach to identifying potential points of failure before they impact the user experience.

Education

  • Bachelor’s degree in Technology, Computing, or a related field

Job Types: Full-time, Contract

Pay: $35,000.00 - $48,000.00 per year

Benefits

  • Dental insurance
  • Flexible schedule
  • Health insurance
  • Paid time off
  • Vision insurance

Application Question(s):

  • Do you have previous on-call experience?
  • Are you located in South America?

Work Location: Remote

Want AI-powered job matching?

Upload your resume and get every job scored, your resume tailored, and hiring manager emails found - automatically.

Get Started Free