Datacenter Observability and Site Reliability Engineer

Macpower Digital Assets Edge Private Limited

Full Timemid

Chennai, Tamil Nadu, INPosted March 19, 2026

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonGoBashAWSGCPAzureDockerKubernetesTerraform

Job Description

Job Summary: We are seeking a skilled Observability & Site Reliability Engineer to join our team in supporting large-scale, enterprise-grade infrastructure. The ideal candidate will have extensive experience with observability tools especially Grafana, Loki, Mimir, and Kubernetes metrics/logs along with a strong passion for performance, scalability, and system uptime. Candidates must be flexible to collaborate with Korean stakeholders and work within the Korean time zone.

Experience: 8 to 12 years.
Notice Period: Immediate to 30 days preferred.

Key Must-Have Skills:

5+ years in Observability Engineering.
Expertise in Grafana, Loki, Mimir, and Alloy agent.
Strong understanding of infrastructure metrics (e.g., GPU, CPU, Kubernetes).
Proficiency in scripting languages ( Python, Go, Bash).
Prior exposure to tools such as Prometheus, ELK, Docker, and Terraform.
Flexibility to work with Korean stakeholders and time zones.

Role Highlights:

Design and manage the observability stack across large-scale data center infrastructure.
Build scalable telemetry systems, dashboards, alerts, and reports.
Apply SRE best practices to ensure system reliability and performance.
Troubleshoot real-time issues and contribute to ongoing system optimization.

Good to Have: