Principal Site Reliability Engineer

Remote, Oregon, USPosted 9 weeks ago

Role Overview

Upstart is hiring a Principal Site Reliability Engineer. This is a full-time role in Remote. Part of Upstart's Risk hiring. Full responsibilities, required qualifications, and the apply link are listed in the description below.

Resume Keywords to Include

Make sure these keywords appear in your resume to improve ATS scoring

PythonJavaScriptTypeScriptGoTerraformAgileDevOpsSaaS

Job Description

The Team:

Upstart's Site Reliability Engineering (SRE) team owns the reliability, resiliency, and observability of Upstart's production systems. We build automation, tooling, and frameworks to ensure our infrastructure is healthy, scalable, and able to support a seamless experience for both engineers and customers. Our scope includes defining Upstart's technology operations risk strategy, implementing disaster recovery planning, and setting company-wide reliability standards.

As a Principal Engineer on the SRE team at Upstart, you will serve as a thought leader and SRE evangelist - driving adoption of best practices, mentoring engineers across the organization, and influencing both technical and business decisions. Your impact will extend beyond SRE into cross-functional collaboration with Product Engineering, DevEx, Development Productivity (Quality), DevOps, Data Engineering, and Machine Learning teams to elevate operational excellence across the company.

How you'll make an impact

Lead the definition, advocacy, and adoption of SRE principles across engineering teams
Partner with leadership to shape long-term reliability, resiliency, and observability strategies
Champion distributed tracing, real user monitoring (RUM), and key performance metrics such as Largest Contentful Paint (LCP) to improve system visibility and user experience
Build and scale self-healing systems to minimize manual intervention and reduce downtime
Drive enterprise-wide improvements to incident response processes, including those related to Machine Learning systems
Collaborate closely with Development Productivity and Quality teams to improve engineering velocity without sacrificing reliability
Influence technical and operational roadmaps through data-driven insights and hands-on technical contributions
Own and deliver cross-functional initiatives from concept through execution, applying program management skills to align stakeholders and achieve results

Minimum Qualifications

Bachelor's degree in Computer Science, Engineering, or Mathematics, or a related field (or its equivalent) + 8 years of experience
Combined experience with both Software Engineering and Site Reliability Engineering, with a balanced background in both disciplines
Proven track record as an SRE thought leader and evangelist, driving adoption of reliability best practices across organizations
Strong communication and mentoring skills to influence engineers across disciplines
Proficiency in Python, Go, JavaScript/TypeScript
Proficiency with Infrastructure as Code (Terraform, CDK, CloudFormation, etc.)
Experience building internal tooling from scratch in agile development environments
Expertise with observability, distributed tracing, RUM, LCP, and performance monitoring tools (e.g., Datadog, Prometheus)
Experience with on-call and incident management, including large-scale or ML-related incidents
Strong background in automation and building self-healing systems
Hands-on experience with LLM/GenAI to improve SRE efficiency and processes
Program management skills, including the ability to propose innovative solutions, influence leadership, improve processes, and drive cross-functional projects to completion

Preferred Qualifications

Experience with service mesh
Full stack development skills
Experience building or extending observability platforms
Background in Development Productivity or Quality Platforms
Experience in high-scale SaaS, microservice-oriented cloud environments

Position location This role is available in the following locations: Remote, San Mateo, Columbus, Austin

Time zone requirements The team operates on the East/West coast time zones.

Travel requirements As a digital first company, the majority of your work can be accomplished remotely. The majority of our employees can live and work anywhere in the U.S but are encouraged to to still spend high quality time in-person collaborating via regular onsites. The in-person sessions' cadence varies depending on the team and role; most teams meet once or twice per quarter for 2-4 consecutive days at a time.

#LI-REMOTE

#LI-MidSenior

Frequently Asked Questions

How do I apply for the Principal Site Reliability Engineer position at Upstart?

Use the Apply button above to submit your application directly to Upstart. Most applications take less than 5 minutes if your resume and contact details are ready, and you'll be routed to the employer's official application system to finish.

Where is the Principal Site Reliability Engineer position at Upstart located?

This position is based in Remote. Upstart has not indicated remote or hybrid options for this role, so candidates should plan for on-site work.

What does a Principal Site Reliability Engineer at Upstart earn?

Upstart has not disclosed a salary range in this posting. Many employers share specifics later in the interview process; you can also ask during a recruiter screen if compensation transparency is important to you.

When was the Principal Site Reliability Engineer role at Upstart posted?

This role was posted on April 13, 2026 (67 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.

AI-powered job search

Get every job scored to your resume

Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.

Get Started Free

No credit card to start