Senior Systems Operations Engineer - Production Support, SRE, ITIL
The Wells Fargo FoundationRole Overview
The Wells Fargo Foundation is hiring a Senior Systems Operations Engineer - Production Support, SRE, ITIL. This is a full-time role in Secunderabad. Part of The Wells Fargo Foundation's Devops hiring. Full responsibilities, required qualifications, and the apply link are listed in the description below.
Resume Keywords to Include
Make sure these keywords appear in your resume to improve ATS scoring
Job Description
About this role:
Wells Fargo is seeking a Senior Systems Operations Engineer.
In this role, you will:
- Lead or participate in managing all installed systems and infrastructure within the Systems Operations functional area
- Contribute in increasing system efficiencies and lowering the human intervention time on related tasks
- Review and analyze moderately complex operational support systems, application software, and system management tools to ensure the highest levels of systems and infrastructure availability
- Work with vendors and other technical personnel for problem resolution
- Lead team to meet technical deliverables while leveraging solid understanding of technical process controls or standards
- Collaborate with vendors and other technical personnel to resolve technical issues and achieve highest levels of systems and infrastructure availability
Required Qualifications:
- 4+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
Desired Qualifications:
- 4+ years in Production Support / SRE / DevOps / Platform Operations for business-critical applications.
- Proven track record supporting 24x7 platforms with strict SLAs and high availability requirements.
- Experience working in ITIL-aligned environments (Incident, Problem, Change).
- Strong troubleshooting skills across Linux/Unix, system processes, CPU/memory, threads, disk, network basics.
- Working knowledge of application architectures: microservices, distributed systems, batch + online workloads.
- Proficiency in log analysis and observability tools (e.g., Splunk/ELK, Grafana, Prometheus, AppDynamics, Dynatrace-any equivalent).
- Solid understanding of HTTP, TLS, DNS, load balancing, reverse proxy, and typical failure patterns (timeouts, 503/504, connection pool saturation).
- Hands-on with databases (Oracle / Postgres / SQL Server etc.): query basics, locks, slow queries, connection pooling, indexing concepts.
- Familiarity with messaging/streaming systems (Kafka/RabbitMQ) and troubleshooting lag/offset/consumer issues (good-to-have).
- Ability to write scripts for automation in Python / Shell / PowerShell.
- Comfortable with runbooks, automation tools, CI/CD basics, and reducing manual toil. Understanding of SLO/SLI, monitoring, alert tuning, and reliability best practices.
- Strong incident handling skills: triage, mitigation, communication, and structured follow-through.
- Knowledge of RCA techniques (5 Whys, fishbone, timeline-based analysis) and converting findings into preventive actions.
- Experience with change management and release support able to assess risk and enforce operational readiness.
- Excellent written and verbal communication for stakeholder updates (technical + business-friendly). Ability to collaborate across Dev, QA, DBAs, Network, Cloud/Infra teams.
- Calm under pressure, structured thinker, strong ownership. Bias for root-cause and prevention over repeated firefighting. High attention to detail and commitment to operational excellence.
Job Expectations:
- Production Support & Incident Management - Provide L2 support for critical applications/services including triage, diagnosis, mitigation, and recovery. Lead or co-lead major incidents (P1/P2) troubleshooting and coordinate with relevant teams until service restoration. Maintain clear, timely incident communications (status updates, ETAs, impact, workaround). Ensure incidents are properly documented with timeline, actions taken, and next steps.
- Monitoring, Alerting & Observability - Own service monitoring hygiene: reduce noise, tune alerts, and improve signal quality. Use metrics/logs/traces to quickly isolate failure domains (application, infra, DB, network, dependencies). Build/maintain dashboards for service health, SLIs (latency, error rate, throughput), and batch completion tracking.
- Problem Management & RCA - Drive root cause analysis for recurring issues and high-severity incidents. Convert RCAs into measurable outcomes: bug tickets, automation, monitoring improvements, capacity fixes, and operational controls. Track corrective actions to closure and measure reduction in repeat incidents.
- Release, Change & Operational Readiness - Support production releases: validation, smoke checks, rollback readiness, and post-release monitoring. Review changes for operational risk and ensure runbooks, alarms, dashboards, and rollback plans are in place. Participate in CAB/change reviews where applicable.
- Automation & Reliability Engineering - Identify repetitive manual tasks and deliver automation to reduce toil (e.g., health checks, remediation scripts, self-healing steps). Improve MTTR via better runbooks, automation, and faster diagnostics. Contribute to reliability engineering initiatives: capacity planning inputs, performance tuning, resilience testing (where applicable).
- Knowledge Management & Documentation - Create and maintain operational documentation: Runbooks / SOPs / troubleshooting guides Service dependency maps Known error database entries. Ensure documentation is usable during incidents (step-based, verified, and current).
- On-call / Shift Support - Participate in on-call rotation and/or shift-based coverage as per business requirements. Handle escalations from L1 and provide coaching to improve L1 resolution rates.
Posting End Date:
14 Apr 2026
We Value Equal Opportunity
Wells Fargo is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic.
Employees support our focus on building strong customer relationships balanced with a strong risk mitigating and compliance-driven culture which firmly establishes those disciplines as critical to the success of our customers and company. They are accountable for execution of all applicable risk programs (Credit, Market, Financial Crimes, Operational, Regulatory Compliance), which includes effectively following and adhering to applicable Wells Fargo policies and procedures, appropriately fulfilling risk and compliance obligations, timely and effective escalation and remediation of issues, and making sound risk decisions. There is emphasis on proactive monitoring, governance, risk identification and escalation, as well as making sound risk decisions commensurate with the business unit's risk appetite and all risk and compliance program requirements.
Candidates applying to job openings posted in Canada: Applications for employment are encouraged from all qualified candidates, including women, persons with disabilities, aboriginal peoples and visible minorities. Accommodation for applicants with disabilities is available upon request in connection with the recruitment process.
Applicants with Disabilities
To request a medical accommodation during the application or interview process, visit.
Drug and Alcohol Policy
Wells Fargo maintains a drug free workplace. Please see our to learn more.
Wells Fargo Recruitment and Hiring Requirements:
a. Third-Party recordings are prohibited unless authorized by Wells Fargo.
b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.
Frequently Asked Questions
How do I apply for the Senior Systems Operations Engineer - Production Support, SRE, ITIL position at The Wells Fargo Foundation?
Use the Apply button above to submit your application directly to The Wells Fargo Foundation. Most applications take less than 5 minutes if your resume and contact details are ready, and you'll be routed to the employer's official application system to finish.
Where is the Senior Systems Operations Engineer - Production Support, SRE, ITIL position at The Wells Fargo Foundation located?
This position is based in Secunderabad. The Wells Fargo Foundation has not indicated remote or hybrid options for this role, so candidates should plan for on-site work.
What does a Senior Systems Operations Engineer - Production Support, SRE, ITIL at The Wells Fargo Foundation earn?
The Wells Fargo Foundation has not disclosed a salary range in this posting. Many employers share specifics later in the interview process; you can also ask during a recruiter screen if compensation transparency is important to you.
When was the Senior Systems Operations Engineer - Production Support, SRE, ITIL role at The Wells Fargo Foundation posted?
This role was posted on April 13, 2026 (64 days ago). It's still listed as actively hiring; we re-confirm openings against the source system multiple times per day and remove closed roles.
How much experience does the Senior Systems Operations Engineer - Production Support, SRE, ITIL role at The Wells Fargo Foundation require?
This is a senior-level position. Most senior roles call for 5+ years of directly relevant experience. The Wells Fargo Foundation lists their specific requirements in the description below, so review the must-have qualifications closely before applying.
AI-powered job search
Get every job scored to your resume
Upload your resume and get jobs ranked, your resume tailored, and employee contacts found automatically.
Get Started FreeNo credit card to start