Discover. A brighter future.
With us, you’ll do meaningful work from Day 1. Our collaborative culture is built on three core behaviors: We Play to Win, We Get Better Every Day & We Succeed Together. And we mean it — we want you to grow and make a difference at one of the world's leading digital banking and payments companies. We value what makes you unique so that you have an opportunity to shine.
Come build your future, while being the reason millions of people find a brighter financial future with Discover.
Job Description:
At Discover, be part of a culture where diversity, teamwork and collaboration reign. Join a company that is just as employee-focused as it is on its customers and is consistently awarded for both. We’re all about people, and our employees are why Discover is a great place to work. Be the reason we help millions of consumers build a brighter financial future and achieve yours along the way with a rewarding career.
Have you ever wondered what’s behind Discover Card’s award-winning customer experience? Our Card Portfolio group owns dozens of cardmember experiences, from setting up your account on Discover.com for the first time to adding your Discover Card to your phone (and much more). This group is committed to building new and more efficient ways for our Cardmembers to use our products.
This is where you come in. We need a Principal Application Reliability Engineer who’s seeking an opportunity to make a positive impact. You will partner with teams to identify and fix inefficiencies to solve system reliability and performance opportunities. Some examples include reviewing availability expectations, addressing performance issues, uncovering observability gaps, leading problem management, and driving capacity planning. You will actively manage risk and customer-impacting issues within the day-to-day role and ensure product leaders are aware.
Responsibilities
Consult teams and provide hands-on training to teams in observability, incident management and reliability best practices.
Includes defining SLOs\SLAs\SLIs, on-call support behaviors, troubleshooting, building support playbooks, implementing monitoring and alerting, logging standards, conducting fragility & performance testing, etc.
Review product journeys and reliability practices on regular interval to enforce best practices.
Periodically pair/mob program with the teams to help build reliability thinking.
Lead failure point discussions, chaos testing and family level capacity management.
Responsible for family level application reliability and resiliency
Leverage metrics and scorecards to better drive site reliability adoption in the product areas
Ensure delivery teams in the product family track and meet annual operational goals (MTTR reduction, incident reduction, platform availability, SLO\SLA targets)
Ensure automated delivery for all family level products.
Ensure proper level of documentation exists.
Drive SRE community discussions, share wins and failures with Discover SRE community of practice.
Minimum Qualifications
At a minimum, here’s what we need from you:
Bachelors – Computer Science or related
6+ Years -- Information Technology, (Software) Engineering, or related
Internal applicants only: technical proficiency rating of proficient on the Dreyfus engineering scale
Preferred Qualifications
3+ years in a SRE or DevOps role
Experience with DevOps tools, processes, and culture
Extensive experience leading customer facing systems in a mission critical environment
Advanced experience with programming and/or scripting languages (Python, Java, bash)
In depth knowledge on application development landscape - Java, Rest API, design patterns and CI/CD.
Extensive experience with monitoring and observability tools/technologies (i.e., Grafana, Kibana, Datadog, AppDynamics)
Creation of standardized monitoring dashboards in cloud platforms for proactive monitoring of application and infrastructure health
In-depth knowledge of Non-functional requirements (NFR’s) including pressure/chaos testing, performance, and penetration testing
Reliability best practices in the cloud native environment
Operational Readiness strategies and best practices
#LI-DD1
Application Deadline:
The application window for this position is anticipated to close on Dec-05-2023. We encourage you to apply as soon as possible. The posting may be available past this date, but it is not guaranteed.
Compensation:
The base pay for this position generally ranges between $104,000.00 to $175,600.00. Additional incentives may be provided as part of a market competitive total compensation package. Factors, such as but not limited to, geographical location, relevant experience, education, and skill level may impact the pay for this position.
Benefits:
We also offer a range of benefits and programs based on eligibility. These benefits include:
Paid Parental Leave
Paid Time Off
401(k) Plan
Medical, Dental, Vision, & Health Savings Account
STD, Life, LTD and AD&D
Recognition Program
Education Assistance
Commuter Benefits
Family Support Programs
Employee Stock Purchase Plan
Learn more at MyDiscoverBenefits.com .
What are you waiting for? Apply today!
All Discover employees place our customers at the very center of our work. To deliver on our promises to our customers, each of us contribute every day to a culture that values compliance and risk management.
Discover is committed to a diverse and inclusive workplace. Discover is an equal opportunity employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status, or other legally protected status. (Know Your Rights) (https://urldefense.com/v3/__https:/www.eeoc.gov/poster__;!!MjXRb4uW6x5k!ABIVgRw0WsyX2wfQC-pKxK3V9X4h1NBUGgjO7EM8PTvp5MNRgpEuVC_jVk0fcn_ISAZjmwkbLuUIrj8mFedCBkyz$)