Staff Reliability Engineer - IE07KE
We’re determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we help shape the future.
The Central Reliability and Automation team is looking for a driven and highly motivated Staff Reliability Engineer Coach to join the team. In this role you will have responsibility for designing and maintaining a given IT solution for CI/CD pipeline, observability suite (monitoring/alerting/logging tools/processes) and automation suite consumed by REs, and Software Engineers. The Site Reliability Engineer will work with the consumers and stakeholder of the solution to define functional and non-functional requirements for the service. Leveraging Open Source or Commercial of the Shelf (COTS) products, they will design, build and maintain the solution, meet current and future demand. They will apply key SRE tenets across the life-cycle of the solution.A prerequisite to the role will be a “build-to-manage”, problem-solving and innovative mindset applied to the design, build, test, deploy, change and maintenance of services drawing from deep engineering expertise. Key measures of success will include service stability, effective delivery and environment instrumentation, deployment quality, technical debt reduction, asset resiliency, risk/security compliance, cost efficiency, proactive and preventative maintenance mechanisms, top quartile operating norms. The Senior Site Reliability Engineer will actively contribute to sustained advancement of the SRE practice within and beyond a given area of responsibility.
Responsibilities:
Influence and design architecture, infrastructure, standards and methods for large-scale cloud systems
Engage in and improve the software development life-cycle through CI/CD; Improve build to deployment process to establish greater reliability and a sustainable release process; Oversee release gating; establish deployment metrics (DORA)
Monitor and develop SLOs and SLIs through customer user journey; Advise on SLA; Establish error budgets
Observability and custom monitoring tool integrations; introduce telemetry to support SLOs
Automate system scalability and continually work to improve system resiliency, performance and efficiency; Makes recommendations for design changes for improved reliability
Deploy software through highly available practices, rolling, blue-green or canary
Provide mentorship to reliability engineering squads under a consistent framework for the Development, Testing and Alerting processes
Practice sustainable incident response through blameless RCA and postmortems
Advise performance testing and capacity planning
Communicate proactively with colleagues and formally present work product outcomes and risk analysis to product team and management.
Follow the Agile/Scrum working methodologies
Establish dashboarding for monitoring capabilities and metrics
Qualifications:
7+ years of experience in related field
3- + years of experience in languages such as Python, Ruby, Bash, Perl
BS degree in Engineering, Computer Science, or equivalent practical experience
Experience in monitoring infrastructure and application service level objectives to ensure functional and performance objectives.
Experience in implementing service dashboards for monitoring. objectives, and metrics
Experience developing and/or administering software in AWS cloud infrastructure
System administration skills, including automation and orchestration of environments using Terraform or CloudFormation and configuration management
Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
Experience with container orchestration tools and container management (Docker, Kubernetes, etc.)
Proficiency with continuous integration and continuous delivery tooling and practices
Strong analytical and troubleshooting skills; Experience with runbooks
Preferred Qualifications:
Expertise designing, analyzing and troubleshooting large-scale distributed systems.
Systematic problem-solving approach coupled with strong communication skills and a sense of ownership and drive
Experience in implementing Infrastructure as code
Experience building software and maintaining systems in a highly secure, regulated or compliant industry
Experience and passion for working within a DevSecOps team culture
Additional Details:
Must be authorized to work in the US without company sponsorship.
This role can have a Hybrid or Remote work arrangement. Candidates who live near one of our office locations will have the expectation of working in an office 3 days a week (Tuesday through Thursday). Candidates who do not live near an office will have a remote work arrangement, with the expectation of coming into an office as business needs arise.
Compensation
The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford’s total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is:
$113,680 - $170,520
Equal Opportunity Employer/Females/Minorities/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age
About Us (https://www.thehartford.com/about-us) | Culture & Employee Insights (https://www.thehartford.com/careers/employee-stories) | Diversity, Equity and Inclusion (https://www.thehartford.com/about-us/corporate-diversity) | Benefits (https://www.thehartford.com/careers/benefits)
Human achievement is at the heart of what we do.
We believe that with the right encouragement and support, people are capable of achieving amazing things.
We put our belief into action by ensuring individuals and businesses are well protected, and by going even further – making an impact in ways that go beyond an insurance policy.
Nearly 19,000 employees use their unique talents in careers that span a variety of disciplines – from developing the latest technology to creating and promoting our products to evaluating future financial risks.
We’re also committed to programs that drive education and support volunteerism, which put human beings first. We do it because it’s the right thing to do, and because when our customers, communities and employees succeed, we all do.
About Us (https://www.thehartford.com/about-us)
Culture & Employee Insights
Diversity, Equity and Inclusion (https://www.thehartford.com/about-us/corporate-diversity)
Benefits
Legal Notice (https://www.thehartford.com/legal-notice)
Accessibility StatementProducer Compensation (https://www.thehartford.com/producer-compensation) EEO
Privacy Policy (https://www.thehartford.com/online-privacy-policy)
California Privacy Policy
Your California Privacy Choices (https://www.thehartford.com/data-privacy-opt-out-form)
International Privacy Policy
Canadian Privacy Policy (https://www.thehartford.com/canadian-privacy-policy)