Microsoft has an exciting opportunity for a Senior Service Engineer on the Cloud + AI Silver Infrastructure and Operations (I&O) Team. This team is responsible for deploying and operating services within an air gapped environment, including the infrastructure for collaboration. The I&O team manages the infrastructure and day to day operations enabling Azure engineers to work, collaborate, and deliver mission success in highly regulated environments.
In this role as a Senior Service Engineer , you will have the opportunity to work with engineers who enable a broad set of Azure services to be consumed by internal and external customers in highly secured and regulated industries. The services you provide and influence, and decisions you make will be required to meet the security policy and assurance requirements of both public and private sector customers.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
The scale of our operations is enormous. Microsoft's products and services are overwhelmingly consumed online, and billions of people use them every day. We need people who enjoy analyzing complicated problems, coming up with creative solutions, working in focused teams to build things no-one has thought of before, all in the service of production reliability.
Technical Knowledge and Expertise:
Develops end-to-end expertise in service and/or system design, interactions between technology layers and components, functions of infrastructure, and dependencies at scale. Takes ownership of service design by driving efforts within an organization to identify, define, recommend, and build optimal configurations of technology solutions with considerations for cost management. Independently adjusts configurations and defines infrastructures to improve the availability, reliability, efficiency, observability, and/or performance of supported products and services. Drives reviews with the engineering teams that develop and/or manage services, identifying opportunities for efficiencies in operations and sharing learnings and recommendations across engineering teams working on related services within their organization.
Stays current in knowledge and expertise as technology landscape evolves, maintaining awareness of industry norms. Uses knowledge to drive the adoption of new solutions across engineering teams working with related products within an organization. Provides guidance to others through sharing, coaching, conferences, and other means to drive improvements across teams.
Reviews and provides technical guidance, change advisory board authority, and direction on electrical, mechanical, and other critical facility maintenance methods of procedure, drawings, and technical documents.
Maintains a subject matter expertise of Azure critical facilities dependency and resiliency as a foundational aid to decision making in planned and unplanned scenarios.
Operational Excellence:
Maximizes uptime and operational excellence and minimizes disruption and downtime of Azure critical facilities through proper management, planning, coordination, and assessment of risk and impact of preventative and corrective maintenance. Leads space improvement projects. Creates standards and reviews and approves technical physical engineering procedures. Plans, organizes, and executes work with stakeholders and partners.
Maintains operations of live service as issues arise on an on-call basis. Implements solutions and mitigations to more complex issues impacting performance or functionality of Live Site service and escalates as necessary. Reviews and writes issues postmortem and shares insights with the team.
Independently implements reliable, scalable, and high-performance solutions across teams. Contributes to design documents. Owns implementation and rollback plans. Maintains quality checklist and related documentation.
Creates, monitors, and takes action on telemetry data and influences telemetry analytics to better identify patterns that reveal errors and unexpected problems that are affecting the system’s availability, reliability, performance, and/or efficiency. Develops solutions and/or automation and leverages an understanding of solutions to define, develop, measure, track, change, and improve the quality of telemetry pipelines that support automated monitoring and incident response.
Responds to incidents while on-call, including complex issues with major customer or business impact, by identifying the level of impact, troubleshooting, contributing to difficult decisions based on business impact, deploying appropriate fixes to resolve root cause(s), and implementing automations for prevention of recurring issues through coordinating resources required for incident resolution, which may include product teams, owners, leadership, other engineering teams, and/or subject matter experts. Escalates resolution of highly complex, ambiguous, and impactful issues as needed. Contributes to postmortems and shares details related to incidents and their resolution through post-mortem reports and regular review meetings. Provides incident response assistance to other personnel as needed, and develops incident response and resolution guidance.
Adheres to prescriptive guidance for security, privacy, and compliance standards in alignment with direction from the business and technical experts. Works with security, privacy, and compliance teams to identify and address issues relevant to their services. Identifies patterns of violations and implements automations for prevention. Provides assistance to other personnel as needed.
Collaboration and Knowledge Sharing:
Collaborates within and across teams by proactively and systematically sharing information with an appropriate level of detail for their audience. Overcomes obstacles by resolving conflicts and issues across interdependent teams and engages with partners and stakeholders so issues can be resolved and mutual objectives are met.
Shares insights and best practices that can be applied to improve development and operations across related sets of the systems, platforms, and/or products. Continues to develop their understanding of insights and best practices through interactions with members of product engineering teams and other resources (e.g., conferences, brown bags, wikis, documentation). Mentors and coaches other engineers to help them identify and propose relevant solutions.
Specialty Responsibilities:
Leverages advanced technical expertise, judgment, and decision making to coordinate multiple work streams and resources in crisis situations to drive mitigation plan and resolve crisis by engaging necessary teams and escalating to appropriate stakeholders. Applies diagnostic expertise. Provides guidance to other engineers working to mitigate and resolve issues. Communicates customer impact and other relevant information with key stakeholders, leadership, and customers. Develops and drives projects and programs to improve crisis response by creating standard practices for consistent response across engineering teams. Fosters increased stability. Reduces noise by adjusting telemetry and alarming. Influences key stakeholders to adopt new standards and practices to broadly improve crisis and problem management.
Monitors and maintains security by addressing security vulnerabilities through patches, reconfigurations, and/or settings updates. Identifies, prioritizes, and targets solutions to complex security issues that may impact customers and partners, and drives action to promote the adoption of relevant mitigations. Drives program and process of mitigation (e.g., automation), troubleshoots system issues, and partners closely with internal customers and engineering teams to conduct root cause analyses, share end-to-end expertise in services, and to mitigate and resolve issues. Communicates and drives adherence to security policies and procedures.
Defines and develops standardized, repeatable, scalable procedures and solutions to guarantee quality.
Qualifications
Required/Minimum Qualifications:
Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, service engineering, or systems engineering
OR equivalent experience.
Other Requirements:
Security Clearance Requirements: Candidates must be able to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
The successful candidate must have an active U.S. Government Top Secret Clearance with access to Sensitive Compartmented Information (SCI) based on a Single Scope Background Investigation (SSBI) with Polygraph. Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. Failure to maintain or obtain the appropriate U.S. Government clearance and/or customer screening requirements may result in employment action up to and including termination.
Clearance Verification : This position requires successful verification of the stated security clearance to meet federal government customer requirements. You will be asked to provide clearance verification information prior to an offer of employment.
Microsoft Cloud Background Check : This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Citizenship & Citizenship Verification: This position requires verification of U.S. citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local United States government agency customer and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, citizenship will be verified via a valid passport, or other approved documents, or verified US government Clearance
Preferred/Additional Qualifications:
Bachelor's Degree in Electrical Engineering, Mechanical Engineering, Computer Science, Information Technology or related field AND 8+ years technical experience in software engineering, network engineering, service engineering, or systems engineering
OR equivalent experience.
3+ years technical experience working with large-scale cloud or distributed systems.
3+ years technical experience managing critical environments, server rooms, datacenters, or a mix of critical facilities and IT/engineering work spaces.
Management Information Systems (MIS), or other industry or product specific Engineering Certifications.
Service Engineering IC4 - The typical base pay range for this role across the U.S. is USD $112,000 - $218,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $145,800 - $238,600 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations (https://careers.microsoft.com/v2/global/en/accessibility.html) .