Home
/
Comprehensive
/
HPC-AI Solution Architect
HPC-AI Solution Architect-April 2024
Las Vegas
Apr 1, 2025
About HPC-AI Solution Architect

HPC/AI Architect (Federal)

Job Summary:

The HPC/AI Architect will be responsible for architecting, designing, and optimizing high-performance computing (HPC) and artificial intelligence (AI) infrastructure, ensuring the systems meet the requirements for scalability, efficiency, and performance. The role demands a blend of expertise in HPC, AI, and system architecture, along with the ability to manage projects from conception through implementation. This role will also involve proactive engagement with stakeholders and staying at the forefront of technological advancements.

Recruiting for this role ends on May 31, 2025

Key Responsibilities:

System Architecture and Design: Develop and refine the architecture for HPC and AI systems. This includes defining system requirements, designing computational workflows, and selecting appropriate hardware and software configurations. Assist teams with the implementation, tuning, and optimization of Generative AI models tailored for federal government applications.

Performance Optimization: Analyze and optimize system performance, ensuring the efficient execution of AI models and HPC applications. Implement techniques for parallel processing, distributed computing, and resource management. Manage and optimize GPU enabled computing fabric.

Integration and Optimization: Develop, debug, and maintain software tools, libraries, and frameworks that support HPC and AI workloads. Work closely with vendor hardware and software providers to ensure AI models are properly optimized for maximum performance and scalability.

NVidia Tools and Frameworks: Utilize NVidia's suite of tools and frameworks, such as CUDA, DNN, and TensorRT, to optimize AI and HPC workloads on NVidia GPUs.

HPC Systems Support: Implement and manage HPC and AI systems on-premise, and in COLO facilities ensuring seamless integration with existing IT infrastructure. This includes the installation, configuration, and maintenance of the HPC infrastructure.

Collaboration and Teamwork: Work with cross-functional teams, including alliance partners, data scientists, researchers, and software developers to solve complex AI related challenges. Provide training and mentorship to junior engineers and team members, fostering a culture of continuous learning and innovation within the team.

Learning and Development: Stay updated with the latest advancements in HPC and AI technologies. Conduct research to explore new methodologies and integrate them into existing systems as requested.

Technical Support and Troubleshooting: Provide support for resolving complex technical issues related to HPC and AI infrastructure. Perform root cause analysis and implement solutions to prevent recurrence.

Documentation and Reporting: Create comprehensive documentation for system designs, performance metrics, and project status. Prepare detailed technical reports and presentations for stakeholders.

Security and Compliance - Ensure that all HPC, AI systems, and software tools and frameworks comply with federal security and regulatory requirements. Work with Deloitte Federal BISO to implement controls to protect sensitive data and intellectual property relative to NIST guidelines.

Required Skills and Qualifications:

7+ years professional experience designing and managing HPC and AI architectures with a proven track record of successful project implementations.

3+ years of experience in the design, support, and management of Kubernetes

Linux: In-depth experience of at least one Linux distribution including configuration of kernels, bootloaders, networking, and CLI.

3+ years working with Machine Learning Frameworks: Deep understanding of TensorFlow, PyTorch, and/or other AI/ML frameworks.

1+ year of NVIDIA Expertise: Extensive experience with NVIDIA's AI tools and frameworks such as CUDA, NeMo, and Triton

Bachelors Degree in Computer Science, Artificial Intelligence, Engineering, or a closely related field

Limited immigration sponsorship may be available

Ability to travel 0-10%, on average, based on the work you do and the clients and industries/sectors you serve

Preferred Qualifications:

Master's or Ph.D. in Computer Science, Artificial Intelligence, Engineering, or a closely related field

Programming Languages: Proficiency in Python, C++, Java, and scripting languages commonly used in HPC/AI environments. Proven python coding experience with expertise in at least one additional scripting or programming language. Python package management and dependency debugging skills a plus

AI Development Support: Proven ability to troubleshoot distributed AI model training frameworks like TensorFlow, Pytorch, Horovod, Ray, DeepSpeed, and others. Experience supporting Large Language models a plus.

1+ year Project Management: Experience managing large-scale projects, including timeline development, resource allocation, and risk management.

Parallel and Distributed Computing: Expertise in parallel programming models (MPI, OpenMP), distributed computing, and cloud based HPC solutions.

System Performance: Strong knowledge of performance profiling, benchmarking, and optimization techniques.

Data Management: Understanding of data storage solutions, file systems, and data transfer protocols.

Communication Skills: Excellent verbal and written communication skills for effective collaboration and reporting.

Analytical Skills: Strong analytical skills for data-driven decision making and ability to solve complex problems and develop innovative solutions.

May require a security clearance

Helpful to have knowledge of the Federal government.

Industry Experience: Background in supporting HPC/AI in the Federal Government or Defense

Industry sector a plus.

The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Deloitte, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is $107,000 to $198,000.

You may also be eligible to participate in a discretionary annual incentive program, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance.

Information for applicants with a need for accommodation: https://www2.deloitte.com/us/en/pages/careers/articles/join-deloitte-assistance-for-disabled-applicants.html

EA_ExpHire

#LI-LH1

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law.

Comments
Welcome to zdrecruit comments! Please keep conversations courteous and on-topic. To fosterproductive and respectful conversations, you may see comments from our Community Managers.
Sign up to post
Sort by
Show More Comments
SIMILAR JOBS
Lead Engineer - Nuclear Simulation Assisted Engineering (Remote Eligible, US)
Job Description Summary Lead Engineer - Simulation Assisted Engineering (SAE) works within the Plant Integration Engineering team by enabling LEAN integrated plant design (IPD) using modern digital e
Certified Montessori Guide
Certified Montessori Guide Share by Email Share on LinkedIn Share on Twitter (http://www.twitter.com/intent/tweet?url=https%3a%2f%2fcareers.learningcaregroup.com%2fsearch%2fjobdetails%2fx%2f3604e8d8-
Senior Billing Operations Analyst
About Lumen Lumen connects the world. We are igniting business growth by connecting people, data and applications – quickly, securely, and effortlessly. Together, we are building a culture and compan
Restaurant Team Member
Req ID: 428446 Address: 15402 Hornsby Street NE Columbus, MN, 55025 Benefits: * Paid Time Off * Flexible Scheduling * 401(k) – 100% match up to 5% * Medical/Dental/Vision Insurance after 30 days * Co
Retail Advisor
Retail Advisor Job Req ID: 28811 Posting Date: 22 Jan 2024 Function: EE Retail Location: Blackburn (4218), Blackburn, United Kingdom Salary: £11.57 p/h plus 20% on target commission Working hours: 25
Lead Warehouse Associate (GA, Brunswick)
Lead Warehouse Associate (GA, Brunswick)GA, Brunswick Warehouse Lead Associate Five Star Breaktime solutions is looking for a Lead Warehouse Associate that can contribute to serving our customers by
Management & Sales Training Program
The Sherwin-Williams Management & Sales Training Program is an accelerated, entry-level position designed to prepare you for a Store Management role in 18-24 months. With Sherwin-Williams’ promot
Materials Estimator Coordinator
70449BR Job Title: Materials Estimator Coordinator Job Discipline: Technology PSA/Division: MPS (VALVES) Job Summary: We are currently looking for an enthusiastic, energetic, positive, hard-working i
Licensed Practical Nurse - LPN
BenefitsNursing Student Loan Debt Repayment and Tuition AssistanceVariable compensation plansTuition, Travel, and Wireless Service DiscountsEmployee Assistance Program to support mental healthEmploye
Customer Service Representative
Bringing smiles is what we do at TTEC… for you and the customer. As a Customer Service Representative working onsite in Rochester, NY , you’ll be a part of creating and delivering amazing customer ex
Copyright 2023-2025 - www.zdrecruit.com All Rights Reserved