We are looking for a first-class DL Performance architect to drive the performance analysis and optimization of the state of art inference network on our GP: identify HW, SW performance limiters of DL networks, prototype the key primitives and guide the design of next generation architecture and DL software optimization.
What you’ll be doing:
Establish deep learning applications and use-cases for performance analysis, modelling, and projections
Analyzing and proposing both SW and HW optimizations for deep learning applications
Specify hardware/software configurations and metrics to analyze performance, power, accuracy and resiliency in existing and future uni-processor and multiprocessor configurations
Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, library, and compiler teams
What we need to see:
MS or PhD in relevant discipline (CS, EE, Math) or equivalent experience with 2+ years of experience
Track record of designing architectures to accelerate computational demanding algorithms and applications
Strong background in computer architecture
Expert mathematical foundation in machine learning and deep learning
Strong programming skills in C, C++, Perl, or Python
Ways to stand out from the crowd:
Prior experience working on assembly level performance optimization
Experience working with deep learning frameworks like Caffe, TensorFlow and Torch
Familiarity with GPU computing (CUDA, OpenCL) and HPC (MPI, OpenMP)
Background with systems-level performance modeling, profiling, and analysis
Experience in characterizing and modeling system-level performance, executing comparison studies, and documenting and publishing results
NVIDIA is a Learning Machine
NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and the metaverse is transforming the world's largest industries and profoundly impacting society.
Learn more about NVIDIA .