Sorry, the offer is not available,
but you can perform a new search or explore similar offers:

Radar Repair

Work in the cutting edge of the STEM field. You will build skills in electronics and technology to conduct advanced radar and computer repair on world-class ...


From United States Army - California

Published 7 days ago

Senior Engineer - Solid Propulsion

Date Posted: 2024-07-24 Country: United States of America Location: AZ802: RMS AP Bldg 802 1151 East Hermans Road Building 802, Tucson, AZ, 85756 USA Positio...


From Raytheon - California

Published 7 days ago

Manufacturing Engineer

About us : HCLTech is a global technology company, home to more than 223,400 people across 60 countries, delivering industry-leading capabilities centered ar...


From Hcltech - California

Published 7 days ago

Senior Systems Design Engineer

Date Posted: 2024-07-11 Country: United States of America Location: AZ805: RMS AP Bldg 805 1151 East Hermans Road Building 805, Tucson, AZ, 85756 USA Positio...


From Raytheon - California

Published 7 days ago

Software Engineer, Deep Learning Infrastructure - Autopilot

Software Engineer, Deep Learning Infrastructure - Autopilot
Company:

Tesla Inc


Details of the offer

**Software Engineer, Deep Learning Infrastructure - Autopilot**
????Engineering & Information Technology????Palo Alto, California?? ID104044????Full-time **THE ROLE:**
As a Software Engineer within Autopilot, you will work on reinforcing, optimizing, and scaling our neural network training infrastructure.
At the core of our self-driving capabilities, there are different neural networks that the Deep Learning team is designing to train large amounts of data. Robustly training jobs at scale, should it be for production models or quick experiments, and completing them in the shortest amount of time possible, is critical to our mission.
**Responsibilities:**
Write robust Python software code in our machine learning training repository while applying best software practices to support machine learning scientists in tasks such as fetching training data, preprocessing it, and orchestrating the training runs.
Integrate the training software into our continuous integration cluster to support metrics persistence across experiments, weekly/nightly neural network builds, and other unit / throughput tests.
Profile performance of training software in our training cluster, identify bottlenecks in and between CPU/GPU code execution, and work on optimizing its throughput and scalability within and across nodes to ultimately reduce convergence time.
Coordinate with the team managing the hardware cluster to maintain high availability / jobs throughput for Machine Learning.
**Requirements:**
Experience programming in Python and/or C/C++.
Proficient in system-level software, in particular hardware-software interactions and resource utilization.
Understanding of modern machine learning concepts and state of the art deep learning.
Experience working with training frameworks, ideally PyTorch.
Demonstrated experience scaling neural network training jobs across clusters of GPUs.
Optional: Experience programming in Cuda.
Optional: Profiling and optimizing CPU-GPU interactions (pipelining compute/transfers, etc).
Optional: Devops experience, in particular dealing with clusters of training nodes, and filesystems for very large amount of training data.
**?????**
Tesla ?????????????????????????????????????????????????????????????????????????????
Tesla ?????????????????????????????????????????????????????????????????????
???????????????????????????????????????????????????????????????????????????????????????????
Tesla ?????????????????????????????????????????????????????????????????????????????????????????????????????????????????
**Software Engineer, Deep Learning Infrastructure - Autopilot**
???? Engineering & Information Technology

???? Palo Alto, California

?? ID 104044

???? Full-time

**THE ROLE:**
As a Software Engineer within Autopilot, you will work on reinforcing, optimizing, and scaling our neural network training infrastructure.
At the core of our self-driving capabilities, there are different neural networks that the Deep Learning team is designing to train large amounts of data. Robustly training jobs at scale, should it be for production models or quick experiments, and completing them in the shortest amount of time possible, is critical to our mission.
**Responsibilities:**
Write robust Python software code in our machine learning training repository while applying best software practices to support machine learning scientists in tasks such as fetching training data, preprocessing it, and orchestrating the training runs.
Integrate the training software into our continuous integration cluster to support metrics persistence across experiments, weekly/nightly neural network builds, and other unit / throughput tests.
Profile performance of training software in our training cluster, identify bottlenecks in and between CPU/GPU code execution, and work on optimizing its throughput and scalability within and across nodes to ultimately reduce convergence time.
Coordinate with the team managing the hardware cluster to maintain high availability / jobs throughput for Machine Learning.
**Requirements:**
Experience programming in Python and/or C/C++.
Proficient in system-level software, in particular hardware-software interactions and resource utilization.
Understanding of modern machine learning concepts and state of the art deep learning.
Experience working with training frameworks, ideally PyTorch.
Demonstrated experience scaling neural network training jobs across clusters of GPUs.
Optional: Experience programming in Cuda.
Optional: Profiling and optimizing CPU-GPU interactions (pipelining compute/transfers, etc).
Optional: Devops experience, in particular dealing with clusters of training nodes, and filesystems for very large amount of training data.

#J-18808-Ljbffr


Source: Grabsjobs_Co

Job Function:

Requirements

Software Engineer, Deep Learning Infrastructure - Autopilot
Company:

Tesla Inc


Built at: 2024-10-05T03:06:36.684Z