Staff Software Engineer, Ml Training Platform

Details of the offer

About the job: Millions of individuals worldwide turn to our platform every day to discover new ideas and envision new possibilities. Our mission is to help these individuals find inspiration and build a life they cherish. As a Staff Software Engineer on our ML Training Platform team, you'll play a pivotal role in advancing our mission and driving Pinterest forward. You'll have the opportunity to grow both personally and professionally while contributing to a positive online environment.
Our ML Training Infrastructure team develops foundational tools and infrastructure used across Pinterest, supporting various ML applications such as recommendations, ads, visual search, and more. We focus on ensuring the robustness and efficiency of ML systems, essential for accelerating model development and deployment.
What you'll do: Design and implement scalable solutions to enhance ML training and inference capabilities using platforms like Kubernetes.Lead critical initiatives such as GPU sharing, intelligent resource management, and fault-tolerant training methods.Define and execute the technical strategy and roadmap for ML Training Infrastructure, encompassing key frameworks like PyTorch, Ray, and Jupyter.Collaborate closely with internal stakeholders, including ML engineers and data scientists, to address development challenges and facilitate customer use cases.Build strong partnerships with leaders across Data and Infrastructure teams to drive comprehensive technical initiatives.Mentor team members and provide technical leadership within the ML Platform group. What we're looking for: 7+ years of experience in software engineering with a focus on ML infrastructure or similar batch compute environments.Proven track record of technical leadership, including devising long-term strategies and successfully executing them.Deep understanding of High Performance Computing and parallel computing principles.Ability to manage cross-functional projects and understand the needs of internal customers, particularly ML practitioners and Data Scientists.Proficiency in Python; experience with languages such as C++ and Java is advantageous.Knowledge of GPU programming, containerization, and orchestration technologies is desirable.Experience with cloud data processing technologies (e.g., Apache Spark, Ray, Dask, Flink) and ML frameworks like PyTorch is a plus. Note: This position does not offer relocation assistance.
Feel free to adjust as needed for your client's specific preferences and requirements!


Nominal Salary: To be agreed

Source: Grabsjobs_Co

Requirements

Senior Technical Account Manager (Remote)

Mattermost provides secure, workflow-centric collaboration for technical and operational teams that need to meet nation-state-level security and trust requir...


Mattermost - United States of America

Published 8 days ago

Systems Administrator

Description SAIC is seeking a Systems Administrator based out of Panama City, Florida. This particular position will be 100% travel aboard an expeditionary ...


Saic - United States of America

Published 5 days ago

Web Conversion Rate Optimization Manager

Web Conversion Rate Optimization Manager - Accruent Personal development and becoming the best you are all about growth and exploring new skills and opportun...


Fortive - United States of America

Published 8 days ago

Devsecops Software Engineer

At Ford Motor Company, we believe freedom of movement drives human progress. We also believe in providing you with the freedom to define and realize your dre...


Ford - United States of America

Published 8 days ago

Built at: 2024-11-21T23:39:06.425Z