Supercomputing Engineer

Supercomputing Engineer
Company:

San Francisco Compute Co.


Details of the offer

About We're the San Francisco Compute Company. We're building the first real-time compute trading platform. We think that over the next decade, thousands of startups and labs are going to be training and serving large models. They need compute to do this, and we're building a platform on which that compute can be traded. If we're successful, it will be possible to scale to tens of thousands of accelerators for hours at a time without having to build your own infrastructure. This will greatly increase the number of organizations that can afford to train large models, which will make the most important technology of our lifetime accessible to more people.
About the RoleML training clusters are some of the most high performance computers on the planet. Even relatively small clusters would have been in the TOP500 5 years ago. Our supercomputing team is responsible for keeping our compute clusters running smoothly, monitoring hardware health, and fixing things when they go wrong. We believe strongly in automation — code is the only reliable way to manage hardware at scale. As we scale, this will become a more data-driven role, predicting failures before they happen. We're a small team, so you'll be spending time talking to customers as well.
About YouYou've managed at least one GPU training cluster in the past (ideally a cluster with >1k GPU's but not required)You appreciate and value good documentationYou deeply understand Linux, CUDA, NCCL, and InfinibandYou enjoy creating large self-correcting systems that keep hardware hummingSome Nice to HavesExperience with Rust (our VM orchestrator is written in Rust)Experience with distributed storage systems (Weka, VAST, Ceph, etc.)Experience with HPC network architectures (eBGP, fat-tree, VXLAN, MCLAG, etc.)Experience with Linux virtualization (KVM, QEMU, libvirt, etc.)Experience with performance optimization of machine learning kernelsBenefitsUnlimited office book budget
You can buy as many books for the office as you want. You're encouraged to spend time during the workday reading!
Generous equity grant
Team members are offered a competitive salary along with equity in the company.
Retirement matching
We match 401(k) plans up to 4%.
Medical, dental & vision
We offer competitive medical, dental, vision insurance for employees and dependents and cover 100% of premiums.
Time off
We offer unlimited paid time off as well as 10+ observed holidays.
Parental leave
We offer biological, adoptive, and foster parents paid time off to spend quality time with family.
Daily lunch
We cover lunch daily for employees.
Visa SponsorshipsYes, we sponsor visas and work permits.
The San Francisco Compute Company is committed to maintaining a workplace free from discrimination and harassment.We make employment decisions based on business needs, job requirements, and individual qualifications, without regard to race, color, religion, belief, national origin, social or ethical origin, age, physical, mental, or sensory disability, sexual orientation, gender identity or expression, marital status, civil union or domestic partnership status, past or present military service, HIV status, family medical history or genetic information, family or parental status including pregnancy, or any other status protected by law.
We welcome the opportunity to consider qualified applicants with prior arrest or conviction records. Our commitment to diversity includes hiring talented individuals regardless of their criminal history, in accordance with local, state, and federal laws, including San Francisco's Fair Chance Ordinance and California's ban-the-box laws.

#J-18808-Ljbffr


Source: Jobleads

Job Function:

Requirements

Supercomputing Engineer
Company:

San Francisco Compute Co.


Radar Repair

Work in the cutting edge of the STEM field. You will build skills in electronics and technology to conduct advanced radar and computer repair on world-class ...


From United States Army - California

Published 7 days ago

Senior Engineer - Solid Propulsion

Date Posted: 2024-07-24 Country: United States of America Location: AZ802: RMS AP Bldg 802 1151 East Hermans Road Building 802, Tucson, AZ, 85756 USA Positio...


From Raytheon - California

Published 7 days ago

Manufacturing Engineer

About us : HCLTech is a global technology company, home to more than 223,400 people across 60 countries, delivering industry-leading capabilities centered ar...


From Hcltech - California

Published 7 days ago

Senior Systems Design Engineer

Date Posted: 2024-07-11 Country: United States of America Location: AZ805: RMS AP Bldg 805 1151 East Hermans Road Building 805, Tucson, AZ, 85756 USA Positio...


From Raytheon - California

Published 7 days ago

Built at: 2024-10-05T03:24:12.051Z