Sr. System Development Engineer, Alexa Language and Data OpsThe Artificial General Intelligence (AGI) team is looking for passionate, talented, and inventive engineers to play a pivotal role in the development and maintenance of industry-leading multi-modal and multi-lingual large language models (LLM). AGI team's mission is to leverage our hyper-scalable, general-purpose large model training and inference systems to develop and deploy cutting-edge sensory AI foundational models that revolutionize machine perception, interpretation, and interaction with humans and the physical world.
We believe in the value of "Work Hard. Have Fun. Make History" by having a strong focus on sharing learning experiences from the front line with the development teams. The options for people in the team are vast. If you like mastering a domain and going deep, we need you. If you can juggle multiple tasks and coordinate with several people in the heat of an incident, we need you. If you love the benefits of process and methodical improvement, you will thrive here. If you want to keep your head down, headphones on, and focus on coding to support the team, we have a spot for you too.
You will be required to deeply understand technology landscapes and evaluate the use of new technologies. You will be influential within your team and work with peers and senior leaders to define and revise the standards for operational excellence across systems. You will consistently tackle abstract issues that span multiple functional areas and drive your team to push for improvements that can scale across other teams, services, and platforms.
Key Job ResponsibilitiesProvide support for cluster and node management, ensuring smooth operation of LLM infrastructure.Continuously improve and automate our cluster/capacity/maintenance upgrades.Develop automation tools for improving operational excellence.Work on operations and maintenance driven coding projects, primarily in Ruby, Rails, Java, Python, or shell scripts, AWS, and web technologies projects.Have hands-on experience in Kubernetes and expertise in different AWS services.Drive company-wide campaigns with support and engineering teams and drive them to closure.Participate in design and code reviews and identify bottlenecks.Troubleshoot and research root causes thoroughly and resolve defects.Incubate new ideas on tools and techniques focused on improving availability, performance, and cost of services.Identify and resolve bottlenecks in software and infrastructure, driving design reviews and operational reviews for enterprise systems.BASIC QUALIFICATIONS5+ years of systems design, software development, operations, automation, and process improvement experience.5+ years of development/programming/scripting language (Python/Java/Bash/Perl) experience.Experience with Linux/Unix.Experience with CI/CD pipelines build processes.PREFERRED QUALIFICATIONSExperience with distributed systems at scale.Amazon is an Equal Opportunity Employer – Minority / Women / Disability / Veteran / Gender Identity / Sexual Orientation / Age.
#J-18808-Ljbffr