Title: AI Data Trainer
Location: remote
Duration: contract-to-hire
PR: $25/hr
Work Authorization: USC/GC; no subcontracting during the contract duration
Overview: Large language models are core to our client, and the data we collect is core to the language models we train.
While the current iteration of LLMs is trained primarily on web text, the next generation of LLMs will rely on human annotation to create custom datasets to further develop the capabilities of these models.
We are looking for an AI Data Trainer to work closely with engineering and product teams to lead the creation of custom datasets for training specialized models to enable enterprise solutions using LLM's cutting-edge capabilities.
This role requires a diverse set of skills and draws on a range of disciplines.
We are therefore considering a broad range of backgrounds for this role, including ML, NLP, HCI, software engineering, and relevant linguistic and social sciences.
Key responsibilities: Collaborate with Data Science and Product teams to define annotation tasks, coordinate resourcing, and review annotated data for qualityDevelop and disseminate data labeling best practices learned from building enterprise solutions using LLMsDevelop labeled data assets according to annotation guides to train and evaluate LLMs in collaboration with Machine Learning Engineers for real-world use casesCollaborate with centralized data and evaluation teams on specialized collection protocols, UIs, and instructions for diverse and creative human annotation tasks Must Haves: Bachelor's degree in Linguistics, Library Science, or a related field (open to non-traditional backgrounds as well!
)Experience with ontology development and information domain modelingExperience labeling conversational text for analysis as AI trainersExperience with AI interaction, such as prompt generation and open AIsExperience running and managing human annotation jobs for large-scale data collection with quality control and best practices for human annotationProficiency with SQL, terminal, and command lineProficiency with Jupyter notebooksAbility to follow complex instructions, navigate ambiguity, and work independentlyDetail-oriented disposition and clear, concise communication skillsCuriosity about technology and knack for tackling problems in creative ways Plusses: Proficiency in JapaneseExperience developing labeled data assets according to annotation guides to train and evaluate LLMs in collaboration with ML Engineers for real-world use casesExperience collaborating with centralized data and evaluation teams on specialized collection protocols, UIs, and instructions for diverse and creative human annotation tasks Compensation: $25.00/HR