JOB DESCRIPTION: The Principal Machine Learning Engineer reports to the Head of Technology, Digital Solutions. The Principal Machine Learning Engineer will closely work with cross-functional product teams and is responsible for designing, developing, and deploying state of the art data engineering techniques and streamlined data ingestion processes to extract valuable insights and intelligence from large and complex medical datasets (structured and unstructured data). This role will develop standards, guidelines, and direction for data modeling and data standardization that will directly contribute to enhancing the quality of patient care and the development of innovative medical devices and therapy solutions.
JOB RESPONSIBILITIES: Lead and contribute hands-on to major data engineering initiatives from inception to delivery. Analyze data to identify trends and insights. Collaborate with product and engineering teams to define data requirements and drive data-driven decision making. Design and implement data models to effectively support various product use cases. Design, implement, and maintain scalable and optimized data architectures that meet evolving business needs. Evaluate and recommend appropriate data storage solutions, ensuring data accessibility and integrity. Develop and continuously optimize data ingestion processes for improved reliability and performance. Design, build, and maintain robust data pipelines and platforms. Establish monitoring and alerting systems to proactively identify and address potential data pipeline issues. Support Data infrastructure needs such as cluster management and permission. Develop and maintain internal tools to streamline data access and analysis for all teams. Create and deliver documentation to educate product teams on data best practices and tools. Communicate technical concepts effectively to both technical and non-technical audiences. EDUCATION AND EXPERIENCE: Master's Degree in Data Science, Computer Science, Statistics, or a related field plus 10 years of experience in data engineering with a strong focus on data architecture and data ingestion. Experience in the Life Science Industry. Strong understanding of data modeling (conceptual, logical, and physical) using different data modeling methodologies and analytics concepts. Proven experience designing, building, and maintaining data pipelines and platforms. Expertise working with data integration and ETL tools, and data engineering programming/scripting languages (Python, Scala, SQL) for data preparation and analysis. Experience with Data Ops (VPCs, cluster management, permissions, Databricks configurations, Terraform) in Cloud Computing environments (e.g., AWS, Azure, GCP) and associated cloud data platforms, cloud data warehouse technologies (Snowflake/Redshift), and Advanced Analytical platforms (e.g., Dataiku and Databricks). Familiarity with data streaming technologies like Kafka and Debezium. Proven expertise with data visualization tools (e.g., Tableau, Power BI). Strong understanding of data security principles and best practices. Experience with CI/CD pipelines and automation tools. Strong problem-solving and critical thinking skills. Excellent written and verbal communication skills to convey complex technical concepts and findings to non-technical stakeholders and collaborate effectively across teams. PREFERRED: Prior experience with healthcare domain data, including Electronic Health Records (EHR). Experience with triple stores or graph databases (e.g., GraphDB, Stardog, Jena Fuseki). Proficient with building domain ontologies and relevant W3C standards - RDF, RDFS, OWL, SKOS, SPARQL and associated Ontology Editors (e.g., TopBraid Composer, Protégé). Experience with semantic validation languages (e.g. SHACL, SPIN) and associated semantic software packages and frameworks (e.g., Jena, Sesame, RDF4J, RDFLib). Knowledge of data governance and compliance policies.
#J-18808-Ljbffr