SENIOR STAFF DATA ENGINEER

About Karius

Karius is a venture-backed life science startup that is transforming the way pathogens

and other microbes are observed throughout the body. By unlocking the information

present in microbial cell-free DNA, helping doctors quickly solve their most

challenging cases, providing industry partners with access to 1000’s of biomarkers to

accelerate clinical trials, discover new microbes, and reduce patient suffering worldwide.

Karius aims to conquer infectious diseases through innovations around genomic

sequencing and machine learning. The company’s platform is already delivering

unprecedented insights into the microbial landscape, providing clinicians with a

a comprehensive test capable of identifying more than a thousand pathogens directly

from blood, and helping the industry accelerate the development of therapeutic

solutions. The Karius test we provide today is one of the most advanced solutions

available to physicians who aim to deliver better care to many otherwise ineffectively

treated patients.

Position Summary

Karius is building AI-driven data analytics pipelines to deliver life-saving results in the

highly complex infectious disease landscape. We are seeking a seasoned Senior Staff

Data Engineer in Redwood City, CA to lead the design and development of a scalable

data platform to meet our rapid business growth. Senior Staff Data Engineer will be

responsible for defining the technology roadmap, and developing and optimizing the

data platform to enable us to extract values from large amounts of genomic, clinical,

operation and clinical data to provide actionable insights to serve the patients and

develop innovative products. In this regard, the Senior Staff Data Engineer will work

with key stakeholders within the company to understand our data landscape and the

core needs for data governance and usage.

Primary Responsibilities
• Design, develop, and operate a scalable data platform that ingests, stores, and

aggregates various datasets to meet the defined requirements;
• As the primary subject matter expert in the data engineering domain, evaluate

technology trends in the data industry, identify those technologies relevant to

the company’s business objectives, and develop a roadmap to update the

company’s data platform;
• Provide Machine Learning (“ML”) data platform capabilities for R&D and

Analytics teams to perform data preparation, model training and management,

and run experiments against clinical and genomic datasets;
• Train the R&D and Analytics teams on using Karius data toolsets and mentor

and support them throughout their research and development efforts;
• Build and maintain data ETL/ELT pipelines to source and aggregate the

required internal data to calculate operational and commercial Key Performance

Indicators (“KPIs”) and various data analysis and reporting needs;
• Develop integrations with Karius and 3rd party systems to source, qualify and

ingest various datasets; work closely with cross-functional groups and

stakeholders, such as the product, engineering, medical, and scientific teams,

for data modeling and general life cycle management;
• Provide data analytics and visualization tools to extract valuable insights from

the data and enable data-driven decisions; and
• Work closely with the Security and Compliance teams, and deploy necessary

data governance to meet the regulatory and legal requirements.

Position Minimum Requirements
• At least a Bachelor’s degree in Computer Science, Data Science, or Software

Engineering, Electrical Engineering, or Bio-Engineering (or its foreign equivalent);

plus
• At least 10 years of experience as a Software or Data Engineer or similar

position, including at least 5 years in a senior or higher-level position;

AND (or experience must include):
• 4+ years of hands-on design, development and operation of data solutions using

the following data technologies: Spark and Spark Streaming, Presto, Parquet,

MLflow, Kafka, and ETL tools such as Stitch or FiveTran;
• 4+ years of hands-on experience with design, development and maintenance of

structured, semi and non-structured (NoSQL) data stores, such as MySQL,

PostgreSQL, AWS Redshift, Teradata, Graph databases like Neo4j, and

Databricks Lakehouse;
• 4+ years of hands-on development and operation of workflows and jobs using

task orchestration engines such as Airflow, Argo, NextFlow, Dollar U and Tidal;
• 4+ years of hands-on experience building and operating data solutions on

operating systems such as Linux and Unix hosted in Amazon Web Services

(AWS) cloud;
• 5+ years of hands-on building and operation of scalable infrastructure to support

batch, micro-batch, and stream data processing for large volumes of data;
• 5+ years of hands-on experience designing and implementing enterprise data

warehouse/Lakehouse solutions to house business and technical datasets and

derive KPI dimensions for consumption;
• Demonstrated experience with enterprise data modeling in healthcare and/or life

science sectors;
• Demonstrated experience with the development and operation of visualization

and dashboards for business KPI reporting using tools such as Tableau or

Looker;
• Proficiency in Python and PySpark;
• Automation of Data Testing using scripting;
• Experience developing and managing technical and administrative controls for

data governance and regulatory compliance in the healthcare and/or life sciences

sectors;
• Experience mentoring and coaching junior data engineers; and
• Cross-functional project management experience.

Travel: No travel is required.

Reports to: VP, Engineering

$180,000 – $240,000 a year

Job Category
Job Type
Salary
Country
City
Career Level
Company
JOB SOURCE