Data Engineer with ref. 76915-PHARM-JNB_1596049766

We are seeking two Data Engineers of differing levels of experience to join a fast growing Biotech start-up.

Role:

You will design and build future-proof databases, large-scale processing systems and APIs in collaboration with Bioinformatics, Machine Learning and modeling experts, by developing, constructing, testing and maintaining data acquisition and dissemination methods. Deciding the best methods to acquire, curate, store and retrieve many primary and secondary data types along with metadata pertaining to various data domains.

Analysing characteristics of data sets (-omics, imaging, structural) required by Bioinformatics, Machine Learning and Science team members, and using that understanding to discover and develop methods to make them available.

Developing and implementing the most optimal methods for regular extraction, curation, transformation, storage, retrieval and delivery of large and complex scientific datasets for Research and Product Development

Recommending and implementing ways to improve data reliability, efficiency, and quality, through systems integration methods, automation of acquisition and quality control/assurance processes

Actively identifying patterns and anomalies in datasets using data surveillance tools as part of data performance reviews, and identify methods to improve existing processing pipelines.

Requirements:

A Bachelors' degree (Computer Science/ Mathematics/ Statistics) followed by a minimum of five years' experience developing and working with a variety of databases and data sets

At least three years' of deep experience, with demonstrated evidence of:

Performing analysis on at least a couple of types of data sets to understand their properties and advising end-user teams on their value

Developing/ optimising high-volume data pipelines, large datasets and big-data architectures

Successfully building processes for transforming data, creating unique data structures to suit end uses, ensuring sufficiency of metadata, and developing methods for automated delivery of data sets (software tools, APIs)

Working on building and using data stores in AWS

Big data tools and stream-processing systems: Hadoop, Spark, Kafka, Storm, Spark-Streaming

Relational SQL and NoSQL databases, including Postgres and Cassandra.

Data pipeline and workflow management tools: Luigi, Airflow, etc.

AWS cloud services: EC2, S3, Glue, Athena, API Gateway, Redshift

Experience with object-oriented and scripting languages: Python, JavaScript, Scala, etc.

Designing and building APIs (Restful, etc.)

Ontologies such as Gene Ontology, and ontological modelling tools and editors such as W3C Wiki, Basic Formal Ontology, etc.

Remote working possible

Data Engineer

Innova

Innova Businesses

About

Social