We are looking for a Data Engineer to join our growing team of Data Scientists.
Caris Life Sciences is looking for a sharp, driven and goal-oriented Data Engineer to expand and optimize our data storage and data pipeline architecture, and to optimize data flow and collection for cross functional teams. The successful candidate will design, implement, and maintain data storage and data flow solutions for structured and non-structured multi-model data in support of data science and machine learning pipelines. The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up. They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products. Must know and adhere to best practices and possess knowledge of current state of the art data platforms and solutions.
- Create and maintain optimal data pipeline architecture
- Assemble large, complex data sets that meet functional / non-functional business requirements
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing data science products
- Build processes supporting data transformation, data structures, metadata, dependency and workload management
- Masters in Computer Science, Engineering or a related field
- A successful history of manipulating, processing, and extracting value from large disconnected datasets
- Experience building and optimizing ‘big data’ data pipelines, architectures and data sets
- Experience designing, implementing, and maintaining data storage and data flow solutions for structured and non-structured multi-model data
- Strong analytic skills related to working with unstructured datasets
- Experience with relational SQL and NoSQL databases, including MongoDB, Cassandra, etc.
- Experience with data pipeline and workflow management tools: Luigi, Airflow, etc. (ability to create end to end pipelines – data transformation, data cleaning, ingestion into the data base, verify data cleaning, data integrity)
- Experience with object-oriented/object function scripting languages: Python, Java, Scala, etc.
- Proficiency in Linux, Python, Pandas, PySpark, Dask, etc.
- Experience writing RESTful APIs
- Experience with big data tools: Hadoop, Spark, Kafka, etc.
- Dashboard experience I.E. Tableau, Looker, Dash
- Experience with flask
- Exposure to cancer biology
- Experience with microservices architecture
- Data Science or Machine learning experience
Core Skills & Competencies:
- Proficient verbal and written communication skills to explain complex technical details in clear language
- Commitment to the successful achievement of team and organizational goals through a desire to participate with and help other members of the team
- Demonstrate a focus on listening to and understanding user needs and then delighting the customer by exceeding service and quality expectations
- Highly self-motivated, self-directed, and attentive to detail.
- Must possess ability to sit, stand, and/or work at a computer for long periods of time.
This job description reflects management’s assignment of essential functions. Nothing in this job description restricts management’s right to assign or reassign duties and responsibilities to this job at any time.
Caris Life Sciences is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, religion, color, national origin, gender, gender identity, sexual orientation, age, status as a protected veteran, among other things, or status as a qualified individual with disability.
*Interested parties please email your resume and cover letter to Valarie Perez at firstname.lastname@example.org. #LI-VP1