Machine Learning and Data Engineer (m/f/d)

  • Datalynx AG
  • Hybrid (Basel, Zurich)
  • 11/04/2024
Full time Data Engineering Machine Learning Artificial Intelligence Software Engineering Data Warehouse

Job Description

For our client, a well-known pharmaceutical company in Basel, we are looking for a Machine Learning and Data Engineer (m/f/d).


This position is located in Data Products & Platforms, a chapter within the Data & Analytics function, which pushes boundaries of drug discovery and development, enabling pRED to achieve its goals.
The Machine Learning and Data Engineer will be responsible for the end-to-end development and deployment of a semantic search vector database for research purposes and pRED scientific needs. This role requires a combination of skills in machine learning, data engineering, and software development.

General Information

  • Start date: 1.5.2024
  • Latest Start Date: 1.7.2024
  • Extension (in case of limitation): possible
  • Workplace: Basel, Zurich
  • Workload: 100%
  • Remote/Home Office: partially remote, partially in Basel
  • Travel: no

Tasks and Responsibilities

  • Integrate off-the-shelf open-source embedding models with the system to generate text embeddings from research publications and other text based sources
  • Design and implement the data processing pipeline to handle the conversion of PDF, XML or other files into a suitable format for text embedding
  • Set up and maintain the vector database infrastructure, ensuring efficient storage and retrieval of embeddings
  • Develop and maintain the API for semantic search, allowing for robust querying capabilities.
  • Collaborate with stakeholders to gather requirements and ensure the system meets the needs of the organization
  • Conduct testing and quality assurance to ensure the reliability and accuracy of the search results.
  • Document the system architecture, API usage, and operational procedures for future reference and maintenance
Must Haves
  • Strong programming skills, particularly in Python, and experience with machine learning libraries (e.g., TensorFlow, PyTorch)
  • Minimum 7 years Experience with data engineering tasks, including data extraction, transformation, and loading (ETL)
  • Familiarity with vector database technologies (e.g., FAISS, Milvus, Elasticsearch) and database indexing.
  • Knowledge of API development and best practices for scalability and security
  • Ability to work independently, manage multiple priorities, and communicate effectively with both technical and non-technical stakeholders
  • English fluent
Nice to Haves
  • Former pharmaceutical industry experience


Jan Schmitz-Elsen
Talent Acquisition Consultant
079 425 10 45