Job Description

Let us shape the future together!

Your Role & Responsibilities

  • Design, build, and maintain scalable data pipelines that support the full data lifecycle—from collection and annotation to quality assurance, governance, and cloud storage.
  • Develop and optimize infrastructure that enables high-quality datasets to flow efficiently into machine learning training pipelines.
  • Contribute to the design and evolution of a cloud-based data platform, including ingestion pipelines, labeling workflows, dataset cataloging, versioning, and compliance.
  • Implement processes and tooling that improve data quality, reliability, and observability across the data lifecycle.
  • Work closely with research and engineering teams to support the creation and management of large-scale datasets for advanced machine learning models.
  • Support data collection and integration workflows from robotics and sensor-based systems, including environments built on ROS2 or similar frameworks.
  • Contribute to best practices for data engineering, documentation, testing, and reproducible data pipelines.

Required Technical & Professional Expertise

  • 5+ years of professional experience in data engineering, machine learning infrastructure, or related fields.
  • Strong experience designing and building large-scale data pipelines and distributed data processing systems.
  • Solid experience working with cloud-based data platforms and storage systems (Azure or similar cloud platforms preferred).
  • Experience working with datasets for machine learning systems, ideally including multimodal data.
  • Strong programming skills in Python and familiarity with modern data engineering tools and workflows.
  • Experience working with high-volume data ingestion, data processing pipelines, and dataset versioning.
  • Familiarity with robotics data pipelines or ROS2-based ecosystems is beneficial but not required.
  • Strong problem-solving skills and the ability to collaborate effectively with cross-functional technical teams.