Let us shape the future together!
Your Role & Responsibilities
- Design, build, and maintain scalable data pipelines that support the full data lifecycle—from collection and annotation to quality assurance, governance, and cloud storage.
- Develop and optimize infrastructure that enables high-quality datasets to flow efficiently into machine learning training pipelines.
- Contribute to the design and evolution of a cloud-based data platform, including ingestion pipelines, labeling workflows, dataset cataloging, versioning, and compliance.
- Implement processes and tooling that improve data quality, reliability, and observability across the data lifecycle.
- Work closely with research and engineering teams to support the creation and management of large-scale datasets for advanced machine learning models.
- Support data collection and integration workflows from robotics and sensor-based systems, including environments built on ROS2 or similar frameworks.
- Contribute to best practices for data engineering, documentation, testing, and reproducible data pipelines.
Required Technical & Professional Expertise
- 5+ years of professional experience in data engineering, machine learning infrastructure, or related fields.
- Strong experience designing and building large-scale data pipelines and distributed data processing systems.
- Solid experience working with cloud-based data platforms and storage systems (Azure or similar cloud platforms preferred).
- Experience working with datasets for machine learning systems, ideally including multimodal data.
- Strong programming skills in Python and familiarity with modern data engineering tools and workflows.
- Experience working with high-volume data ingestion, data processing pipelines, and dataset versioning.
- Familiarity with robotics data pipelines or ROS2-based ecosystems is beneficial but not required.
- Strong problem-solving skills and the ability to collaborate effectively with cross-functional technical teams.