Cloud data platforms like Google BigQuery or Snowflake are very successful today. They promise to bridge the advantages of data lakes (big, unstructured, or semi-structured), the SQL database’s power, and the simplicity of managed services.
In this internship, you will experiment with those platforms and the competing technologies to understand the differences and provide selection criteria and architecture recommendations.
In this role
The objectives of this internship include:
- Create a big data SQL benchmark based on public datasets (e.g., GDELT, CitiBike)
- Experiment and measure performance of SQL query (Snowflake, BigQuery, Presto/Trino)
- Compare to competing offerings like data lake engines (e.g., Dremio, Databricks’ SQL Analytics) or SQLon-Hadoop engines (e.g., Impala, Spark SQL)
- Provide an assessment of the strengths and weaknesses of each solution
- Issue recommendations for target use cases
What we offer
Join our team as intern and you will find a young, dynamic and culturally diverse working environment.
About your profile
- Strong interest in SQL and query engines
- Knowledge of the fundamentals of data management and system architecture
- Knowledge of cloud service deployments (AWS or GCP)