Future-proofing analytics event ingestion into Teads’ data warehouse

Tech Talks

French

Welcome
Blog
Future-proofing analytics event ingestion into Teads’ data warehouse

Watch the video: Future-proofing analytics event ingestion into Teads’ data warehouse

At Teads we are dealing with the processing of several billions of analytics events per hour. Those events are firstly ingested using a streaming pipeline and then prepared and loaded into our data warehouse where they are stored and aggregated in order to create meaningful costumer reports and apply helpful business intelligence.

A very important step during the ingestion is the data extraction from the streams (Kafka topics) into our log tables of the data warehouse (BigQuery). Here, we have to find the right balance between ingestion speed and transaction costs while maintaining our high requirements on data security. In this talk I would like show the development of our batched based approach of the last three years, from using a managed service like Dataflow inside a VPN over introducing kConnect towards applying our own framework based on Apache Spark.

Our speaker(s)

Matthias Kunter

Senior Software Engineer @ Analytics

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Future-proofing analytics event ingestion into Teads’ data warehouse

Watch the video: Future-proofing analytics event ingestion into Teads’ data warehouse

Our speaker(s)

Similar posts

Cross-cloud analytics workloads with BigQuery Omni and Looker

A Pragmatic Data Stack

Moving from Pull to Push and dividing by 1000 the network traffic