Machine Learning Dataset #1:

predicting engagement in video ads

Watch the video: Machine Learning Dataset #1:

We are releasing a first dataset containing 3 million labeled lines (advertising auctions). This dataset can be freely used as resources for Machine Learning courses.

It was originally created by Cyrille Dubarry and previously used as competition material for a Machine Learning class he gives at the École Polytechnique. We thought it would be nice to share it with a wider audience!

This dataset can be used to predict the time a user will spend watching a video ad. Each line is identified by its auction_id and depicts one impression for a given context with: user, publisher, and advertiser information.


Columns description:

  • auction_id – unique id for identifying each line
  • timestamp – the timestamp (in seconds) of the ad impression
  • creative_duration – the total duration of the video that has been played
  • campaign_id – the advertising campaign id
  • advertiser_id – the advertiser id
  • placement_id – the id of a zone in the web page where the video was played
  • placement_language – the language of this zone
  • website_id – the corresponding website id
  • referer_deep_three – the URL of the page where the video was played, truncated at its 3rd level
  • ua_country – the country of the user who saw the video
  • ua_os – the user Operating System
  • ua_browser – the user internet browser
  • ua_browser_version – the user browser version
  • ua_device – the user device
  • user_average_seconds_played – the average duration the user watched video ads in the past. It can be null if the user never watched any ad.
  • seconds_played – the observed time the video has been watched. This is the quantity we are trying to predict.


CC0 1.0 Universal (CC0 1.0) – Public Domain Dedication

Get the dataset

File description:

  • dataset.csv.gz – a gzip .csv file containing 3 million labeled lines (147.16 MB)
All fields required except Newsletter subscription. 100% Non-Spam.

Machine Learning at Teads

Digital Advertising is an astonishing Machine Learning playground, it combines data-rich activities, scaling challenges and a lot of automation, especially since the rise of Programmatic buying and selling of ads in real-time.

If you want to know more about our Machine Learning stack and use cases you can have a look at our blog articles on the subject and also watch the talk Cyrille Dubarry and Han Ju gave at Spark Summit Europe 2018: Machine Learning for AdTech in action.

Our speaker(s)

Robert Dupuy
VP Engineering
Natural born geek with an entrepreneur spirit that loves to discuss around anything tech related, management related or neat cultural differences. My main focus has always been building dev and infra teams, leading them to scale the business in a very automated way.
Cyrille Dubarry
Engineering Manager
After my PhD thesis in statistics, I joined the AdTech industry in 2012. My main focus today is to manage the Teads DSP datascience team in order to solve various business related Machine Learning challenges. This position leverages my experience with Hadoop and Spark acquired at Criteo and Teads, my mathematical background and my understanding of the AdTech world.
Alban Perillat-Merceroz
Engineering Manager in Tech Montpellier
I help Teads giving sense to billions of events a day, by building an Analytics platform capable of transforming and storing this data. Other than the scale at which we operate, my main challenge is to serve meaningful insights to a multitude of different users in a timely manner. Before moving to Montpellier, I worked on search, payment engines at Viadeo in San Francisco and Paris.
Han Ju
Senior Software Engineer
I’ve worked at different components at Teads including analytics and big data platforms. My main focus now is building production machine learning systems. Working alongside with data scientists, I combine my knowledge of software engineering, distributed system and machine learning to help scaling our ML pipelines and experimentation platforms.
Tristan Sallé
Senior Software Engineer
Software engineer with a preference for the back-end side, I’ve been working on the SSP at Teads since its beginning and we had to face a fairly large number of technical challenges along the way. We grew a lot as a team and I’m learning how to make this impactful service scale properly with our growing business every day. I’m also responsible for our Scala trainings and I like to share on what I have been discovering during my journey.
Xavier Bucchiotty
Director of Engineering
Vestibulum eu odio. Fusce fermentum odio nec arcu. Vivamus aliquet elit ac nisl. Ut a nisl id ante tempus hendrerit. Cras id dui.
Loïc Jaures
SVP Technology
As Teads’ co-founder, I’ve spent countless hours nursing and leading our Technology from 0 to $400M in revenues. Nowadays I’m using my entrepreneurial experience and industry knowledge to lead the data initiative at Teads.
Jean-Baptiste Pringuey
VP Engineering
Vestibulum eu odio. Fusce fermentum odio nec arcu. Vivamus aliquet elit ac nisl. Ut a nisl id ante tempus hendrerit. Cras id dui.
Kévin Margueritte
Software Engineer
Full Stack Engineer and Scala addict working in the Performance feature team.
Benjamin Davy
Sustainability Director
Leading the Engineering Sustainability initiative at Teads, working on estimating the carbon footprint of our infrastructure
Antoine Brechon
Engineering Manager - Infrastructure Team
Engineering Manager – Infrastructure Team
Damien Pacaud
Former Infrastructure Director
Scroll to Top
Here is your Dataset
Enjoy your Machine Learning experiments!