We are releasing a first dataset containing 3 million labeled lines (advertising auctions). This dataset can be freely used as resources for Machine Learning courses.
It was originally created by Cyrille Dubarry and previously used as competition material for a Machine Learning class he gives at the École Polytechnique. We thought it would be nice to share it with a wider audience!
This dataset can be used to predict the time a user will spend watching a video ad. Each line is identified by its auction_id and depicts one impression for a given context with: user, publisher, and advertiser information.
Content
Columns description:
- auction_id – unique id for identifying each line
- timestamp – the timestamp (in seconds) of the ad impression
- creative_duration – the total duration of the video that has been played
- campaign_id – the advertising campaign id
- advertiser_id – the advertiser id
- placement_id – the id of a zone in the web page where the video was played
- placement_language – the language of this zone
- website_id – the corresponding website id
- referer_deep_three – the URL of the page where the video was played, truncated at its 3rd level
- ua_country – the country of the user who saw the video
- ua_os – the user Operating System
- ua_browser – the user internet browser
- ua_browser_version – the user browser version
- ua_device – the user device
- user_average_seconds_played – the average duration the user watched video ads in the past. It can be null if the user never watched any ad.
- seconds_played – the observed time the video has been watched. This is the quantity we are trying to predict.
License
CC0 1.0 Universal (CC0 1.0) – Public Domain Dedication
Get the dataset
File description:
- dataset.csv.gz – a gzip .csv file containing 3 million labeled lines (147.16 MB)
Machine Learning at Teads
Digital Advertising is an astonishing Machine Learning playground, it combines data-rich activities, scaling challenges and a lot of automation, especially since the rise of Programmatic buying and selling of ads in real-time.
If you want to know more about our Machine Learning stack and use cases you can have a look at our blog articles on the subject and also watch the talk Cyrille Dubarry and Han Ju gave at Spark Summit Europe 2018: Machine Learning for AdTech in action.