The AdTech world has been totally reinvented a few years ago with the birth of real-time auction technologies, known as Real-Time Bidding (RTB). Those auctions allow to buy ad inventory impression by impression. For each visit of a user on a publisher website, each advertiser can choose to display an ad or not and find the right maximum price he is willing to pay to buy this opportunity. Consequently, we see an increasing need of automation and optimisation for the players connected to the RTB and a lot of solutions make use of Machine Learning. The involved datasets are big (billions of lines per day) and they evolve very quickly.
Thus it’s challenging to be able to train models every few hours to use only up to date data in production. Furthermore, those models need to be easily improvable through feature selection and hyper parameter tuning. This requires the ability to run offline and online tests. In this talk, Cyrille Dubarry (Engineering Manager and Senior Data Scientist) and Han Ju (Senior Software Engineer) explain in more details why Machine Learning plays a key role in the AdTech industry and how Spark is used at Teads to train production models, evaluate them through AB-tests, and design new models according to specific offline metrics. They cover the way we use those Machine Learning models in real-time production servers. The main takeaways architecture and implementation choices (custom model training framework, model serving, job scheduling and deployment) that work at scale for the addressed use cases.