The online display advertising market is growing rapidly and is becoming the most important distribution channel in terms of value. Among the advertising mechanisms on the Internet, Real-Time Bidding (RTB) is the most widely used. This method automates the buying and selling of adverti- sements between websites and advertisers through an auction mechanism. This allows for individual display of advertisement to visitors and thus a fine targeting, explaining the great popularity of this approach. The RTB mechanism is based on the auctioning of available ad slots on the web pages. These auctions are organized in a standardized way during the loading of the web page. Bidding algorithms are responsible for placing the bids in order to guarantee the advertisers the best possible revenue. In this document, we present our work on the study and improvement of real-time bidding ap- proaches carried out during the three years of this Ph.D. The RTB problem (on the advertiser side) consists in a constrained optimization problem : we want to develop an algorithm that maximizes the number of clicks obtained during the display advertising campaign under the constraint of a limited budget. This work led us to consider the problem through two main issues : the prediction of the clicks probability to obtain an estimate of the value of a given ad slot, and the optimization of the bidding campagn which, based on this estimate, should regulate the bids and the budget in order to maximize the number of clicks. Click probability prediction hence plays a crucial role in enabling the utility estimation of an impression. With only about one clicked ad per thousand impressions, this binary classification problem falls into the rare event prediction domain. The prediction of such events requires the use of specific models and evaluation functions. We thus present a study on the performances and biases of classical classification models and we explore ways of reducing these biases. To this end, we compare theperformance of three models : classical logistic regression, weighted logistic regression for rare events, and a reference deep learning model. We study these performances under several evaluation functions allowing us to show some biases induced by classical performance measures. We present a performance measure specific to RTB to correct these biases but also to give indications on the profitability of ad display campaigns in order to help decision making. Using this work on click probability prediction, we study the optimization of real-time bidding campaigns. We formulate this problem as a Markov decision process and develop several bidding strategies : the naive constant bidding strategy, the linear bidding strategy consisting in bidding proportionally to the click probability and its variant, the linear bidding with budget pacing. We also study a deep reinforcement learning strategy theoretically enabling to learn a dynamic bidding strategy, adapting to the conditions of the campaign continuously. We study the performance of these strategies on a benchmark dataset and show that despite its great popularity in the RTB research community, reinforcement learning does not bring significant improvement compared to other approaches, amongst others because of the convergence and stability issues of this type of approach, notably due to the formulation of the states of the Markovian decision process. We finally present a study on the convergence of reinforcement learning and state formulation learning using a game as a simplified simulation of RTB. We explore the use of autoencoders to learn a state formulation that would allow better convergence of reinforcement learning.
Soutenue par Slimane Makhlouf, au CNAM le 19-04-2023. Encadrée par Avner Bar-Hen et François-Xavier Jollois et financé par Velvet Consulting. Content d’avoir pu suivre sa thèse lors des comités de suivi et sa soutenance. Lien