E-Commerce (E-Com) industry is growing fast, with a projected global sales of 4.8 trillion USD in 2021. Virtually every E-Com platform leverages machine learning (ML) techniques to increase their users’ satisfaction and optimize business value for the company. Optimal ranking of search results plays a vital role in achieving these goals. This problem is referred to as learning to rank (LTR) in the ML literature.
Why opt Deep Learning?
We are interested in knowing the “good” products for a search query entered by the user. ML algorithms require hand-crafted features of product and product-query pair to learn the optimal ranking. The algorithm is actually learning the “importance” of these features. The rank (position on the search result page) of the product is obtained by weighting these features according to their learned “importance”. However, such hand-crafted features greatly reduce the applicability of such ML algorithms in commercial settings due to the large amount of diverse products.
Deep learning (DL) is a branch of machine learning which does not require such hand-crafted features for learning. It offers an opportunity to automatically learn the relevant features from raw data and has also inspired research in learning to rank. But DL needs large-scale data for training a model. Fortunately, such data is readily available in almost every E-Com platform in the form of search, clicks, add-to-basket (AtB) clicks and orders logs. As log data is abundantly and cheaply available, it is promising to devise learning algorithms which can learn effective ranking models from it. Learning from log data avoids intrusive interactions in the live E-Com platform. This is highly desirable in practice because it avoids badly affecting users’ experience.
The Challenge of Getting Relevance Judgement
One significant challenge in deploying the LTR algorithms is that they need information about which products are “good” or “relevant” to the search query. We refer to this information as relevance judgment. Relevance judgments are typically performed by experts or crowd sourcing. Several studies have shown that crowd-sourcing is not a reliable technique for getting relevance judgments on products of the E-Com platform. The reason is that the users of E-Com platforms have a very complex utility function and their criteria of relevance may depend on the product’s value for money, brand, warranty etc. Crowd-source workers fail to capture all these aspects of relevance. Moreover, it is prohibitively expensive to ask experts to provide relevance judgments for millions of products.
This has created a gap in the application of DL research for improving E-Com search quality.
The Proposed Solution
In this work, we aim to bridge this gap and improve product search with the practical constraints of a commercial setting. Some recent works have argued to utilize the log data for getting relevance judgments. They aggregate query-product pairs from the logs and calculate user feedback rate (e.g. order rate) for these pairs. Relevance judgments are done based on this rate. Such method of getting relevance judgments ignores the fact that log data is in the form of so called contextual-bandit feedback. This means that we have access to only those feedback signals which were generated in response to a limited set of actions taken by the ranking algorithm deployed on the E-Com platform. For instance, we do not know how the user would have responded to the search results if another set of products was shown. That is why the traditional supervised learning approach, where information about all possible actions is assumed, is not well-suited for learning from log data. We refer to the traditional approach as full-information (Full-Info) approach and its loss (such as cross-entropy and hinge) as Full-Info loss. This partial information (contextual-bandit feedback) challenge requires us to devise more efficient ways of utilizing the information contained in the log data.
To address this challenge effectively, the learning problem should be reformulated. Due to these reasons, we advocate employing a counterfactual risk minimization (CRM) approach and adapting the learning to rank algorithm to learn directly from the log data. CRM loss requires the knowledge of the current ranking algorithm, the actions taken by the current ranking algorithm and users’ feedback on these actions (e.g. AtB clicks, orders etc). All this information is contained in the log data.
We propose that the CRM approach is better suited for learning from such logged contextual bandit feedback, as it does not require full-information about all actions and their rewards. Moreover, it also circumvents the need to aggregate the log data and generate relevance judgments.
Who writes here?
“Mercateo has tons of data records and I can leverage this data to come up with novel solutions for solving real-world business problems." Muhammad Umer Anwaar works as a researcher at Mercateo. He is passionate about machine learning. Umer is doing his PhD with the Technical University Munich.
Muhammad Umer Anwaar