A Simple and Efficient Method to Handle Sparse Preference Data Using Domination Graphs: An Application to YouTube


The phenomenal growth of the number of videos on YouTube provides enormous potential for users to find content of interest to them. Unfortunately, as the size of the repository grows, the task of discovering high-quality content becomes more daunting. To address this, YouTube occasionally asks users for feedback on videos. In one such event (the YouTube Comedy Slam), users were asked to rate which of two videos was funnier. This yielded sparse pairwise data indicating a participant’s relative assessment of two videos. Given this data, several questions immediately arise: how do we make inferences for uncompared pairs, overcome noisy, and usually contradictory, data, and how do we handle severely skewed, real-world, sampling? To address these questions, we introduce the concept of a domination-graph, and demonstrate a simple and scalable method, based on the Adsorption algorithm, to efficiently propagate preferences through the graph. Before tackling the publicly available YouTube data, we extensively test our approach on synthetic data by attempting to recover an underlying, known, rank-order of videos using similarly created sparse preference data.