SHRec: Scalable Holistic Recommendation


The problem of recommending items to users is of high practi- cal importance. For instance, many web services try to find rel- evant recommendations for the users, e.g., finding relevant movies, social-media friends, restaurants, shopping items, etc. The expan- sion of the Web and the ever-growing number of people who use web services render the problem of recommendation challenging. The Locality Sensitive Hashing (LSH, for short) is the most known scalable technique for nearest-neighbor search in high dimensional data, and hence the LSH is widely used in most industrial recom- mendation systems. This paper presents an implementation of the LSH using Googles MapReduce engine. We apply the LSH to a real case study at Google, where we recommend for each web-host a set of outlinks based on the outlink similarity amongst the web- hosts. We identify some performance limitations of the LSH that occur due to specific properties in the data, and that become signif- icant when the scale of the data is large. Furthermore, we present SHRec, a novel technique for scalable recommendation that ad- dresses these performance limitations. Based on real deployment of both SHRec and LSH on Google’s infrastructure, and using real data of the crawled Web at Google, where a sample host-level graph of 1.5 Billion web-hosts is extracted, we demonstrate that SHRec is more scalable than LSH. In particular, we show that SHRec is one order of magnitude faster than LSH while achieving better rec- ommendation quality.