Jump to Content
Ed H. Chi

Ed H. Chi

Ed H. Chi is a Distinguished Scientist at Google, leading several machine learning research teams focusing on neural modeling, reinforcement learning, dialog modeling, reliable/robust machine learning, and recommendation systems in Google Brain team. His team has delivered significant improvements for YouTube, News, Ads, Google Play Store at Google with >420 product improvements since 2013. With 39 patents and >150 research articles, he is also known for research on user behavior in web and social media.
Prior to Google, he was the Area Manager and a Principal Scientist at Palo Alto Research Center's Augmented Social Cognition Group, where he led the team in understanding how social systems help groups of people to remember, think and reason. Ed completed his three degrees (B.S., M.S., and Ph.D.) in 6.5 years from University of Minnesota. Recognized as an ACM Distinguished Scientist and elected into the CHI Academy, he recently received a 20-year Test of Time award for research in information visualization. He has been featured and quoted in the press, including the Economist, Time Magazine, LA Times, and the Associated Press. An avid swimmer, photographer and snowboarder in his spare time, he also has a blackbelt in Taekwondo. See Ed's personal website.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, which introduces hundreds of billions of parameters for extremely high-cardinality features. This bottleneck has led to substantial progress in alternative embedding algorithms. Many of these methods, however, make the assumption that each feature uses an independent embedding table. This work introduces a simple yet highly effective framework, Feature Multiplexing, where one single representation space is used for many different categorical features. Our theoretical and empirical analysis reveals that multiplexed embeddings can be decomposed into components from each constituent feature, allowing models to distinguish between features. We show that multiplexed representations give Pareto-optimal space-accuracy tradeoffs for three public benchmark datasets. Further, we propose a highly practical approach called Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware. Unified embedding gives significant improvements in offline and online metrics compared to highly competitive baselines across five web-scale search, ads, and recommender systems, where it serves billions of users across the world in industry-leading products. View details
    Preview abstract Recommender systems play an important role in YouTube, one of the largest online video platforms across the world. In this paper, we focus on a real-world multitask ranking model for YouTube recommendations. While most of the recommendation research is dedicated to designing better models to improve user engagement and satisfaction, we found that research on stabilizing the training for such models is severely under-explored. As the recommendation models become larger and more sophisticated, they are more vulnerable to training instability issues, \emph{i.e.}, the loss diverges (instead of converging) which can make the model unusable, wasting significant resources and blocking model iterations. In this paper, we share our understanding and best practices we learned for improving the training stability of a multitask ranking model used in production. We show some properties of the model that lead to unstable training and speculate on the cause. Furthermore, we propose an effective solution to improve training stability based on our observations of training dynamics when model training starts to become unstable. Our experiments on a proprietary dataset show the effectiveness of the proposed method over several commonly used baseline methods. View details
    Preview abstract Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%). View details
    Emergent abilities of large language models
    Barret Zoph
    Colin Raffel
    Dani Yogatama
    Jason Wei
    Liam B. Fedus
    Maarten Paul Bosma
    Percy Liang
    Sebastian Borgeaud
    Tatsunori B. Hashimoto
    Yi Tay
    TMLR (2022)
    Preview abstract Scaling up language models has been shown to predictably confer a range of benefits such as improved performance and sample efficiency. This paper discusses an unpredictable phenomenon that we call emergent abilities of large language models. Such emergent abilities have close to random performance until evaluated on a model of sufficiently large scale, and hence their emergence cannot be predicted by extrapolating a scaling law based on small-scale models. The emergence of such abilities suggests that additional scaling could further expand the range of tasks that language models can perform. We discuss the implications of these phenomena and suggest directions for future research. View details
    Can Small Heads Help? Understanding and Improving Multi-Task Generalization
    Christopher Fifty
    Dong Lin
    Li Wei
    Lichan Hong
    Yuyan Wang
    the WebConf 2022 (2022)
    Preview abstract A goal for multi-task learning from a multi-objective optimization perspective is to find the Pareto solutions that are not dominated by others. In this paper, we provide some insights on understanding the trade-off between Pareto efficiency and generalization, as a result of parameterization in deep learning: as a multi-objective optimization problem, enough parameterization is needed for handling task conflicts in a constrained solution space; however, from a multi-task generalization perspective, over-parameterization undermines the benefit of learning a shared representation which helps harder tasks or tasks with limited training examples. A delicate balance between multi-task generalization and multi-objective optimization is therefore needed for finding a better trade-off between efficiency and generalization. To this end, we propose a method of under-parameterized self-auxiliaries for multi-task models to achieve the best of both worlds. It is model-agnostic, task-agnostic and works with other multi-task learning algorithms. Empirical results show our method improves Pareto efficiency over existing popular algorithms on several multi-task applications. View details
    Surrogate for Long-Term User Experience in Recommender Systems
    Can Xu
    Lisa Mijung Chung
    Mohit Sharma
    Qian Sun
    Sriraj Badam
    Yuyan Wang
    KDD 2022 (2022)
    Preview abstract Over the years we have seen recommender systems shifting focus from optimizing short-term engagement toward improving long-term user experience on the platforms. While defining good long-term user experience is still an active research area, we focus on one specific aspect of improved long-term user experience here, which is user revisiting the platform. These long term outcomes however are much harder to optimize due to the sparsity in observing these events and low signal-to-noise ratio (weak connection) between these long-term outcomes and a single recommendation. To address these challenges, we propose to establish the association between these long-term outcomes and a set of more immediate term user behavior signals that can serve as surrogates for optimization. To this end, we conduct a large-scale study of user behavior logs on one of the largest industrial recommendation platforms serving billions of users. We study a broad set of sequential user behavior patterns and standardize a procedure to pinpoint the subset that has strong predictive power of the change in users' long-term visiting frequency. Specifically, they are predictive of users' increased visiting to the platform in $5$ months among the group of users with the same visiting frequency to begin with. We validate the identified subset of user behaviors by incorporating them as reward surrogates for long-term user experience in a reinforcement learning (RL) based recommender. Results from multiple live experiments on the industrial recommendation platform demonstrate the effectiveness of the proposed set of surrogates in improving long-term user experience. View details
    LaMDA: Language Models for Dialog Applications
    Aaron Daniel Cohen
    Alena Butryna
    Alicia Jin
    Apoorv Kulshreshtha
    Ben Zevenbergen
    Chung-ching Chang
    Cosmo Du
    Daniel De Freitas Adiwardana
    Dehao Chen
    Dmitry (Dima) Lepikhin
    Erin Hoffman-John
    Igor Krivokon
    James Qin
    Jamie Hall
    Joe Fenton
    Johnny Soraker
    Maarten Paul Bosma
    Marc Joseph Pickett
    Marcelo Amorim Menegali
    Marian Croak
    Maxim Krikun
    Noam Shazeer
    Rachel Bernstein
    Ravi Rajakumar
    Ray Kurzweil
    Romal Thoppilan
    Steven Zheng
    Taylor Bos
    Toju Duke
    Tulsee Doshi
    Vincent Y. Zhao
    Will Rusch
    Yuanzhong Xu
    arXiv (2022)
    Preview abstract We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and arepre-trained on 1.56T words of public dialog data and web text. While model scaling alone canimprove quality, it shows less improvements on safety and factual grounding. We demonstrate thatfine-tuning with annotated data and enabling the model to consult external knowledge sources canlead to significant improvements towards the two key challenges of safety and factual grounding.The first challenge, safety, involves ensuring that the model’s responses are consistent with a set ofhuman values, such as preventing harmful suggestions and unfair bias. We quantify safety using ametric based on an illustrative set of values, and we find that filtering candidate responses using aLaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promisingapproach to improving model safety. The second challenge, factual grounding, involves enabling themodel to consult external knowledge sources, such as an information retrieval system, a languagetranslator, and a calculator. We quantify factuality using a groundedness metric, and we find that ourapproach enables the model to generate responses grounded in known sources, rather than responsesthat merely sound plausible. Finally, we explore the use of LaMDA in the domains of education andcontent recommendations, and analyze their helpfulness and role consistency. View details
    Learning to Augment for Casual User Recommendation
    Elaine Le
    Jianling Wang
    Yuyan Wang
    The ACM Web Conference 2022 (2022)
    Preview abstract Users who come to recommendation platforms are heterogeneous in activity levels. There usually exists a group of core users who visit the platform regularly and consume a large body of contents upon each visit, while others are casual users who tend to visit the platform occasionally and consume less each time. As a result, consumption activities from core users often dominate the training data used for learning. As core users can exhibit different activity patterns from casual users, recommender systems trained on historical user activity data usually achieve much worse performance on casual users than core users. To bridge the gap, we propose a model-agnostic framework \textit{L2Aug} to improve recommendations for casual users through data augmentation, without sacrificing core user experience. \textit{L2Aug} is powered by a data augmentor that learns to generate augmented interaction sequences, in order to fine-tune and optimize the performance of the recommendation system for casual users. On four real-world public datasets, the proposed \textit{L2Aug} outperforms other treatment methods and achieves the best sequential recommendation performance for both casual and core users. We also test \textit{L2Aug} in an online simulation environment with real-time feedback to further validate its efficacy, and showcase its flexibility in supporting different augmentation actions. View details
    Preview abstract Prompt-tuning is becoming a new paradigm for finetuning pre-trained language models in a parameter-efficient way. Here, we explore the use of HyperNetworks to generate prompts. We propose a novel architecture of HyperPrompt: prompt-based task-conditioned parameterization of self-attention in Transformers. We show that HyperPrompt is very competitive against strong multi-task learning baselines with only 1% of additional task-conditioning parameters. The prompts are end-to-end learnable via generation by a HyperNetwork. The additional parameters scale sub-linearly with the number of downstream tasks, which makes it very parameter efficient for multi-task learning. Hyper-Prompt allows the network to learn task-specific feature maps where the prompts serve as task global memories. Information sharing is enabled among tasks through the HyperNetwork to alleviate task conflicts during co-training. Through extensive empirical experiments, we demonstrate that HyperPrompt can achieve superior performances over strong T5 multi-task learning base-lines and parameter-efficient adapter variants including Prompt-Tuning on Natural Language Understanding benchmarks of GLUE and Super-GLUE across all the model sizes explored. View details
    Preview abstract Most literature in fairness has focused on improving fairness with respect to one single model or one single objective. However, real-world machine learning systems are usually composed of many different components. Unfortunately, recent research has shown that even if each component is "fair", the overall system can still be "unfair". In this paper, we focus on how well fairness composes over multiple components in real systems. We consider two recently proposed fairness metrics for rankings: exposure and pairwise ranking accuracy gap. We provide theory that demonstrates a set of conditions under which fairness of individual models does compose. We then present an analytical framework for both understanding whether a system's signals can achieve compositional fairness, and diagnosing which of these signals lowers the overall system's end-to-end fairness the most. Despite previously bleak theoretical results, on multiple data-sets -- including a large-scale real-world recommender system -- we find that the overall system's end-to-end fairness is largely achievable by improving fairness in individual components. View details
    Values of Exploration in Recommender Systems
    Can Xu
    Elaine Le
    Mohit Sharma
    Su-Lin Wu
    Yuyan Wang
    RecSys (2021)
    Preview abstract Reinforcement Learning (RL) has been sought after to bring next-generation recommender systems to improve user experience on recommendation platforms. While the exploration-exploitation tradeoff is the foundation of RL research, the value of exploration in RL based recommender systems is less well understood. Exploration, commonly seen as a tool to reduce model uncertainty in regions with sparse user interaction/feedback, is believed to cost user experience in the short term while the indirect benefit of better model quality arrives at a later time. We on the other hand argue that recommender systems have inherent needs for exploration and exploration can improve user experience even in the more imminent term. We focus on understanding the role of exploration in changing different facets of recommendation quality that more directly impact user experience. To do that, we introduce a series of methods inspired by exploration research to increase exploration in a RL based recommender system, and study their effect on the end recommendation quality, more specifically, \emph{accuracy, diversity, novelty and serendipity}. We propose a set of metrics to measure RL based recommender systems in these four aspects and evaluate the impact of exploration induced methods against these metrics. In addition to the offline measurements, we conduct live experiments on an industrial recommendation platform serving billions of users to showcase the benefit of exploration. Moreover, we use user conversion as an indicator of the holistic long-term user experience and study the values of exploration in helping platforms convert users. Connecting the offline analyses and live experiments, we start building the connections between these four facets of recommendation quality toward long term user experience and identify serendipity as a desirable recommendation quality that changes user states and improves long term user experience. View details
    Preview abstract Neural networks lack adversarial robustness, ie, they are vulnerable to adversarial examples that through small perturbations to inputs cause incorrect predictions. Further, trust is undermined when models give miscalibrated predictions, ie, the predicted probability is not a good indicator of how much we should trust our model. In this paper, we study the connection between adversarial robustness and calibration and find that the inputs for which the model is sensitive to small perturbations (are easily attacked) are more likely to have poorly calibrated predictions. Based on this insight, we examine if calibration can be improved by addressing those adversarially unrobust inputs. To this end, we propose Adversarial Robustness based Adaptive Label Smoothing (AR-AdaLS) that integrates the correlations of adversarial robustness and calibration into training by adaptively softening labels for an example based on how easily it can be attacked by an adversary. We find that our method, taking the adversarial robustness of the in-distribution data into consideration, leads to better calibration over the model even under distributional shifts. In addition, AR-AdaLS can also be applied to an ensemble model to further improve model calibration. View details
    Reward Shaping for User Satisfaction in a REINFORCE Recommender
    Can Xu
    Sriraj Badam
    Trevor Potter
    Daniel Li
    Hao Wan
    Elaine Le
    Chris Berg
    Eric Bencomo Dixon
    (2021)
    Preview abstract How might we design Reinforcement Learning (RL)-based recommenders that encourage aligning user trajectories with the underlying user satisfaction? Three research questions are key: (1) measuring user satisfaction, (2) combatting sparsity of satisfaction signals, and (3) adapting the training of the recommender agent to maximize satisfaction. For measurement, it has been found that surveys explicitly asking users to rate their experience with consumed items can provide valuable orthogonal information to the engagement/interaction data, acting as a proxy to the underlying user satisfaction. For sparsity, i.e, only being able to observe how satisfied users are with a tiny fraction of user-item interactions, imputation models can be useful in predicting satisfaction level for all items users have consumed. For learning satisfying recommender policies, we postulate that reward shaping in RL recommender agents is powerful for driving satisfying user experiences. Putting everything together, we propose to jointly learn a policy network and a satisfaction imputation network: The role of the imputation network is to learn which actions are satisfying to the user; while the policy network, built on top of REINFORCE, decides which items to recommend, with the reward utilizing the imputed satisfaction. We use both offline analysis and live experiments in an industrial large-scale recommendation platform to demonstrate the promise of our approach for satisfying user experiences. View details
    DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning
    Maheswaran Sathiamoorthy
    Yihua Chen
    Rahul Mazumder
    Lichan Hong
    35th Conference on Neural Information Processing Systems (NeurIPS 2021) (2021)
    Preview
    Preview abstract Developing robust NLP models that perform well on many, even small, slices of data is a significant but important challenge, with implications from fairness to general reliability. To this end, recent research has explored how models rely on spurious correlations, and how counterfactual data augmentation (CDA) can mitigate such issues. In this paper we study how and why modeling counterfactuals over multiple attributes can go significantly further in improving model performance. We propose RDI, a context-aware methodology which takes into account the impact of secondary attributes on the model’s predictions and increases sensitivity for secondary attributes over reweighted counterfactually augmented data. By implementing RDI in the context of toxicity detection, we find that accounting for secondary attributes can significantly improve robustness, with improvements in sliced accuracy on the original dataset up to 7% compared to existing robustness methods. We also demonstrate that RDI generalizes to the coreference resolution task and provide guidelines to extend this to other tasks. View details
    Measuring Model Fairness under Noisy Covariates: A Theoretical Perspective
    Aditee Ajit Kumthekar
    Alex Beutel
    Li Wei
    Nick Blumm
    Pranjal Awasthi
    Trevor Potter
    AIES (2021)
    Preview abstract In this work we study the problem of measuring the fairness of a machine learning model under noisy information. In many applications, evaluating a model according to a well-specified metric such as the FPR requires access to variables that cannot be jointly observed in a given practical setting. A standard workaround is to then use proxies for one or more of these variables. These proxies are either obtained using domain expertise or by training another machine learning model. Prior works have demonstrated the dangers of using such an approach, and strong independence assumptions are needed to provide guarantees on the accuracy of the noisy estimates via proxies. In contrast, in this work we present a general theoretical framework that aims to characterize weaker conditions under which accurate model auditing is possible via the above approach. Furthermore, our theory identifies potential sources of errors and decouples them into two interpretable parts Epsilon_c and Epsilon_g. The first part depends on natural properties of the proxy such as precision and recall, whereas the second part captures correlations between different variables of interest. We show that in many scenarios the error in the estimates is dominated by the Epsilon_c via a linear dependence, whereas the dependence on the correlations only constitutes a lower order term. As a result we expand the understanding of scenarios where model auditing via proxies can be an effective approach. Finally, we compare via simulations the theoretical upper-bounds to the distribution of simulated estimation errors and show that both theoretical guarantees and empirical results significantly improve as we progressively enforce structure along the conditions highlighted by the theory. View details
    Preview abstract In recent years, various deep neural network (DNN) models led to stellar performance in various domains. However, ML practitioners and researchers have observed severe reproducibility issues on DNN models. That is, a set of DNN models trained on the same data with exactly the same architecture may lead to quite different predictions. A common remedy is to use the ensemble method to quantify the prediction variations and improve model reproducibility. However, the ensemble method makes multiple predictions given an input, and is computationally expensive especially serving web-scale traffic at inference time. In this paper, we seek to advance our understanding of prediction variation. We demonstrate that we are able to use neuron activation strength to infer prediction variation. Through empirical experiments on two widely used benchmark datasets Movielens and Criteo, we observed that prediction variations do come from various different sources with randomness, including training data shuffling, and model and embedding parameter random initialization. By adding more randomness sources into model training, we noticed that the ensemble method tends to produce more accurate predictions with higher prediction variations. Last but not least, we demonstrate that neuron activation strength has strong prediction power to infer the ensemble prediction variation. Our approach provides a cheap and simple way for prediction variation estimation, which sets up the foundation and opens up new opportunities for future work on many interesting areas (e.g., model-based reinforcement learning, and active learning) without having to relying on expensive ensemble models. View details
    Preview abstract As multi-task models gain popularity in a wider range of machine learning applications, it is becoming increasingly important for practitioners to understand the fairness implications associated with those models. Most existing fairness literature focuses on learning a single task more fairly, while how ML fairness interacts with multiple tasks in the joint learning setting is largely under-explored. In this paper, we are concerned with how group fairness (e.g., equal opportunity, equalized odds) as an ML fairness concept plays out in the multi-task scenario. In multi-task learning, several tasks are learned jointly to exploit task correlations for a more efficient inductive transfer. This presents a multi-dimensional Pareto frontier on (1) the trade-off between group fairness and accuracy with respect to each task, as well as (2) the trade-offs across multiple tasks. We aim to provide a deeper understanding on how group fairness interacts with accuracy in multi-task learning, and we show that traditional approaches that mainly focus on optimizing the Pareto frontier of multi-task accuracy might not perform well on fairness goals. We propose a new set of metrics to better capture the multi-dimensional Pareto frontier of fairness-accuracy trade-offs uniquely presented in a multi-task learning setting. We further propose a Multi-Task-Aware Fairness (MTA-F) approach to improve fairness in multi-task learning. Experiments on several real-world datasets demonstrate the effectiveness of our proposed approach. View details
    Preview abstract Most existing recommender systems primarily focus on the users (content consumers), matching users with the most relevant contents, with the goal of maximizing user satisfaction on the platform. However, given that content providers are playing an increasingly critical role through content creation, largely determining the content pool available for recommendation, a natural question that arises is: Can we design recommenders taking into account utilities of both users and content providers? By doing so, we hope to sustain the flourish of more content providers and a diverse content pool for long-term user satisfaction. Understanding the full impact of recommendations on both user and content provider groups is challenging. This paper aims to serve as a research investigation on one approach toward building a content provider-aware recommender, and evaluating its impact under a simulated setup. To characterize the users-recommender-providers interdependence, we complement user modeling by formalizing provider dynamics as a parallel Markov Decision Process of partially observable states transited by recommender actions and user feedback. We then build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective of user utility and the counterfactual utility lift of the content provider associated with the chosen content, which we show to be equivalent to maximizing overall user utility and utilities of all content providers on the platform. To evaluate our approach, we also introduce a simulation environment capturing the key interactions among users, providers, and the recommender. We offer a number of simulated experiments that shed light to both the benefits and the limitations of our approach. These results serve to understand how and when a content-provider aware recommender agent is of benefit in building multi-stakeholder recommender systems. View details
    Deconfounding User Satisfaction Estimation from Response Rate Bias
    Madeleine Traverse
    Trevor Potter
    Emma Marriott
    Daniel Li
    Chris Haulk
    Proceedings of the 14th ACM Conference on Recommender Systems (2020)
    Preview abstract Improving user satisfaction is at the forefront of industrial recommender systems. While significant progress in recommender systems has relied on utilizing logged implicit data of user-item interactions (i.e., clicks, dwell/watch time, and other user engagement signals), there has been a recent surge of interest in measuring and modeling user satisfaction, as provided by orthogonal data sources. Such data sources typically originate from responses to user satisfaction surveys, which are explicitly asking users to rate their experience with the system and/or specific items they have consumed in the recent past. This data can be valuable for measuring and modeling the degree to which a user has had a satisfactory experience with the recommender, since what users do (engagement) does not always align with what users say they want (satisfaction as measured by surveys). We focus on a large-scale industrial system trained on user survey responses to predict user satisfaction. The predictions of the satisfaction model for each user-item pair, combined with the predictions of the other models (e.g., engagement-focused ones), are fed into the ranking component of a real-world recommender system in deciding items to present to the user. It is therefore imperative that the satisfaction model does an equally good job on imputing user satisfaction across slices of users and items, as it would directly impact which items a user is exposed to. However, the data used for training satisfaction models is specifically biased in that users are more likely to respond to a survey when they will respond that they are more satisfied. When the satisfaction survey responses in slices of data with high response rate follow a different distribution than those with low response rate, response rate becomes a confounding factor for user satisfaction estimation. We find a positive correlation between response rate and ratings in a large-scale survey dataset collected in our case study. To address this inherent response rate bias in the satisfaction data, we propose an inverse propensity weighting approach within a multi-task learning framework. We extend a simple feed-forward neural network architecture predicting user satisfaction to a shared-bottom multi-task learning architecture with two tasks: the user satisfaction estimation task, and the response rate estimation task. We concurrently train these two tasks, and use the inverse of the predictions of the response rate task as loss weights for the satisfaction task to address the response rate bias. We showcase that by doing this, (i) we can accurately model whether a user will respond to a survey, (ii) we improve the user satisfaction estimation error for the data slices with lower propensity to respond while not hurting that of the slices with higher propensity to respond, and (iii) we demonstrate in live A/B experiments that applying the resulting satisfaction predictions from this approach to rank recommendations translates to higher user satisfaction. View details
    Learned Indexes for a Google-scale Disk-based Database
    Hussam Abu-Libdeh
    Alex Beutel
    Lyric Pankaj Doshi
    Tim Klas Kraska
    Chris Olston
    (2020)
    Preview abstract There is great excitement about learned index structures, but understandable skepticism about the practicality of a new method uprooting decades of research on B-Trees. In this paper, we work to remove some of that uncertainty by demonstrating how a learned index can be integrated in a distributed, disk-based database system: Google’s Bigtable. We detail several design decisions we made to integrate learned indexes in Bigtable. Our results show that integrating learned index significantly improves the end-to-end read latency and throughput for Bigtable. View details
    Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations
    Ji Yang
    Lichan Hong
    Yang Li
    Simon Wang
    Taibai Xu
    WWW '20: Companion Proceedings of the Web Conference 2020April 2020 (2020)
    Preview abstract Learning query and item representations is important for building large scale recommendation systems. In many real applications where there is a huge catalog of items to recommend, the problem of efficiently retrieving top k items given user's query from deep corpus leads to a family of factorized modeling approaches where query and item are jointly embedded into a low-dimensional space. In this paper, we first showcase how to apply a two-tower neural network framework, which is also known as dual encoder in the natural language community, to improve a large-scale, production app recommendation system. Furthermore, we offer a novel negative sampling approach called Mixed Negative Sampling (MNS). In particular, different from commonly used batch or unigram sampling methods, MNS uses a mixture of batch and uniformly sampled negatives to tackle the selection bias of implicit user feedback. We conduct extensive offline experiments in the production dataset and show that MNS outperforms other baseline sampling methods. We also conduct online A/B testing and demonstrate that the two-tower retrieval model based on MNS significantly improves retrieval quality by encouraging more high-quality app installs. View details
    BRPO: Batch Residual Policy Optimization
    Sungryull Sohn
    Ofir Nachum
    Honglak Lee
    Proceedings of the Twenty-ninth International Joint Conference on Artificial Intelligence (IJCAI-20), Yokohama, Japan (2020), pp. 2824-2830
    Preview abstract In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e.g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state. This can cause batch RL to be overly conservative, unable to exploit large policy changes at frequently-visited, highconfidence states without risking poor performance at sparsely-visited states. To remedy this, we propose residual policies, where the allowable deviation of the learned policy is state-action-dependent. We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance. We show that BRPO achieves the state-of-the- art performance in a number of tasks. View details
    Zero-Shot Transfer Learning for Query-Item Cold Start in Search Retrieval and Recommendations
    Ankit Kumar
    Cosmo Du
    Dima Kuzmin
    Ellie Chio
    John Roberts Anderson
    Li Zhang
    Nitin Jindal
    Pei Cao
    Ritesh Agarwal
    Steffen Rendle
    Tao Wu
    Wen Li
    CIKM (2020)
    Preview abstract Most search retrieval and recommender systems predict top-K items given a query by learning directly from a large training set of (query, item) pairs, where a query can include natural language (NL), user, and context features. These approaches fall into the traditional supervised learning framework where the algorithm trains on labeled data from the target task. In this paper, we propose a new zero-shot transfer learning framework, which first learns representations of items and their NL features by predicting (item, item) correlation graphs as an auxiliary task, followed by transferring learned representations to solve the target task (query-to-item prediction), without having seen any (query, item) pairs in training. The advantages of applying this new framework include: (1) Cold-starting search and recommenders without abundant query-item data; (2) Generalizing to previously unseen or rare (query, item) pairs and alleviating the "rich get richer" problem; (3) Transferring knowledge of (item, item) correlation from domains outside of search. We show that the framework is effective on a large-scale search and recommender system. View details
    Preview abstract Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fairness research. Therefore we ask: How can we train an ML model to improve fairness when we do not even know the protected group memberships? In this work we address this problem by proposing Adversarially Reweighted Learning (ARL). In particular, we hypothesize that non-protected features and task labels are valuable for identifying fairness issues, and can be used to co-train an adversarial reweighting approach for improving fairness. Our results show that ARL improves Rawlsian Max-Min fairness, with notable AUC improvements for worst-case protected groups in multiple datasets, outperforming state-of-the-art alternatives. View details
    Preview abstract NLP models are shown to suffer from robustness issues, for example, a model's prediction can be easily changed under small perturbations to the input. In this work, we aim to present a Controlled Adversarial Text Generation (CAT-Gen) model that, given an input text, it can generate adversarial texts through controllable attributes that are known to be invariant to task labels. For example, for a main task like sentiment classification, an example attribute can be different categories/domains, and a model should have similar performance across them; for a coreference resolution task, a model's performance should not differ across different demographic attributes. Different from many existing adversarial text generation approaches, we show that our model can generate adversarial texts that are more fluent, diverse, and with better task-label invariance guarantees. We aim to use this model to generate counterfactual texts that could better improve robustness in NLP models (e.g., through adversarial training), and we argue that our generation can create more natural attacks. View details
    Preview abstract Large pre-trained models have revolutionized natural language understanding. However, researchers have found they can encode correlations undesired in many applications, like \emph{surgeon} being associated more with \emph{he} than \emph{she}. We explore such \emph{gendered correlations} as a case study, to learn how we can configure and train models to mitigate the risk of encoding unintended associations. We find that it is important to define correlation metrics, since they can reveal differences among models with similar accuracy. Large models have more capacity to encode gendered correlations, but this can be mitigated with general dropout regularization. Counterfactual data augmentation is also effective, and can even reduce correlations not explicitly targeted for mitigation, potentially making it useful beyond gender too. Both techniques yield models with comparable accuracy to unmitigated analogues, and still resist re-learning correlations in fine-tuning. View details
    Preview abstract As recent literature has demonstrated how classifiers often carry unintended biases toward some subgroups, deploying machine learned models to users demands careful consideration of the social consequences. How should we address this problem in a real-world system? How should we balance core performance and fairness metrics? In this paper, we introduce a MinDiff framework for regularizing classifiers toward different fairness metrics and analyze a technique with kernel-based statistical dependency tests. We run a thorough study on an academic dataset to compare the Pareto frontier achieved by different regularization approaches, and apply our kernel-based method to two large-scale industrial systems demonstrating real-world improvements. View details
    Fairness in Recommendation Ranking through Pairwise Comparisons
    Alex Beutel
    Tulsee Doshi
    Hai Qian
    Li Wei
    Yi Wu
    Lukasz Heldt
    Lichan Hong
    Cristos Goodrow
    KDD (2019)
    Preview abstract Recommender systems are one of the most pervasive applications of machine learning in industry, with many services using them to match users to products or information. As such it is important to ask: what are the possible fairness risks, how can we quantify them, and how should we address them? In this paper we offer a set of novel metrics for evaluating algorithmic fairness concerns in recommender systems. In particular we show how measuring fairness based on pairwise comparisons from randomized experiments provides a tractable means to reason about fairness in rankings from recommender systems. Building on this metric, we offer a new regularizer to encourage improving this metric during model training and thus improve fairness in the resulting rankings. We apply this pairwise regularization to a large-scale, production recommender system and show that we are able to significantly improve the system's pairwise fairness. View details
    Recommending What Video to Watch Next: A Multitask Ranking System
    Aditee Ajit Kumthekar
    Aniruddh Nath
    Li Wei
    Lichan Hong
    Mahesh Sathiamoorthy
    Shawn Andrews
    Recsys 2019 (2019)
    Preview abstract In this paper, we introduce a large scale multi-objective ranking system for recommending what video to watch next on an industrial video sharing platform. The system faces many real-world challenges, including the presence of multiple competing ranking objectives, as well as implicit selection biases in user feedback. To tackle these challenges, we explored a variety of soft-parameter sharing techniques such as Multi-gate Mixture-of-Experts so as to efficiently optimize for multiple ranking objectives. Additionally, we mitigated the selection biases by adopting a Wide & Deep frame- work. We demonstrated that our proposed techniques can lead to substantial improvements on recommendation quality on one of the world’s largest video sharing platforms. View details
    Preview abstract Many recommendation systems need to retrieve and score items from a large corpus. A common approach to handle data sparsity and power-law item distribution is to learn item representations from its content features. Apart from many content-aware systems based on matrix factorization, in this paper, we consider a modeling framework with two-tower neural networks where one network called item tower is used to encode a wide variety of item features. Optimizing loss functions calculated from in-batch negatives, which are items sampled in a random batch, is a general recipe of training such two-tower models. However, batch loss is subject to sampling bias which could severely restrict model performance, particularly in the case of power-law distribution. In this work, we present a novel algorithm for estimating item frequency from streaming data. Our main idea is to sketch and estimate item occurrences via gradient descent. Through theoretical analysis and simulations, we show that the proposed algorithm can work without fixed item vocabulary, and is capable of producing unbiased estimation and being adaptive to item distribution change. We then apply the sampling-bias-corrected modeling approach to build a large scale retrieval system called Neural Deep Retrieval (NDR) for YouTube recommendations. The system is deployed to retrieve personalized suggestions from a corpus of tens of millions videos. We demonstrate the effectiveness of sampling bias correction through offline experiments on two real-world datasets. We also conduct live A/B testings to show that the NDR system leads to improved recommendation quality for YouTube. View details
    Preview abstract If our models are used in new or unexpected cases, do we know if they will make fair predictions? Previously, researchers developed ways to debias a model for a single problem domain. However, this is often not how models are trained and used in practice. For example, labels and demographics (sensitive attributes) are often hard to observe, resulting in auxiliary or synthetic data to be used for training, and proxies of the sensitive attribute to be used for evaluation of fairness. A model trained for one setting may be picked up and used in many others, particularly as is common with pre-training and cloud APIs. Despite the pervasiveness of these complexities, remarkably little work in the fairness literature has theoretically examined these issues. We frame all of these settings as domain adaptation problems: how can we use what we have learned in a source domain to debias in a new target domain, without directly debiasing on the target domain as if it is a completely new problem? We offer new theoretical guarantees of improving fairness across domains, and offer a modeling approach to transfer to data-sparse target domains. We give empirical results validating the theory and showing that these modeling approaches can improve fairness metrics with less data. View details
    SageDB: A Learned Database System
    Tim Kraska
    Mohammad Alizadeh
    Alex Beutel
    Jialin Ding
    Ani Kristo
    Guillaume Leclerc
    Samuel Madden
    Hongzi Mao
    Vikram Nathan
    CIDR (2019)
    Preview abstract Modern data processing systems are designed to be general purpose, in that they can handle a wide variety of different schemas, data types, and data distributions, and aim to provide efficient access to that data via the use of optimizers and cost models. This general purpose nature results in systems that do not take advantage of the characteristics of the particular application and data of the user. With SageDB we present a vision towards a new type of a data processing system, one which highly specializes to an application through code synthesis and machine learning. By modeling the data distribution, workload, and hardware, SageDB learns the structure of the data and optimal access methods and query plans. These learned models are deeply embedded, through code synthesis, in essentially every component of the database. As such, SageDB presents radical departure from the way database systems are currently developed, raising a host of new problems in databases, machine learning and programming systems. View details
    Preview abstract Understanding temporal dynamics has proved to be highly valuable for accurate recommendation. Sequential recommenders have been successful in modeling the dynamics of users and items over time. However, while different model architectures excel at capturing various temporal ranges or dynamics, distinct application contexts require adapting to diverse behaviors. In this paper we examine how to build a model that can make use of different temporal ranges and dynamics depending on the request context. We begin with the analysis of an anonymized Youtube dataset comprising millions of user sequences. We quantify the degree of long-range dependence in these sequences and demonstrate that both short-term and long-term dependent behavioral patterns co-exist. We then propose a neural Multi-temporalrange Mixture Model (M3) as a tailored solution to deal with both short-term and long-term dependencies. Our approach employs a mixture of models, each with a different temporal range. These models are combined by a learned gating mechanism capable of exerting different model combinations given different contextual information. In empirical evaluations on a public dataset and our own anonymized YouTube dataset, M3 consistently outperforms state-of-the-art sequential recommendation methods. View details
    Preview abstract Recurrent neural networks have gained widespread use in modeling sequential data. Learning long-term dependencies using these models remains difficult though, due to exploding or vanishing gradients. In this paper, we draw connections between recurrent networks and ordinary differential equations. A special form of recurrent networks called the AntisymmetricRNN is proposed under this theoretical framework, which is able to capture long-term dependencies thanks to the stability property of its underlying differential equation. Existing approaches to improving RNN trainability often incur significant computation overhead. In comparison, AntisymmetricRNN achieves the same goal by design. We showcase the advantage of this new architecture through extensive simulations and experiments. AntisymmetricRNN exhibits much more predictable dynamics. It outperforms regular LSTM models on tasks requiring long-term memory and matches the performance on tasks where short-term dependencies dominate despite being much simpler. View details
    Top-K Off-Policy Correction for a REINFORCE Recommender System
    Alex Beutel
    Paul Covington
    Sagar Jain
    Francois Belletti
    ACM International Conference on Web Search and Data Mining (WSDM) (2019)
    Preview abstract Industrial recommender systems deal with extremely large action spaces – many millions of items to recommend. Moreover, they need to serve billions of users, who are unique at any point in time, making a complex user state space. Luckily, huge quantities of logged implicit feedback (e.g., user clicks, dwell time) are available for learning. Learning from the logged feedback is however subject to biases caused by only observing feedback on recommendations selected by the previous versions of the recommender. In this work, we present a general recipe of addressing such biases in a production top-K recommender system at YouTube, built with a policy-gradient-based algorithm, i.e. REINFORCE [48]. The contributions of the paper are: (1) scaling REINFORCE to a production recommender system with an action space on the orders of millions; (2) applying off-policy correction to address data biases in learning from logged feedback collected from multiple behavior policies; (3) proposing a novel top-K off-policy correction to account for our policy recommending multiple items at a time; (4) showcasing the value of exploration. We demonstrate the efficacy of our approaches through a series of simulations and multiple live experiments on YouTube. View details
    Preview abstract Ranking is a central task in machine learning and information retrieval. In this task, it is especially important to present the user with a slate of items that is appealing as a whole. This in turn requires taking into account interactions between items, since intuitively, placing an item on the slate affects the decision of which other items should be placed alongside it. In this work, we propose a sequence-to-sequence model for ranking called seq2slate. At each step, the model predicts the next "best" item to place on the slate given the items already selected. The sequential nature of the model allows complex dependencies between the items to be captured directly in a flexible and scalable way. We show how to learn the model end-to-end from weak supervision in the form of easily obtained click-through data. We further demonstrate the usefulness of our approach in experiments on standard ranking benchmarks as well as in a real-world recommendation system. View details
    Efficient Training on Very Large Corpora via Gramian Estimation
    Nicolas Mayoraz
    Steffen Rendle
    Li Zhang
    Lichan Hong
    John Anderson
    ICLR 2019 (to appear)
    Preview abstract We study the problem of learning similarity functions over very large corpora using neural network embedding models. These models are typically trained using SGD with random sampling of unobserved pairs, with a sample size that grows quadratically with the corpus size, making it expensive to scale. We propose new efficient methods to train these models without having to sample unobserved pairs. Inspired by matrix factorization, our approach relies on adding a global quadratic penalty and expressing this term as the inner-product of two generalized Gramians. We show that the gradient of this term can be efficiently computed by maintaining estimates of the Gramians, and develop variance reduction schemes to improve the quality of the estimates. We conduct large-scale experiments that show a significant improvement both in training time and generalization performance compared to sampling methods. View details
    Preview abstract Characterizing temporal dependence patterns is a critical step in understanding the statistical properties of sequential data. Long Range Dependence (LRD), referring to long-range correlations decaying as a power law rather than exponentially w.r.t. distance, demands a different set of tools for modeling the underlying dynamics of the sequential data. While it has been widely conjectured that LRD is present in language modeling and sequential recommendation, the amount of LRD in the corresponding sequential datasets has not yet been quantified in a scalable and model-independent manner. We propose a principled estimation procedure of LRD in sequential datasets based on established LRD theory for real-valued time series and apply it to sequences of symbols with million-item-scale dictionaries. In our measurements, the procedure estimates reliably the LRD in the behavior of users as they write Wikipedia articles and as they interact with Youtube. We further show that measuring LRD better informs modeling decisions in particular for RNNs whose ability to capture LRD is still an active area of research. The quantitative measure of LRD informs new Evolutive Recurrent Neural Networks (EvolutiveRNNs) designs, leading to state-of-the-art results on language understanding and sequential recommendation tasks at a fraction of the computational cost. View details
    Counterfactual Fairness in Text Classification through Robustness
    Sahaj Garg
    Nicole Limtiaco
    Ankur Taly
    Alex Beutel
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) (2019)
    Preview abstract In this paper, we study counterfactual fairness in text classification, which asks the question: How would the prediction change if the sensitive attribute referenced in the example were different? Toxicity classifiers demonstrate a counterfactual fairness issue by predicting that "Some people are gay'' is toxic while "Some people are straight'' is nontoxic. We offer a metric, counterfactual token fairness (CTF), for measuring this particular form of fairness in text classifiers, and describe its relationship with group fairness. Further, we offer three approaches, blindness, counterfactual augmentation, and counterfactual logit pairing (CLP), for optimizing counterfactual token fairness during training, bridging the robustness and fairness literature. Empirically, we find that blindness and CLP address counterfactual token fairness. The methods do not harm classifier performance, and have varying tradeoffs with group fairness. These approaches, both for measurement and optimization, provide a new path forward for addressing fairness concerns in text classification. View details
    Putting Fairness Principles into Practice: Challenges, Metrics, and Improvements
    Alex Beutel
    Tulsee Doshi
    Hai Qian
    Allison Woodruff
    Christine Luu
    Pierre Kreitmann
    Jonathan Bischof
    AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) (2019)
    Preview abstract As more researchers have become aware of and passionate about algorithmic fairness, there has been an explosion in papers laying out new metrics, suggesting algorithms to address issues, and calling attention to issues in existing applications of machine learning. This research has greatly expanded our understanding of the concerns and challenges in deploying machine learning, but there has been much less work in seeing how the rubber meets the road. In this paper we provide a case-study on the application of fairness in machine learning research to a production classification system, and offer new insights in how to measure and address algorithmic fairness issues. We discuss open questions in implementing equality of opportunity and describe our fairness metric, conditional equality, that takes into account distributional differences. Further, we provide a new approach to improve on the fairness metric during model training and demonstrate its efficacy in improving performance for a real-world product. View details
    Preview abstract Neural-based multi-task learning has been successfully used in many real-world large-scale applications such as recommendation systems. For example, in movie recommendations, beyond providing users movies which they tend to purchase and watch, the system might also optimize for users liking the movies afterwards. With multi-task learning, we aim to build a single model that learns these multiple goals and tasks simultaneously. However, the prediction quality of commonly used multi-task models is often sensitive to the relationships between tasks. It is therefore important to study the modeling tradeoffs between task-specific objectives and inter-task relationships. In this work, we propose a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data. We adapt the Mixture-of-Experts (MoE) structure to multi-task learning by sharing the expert submodels across all tasks, while also having a gating network trained to optimize each task. To validate our approach on data with different levels of task relatedness, we first apply it to a synthetic dataset where we control the task relatedness. We show that the proposed approach performs better than baseline methods when the tasks are less related. We also show that the MMoE structure results in an additional trainability benefit, depending on different levels of randomness in the training data and model initialization. Furthermore, we demonstrate the performance improvements by MMoE on real tasks including a binary classification benchmark, and a large-scale content recommendation system at Google. View details
    Categorical-Attributes-Based Multi-Level Classification for Recommender Systems
    Qian Zhao
    Sagar Jain
    Alex Beutel
    Francois Belletti
    ACM Conference Series on Recommender Systems, RecSys (2018)
    Preview abstract Many techniques to utilize side information of users and/or items as inputs to recommenders to improve recommendation, especially on cold-start items/users, have been developed over the years. In this work, we test the approach of utilizing item side information, specifically categorical attributes, in the output of recommendation models either through multi-task learning or hierarchical classification. We first demonstrate the efficacy of these approaches for both matrix factorization and neural networks with a medium-size realword data set. We then show that they improve a neural-network based production model in an industrial-scale recommender system. We demonstrate the robustness of the hierarchical classification approach by introducing noise in building the hierarchy. Lastly, we investigate the generalizability of hierarchical classification on a simulated dataset by building two user models in which we can fully control the generative process of user-item interactions. View details
    Preview abstract The ability to capture Long Range Dependence (LRD) in a stochastic process is of prime importance in the context of predictive models. A sequential model with a longer-term memory is able to better contextualize recent observations. In this article, we apply the theory of LRD stochastic processes to modern recurrent architectures such as LSTM and GRU and prove they do not provide LRD behavior under homoscedasticity assumptions. After having proven that leaky gating mechanisms lead to memory loss in gated recurrent networks such as LSTMs and GRUs we provide an architecture that attempts at addressing the issue of faulty memory. The key insight of our theoretical study is to encourage memory redundancy. We show how the resulting architectures are more lightweight, parallelizable and able to leverage old observations. Experimental results on a synthetic copy task, the Youtube-8m video classification task and a latency sensitive recommender system show that our approach leads to better memorization View details
    Preview abstract Recommendation systems, prevalent in many applications, aim to surface to users the right content at the right time. Recently, researchers have aspired to develop conversational systems that offer seamless interactions with users, more effectively eliciting user preferences and offering better recommendations. Taking a step towards this goal, this paper explores the two stages of a single round of conversation with a user: which question to ask the user, and how to use their feedback to respond with a more accurate recommendation. Following these two stages, first, we detail an RNN-based model for generating topics a user might be interested in, and then extend a state-of-the-art RNN-based video recommender to incorporate the user’s selected topic. We describe our proposed system Q&R, i.e., Question & Recommendation, and the surrogate tasks we utilize to bootstrap data for training our models. We evaluate different components of Q&R on live traffic in various applications within YouTube: User Onboarding, Homepage Recommendation, and Notifications. Our results demonstrate that our approach improves upon state-of-the-art recommendation models, including RNNs, and makes these applications more useful, such as a > 1% increase in video notifications opened. Q&R has been deployed and is used in YouTube production. Further, our design choices can be useful to practitioners wanting to transition to more conversational recommendation systems. View details
    The Case for Learned Index Structures
    Tim Kraska
    Alex Beutel
    Neoklis Polyzotis
    SIGMOD (2018)
    Preview abstract Indexes are models: a BTree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible. View details
    Latent Cross: Making Use of Context in Recurrent Recommender Systems
    Alex Beutel
    Paul Covington
    Sagar Jain
    Can Xu
    Jia Li
    Vince Gatto
    WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining
    Preview abstract The success of recommender systems often depends on their ability to understand and make use of the context of the recommendation request. Significant research has focused on how time, location, interfaces, and a plethora of other contextual features affect recommendations. However, in using deep neural networks for recommender systems, researchers often ignore these contexts or incorporate them as ordinary features in the model. In this paper, we study how to effectively treat contextual data in neural recommender systems. We begin with an empirical analysis of the conventional approach to context as features in feed-forward recommenders and demonstrate that this approach is inefficient in capturing common feature crosses. We apply this insight to design a state-of-the-art RNN recommender system. We first describe our RNN-based recommender system in use at YouTube. Next, we offer "Latent Cross," an easy-to-use technique to incorporate contextual data in the RNN by embedding the context feature first and then performing an element-wise product of the context embedding with model's hidden states. We demonstrate the improvement in performance by using this Latent Cross technique in multiple experimental settings. View details
    Beyond Globally Optimal: Focused Learning for Improved Recommendations
    Alex Beutel
    Hubert Pham
    John Anderson
    Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017
    Preview abstract When building a recommender system, how can we ensure that all items are modeled well? Classically, recommender systems are built, optimized, and tuned to improve a global prediction objective, such as root mean squared error. However, as we demonstrate, these recommender systems often leave many items badly-modeled and thus under-served. Further, we give both empirical and theoretical evidence that no single matrix factorization, under current state-of-the-art methods, gives optimal results for each item. As a result, we ask: how can we learn additional models to improve the recommendation quality for a specified subset of items? We offer a new technique called focused learning, based on hyperparameter optimization and a customized matrix factorization objective. Applying focused learning on top of weighted matrix factorization, factorization machines, and LLORMA, we demonstrate prediction accuracy improvements on multiple datasets. For instance, on MovieLens we achieve as much as a 17% improvement in prediction accuracy for niche movies, cold-start items, and even the most badly-modeled items in the original model. View details
    Preview abstract How can we learn classifier that is ``fair'' for a protected or sensitive group, when we do not know if the input to the classifier affects the protected group? How can we train such a classifier when data on the protected group is difficult to attain? In many settings, finding out the sensitive input attribute can be prohibitively expensive even during model training, and possibly impossible during model serving. For example, in recommender systems, if we want to predict if a user will click on a given recommendation, we often do not know many attributes of the user, e.g., race or age, and many attributes of the content are hard to determine, e.g., the language or topic. Thus, it is not feasible to use a different classifier calibrated based on knowledge of the sensitive attribute. Here, we use an adversarial training procedure to remove information about the sensitive attribute from the latent representation learned by a neural network. In particular, we study how the choice of data for the adversarial training effects the resulting fairness properties. We find two interesting results: a remarkably small amount of data is needed to train these models, and there is still a gap between the theoretical implications and the empirical results. View details
    Video WatchTime and Comment Sentiment: Experience from YouTube
    Bo Fu
    Pei Cao
    Rong Yang
    Proceedings of the Fourth IEEE Workshop on Hot Topics in Web Systems and Technologies (2016)
    Preview abstract Video watching is now an indispensable part of the general public media consumption, yet very little is known about the relationship between how users interact with each other and how that affects video consumption patterns. In this paper, we explore the relationship between user commenting behavior and how that might or might not be predictive of video consumption patterns such as watch time. Contrary to recent findings, we found that video watch time is correlated with the positive sentiment expressed in the comments of YouTube videos. More precisely, videos with more positive sentiment on average in the comments are more likely to be watched longer; while videos with negative comment sentiment on average are more likely to have shorter watch durations. These results suggest that users prefer videos that evoke positive emotional responses. If the findings here generalizes to other social media, this result suggests a motivational design finding that is useful for other system designers. View details
    Google+ Communities as Plazas and Topic Boards
    Michael J. Brzozowski
    Phil Adams
    Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), ACM, New York, NY (2015), pp. 3779-3788
    Preview abstract Researchers have recently been focusing on understanding online communities in social networks that offer easy access to new audiences. In this work, we conducted a mixed-method study of public Google+ Communities and found two major types evident in both how users talk about them and how they appear to use them: plazas to meet new people, and topic boards to discuss common interests. This reflects two common motivations users cite in describing Communities: "meeting like minded people" and "finding great content". We characterize these two types of Communities within Google+ using mixed methods including surveys, interviews, and quantitative analytics, and expose differences in user behaviors between them. View details
    Improving User Topic Interest Profiles by Behavior Factorization
    Lichan Hong
    Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2015), pp. 1406-1416
    Preview abstract Many recommenders aim to provide relevant recommendations to users by building personal topic interest profiles and then using these profiles to find interesting contents for the user. In social media, recommender systems build user profiles by directly combining users' topic interest signals from a wide variety of consumption and publishing behaviors, such as social media posts they authored, commented on, +1'd or liked. Here we propose to separately model users' topical interests that come from these various behavioral signals in order to construct better user profiles. Intuitively, since publishing a post requires more effort, the topic interests coming from publishing signals should be more accurate of a user's central interest than, say, a simple gesture such as a +1. By separating a single user's interest profile into several behavioral profiles, we obtain better and cleaner topic interest signals, as well as enabling topic prediction for different types of behavior, such as topics that the user might +1 or comment on, but might never write a post on that topic. To do this at large scales in Google+, we employed matrix factorization techniques to model each user's behaviors as a separate example entry in the input user-by-topic matrix. Using this technique, which we call "behavioral factorization", we implemented and built a topic recommender predicting user's topical interests using their actions within Google+. We experimentally showed that we obtained better and cleaner signals than baseline methods, and are able to more accurately predict topic interests as well as achieve better coverage. View details
    Inserting Micro-Breaks into Crowdsourcing Workflows
    Jeffrey M. Rzeszotarski
    Praveen Paritosh
    Peng Dai
    HCOMP 2013
    Preview
    Preview abstract As news reading becomes more social, how do different types of annotations affect people's selection of news articles? This paper reports on results from two experiments looking at social annotations in two different news reading contexts. The first experiment simulates a logged-out experience with annotations from strangers, a computer agent, and a branded company. Results indicate that, perhaps unsurprisingly, annotations by strangers have no persuasive effects. However, surprisingly, unknown branded companies still had a persuasive effect. The second experiment simulates a logged-in experience with annotations from friends, finding that friend annotations are both persuasive and improve user satisfaction over their article selections. In post-experiment interviews, we found that this increased satisfaction is due partly because of the context that annotations add. That is, friend annotations both help people decide what to read, and provide social context that improves engagement. Interviews also suggest subtle expertise effects. We discuss implications for design of social annotation systems and suggestions for future research. View details
    Instant Foodie: Predicting Expert Ratings From Grassroots
    Chenhao Tan
    Gueorgi Kossinets
    Alex J. Smola
    CIKM’13, Oct. 27–Nov. 1, 2013, San Francisco, CA, USA, ACM
    Preview abstract Consumer review sites and recommender systems typically rely on a large volume of user-contributed ratings, which makes rating acquisition an essential component in the design of such systems. User ratings are then summarized to provide an aggregate score representing a popular evaluation of an item. An inherent problem in such summarization is potential bias due to raters’ self-selection and heterogeneity in terms of experiences, tastes and rating scale interpretations. There are two major approaches to collecting ratings, which have different advantages and disadvantages. One is to allow a large number of volunteers to choose and rate items directly (a method employed by e.g. Yelp and Google Places). Alternatively, a panel of raters may be maintained and invited to rate a predefined set of items at regular intervals (such as in Zagat Survey). The latter approach arguably results in more consistent reviews and reduced selection bias, however, at the expense of much smaller coverage (fewer rated items). In this paper, we examine the two different approaches to collecting user ratings of restaurants and explore the question of whether it is possible to reconcile them. Specifically, we study the problem of inferring the more calibrated Zagat Survey ratings (which we dub “expert ratings”) from the user-contributed ratings (“grassroots”) in Google Places. To achieve this, we employ latent factor models and provide a probabilistic treatment of the ordinal ratings. We can predict Zagat Survey ratings accurately from ad hoc user-generated ratings by employing joint optimization. Furthermore, the resulting model show that users become more discerning as they submit more ratings. We also describe an approach towards cross-city recommendations, answering questions such as “What is the equivalent of the Per Se restaurant in Chicago?” View details
    Swipe vs. scroll: web page switching on mobile browsers
    Andrew Warr
    In Proc. of CHI2013, ACM, pp. 2171-2174
    Preview abstract Tabbed web browsing interfaces enable users to multi-task and easily switch between open web pages. However, tabbed browsing is difficult for mobile web browsers due to the limited screen space and the reduced precision of touch. We present an experiment comparing Safari's pages-based switching interface using horizontal swiping gestures with the stacked cards-based switching interface using vertical scrolling gestures, introduced by Chrome. The results of our experiment show that cards-based switching interface allows for faster switching and is less frustrating, with no significant effect on error rates. We generalize these findings, and provide design implications for mobile information spaces. View details
    Perception and Understanding of Social Annotations in Web Search
    In Proc. of WWW2013, International World Wide Web Conferences Steering Committee, pp. 403-412
    Preview abstract As web search increasingly becomes reliant on social signals, it is imperative for us to understand the effect of these signals on users' behavior. There are multiple ways in which social signals can be used in search: (a) to surface and rank important social content; (b) to signal to users which results are more trustworthy and important by placing annotations on search results. We focus on the latter problem of understanding how social annotations affect user behavior. In previous work, through eyetracking research we learned that users do not generally seem to fixate on social annotations when they are placed at the bottom of the search result block, with 11% probability of fixation [22]. A second eyetracking study showed that placing the annotation on top of the snippet block might mitigate this issue [22], but this study was conducted using mock-ups and with expert searchers. In this paper, we describe a study conducted with a new eyetracking mix-method using a live traffic search engine with the suggested design changes on real users using the same experimental procedures. The study comprised of 11 subjects with an average of 18 tasks per subject using an eyetrace-assisted retrospective think-aloud protocol. Using a funnel analysis, we found that users are indeed more likely to notice the annotations with a 60% probability of fixation (if the annotation was in view). Moreover, we found no learning effects across search sessions but found significant differences in query types, with subjects having a lower chance of fixating on annotations for queries in the news category. In the interview portion of the study, users reported interesting "wow" moments as well as usefulness in recalling or re-finding content previously shared by oneself or friends. The results not only shed light on how social annotations should be designed in search engines, but also how users make use of social annotations to make decisions about which pages are useful and potentially trustworthy. View details
    Talking in Circles: Selective Sharing in Google+
    Sanjay Kairam
    Michael J. Brzozowski
    Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI ’12), ACM, New York, NY (2012), pp. 1065-1074
    Preview abstract Online social networks have become indispensable tools for information sharing, but existing ‘all-or-nothing’ models for sharing have made it difficult for users to target information to specific parts of their networks. In this paper, we study Google+, which enables users to selectively share content with specific ‘Circles’ of people. Through a combination of log analysis with surveys and interviews, we investigate how active users organize and select audiences for shared content. We find that these users frequently engaged in selective sharing, creating circles to manage content across particular life facets, ties of varying strength, and interest-based groups. Motivations to share spanned personal and informational reasons, and users frequently weighed ‘limiting’ factors (e.g. privacy, relevance, and social norms) against the desire to reach a large audience. Our work identifies implications for the design of selective sharing mechanisms in social networks. View details
    Social Annotations in Web Search
    Aditi Muralidharan
    Zoltan Gyongyi
    Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems (CHI '12), ACM, New York, NY, pp. 1085-1094
    Preview abstract We ask how to best present social annotations on search results, and attempt to find an answer through mixed-method eye-tracking and interview experiments. Current practice is anchored on the assumption that faces and names draw attention; the same presentation format is used independently of the social connection strength and the search query topic. The key findings of our experiments indicate room for improvement. First, only certain social contacts are useful sources of information, depending on the search topic. Second, faces lose their well-documented power to draw attention when rendered small as part of a social search result annotation. Third, and perhaps most surprisingly, social annotations go largely unnoticed by users in general due to selective, structured visual parsing behaviors specific to search result pages. We conclude by recommending improvements to the design and content of social annotations to make them more noticeable and useful. View details
    2nd Workshop on context-awareness in retrieval and recommendation:(CaRR 2012)
    E.W. De Luca
    M. Böhmer
    A. Said
    Proceedings of the 2012 ACM international conference on Intelligent User Interfaces, pp. 409-412
    Preview
    RepliCHI SIG: from a panel to a new submission venue for replication
    Max Wilson
    Wendy Mackay
    Michael Bernstein
    Jeffrey Nichols
    Proceedings of the 2012 ACM annual conference extended abstracts on Human Factors in Computing Systems Extended Abstracts, ACM, New York, NY, USA, pp. 1185-1188
    Preview
    Apples to oranges?: comparing across studies of open collaboration/peer production
    Judd Antin
    James Howison
    Sharoda Paul
    Aaron Shaw
    Jude Yew
    WikiSym '11: Proceedings of the 7th International Symposium on Wikis and Open Collaboration, ACM, New York, NY, USA (2011), pp. 227-228
    Preview
    VisualWikiCurator: human and machine intelligencefor organizing wiki content
    Nicholas Kong
    Ben Hanrahan
    Thiébaud Weksteen
    Gregorio Convertino
    IUI '11: Proceedings of the 16th international conference on Intelligent user interfaces, ACM, New York, NY, USA (2011), pp. 367-370
    From slacktivism to activism: participatory culture in the age of social media
    D. Rotman
    S. Vieweg
    S. Yardi
    J. Preece
    B. Shneiderman
    P. Pirolli
    T. Glaisyer
    PART 2-----------Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems, pp. 819-822
    Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles
    Brent Hecht
    Lichan Hong
    Bongwon Suh
    CHI '11: Proceedings of the 2011 annual conference on Human factors in computing systems, ACM, New York, NY, USA, pp. 237-246
    Mail2Wiki: posting and curating Wiki content from email
    Benjamin V. Hanrahan
    Thiebaud Weksteen
    Nicholas Kong
    Gregorio Convertino
    Guillaume Bouchard
    Cedric Archambeau
    IUI '11: Proceedings of the 16th international conference on Intelligent user interfaces, ACM, New York, NY, USA (2011), pp. 441-442
    RepliCHI-CHI should be replicating and validating results more: discuss
    M.L. Wilson
    W. Mackay
    M. Bernstein
    D. Russell
    H. Thimbleby
    PART 2-----------Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems, pp. 463-466
    Transferability of research findings: context-dependent or model-driven
    Mary Czerwinski
    David Millen
    Dave Randall
    Gunnar Stevens
    Volker Wulf
    John Zimmermann
    CHI EA '11: Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems, ACM, New York, NY, USA, pp. 651-654
    Mail2Wiki: low-cost sharing and early curation from email to wikis
    Ben Hanrahan
    Guillaume Bouchard
    Gregorio Convertino
    Thiebaud Weksteen
    Nicholas Kong
    Cedric Archambeau
    C\&\#38;T '11: Proceedings of the 5th International Conference on Communities and Technologies, ACM, New York, NY, USA (2011), pp. 98-107
    VisualWikiCurator: a corporate Wiki plugin
    N. Kong
    G. Convertino
    B. Hanrahan
    Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems, pp. 1549-1554
    RepliCHI - CHI should be replicating and validating results more: discuss
    Max L. Wilson
    Wendy Mackay
    Michael Bernstein
    Dan Russell
    Harold Thimbleby
    CHI EA '11: Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems, ACM, New York, NY, USA, pp. 463-466
    Crowdsourcing and human computation: systems, studies and platforms
    Michael Bernstein
    Lydia Chilton
    Björn Hartmann
    Aniket Kittur
    Robert C. Miller
    CHI EA '11: Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems, ACM, New York, NY, USA, pp. 53-56
    Festschrift panel in honor of Stuart K. Card
    P. Pirolli
    B. John
    J. Olson
    D. Russel
    T. Moran
    Proceedings of the 2011 annual conference on Human factors in computing systems
    Speak little and well: recommending conversations in online social streams
    J. Chen
    R. Nairn
    Proceedings of the 2011 annual conference on Human factors in computing systems, pp. 217-226
    Reviewing peer review
    Jeannette M. Wing
    Commun. ACM, vol. 54 (2011), pp. 10-11
    The trouble with social computing systems research
    Michael S. Bernstein
    Mark S. Ackerman
    Robert C. Miller
    CHI EA '11: Proceedings of the 2011 annual conference extended abstracts on Human factors in computing systems, ACM, New York, NY, USA, pp. 389-398
    Mail2Wiki: low-cost sharing and organization on wikis
    B. Hanrahan
    Guillaume Bouchard
    G. Convertino
    T. Weksteen
    N. Kong
    C. Archambeau
    Proceedings of the 5th International Conference on Communities and Technologies, ACM, Brisbane, Australia (2011), pp. 98-107
    Mail2tag: Efficient targeting of news in an organization
    L. Nelson
    R. Nairn
    CSCW 2010 Workshop Paper: Collective Intelligence In Organizations (2010)
    An elaborated model of social search
    Brynn M. Evans
    Inf. Process. Manage., vol. 46 (2010), pp. 656-678
    Expanding CS education; improving software development
    Ruben Ortega
    Commun. ACM, vol. 53 (2010), pp. 8-9
    Eddi: interactive topic-based browsing of social status streams
    Michael S. Bernstein
    Bongwon Suh
    Lichan Hong
    Jilin Chen
    Sanjay Kairam
    UIST '10: Proceedings of the 23nd annual ACM symposium on User interface software and technology, ACM, New York, NY, USA (2010), pp. 303-312
    FeedWinnower: layering structures over collections of information streams
    Lichan Hong
    Gregorio Convertino
    Bongwon Suh
    Sanjay Kairam
    CHI '10: Proceedings of the 28th international conference on Human factors in computing systems, ACM, New York, NY, USA (2010), pp. 947-950
    Clorg: collective intelligence in organizations
    Gregorio Convertino
    Antonietta Grasso
    Giorgio De Michelis
    David R. Millen
    GROUP '10: Proceedings of the 16th ACM international conference on Supporting group work, ACM, New York, NY, USA (2010), pp. 355-358
    Short and tweet: experiments on recommending content from information streams
    J. Chen
    R. Nairn
    L. Nelson
    M. Bernstein
    Proceedings of the 28th international conference on Human factors in computing systems (2010), pp. 1185-1194
    Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network
    Bongwon Suh
    Lichan Hong
    Peter Pirolli
    SOCIALCOM '10: Proceedings of the 2010 IEEE Second International Conference on Social Computing, IEEE Computer Society, Washington, DC, USA, pp. 177-184
    Designing a cross-channel information management tool for workers in enterprise task forces
    Gregorio Convertino
    Sanjay Kairam
    Lichan Hong
    Bongwon Suh
    AVI '10: Proceedings of the International Conference on Advanced Visual Interfaces, ACM, New York, NY, USA (2010), pp. 103-110
    A comparison of generated Wikipedia profiles using social labeling and automatic keyword extraction
    T. Russell
    B. Suh
    Fourth International Conference on Weblogs and Social Media (ICWSM���10), Washington, DC (2010)
    Advancing the Design of Technology-Mediated Social Participation Systems
    Sean Munson
    Gerhard Fischer
    Sarah Vieweg
    Cynthia Parr
    Computer, vol. 43 (2010), pp. 29-35
    The chaos of the internet as an external brain; and more
    Greg Linden
    Mark Guzdial
    Commun. ACM, vol. 53 (2010), pp. 10-11
    What's in Wikipedia?: mapping topics and conflict using socially annotated category structure
    Aniket Kittur
    Bongwon Suh
    CHI '09: Proceedings of the 27th international conference on Human factors in computing systems, ACM, New York, NY, USA (2009), pp. 1509-1512
    Collaborative filtering is not enough? Experiments with a mixed-model recommender for leisure activities
    N. Ducheneaut
    K. Partridge
    Q. Huang
    B. Price
    M. Roberts
    V. Bellotti
    B. Begole
    User Modeling, Adaptation, and Personalization (2009), pp. 295-306
    Annotate once, appear anywhere: collective foraging for snippets of interest using paragraph fingerprinting
    Lichan Hong
    CHI '09: Proceedings of the 27th international conference on Human factors in computing systems, ACM, New York, NY, USA (2009), pp. 1791-1794
    Information Seeking Can Be Social
    Computer, vol. 42 (2009), pp. 42-46
    The singularity is not near: slowing growth of Wikipedia
    Bongwon Suh
    Gregorio Convertino
    Peter Pirolli
    WikiSym '09: Proceedings of the 5th International Symposium on Wikis and Open Collaboration, ACM, New York, NY, USA (2009), pp. 1-10
    Augmented social cognition: using social web technology to enhance the ability of groups to remember, think, and reason
    SIGMOD '09: Proceedings of the 35th SIGMOD international conference on Management of data, ACM, New York, NY, USA (2009), pp. 973-984
    A Position Paper on 'Living Laboratories': Rethinking Ecological Designs and Experimentation in Human-Computer Interaction
    Proceedings of the 13th International Conference on Human-Computer Interaction. Part I, Springer-Verlag, Berlin, Heidelberg (2009), pp. 597-605
    Activity Awareness and Social Sensemaking 2.0: Design of a Task Force Workspace
    G. Convertino
    L. Hong
    L. Nelson
    P. Pirolli
    Foundations of Augmented Cognition. Neuroergonomics and Operational Neuroscience (2009), pp. 128-137
    Signpost from the masses: learning effects in an exploratory social tag search browser
    Yvonne Kammerer
    Rowan Nairn
    Peter Pirolli
    CHI '09: Proceedings of the 27th international conference on Human factors in computing systems, ACM, New York, NY, USA (2009), pp. 625-634
    A position paper on 'living laboratories': Rethinking ecological designs and experimentation in human-computer interaction
    Human-Computer Interaction. New Trends (2009), pp. 597-605
    With a little help from my friends: examining the impact of social annotations in sensemaking tasks
    Les Nelson
    Christoph Held
    Peter Pirolli
    Lichan Hong
    Diane Schiano
    CHI '09: Proceedings of the 27th international conference on Human factors in computing systems, ACM, New York, NY, USA (2009), pp. 1795-1798
    Impact on Performance and Process by a Social Annotation System: A Social Reading Experiment
    L. Nelson
    G. Convertino
    P. Pirolli
    L. Hong
    Foundations of Augmented Cognition. Neuroergonomics and Operational Neuroscience (2009), pp. 270-278
    Crowdsourcing for usability: Using micro-task markets for rapid, remote, and low-cost user measurements
    A. Kittur
    B. Suh
    Proc. CHI 2008 (2008)
    Lifting the veil: improving accountability and social transparency in Wikipedia with wikidashboard
    Bongwon Suh
    Aniket Kittur
    Bryan A. Pendleton
    CHI '08: Proceedings of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA (2008), pp. 1037-1040
    Can you ever trust a wiki?: impacting perceived trustworthiness in wikipedia
    Aniket Kittur
    Bongwon Suh
    CSCW '08: Proceedings of the 2008 ACM conference on Computer supported cooperative work, ACM, New York, NY, USA, pp. 477-480
    Towards a model of understanding social search
    Brynn M. Evans
    CSCW '08: Proceedings of the 2008 ACM conference on Computer supported cooperative work, ACM, New York, NY, USA, pp. 485-494
    Understanding the efficiency of social tagging systems using information theory
    Todd Mytkowicz
    HT '08: Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, ACM, New York, NY, USA (2008), pp. 81-88
    The Social Web: Research and Opportunities
    Computer, vol. 41 (2008), pp. 88-91
    LATEST: A System for Active Learning About Emerging Science and Technology
    P. Pirolli
    L. Hong
    2008 HCIC Workshop, Frasier, Colorado
    SparTag.us: a low cost tagging system for foraging of web content
    Lichan Hong
    Raluca Budiu
    Peter Pirolli
    Les Nelson
    AVI '08: Proceedings of the working conference on Advanced visual interfaces, ACM, New York, NY, USA (2008), pp. 65-72
    The social (open) workspace
    David A. Evans
    Susan Feldman
    Nataša Milic-Frayling
    Igor Perisic
    CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management, ACM, New York, NY, USA (2008), pp. 1529-1529
    Crowdsourcing user studies with Mechanical Turk
    Aniket Kittur
    Bongwon Suh
    CHI '08: Proceedings of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA (2008), pp. 453-456
    Activity-based serendipitous recommendations with the Magitti mobile leisure guide
    Victoria Bellotti
    Bo Begole
    Nicolas Ducheneaut
    Ji Fang
    Ellen Isaacs
    Tracy King
    Mark W. Newman
    Bob Price
    Paul Rasmussen
    Michael Roberts
    Diane J. Schiano
    Alan Walendowski
    CHI '08: Proceedings of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA (2008), pp. 1157-1166
    Web usability
    Conference on Human Factors in Computing Systems: Proceedings of the SIGCHI conference on Human factors in computing systems (2007)
    He says, she says: conflict and coordination in Wikipedia
    Aniket Kittur
    Bongwon Suh
    Bryan A. Pendleton
    CHI '07: Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA (2007), pp. 453-462
    Power of the few vs. wisdom of the crowd: Wikipedia and the rise of the bourgeoisie
    A. Kittur
    B.A. Pendleton
    B. Suh
    T. Mytkowicz
    World Wide Web, vol. 1 (2007), pp. 19
    Visual foraging of highlighted text: an eye-tracking study
    Michelle Gumbrecht
    Lichan Hong
    HCI'07: Proceedings of the 12th international conference on Human-computer interaction, Springer-Verlag, Berlin, Heidelberg (2007), pp. 589-598
    ScentIndex and ScentHighlights: productive reading techniques for conceptually reorganizing subject indexes and highlighting passages
    Lichan Hong
    Julie Heiser
    Stuart K. Card
    Michelle Gumbrecht
    Information Visualization, vol. 6 (2007), pp. 32-47
    Aspects of augmented social cognition: social information foraging and social search
    Peter Pirolli
    Shyong K. Lam
    OCSC'07: Proceedings of the 2nd international conference on Online communities and social computing, Springer-Verlag, Berlin, Heidelberg (2007), pp. 60-69
    Us vs. Them: Understanding Social Dynamics in Wikipedia with Revert Graph Visualizations
    Bongwon Suh
    Bryan A. Pendleton
    Aniket Kittur
    VAST '07: Proceedings of the 2007 IEEE Symposium on Visual Analytics Science and Technology, IEEE Computer Society, Washington, DC, USA, pp. 163-170
    Entity workspace: an evidence file that aids memory, inference, and reading
    Eric A. Bier
    Edward W. Ishak
    ISI'06: Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics, Springer-Verlag, Berlin, Heidelberg (2006), pp. 466-472
    /Entity quick click/: rapid text copying based on automatic entity extraction
    Eric A. Bier
    Edward W. Ishak
    CHI EA '06: CHI '06 extended abstracts on Human factors in computing systems, ACM, New York, NY, USA (2006), pp. 562-567
    Entity quick click: rapid text copying based on automatic entity extraction
    E.A. Bier
    E.W. Ishak
    CHI'06 extended abstracts on Human factors in computing systems (2006), pp. 562-567
    Guest Editors' Introduction: Pervasive Computing in Sports Technologies
    Gaetano Borriello
    Guerney Hunt
    Nigel Davies
    IEEE Pervasive Computing, vol. 4 (2005), pp. 22-25
    Annotating 3D electronic books
    Lichan Hong
    Stuart K. Card
    CHI EA '05: CHI '05 extended abstracts on Human factors in computing systems, ACM, New York, NY, USA (2005), pp. 1463-1466
    ScentHighlights: highlighting conceptually-related sentences during reading
    Lichan Hong
    Michelle Gumbrecht
    Stuart K. Card
    IUI '05: Proceedings of the 10th international conference on Intelligent user interfaces, ACM, New York, NY, USA (2005), pp. 272-274
    Web interactions
    Conference on Human Factors in Computing Systems: Proceedings of the SIGCHI conference on Human factors in computing systems (2005)
    Information scent and web navigation: Theory, models and automated usability evaluation
    P. Pirolli
    W. Fu
    A. Farahat
    Human-Computer Interaction International Conference (2005)
    Introducing Wearable Force Sensors in Martial Arts
    IEEE Pervasive Computing, vol. 4 (2005), pp. 47-53
    "Killer App" of wearable computing: wireless force sensing body protectors for martial arts
    Jin Song
    Greg Corbin
    UIST '04: Proceedings of the 17th annual ACM symposium on User interface software and technology, ACM, New York, NY, USA (2004), pp. 277-285
    eBooks with indexes that reorganize conceptually
    Lichan Hong
    Julie Heiser
    Stuart K. Card
    CHI EA '04: CHI '04 extended abstracts on Human factors in computing systems, ACM, New York, NY, USA (2004), pp. 1223-1226
    3Book: a scalable 3D virtual book
    Stuart K. Card
    Lichan Hong
    Jock D. Mackinlay
    CHI EA '04: CHI '04 extended abstracts on Human factors in computing systems, ACM, New York, NY, USA (2004), pp. 1095-1098
    3Book: a 3D electronic smart book
    Stuart K. Card
    Lichan Hong
    Jock D. Mackinlay
    AVI '04: Proceedings of the working conference on Advanced visual interfaces, ACM, New York, NY, USA (2004), pp. 303-307
    The bloodhound project: automating discovery of web usability issues using the InfoScent\$\pi\$ simulator
    Adam Rosien
    Gesara Supattanasiri
    Amanda Williams
    Christiaan Royer
    Celia Chow
    Erica Robles
    Brinda Dalal
    Julie Chen
    Steve Cousins
    CHI '03: Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA (2003), pp. 505-512
    LumberJack: Intelligent discovery and analysis of web user traffic composition
    A. Rosien
    J. Heer
    WEBKDD 2002-MiningWeb Data for Discovering Usage Patterns and Profiles (2003), pp. 1-16
    ScentTrails: Integrating browsing and searching on the Web
    Christopher Olston
    ACM Trans. Comput.-Hum. Interact., vol. 10 (2003), pp. 177-197
    Mining the structure of user activity using cluster stability
    J. Heer
    Proceedings of the workshop on Web analytics, SIAM Conference on Data Mining (2002)
    A Framework for Visualizing Information (Human-Computer Interaction Series)
    Springer-Verlag New York, Inc., Secaucus, NJ, USA (2002)
    Separating the swarm: categorization methods for user sessions on the web
    Jeffrey Heer
    CHI '02: Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA (2002), pp. 243-250
    Expressiveness of the data flow and data state models in visualization systems
    AVI '02: Proceedings of the Working Conference on Advanced Visual Interfaces, ACM, New York, NY, USA (2002), pp. 375-378
    Improving Web Usability Through Visualization
    IEEE Internet Computing, vol. 6 (2002), pp. 64-71
    Using information scent to model user information needs and actions and the Web
    Peter Pirolli
    Kim Chen
    James Pitkow
    CHI '01: Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA (2001), pp. 490-497
    Enhancing a digital book with a reading recommender
    Allison Woodruff
    Rich Gossweiler
    James Pitkow
    Stuart K. Card
    CHI '00: Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA (2000), pp. 153-160
    The scent of a site: a system for analyzing and predicting information scent, usage, and usability of a Web site
    Peter Pirolli
    James Pitkow
    CHI '00: Proceedings of the SIGCHI conference on Human factors in computing systems, ACM, New York, NY, USA (2000), pp. 161-168
    A Taxonomy of Visualization Techniques Using the Data State Reference Model
    INFOVIS '00: Proceedings of the IEEE Symposium on Information Vizualization 2000, IEEE Computer Society, Washington, DC, USA, pp. 69
    A framework for information visualization spreadsheets
    Ph.D. Thesis (1999)
    Sensemaking of Evolving Web Sites Using Visualization Spreadsheets
    Stuart K. Card
    INFOVIS '99: Proceedings of the 1999 IEEE Symposium on Information Visualization, IEEE Computer Society, Washington, DC, USA, pp. 18
    Visualizing the evolution of Web ecologies
    James Pitkow
    Jock Mackinlay
    Peter Pirolli
    Rich Gossweiler
    Stuart K. Card
    CHI '98: Proceedings of the SIGCHI conference on Human factors in computing systems, ACM Press/Addison-Wesley Publishing Co., New York, NY, USA (1998), pp. 400-407
    An Operator Interaction Framework for Visualization Systems
    John Riedl
    INFOVIS '98: Proceedings of the 1998 IEEE Symposium on Information Visualization, IEEE Computer Society, Washington, DC, USA, pp. 63-70
    Principles for Information Visualization Spreadsheets
    John Riedl
    Phillip Barry
    Joseph Konstan
    IEEE Comput. Graph. Appl., vol. 18 (1998), pp. 30-38
    A spreadsheet approach to information visualization
    P. Barry
    J. Riedl
    J. Konstan
    INFOVIS '97: Proceedings of the 1997 IEEE Symposium on Information Visualization (InfoVis '97), IEEE Computer Society, Washington, DC, USA, pp. 17
    Flexible information visualization of multivariate data from biological sequence similarity searches
    John Riedl
    Elizabeth Shoop
    John V. Carlis
    Ernest Retzel
    Phillip Barry
    VIS '96: Proceedings of the 7th conference on Visualization '96, IEEE Computer Society Press, Los Alamitos, CA, USA (1996), 133-ff.
    Visualization of Biological Sequence Similarity Search Results
    Phillip Barry
    Elizabeth Shoop
    John V. Carlis
    Ernest Retzel
    John Riedl
    VIS '95: Proceedings of the 6th conference on Visualization '95, IEEE Computer Society, Washington, DC, USA (1995), pp. 44
    Arabidopsis thaliana expressed sequence tags: Generation, analysis and dissemination
    T. Newman
    EF Retzel
    E. Shoop
    C. Somerville
    Plant Genome III: International Conference on the Status of Plant Genome Research (1995)
    Implementation and testing of an automated EST processing and similarity analysis system
    E. Shoop
    J. Carlis
    P. Bieganski
    J. Riedl
    N. Dalton
    T. Newman
    E. Retzel
    System Sciences, 1995. Proceedings of the Twenty-Eighth Hawaii International Conference on, pp. 52-61