Jump to Content
Sean Augenstein

Sean Augenstein

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract How well do existing federated learning algorithms learn from client devices that return model updates with a significant time delay? Is it even possible to learn effectively from clients that report back minutes, hours, or days after being scheduled? We answer these questions by developing Monte Carlo simulations of client latency that are guided by real-world applications. We compare well-known synchronous optimization algorithms like FedAvg and FedAdam with the state-of-the-art asynchronous FedBuff algorithm, and discover that these existing approaches often struggle to learn from severely delayed clients. To improve upon these, we experiment with modifications including distillation regularization and exponential moving averages of model weights. Finally, we invent two new algorithms, FARe-DUST and FeAST-on-MSG, based on distillation and averaging, respectively. Experiments with the EMNIST, CIFAR-100, and StackOverflow benchmark federated learning tasks demonstrate that our new algorithms outperform existing ones in terms of accuracy for straggler clients, while also providing better trade-offs between training time and total accuracy. View details
    Learning to Generate Image Embeddings with User-level Differential Privacy
    Maxwell D. Collins
    Yuxiao Wang
    Sewoong Oh
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2023) (to appear)
    Preview abstract We consider training feature extractors with user-level differential privacy to map images to embeddings from large-scale supervised data. To achieve user-level differential privacy, federated learning algorithms are extended and applied to aggregate user partitioned data, together with sensitivity control and noise addition. We demonstrate a variant of federated learning algorithm with partial aggregation and private reconstruction can achieve strong privacy utility trade-offs. When a large scale dataset is provided, it is possible to train feature extractors with both strong utility and privacy guarantees by combining techniques such as public pretraining, virtual clients, and partial aggregation. View details
    Preview abstract Federated learning (FL) enables learning from decentralized privacy-sensitive data, with computations on raw data confined to take place at edge clients. This paper introduces mixed FL, which incorporates an additional loss term calculated at the coordinating server (while maintaining FL's private data restrictions). There are numerous benefits. For example, additional datacenter data can be leveraged to jointly learn from centralized (datacenter) and decentralized (federated) training data and better match an expected inference data distribution. Mixed FL also enables offloading some intensive computations (e.g., embedding regularization) to the server, greatly reducing communication and client computation load. For these and other mixed FL use cases, we present three algorithms: PARALLEL TRAINING, 1-WAY GRADIENT TRANSFER, and 2-WAY GRADIENT TRANSFER. We state convergence bounds for each, and give intuition on which are suited to particular mixed FL problems. Finally we perform extensive experiments on three tasks, demonstrating that mixed FL can blend training data to achieve an oracle's accuracy on an inference distribution, and can reduce communication and computation overhead by over 90%. Our experiments confirm theoretical predictions of how algorithms perform under different mixed FL problem settings. View details
    Preview abstract With privacy as a motivation, Federated Learning (FL) is an increasingly used paradigm where learning takes place collectively on edge devices, with user-generated training examples that never leave the device. These on-device training examples are gathered in situ during the course of users’ interactions with their devices, and thus are highly reflective of at least part of the inference data distribution. Yet gaps may still exist, where on-device training examples are lacking for some data inputs expected to be encountered at inference time. This paper proposes a way to mitigate these gaps: selective usage of datacenter data, mixed in with FL. By mixing decentralized (federated) and centralized (datacenter) data, we can form an effective training data distribution that better matches the inference data distribution, resulting in more useful models. View details
    Preview abstract To improve real-world applications of machine learning, experienced modelers develop intuition about their datasets, their models, and how the two interact. Manual inspection of raw data—of representative samples, of outliers, of misclassifications—is an essential tool in a) identifying and fixing problems in the data, b) generating new modeling hypotheses, and c) assigning or refining human-provided labels. However, manual data inspection is risky for privacy-sensitive datasets, such as those representing the behavior of real-world individuals. Furthermore, manual data inspection is impossible in the increasingly important setting of federated learning, where raw examples are stored at the edge and the modeler may only access aggregated outputs such as metrics or model parameters. This paper demonstrates that generative models—trained using federated methods and with formal differential privacy guarantees—can be used effectively to debug data issues even when the data cannot be directly inspected. We explore these methods in applications to text with differentially private federated RNNs and to images using a novel algorithm for differentially private federated GANs. View details
    Preview abstract We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the Federated Averaging algorithm. The federated algorithm, which enables training on a higher-quality dataset for this use case, is shown to achieve better prediction recall. This work demonstrates the feasibility and benefit of training language models on client devices without exporting sensitive user data to servers. The federated learning environment gives users greater control over their data and simplifies the task of incorporating privacy by default with distributed training and aggregation across a population of client devices. View details
    Preview abstract We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the FederatedAveraging algorithm. The federated algorithm, which enables training on a higher-quality dataset for this use case, is shown to achieve better prediction recall. This work demonstrates the feasibility and benefit of training language models on client devices without exporting sensitive user data to servers. The federated learning environment gives users greater control over their data and simplifies the task of incorporating privacy by default with distributed training and aggregation across a population of client devices. View details
    Preview abstract A mixed-integer linear program (MILP) approach to scheduling a constellation of Earth-imaging satellites is presented. The algorithm optimizes the assignment of imagery collects, image data downlinks, and "health \& safety" contacts, generating schedules for all satellites and ground stations in a network. Hardware-driven constraints (e.g., the limited agility of the satellites) and operations-driven constraints (e.g., guaranteeing a minimum contact frequency for each satellite) are both addressed. Of critical importance to the use of this algorithm in real-world operations, it runs fast enough to allow for human operator interaction. This is achieved by a novel partitioning of the problem into distinct MILPs for downlink scheduling and image scheduling, with a dynamic programming (DP) heuristic providing a stand-in for imaging activity when scheduling the downlinks. View details
    No Results Found