Jump to Content
Shanqing Cai

Shanqing Cai

Shanqing Cai is a Staff Software Engineer at Google. His current research areas include machine learning, large language models (LLMs), human-computer interfaces for text-entry, and accessibility.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    SpeakFaster Observer: Long-Term Instrumentation of Eye-Gaze Typing for Measuring AAC Communication
    Richard Jonathan Noel Cave
    Bob MacDonald
    Jon Campbell
    Blair Casey
    Emily Kornman
    Daniel Vance
    Jay Beavers
    CHI23 Case Studies of HCI in Practice (2023) (to appear)
    Preview abstract Accelerating communication for users with severe motor and speech impairments, in particular for eye-gaze Augmentative and Alternative Communication (AAC) device users, is a long-standing area of research. However, observation of such users' communication over extended durations has been limited. This case study presents the real-world experience of developing and field-testing a tool for observing and curating the gaze typing-based communication of a consented eye-gaze AAC user with amyotrophic lateral sclerosis (ALS) from the perspective of researchers at the intersection of HCI and artificial intelligence (AI). With the intent to observe and accelerate eye-gaze typed communication, we designed a tool and a protocol called the SpeakFaster Observer to measure everyday conversational text entry by the consenting gaze-typing user, as well as several consenting conversation partners of the AAC user. We detail the design of the Observer software and data curation protocol, along with considerations for privacy protection. The deployment of the data protocol from November 2021 to April 2022 yielded a rich dataset of gaze-based AAC text entry in everyday context, consisting of 130+ hours of gaze keypresses and 5.5k+ curated speech utterances from the AAC user and the conversation partners. We present the key statistics of the data, including the speed (8.1±3.9 words per minute) and keypress saving rate (-0.18±0.87) of gaze typing, patterns of of utterance repetition and reuse, as well as the temporal dynamics of conversation turn-taking in gaze-based communication. We share our findings and also open source our data collections tools for furthering research in this domain. View details
    Context-Aware Abbreviation Expansion Using Large Language Models
    Ajit Narayanan
    Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2022 (2022) (to appear)
    Preview abstract Motivated by the need for accelerating text entry in augmentative and alternative communication (AAC) for people with severe motor impairments, we propose a paradigm in which phrases are abbreviated aggressively as primarily word-initial letters. Our approach is to expand the abbreviations into full-phrase options by leveraging conversation context with the power of pretrained large language models (LLMs). Through zero-shot, few-shot, and fine-tuning experiments on four public conversation datasets, we show that for replies to the initial turn of a dialog, an LLM with 64B parameters is able to exactly expand over 70% of phrases with abbreviation length up to 10, leading to an effective keystroke saving rate of up to about 77% on these exact expansions. Including a small amount of context in the form of a single conversation turn more than doubles abbreviation expansion accuracies compared to having no context, an effect that is more pronounced for longer phrases. Additionally, the robustness of models against typo noise can be enhanced through fine-tuning on noisy data. View details
    Preview abstract Severe speech impairments limit the precision and range of producible speech sounds. As a result, generic automatic speech recognition (ASR) and keyword spotting (KWS) systems are unable to accurately recognize the utterances produced by individuals with severe speech impairments. This paper describes an approach in which simple speech sounds, namely isolated open vowels (e.g., /a/), are used in lieu of more motorically-demanding keywords. A neural network (NN) is trained to detect these isolated open vowels uttered by individuals with speech impairments against background noise. The NN is trained with a two-phase approach. The pre-training phase uses samples from unimpaired speakers along with samples of background noises and unrelated speech; then the fine-tuning stage uses samples of vowel samples collected from individuals with speech impairments. This model can be built into an experimental mobile app that allows users to activate preconfigured actions such as alerting caregivers. Preliminary user testing indicates the model has the potential to be a useful and flexible emergency communication channel for motor- and speech-impaired individuals. View details
    TensorFlow.js: Machine Learning for the Web and Beyond
    Daniel Smilkov
    Nikhil Thorat
    Yannick Assogba
    Ann Yuan
    Nick Kreeger
    Ping Yu
    Kangyi Zhang
    Eric Nielsen
    Stan Bileschi
    Charles Nicholson
    Sandeep N. Gupta
    Sarah Sirajuddin
    Rajat Monga
    SysML, Palo Alto, CA, USA (2019)
    Preview abstract TensorFlow.js is a library for building and executing machine learning algorithms in JavaScript. TensorFlow.js models run in a web browser and in the Node.js environment. The library is part of the TensorFlow ecosystem, providing a set of APIs that are compatible with those in Python, allowing models to be ported between the Python and JavaScript ecosystems. TensorFlow.js has empowered a new set of developers from the extensive JavaScript community to build and deploy machine learning models and enabled new classes of on-device computation. This paper describes the design, API, and implementation of TensorFlow.js, and highlights some of the impactful use cases. View details
    Preview abstract Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt. View details
    What’s your ML test score? A rubric for ML production systems
    Eric Nielsen
    Michael Salib
    Reliable Machine Learning in the Wild - NIPS 2016 Workshop (2016)
    Preview abstract Using machine learning in real-world production systems is complicated by a host of issues not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for assessing the production-readiness of an ML system. But how much testing and monitoring is enough? We present an ML Test Score rubric based on a set of actionable tests to help quantify these issues. View details
    TensorFlow Debugger: Debugging Dataflow Graphs for Machine Learning
    Eric Nielsen
    Michael Salib
    Proceedings of the Reliable Machine Learning in the Wild - NIPS 2016 Workshop (2016)
    Preview abstract Debuggability is important in the development of machine-learning (ML) systems. Several widely-used ML libraries, such as TensorFlow and Theano, are based on dataflow graphs. While offering important benefits such as facilitating distributed training, the dataflow graph paradigm makes the debugging of model issues more challenging compared to debugging in the more conventional procedural paradigm. In this paper, we present the design of the TensorFlow Debugger (tfdbg), a specialized debugger for ML models written in TensorFlow. tfdbg provides features to inspect runtime dataflow graphs and the state of the intermediate graph elements ("tensors"), as well as simulating stepping on the graph. We will discuss the application of this debugger in development and testing use cases. View details
    No Results Found