Jump to Content
Manpreet Singh

Manpreet Singh

Manpreet Singh is a Principal Engineer leading Data Processing in the Data Infrastructure and Analysis (DIA) team at Google. He has conceptualized, designed, implemented and launched multiple large-scale distributed systems, such as Photon and Ubiq, during the last decade at Google. He is also responsible for building and maintaining the infrastructure for some of Google’s most revenue-critical data pipelines in Ads and Commerce. He defines product vision, system architecture, leads the team on development and launch. Prior to Google, he earned a PhD in Computer Science at Cornell University.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Ubiq: A Scalable and Fault-tolerant Log Processing Infrastructure
    Alexander Smolyanov
    Divy Agrawal
    Haifeng Jiang
    Manish Bhatia
    Monica Chawathe Lenart
    Namit Sikka
    Navin Melville
    Scott Holzer
    Shan He
    Shivakumar Venkataraman
    Tianhao Qiu
    Venkatesh Basker
    Vinny Ganeshan
    Yuri Vasilevski
    Workshop on Business Intelligence for the Real Time Enterprise (BIRTE), Springer (2016)
    Preview abstract Most of today’s Internet applications are data-centric and generate vast amounts of data (typically, in the form of event logs) that needs to be processed and analyzed for detailed reporting, enhancing user experience and increasing monetization. In this paper, we describe the architecture of Ubiq, a geographically distributed framework for processing continuously growing log files in real time with high scalability, high availability and low latency. The Ubiq framework fully tolerates infrastructure degradation and datacenter-level outages without any manual intervention. It also guarantees exactly-once semantics for application pipelines to process logs in the form of event bundles. Ubiq has been in production for Google’s advertising system for many years and has served as a critical log processing framework for hundreds of pipelines. Our production deployment demonstrates linear scalability with machine resources, extremely high availability even with underlying infrastructure failures, and an end-to-end latency of under a minute. View details
    Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams
    Rajagopal Ananthanarayanan
    Venkatesh Basker
    Sumit Das
    Haifeng Jiang
    Tianhao Qiu
    Alexey Reznichenko
    Deomid Ryabkov
    Shivakumar Venkataraman
    SIGMOD '13: Proceedings of the 2013 international conference on Management of data, ACM, New York, NY, USA, pp. 577-588
    Preview abstract Web-based enterprises process events generated by millions of users interacting with their websites. Rich statistical data distilled from combining such interactions in near real-time generates enormous business value. In this paper, we describe the architecture of Photon, a geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency, where the streams may be unordered or delayed. The system fully tolerates infrastructure degradation and datacenter-level outages without any manual intervention. Photon guarantees that there will be no duplicates in the joined output (at-most-once semantics) at any point in time, that most joinable events will be present in the output in real-time (near-exact semantics), and exactly-once semantics eventually. Photon is deployed within Google Advertising System to join data streams such as web search queries and user clicks on advertisements. It produces joined logs that are used to derive key business metrics, including billing for advertisers. Our production deployment processes millions of events per minute at peak with an average end-to-end latency of less than 10 seconds. We also present challenges and solutions in maintaining large persistent state across geographically distant locations, and highlight the design principles that emerged from our experience. View details
    No Results Found