Ramprasad Venkataraman
Ram works on large scale distributed software systems for resilient, near-real-time event processing at Google. His interests lie at the confluence of parallel algorithms for high performance computing applications, runtime systems for managing concurrency, scalability and performance. He is excited by trends at both ends of the computing spectrum: from multicore devices to extreme scale top500 supercomputers.
Prior to joining Google, Ram worked in the context of scientific and numerical HPC. He has contributed to the Charm++ parallel programming framework, and to petascale computational software for several scientific domains.
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
TRAM: Optimizing Fine-grained Communication with Topological Routing and Aggregation of Messages
Lukasz Wesolowski
A Gupta
Jae-Seung Yeom
Keith Bisset
Yanhua Sun
Pritish Jetley
Thomas Quinn
Laxmikant Kale
International Conference on Parallel Processing (2014)
Preview abstract
Fine-grained communication in supercomputing applications often limits performance through high communication overhead and poor utilization of network bandwidth. This paper presents Topological Routing and Aggregation Module (TRAM), a library that optimizes fine-grained communication performance by routing and dynamically combining short messages. TRAM collects units of fine-grained communication from the application and combines them into aggregated messages with a common intermediate destination. It routes these messages along a virtual mesh topology mapped onto the physical topology of the network. TRAM improves network bandwidth utilization and reduces communication overhead. It is particularly effective in optimizing patterns with global communication and large message counts, such as all to-all and many-to-many, as well as sparse, irregular, dynamic or data dependent patterns. We demonstrate how TRAM improves performance through theoretical analysis and experimental verification using benchmarks and scientific applications. We present speedups on petascale systems of 6x for communication benchmarks and up to 4x for applications.
View details
Parallel Branch-and-Bound for Two-Stage Stochastic Integer Optimization
Akhil Langer
Udatta Palekar
Laxmikant Kale
IEEE International Conference on High Performance Computing (HiPC) (2013), pp. 266 - 275
OpenAtom: Ab-initio Molecular Dynamics for Petascale Platforms
Glenn Martyna
Eric Bohm
Laxmikant Kale
Abhinav Bhatele
Parallel Science and Engineering Applications: The Charm++ Approach, CRC Press (2013), pp. 79-104
Mapping Dense LU Factorization on Multicore Supercomputer Nodes
Jonathan Lifflander
Phil Miller
Anshu Arya
T Jones
Laxmikant Kale
IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2012), pp. 596 - 606
Charm++ for Productivity and Performance: A Submission to the 2011 HPC Class II Challenge