Jump to Content

Oleksandr Zinenko

Oleksandr "Alex" Zinenko is a software engineer at Google Brain (Systems and Programming Research) in Paris. Before joining Google, he worked as a research engineer in Inria (French National Institute for Computer Science and Applied Mathematics) and École Normale Supérieure and taught at the University Paris-Saclay. He obtained his PhD in Computer Science from the University Paris-Saclay for the work on Interactive Program Restructuring to provide convenient, discoverable and intuitive way to manipulate imperative programs through interactive visualization; and his MS in Computer Engineering from National Technical University of Ukraine "Kyiv Polytechnic Institute" for his implementations of numerical algorithms for heterogeneous distributed systems.

Oleksandr's research interests span from compilation to high-performance systems and from interactive software visualization to machine learning under a common goal of making domain-specific software development easier and the software faster, with particular focus on machine learning software. His previous research has been mostly centered on polyhedral compilation, an approach that reformulates the search of optimizing transformations for certain practical classes of programs into mathematical optimization problems. Oleksandr proposed a series of explanatory and exploratory approaches to polyhedral compilation, making it better accessible to modern search and learning techniques. He currently focuses on application of compiler technology to machine learning infrastructure and, conversely, on improving compilation techniques, program efficiency and programming workflow using novel machine learning approaches.

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract MLIR has recently introduced support for declaratively specifying and controlling compiler transformations via the transform dialect. It allows one to request compiler transformations using compiler IR itself, which can be embedded into the original IR that is being transformed (similarly to pragmas) or supplied separately (similarly to scheduling languages). This tutorial presents the concepts of the MLIR transform dialect and related infrastructure. It will be accompanied by a practical demonstration of three use scenarios. - Composing transform dialect operations available in (upstream) MLIR to perform a sequence of optimizing transformations that results in efficient code for an MLIR linear algebra operation. - Defining new transform dialect operations and adapting existing transformation code to work with the transform dialect infrastructure. - Setting up and using the transform dialect infrastructure in a downstream out-of-tree project with custom dialects, transformations and passes. After following the tutorial, the attendees will be able to apply the transform dialect in their work and extend it when necessary. Basic familiarity with MLIR is a prerequisite. View details
    Structured Operations: Modular Design of Code Generators for Tensor Compilers
    Nicolas Vasilache
    Mahesh Ravishankar
    Thomas Raoux
    Alexander Belyaev
    Tobias Gysi
    Stephan Herhut
    Stella Laurenzo
    LCPC 2022, Springer (2023)
    Preview abstract The performance of machine learning systems heavily relies on code generators tailored to tensor computations. We propose an approach to the design and implementation of such code generators leveraging the natural structure of tensor algebra and illustrating the progressive lowering of domain-specific abstractions in the MLIR infrastructure. View details
    Code Generation for Data-Dependent Stencils
    Mohammed Essadki
    Bertrand Michel
    Bruno Maugars
    Nicolas Vasilache
    CGO, IEEE (2023)
    Preview abstract Numerical simulation often resorts to iterative in-place stencils such as the Gauss-Seidel or Successive Overrelaxation (SOR) methods. Writing high performance implementations of such stencils requires significant effort and time; it also involves non-local transformations beyond the stencil kernel itself. While automated code generation is a mature technology for image processing stencils, convolutions and out-of place iterative stencils (such as the Jacobi method), the optimization of in-place stencils requires manual craftsmanship. Building on recent advances in tensor compiler construction, we propose the first domain-specific code generator for iterative in-place stencils. Starting from a generic tensor compiler implemented in the MLIR framework, tensor abstractions are incrementally refined and lowered down to parallel, tiled, fused and vectorized code. We used our generator to implement a realistic, implicit solver for structured meshes, and demonstrate results competitive with an industrial computational fluid dynamics framework. We also compare with stand-alone stencil kernels for dense tensors. View details
    High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs
    William S. Moses
    Ivan R. Ivanov
    Jens Domke
    Toshio Endo
    Johannes Doerfert
    Proceedings of the Intl Conference on Principles and Practice of Parallel Programming (PPoPP), ACM (2023) (to appear)
    Preview abstract While parallelism remains the main source of performance, architectural implementations and programming models change with each new hardware generation, often leading to costly application re-engineering. Most tools for performance portability require manual and costly application porting to yet another programming model. We propose an alternative approach that automatically translates programs written in one programming model (CUDA), into another (CPU threads) based on Polygeist/MLIR. Our approach includes a representation of parallel constructs that allows conventional compiler transformations to apply transparently and without modification and enables parallelism-specific optimizations. We evaluate our framework by transpiling and optimizing the CUDA Rodinia benchmark suite for a multi-core CPU and achieve a 76\% geomean speedup over handwritten OpenMP code. Further, we show how CUDA kernels from PyTorch can efficiently run and scale on the CPU-only Supercomputer Fugaku without user intervention. Our PyTorch compatibility layer making use of transpiled CUDA PyTorch kernels outperforms the PyTorch CPU native backend by 2.7x. View details
    Polygeist: Affine C in MLIR
    William S. Moses
    Lorenzo Chelini
    Ruizhe Zhao
    (2021)
    Preview abstract We present Polygeist, a new tool that reroutes polyhedral compilation flows to use the representation available in the recent MLIR compilation infrastructure. It consists of two parts: a C and C++ frontend capable of converting a wide variety of existing codes into MLIR suitable for polyhedral transformation, and a bi-directional conversion between MLIR's polyhedral representation and existing polyhedral exchange formats. We demonstrate the \tool flow by converting the entire Polybench/C benchmark suite into MLIR, and by performing an IR-to-IR optimization leveraging an existing polyhedral compiler (Pluto). Our flow produces results comparable to the state-of-the-art compiler, enabling direct comparison of source-to-source and IR-to-binary compilers. We believe Polygeist can improve the interoperation between MLIR and the existing polyhedral tooling ultimately benefiting both the research and the production compiler communities. View details
    Progressive Raising in Multi-level IR
    Lorenzo Chelini
    Andi Drebes
    Nicolas Vasilache
    Tobias Grosser
    Henk Corporaal
    International Conference on Code Generation and Optimization (CGO), ACM, February 27th - March 3rd, 2021, Virtual Conference (2021)
    Preview abstract Multi-level intermediate representation (IR) rewriting promises to lower the cost of designing domain-specific compilers by providing a non-opinionated IR, thus enabling to model the right abstraction level for the problem at hand. High-level abstractions are then lowered to low-level IR using progressive lowering (i.e., from higher-level representations down to the lowest in small steps across the abstraction levels). But progressive lowering works in a single direction: high-level operations can be transformed into operations with lower-level of abstraction, but low-level operations are never raised to high-level ones. Thus, the entry point into the lowering pipeline defines the highest level of abstraction for all subsequent transformations, potentially limiting the set of applicable optimizations. This is especially true for general-purpose languages that are not semantically rich enough to enter the higher parts of the lowering pipeline precluding aggressive domain-specific optimizations. To enable effective domain-specific compilation via progressive lowering in a multi-level IR compiler, we propose Multi-Level Tactics. Multi-Level Tactics allows us to describe computational patterns and raise them to high-level abstractions declaratively. It enables a complementary path to progressive lowering, which we call progressive raising, hence extending the set of optimizations that can be performed on general-purpose languages in a multi-level IR compiler. View details
    Domain-Specific Multi-Level Rewriting for HPC: A Case Study with MLIR
    Tobias Gysi
    Christoph Mueller
    Stephan Andreas Herhut
    Eddie Davis
    Tobias Wicky
    Oliver Fuhrer
    Torsten Hoefler
    Tobias Grosser
    ACM Transactions on Architecture and Code Optimization, vol. 4 (2021), 51:1-51:23
    Preview abstract Peephole optimizations have proven to be effective on traditional compiler tasks such as instruction selection. MLIR raises the level of abstraction from machine instructions to high-level operations such as matrix-multiplication or an entire stencil application. In this project, we want to show the effectiveness of peephole style optimizations on domain-specific abstractions. We, therefore, optimize stencil programs from the weather and climate domain to demonstrate the effectiveness of the approach in the HPC space. As a vehicle to evaluate our idea, we introduce a high-level stencil dialect that models the data-flow of stencil programs. We deduce a set of high-level peephole optimizations to optimize stencil programs and implement a lowering to the GPU dialect of MLIR. The GPU dialect is a novel abstraction layer the allows compiler engineers to generate code that is performance portable across different GPU architectures. View details
    Polygeist: Raising C to Polyhedral MLIR
    William S. Moses
    Lorenzo Chelini
    Ruizhe Zhao
    International Conference on Parallel Architectures and Compilation Techniques (PACT 2021), IEEE Computer Society (2021), pp. 45-59
    Preview abstract We present Polygeist, a new compilation flow that connects MLIR infrastructure to cutting edge polyhedral optimization tools. It consists of a C and C++ frontend capable of converting a broad range of existing codes into MLIR suitable for polyhedral transformation and a bi-directional conversion between MLIR and OpenScop exchange format. The Polygeist/MLIR intermediate representation featuring high-level (affine) loop constructs and n-D arrays embedded into a single static assignment (SSA) substrate enables an unprecedented combination of SSA-based and polyhedral optimizations. We illustrate this by proposing and implementing two extra transformations: statement splitting and reduction parallelization. Our evaluation demonstrates that Polygeist outperforms on average both an LLVM IR-level optimizer (Polly) and a source-to-source state-of-the-art polyhedral compiler (Pluto) when exercised on Polybench/C benchmark suite in sequential (2.53× vs 1.41×, 2.34×) and parallel mode (9.47× vs 3.26×, 7.54×) thanks to the new representation and transformations. View details
    MLIR: Scaling Compiler Infrastructure for Domain Specific Computation
    Chris Lattner
    Mehdi Amini
    Uday Bondhugula
    River Riddle
    Tatiana Shpeisman
    Nicolas Vasilache
    CGO 2021
    Preview abstract This work presents the MLIR compiler infrastructure, which is a novel approach to building reusable compiler infrastructure. MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, significantly reduces the cost of building domain specific compilers, and aid in connecting existing compilers together. MLIR facilitates the design and implementation of code generators, translators and optimizer at different levels of abstraction and also across application domains, hardware targets and execution environments. The scientific perspective on these challenges is twofold: 1) evaluating MLIR as an infrastructure that enables new research and educational approaches on programming languages, compilers, code generators, execution environments, hardware acceleration and codesign; and 2) discussing MLIR as a research artifact built for extension and evolution, raising its own design, semantics, algorithmic, system, engineering, and multi-disciplinary challenges. The paper presents the rationale for MLIR, its original design principles, structures and semantics, and validates these by surveying some applications of it. View details
    TC-CIM: Empowering Tensor Comprehensions for Computing-In-Memory
    Andi Drebes
    Lorenzo Chelini
    Henk Corporaal
    Tobias Grosser
    Kanishkan Vadivel
    Nicolas Vasilache
    IMPACT 2020 workshop (associated with HIPEAC 2020)
    Preview abstract Memristor-based, non-von-Neumann architectures performing tensor operations directly in memory are a promising approach to address the ever-increasing demand for energy-efficient, high-throughput hardware accelerators for Machine Learning (ML) inference. A major challenge for the programmability and exploitation of such Computing-In-Memory (CIM) architectures consists in the efficient mapping of tensor operations from high-level ML frameworks to fixed-function hardware blocks implementing in-memory computations. We demonstrate the programmability of memristor-based accelerators with TC-CIM, a fully-automatic, end-to-end compilation flow from Tensor Comprehensions, a mathematical notation for tensor operations, to fixed-function memristor-based hardware blocks. Operations suitable for acceleration are identified using Tactics, a declarative framework to describe computational patterns in a polyhedral representation. We evaluate our compilation flow on a system-level simulator based on Gem5, incorporating crossbar arrays of memristive devices. Our results show that TC-CIM reliably recognizes tensor operations commonly used in ML workloads across multiple benchmarks in order to offload these operations to the accelerator. View details
    The Next 700 Accelerated Layers: From Mathematical Expressions of Network Computation Graphs to Accelerated GPU Kernels, Automatically
    Nicolas Vasilache
    Theodoros Theodoridis
    Priya Goyal
    Zachary Devito
    William S. Moses
    Sven Verdoolaege
    Andrew Adams
    ACM Transactions on Architecture and Code Optimization (TACO) (2019)
    Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
    Nicolas Vasilache
    Theodoros Theodoridis
    Priya Goyal
    Zachary DeVito
    William S. Moses
    Sven Verdoolaege
    Andrew Adams
    Facebook Artificial Intelligence Research (2018)