Data Management Challenges in Production Machine Learning

Alkis Polyzotis

Martin A. Zinkevich

Steven Whang

Sudip Roy

Proceedings of the 2017 ACM International Conference on Management of Data, ACM, New York, NY, USA, pp. 1723-1726

Download Google Scholar

Abstract

This tutorial discusses data-management issues that arise in the context of production ML pipelines. Informed by our own experience with such large-scale pipelines, we focus on issues related to validating, debugging, cleaning, understanding, and enriching training data. The goal of the tutorial is to bring forth these issues, draw connections to prior work in the database literature, and outline the open research questions that are not addressed by prior art. We believe that the data management community is well positioned to address these issues and we hope to motivate the audience to look more closely in this area.

Research Areas

Data Management
Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Data Management Challenges in Production Machine Learning

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Data Management Challenges in Production Machine Learning

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities