Mariano Schain
Mariano, Senior Software Engineer at Google, was for many years Senior Member of the Technical Staff at Texas Instruments, serving in TI’s Broadband group as Software Manager, Chief Software Architect, and Manager of Methodologies. A Dan-David Prize scholar (2014), Mariano’s Machine Learning research focuses on robustness, mainly in the theory and practice of strategies for autonomous software agents.
Authored Publications
Google Publications
Other Publications
Sort By
Preview abstract
We consider stochastic optimization with delayed gradients where, at each time step~$t$, the algorithm makes an update using a stale stochastic gradient from step $t - d_t$ for some arbitrary delay $d_t$.
This setting abstracts asynchronous distributed optimization where a central server receives gradient updates computed by worker machines. These machines can experience computation and communication loads that might vary significantly over time.
In the general non-convex smooth optimization setting, we give a simple and efficient algorithm that requires $O( \sigma^2/\epsilon^4 + \tau/\epsilon^2 )$ steps for finding an $\epsilon$-stationary point $x$, where $\tau$ is the \emph{average} delay $\smash{\frac{1}{T}\sum_{t=1}^T d_t}$ and $\sigma^2$ is the variance of the stochastic gradients.
This improves over previous work, which showed that stochastic gradient decent achieves the same rate but with respect to the \emph{maximal} delay $\max_{t} d_t$, that can be significantly larger than the average delay especially in heterogeneous distributed systems.
Our experiments demonstrate the efficacy and robustness of our algorithm in cases where the delay distribution is skewed or heavy-tailed.
View details
Adversarial Robustness of Streaming Algorithms through Importance Sampling
Vladimir Braverman
Sandeep Silwal
Samson Zhou
Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (2021)
Preview abstract
Robustness against adversarial attacks have recently been at the forefront of algorithmic design for machine learning tasks. In the adversarial streaming model, an adversary gives an algorithm a sequence of adaptively chosen updates $u_1,\ldots,u_n$ as a data stream.
The goal of the algorithm is to compute or approximate some predetermined function for every prefix of the adversarial stream, but the adversary may generate future updates based on previous outputs of the algorithm.
In particular, the adversary may gradually learn the random bits internally used by an algorithm to manipulate dependencies in the input.
This is especially problematic as many important problems in the streaming model require randomized algorithms, as they are known to not admit any deterministic algorithms that use sublinear space.
In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction.
For regression and other numerical linear algebra related tasks, we consider the row arrival streaming model.
Our results are based on a simple, but powerful, observation that sampling based algorithms give rise to adversarial robustness which is in contrast to sketching based algorithms, which are very prevalent in the streaming literature but suffer from adversarial attacks.
In addition, we show that the well-known merge and reduce paradigm in streaming is adversarially robust.
Since the merge and reduce paradigm defines coreset constructions, we thus obtain robust algorithms for $k$-means, $k$-median, $k$-center, Bregman clustering, projective clustering, principal component analysis (PCA) and non-negative matrix factorization.
To the best of our knowledge, these are the first adversarially robust methods for these problems.
View details
Scalable Learning of Non-Decomposable Objectives
Alan Mackey
Ariel Gordon
arXiv preprint arXiv:1608.04802 (2016)
Preview abstract
Modern retrieval systems are often driven by an underlying machine learning model. The goal of such systems is to identify and possibly rank the few most relevant items for a given query or context. Thus, such systems are typically evaluated using a ranking-based performance metric such as the area under the precision-recall curve, the Fβ score, precision at fixed recall, etc. Obviously, it is desirable to train such systems to optimize the metric of interest.
In practice, due to the scalability limitations of existing approaches for optimizing such objectives, large-scale retrieval systems are instead trained to maximize classification accuracy, in the hope that performance as measured via the true objective will also be favorable. In this work we present a unified framework that, using straightforward building block bounds, allows for highly scalable optimization of a wide range of ranking-based objectives. We demonstrate the advantage of our approach on several real-life retrieval problems that are significantly larger than those considered in the literature, while achieving substantial improvement in performance over the accuracy-objective baseline.
View details
History-Independent Distributed Multi-agent Learning
Amos Fiat
Algorithmic Game Theory: 9th International Symposium, SAGT 2016, Proceedings, Springer, pp. 77-89
Preview abstract
How should we evaluate a rumor? We address this question in a setting where multiple agents seek an estimate of the probability, b, of some future binary event. A common uniform prior on b is assumed. A rumor about b meanders through the network, evolving over time. The rumor evolves, not because of ill will or noise, but because agents incorporate private signals about b before passing on the (modified) rumor. The loss to an agent is the (realized) square error of her opinion. Our setting introduces strategic behavior based on evidence regarding an exogenous event to current models of rumor/influence propagation in social networks. We study a simple Exponential Moving Average (EMA) for combining experience evidence and trusted advice (rumor), quantifying its resulting performance and comparing it to the optimal achievable using Bayes posterior having access to the agents private signals. We study the quality of p_T, the prediction of the last agent along a chain of T rumor-mongering agents. The prediction p_T can be viewed as an aggregate estimator of b that depends on the private signals of T agents.
View details
Robust Domain Adaptation
Annals of Mathematics and Artificial Intelligence, vol. 71 Issue 4 (2014), 365–380
Preview abstract
We derive a generalization bound for domain adaptation by using the properties of robust algorithms. Our new bound depends on λ-shift, a measure of prior knowledge regarding the similarity of source and target domain distributions. Based on the generalization bound, we design SVM variants for binary classification and regression domain adaptation algorithms.
View details
Ad Exchange – Proposal for a New Trading Agent Competition Game
Agent-Mediated Electronic Commerce. Designing Trading Strategies and Mechanisms for Electronic Markets: AMEC and TADA (2013), pp. 133-145
Learning with Maximum-Entropy Distributions