Corinna Cortes
Corinna Cortes is a VP in Google Research, where she is working on a broad range of theoretical and applied large-scale machine learning problems. Prior to Google, Corinna spent more than ten years at AT&T Labs - Research, formerly AT&T Bell Labs, where she held a distinguished research position. Corinna's research work is well-known in particular for her contributions to the theoretical foundations of support vector machines (SVMs), for which she jointly with Vladimir Vapnik received the 2008 Paris Kanellakis Theory and Practice Award, and her work on data-mining in very large data sets for which she was awarded the AT&T Science and Technology Medal in the year 2000. Corinna received her MS degree in Physics from University of Copenhagen and joined AT&T Bell Labs as a researcher in 1989. She received her Ph.D. in computer science from the University of Rochester in 1993.
Corinna is also a competitive and a mother of two.
Authored Publications
Google Publications
Other Publications
Sort By
Preview abstract
We present a new active learning algorithm that adaptively partitions the input space into a finite number of regions, and subsequently seeks a distinct predictor for each region, both phases actively requesting labels. We prove theoretical guarantees for both the generalization error and the label complexity of our algorithm, and analyze the number of regions defined by the algorithm under some mild assumptions. We also report the results of an extensive suite of experiments on several real-world datasets demonstrating substantial empirical benefits over existing single-region and nonadaptive region-based active learning baselines.
View details
Understanding the Effects of Batching in Online Active Learning
Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (2020)
Preview abstract
Online active learning (AL) algorithms often assume immediate access to a label once a query has been made. However, due to practical constraints, the labels of these queried examples are generally only available in ``batches''. In this work, we present a novel analysis for a generic class of batch online AL algorithms and reveal that the effects of batching are in fact mild and only result in an additional term in the label complexity that is linear in the batch size. To our knowledge, this provides the first theoretical justification for such algorithms and we show how they can be applied to batch variants of three canonical online AL algorithms: IWAL, ORIWAL, and DHM. We also conduct an empirical study that corroborates the novel theoretical insights.
View details
Preview abstract
A general framework for online learning with partial information is one where feedback graphs specify which losses can be observed by the learner. We study a challenging scenario where feedback graphs vary with time stochastically and,more importantly, where graphs and losses are stochastically dependent. This scenario appears in several applications that we describe. We give a new algorithm for this scenario that exploits the stochastic properties of the graphs and that benefits from favorable regret guarantees in this challenging setting. We present a detailed theoretical analysis of this algorithm, and also report the results of a series of experiments on real-world datasets, which show that our algorithm outperforms standard baselines for online learning with feedback graphs.
View details
Adaptation Based on Generalized Discrepancy
Journal of Machine Learning Research, vol. 20 (2019), pp. 1-30
Preview abstract
We present a new algorithm for domain adaptation improving upon a discrepancy minimization algorithm, (DM), previously shown to outperform a number of algorithms for this problem. Unlike many previously proposed solutions for domain adaptation, our algorithm does not consist of a fixed reweighting of the losses over the training sample. Instead, the reweighting depends on the hypothesis sought. The algorithm is derived from a less conservative notion of discrepancy than the DM algorithm called generalized discrepancy. We present a detailed description of our algorithm and show that it can be formulated as a convex optimization problem. We also give a detailed theoretical analysis of its learning guarantees which helps us select its parameters. Finally, we report the results of experiments demonstrating that it improves upon discrepancy minimization.
View details
AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles
Charles Weill
Vitaly Kuznetsov
Scott Yang
Scott Yak
Hanna Mazzawi
Eugen Hotaj
Ghassen Jerfel
Vladimir Macko
(2019)
Preview abstract
AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the structure of a neural network as an ensemble of subnetworks. We designed it to: (1) integrate with the existing TensorFlow ecosystem, (2) offer sensible default search spaces to perform well on novel datasets, (3) present a flexible API to utilize expert information when available, and (4) efficiently accelerate training with distributed CPU, GPU, and TPU hardware. The code is open-source and available at https://github.com/tensorflow/adanet.
View details
Preview abstract
We consider the scenario of online learning with the sleeping experts where not all experts are available at each round and analyze the general framework of learning with stochastic feedback graphs, where loss observations associated to each expert are characterized by a graph, thereby including both the bandit and full information settings as special cases. A critical assumption in this framework is that the loss observations and the set of sleeping experts at each round is independent. We first extend the classical algorithm of Kleinberg et al 2008 to use the loss information encoded by any sequence of feedback graphs and prove matching upper and lower bounds for the sleeping regret of this algorithm. Our main contribution is then to relax this independence assumption, present a finer notion of sleeping regret, and derive a general algorithm with strong theoretical guarantees. We instantiate our framework to the important scenario of online learning with abstention, where a learner can elect to abstain from making a prediction at the price of a certain cost. We empirically validate the improvement of our algorithm against multiple abstention algorithms on several real-world datasets
View details
Preview abstract
We present two novel extensions of an on-line importance weighted active learning algorithm IWAL, using the properties of disagreement values among hypotheses. The first extension, DIWAL, prunes the hypothesis set with a more aggressive strategy based on concentration bounds with disagreement values. We show that DIWAL improves the generalization performance and the label complexity of the original IWAL, and further quantify the improvement in terms of the disagreement graph coefficient. The second extension, ZOOM, adaptively zooms into the function space near the best-in-class hypothesis, which effectively reduces the best-in-class error and thus simultaneously improves the generalization performance and the label complexity. We report experimental results on multiple datasets and demonstrate that the proposed algorithms achieve better test performances than IWAL given the same amount of labeling budget.
View details
Preview abstract
We study a scenario of active learning where the input space is partitioned into different regions and where a distinct hypothesis is learned for each region. We first introduce a new active learning algorithm (EIWAL), which is an enhanced version of the IWAL algorithm, based on a finer analysis that results in more favorable learning guarantees. Then, we present a new learning algorithm for region-based active learning, ORIWAL, in which either IWAL or EIWAL serve as a subroutine. ORIWAL optimally allocates points to the subroutine algorithm for each region. We give a detailed theoretical analysis of ORIWAL, including generalization error guarantees and bounds on the number of points labeled, in terms of both the hypothesis set used in each region and the probability mass of that region. We also report the results of several experiments for our algorithm which demonstrate substantial benefits over existing non-region-based active learning algorithms, such as IWAL, and over passive learning.
View details
Learning GANs and Ensembles Using Discrepancy
Ningshan Zhang
Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019
Preview abstract
Generative adversarial networks (GANs) generate data based on minimizing a divergence between two distributions. The choice of that divergence is therefore critical. We argue that the divergence must take into account the hypothesis set and the loss function used in a subsequent learning task, where the data generated by a GAN serves for training. Taking that structural information into account is also important to derive generalization guarantees. Thus, we propose to use the discrepancy measure, which was originally introduced for the closely related problem of domain adaptation and which precisely takes into account the hypothesis set and the loss function. We show that discrepancy admits favorable properties for training GANs and prove explicit generalization guarantees. We present efficient algorithms using discrepancy for two tasks: training a GAN directly, namely DGAN, and mixing previously trained generative models, namely EDGAN. Our experiments on toy examples and several benchmark datasets show that DGAN is competitive with other GANs and that EDGAN outperforms existing GAN ensembles, such as AdaGAN.
View details
Relative deviation learning bounds and generalization with unbounded loss functions
Spencer Greenberg
Annals of Mathematics and Artificial Intelligence (2019)
Preview abstract
We present an extensive analysis of relative deviation bounds, including detailed proofs of two-sided inequalities and their implications. We also give detailed proofs of two-sided generalization bounds that hold in the general case of unbounded loss functions, under the assumption that a moment of the loss is bounded. We then illustrate how to apply these results in a sample application: the analysis of importance weighting.
View details
Online learning with abstention
Scott Yang
Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden (2018)
Preview abstract
We present an extensive study of a key problem in online learning where the learner can opt to abstain from making a prediction, at a certain cost. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a natural extension of UCB-N. We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as other standard baselines.
View details
AdaNet: Adaptive structural learning of artificial neural networks
Vitaly Kuznetsov
Scott Yang
Proceedings of the 34th International Conference on Machine Learning (ICML 2017). Sydney, Australia, August 2017. (2017)
Preview abstract
We present new algorithms for adaptively learning
artificial neural networks. Our algorithms
(ADANET) adaptively learn both the structure
of the network and its weights. They are
based on a solid theoretical analysis, including
data-dependent generalization guarantees that we
prove and discuss in detail. We report the results
of large-scale experiments with one of our
algorithms on several binary classification tasks
extracted from the CIFAR-10 dataset and on the
Criteo dataset. The results demonstrate that our
algorithm can automatically learn network structures
with very competitive performance accuracies
when compared with those achieved by neural
networks found by standard approaches.
View details
Learning with rejection
Proceedings of The 27th International Conference on Algorithmic Learning Theory (ALT 2016). pages 67-82, Bari, Italy, 2016. Springer, Heidelberg, Germany. (2016)
Preview abstract
We introduce a novel framework for classification with a rejection
option that consists of simultaneously learning two functions:
a classifier along with a rejection function. We present a full theoretical
analysis of this framework including new data-dependent learning
bounds in terms of the Rademacher complexities of the classifier and
rejection families as well as consistency and calibration results. These
theoretical guarantees guide us in designing new algorithms that can
exploit different kernel-based hypothesis sets for the classifier and rejection
functions. We compare and contrast our general framework with the
special case of confidence-based rejection for which we devise alternative
loss functions and algorithms as well. We report the results of several experiments
showing that our kernel-based algorithms can yield a notable
improvement over the best existing confidence-based rejection algorithm.
View details
Boosting with abstention
Advances in Neural Information Processing Systems (NIPS 2016). Barcelona, Spain, 2016. MIT Press. (2016)
Preview abstract
We present a new boosting algorithm for the key scenario of binary classification
with abstention where the algorithm can abstain from predicting the label of a point,
at the price of a fixed cost. At each round, our algorithm selects a pair of functions,
a base predictor and a base abstention function. We define convex upper bounds
for the natural loss function associated to this problem, which we prove to be
calibrated with respect to the Bayes solution. Our algorithm benefits from general
margin-based learning guarantees which we derive for ensembles of pairs of base
predictor and abstention functions, in terms of the Rademacher complexities of the
corresponding function classes. We give convergence guarantees for our algorithm
along with a linear-time weak-learning algorithm for abstention stumps. We also
report the results of several experiments suggesting that our algorithm provides a
significant improvement in practice over two confidence-based algorithms.
View details
Structured prediction theory based on factor graph complexity
Vitaly Kuznetsov
Scott Yang
Advances in Neural Information Processing Systems (NIPS 2016). Barcelona, Spain, 2016. MIT Press. (2016)
Preview abstract
We present a general theoretical analysis of structured prediction with a series
of new results. We give new data-dependent margin guarantees for structured
prediction for a very wide family of loss functions and a general family of hypotheses,
with an arbitrary factor graph decomposition. These are the tightest margin
bounds known for both standard multi-class and general structured prediction
problems. Our guarantees are expressed in terms of a data-dependent complexity
measure, factor graph complexity, which we show can be estimated from data and
bounded in terms of familiar quantities for several commonly used hypothesis sets
along with a sparsity measure for features and graphs. Our proof techniques include
generalizations of Talagrand’s contraction lemma that can be of independent
interest.
We further extend our theory by leveraging the principle of Voted Risk Minimization
(VRM) and show that learning is possible even with complex factor graphs. We
present new learning bounds for this advanced setting, which we use to design two
new algorithms, Voted Conditional Random Field (VCRF) and Voted Structured
Boosting (StructBoost). These algorithms can make use of complex features and
factor graphs and yet benefit from favorable learning guarantees. We also report
the results of experiments with VCRF on several datasets to validate our theory.
View details
Adaptation algorithm and theory based on generalized discrepancy
Preview
Proceedings of the 21st ACM Conference on Knowledge Discovery and Data Mining (KDD 2015)
Preview abstract
We present a new algorithm for domain adaptation improving upon a discrepancy mini- mization algorithm, (DM), previously shown to outperform a number of algorithms for this problem. Unlike many previous proposed solutions for domain adaptation, our algorithm does not consist of a fixed reweighting of the losses over the training sample. Instead, the reweighting depends on the hypothesis sought. The algorithm is derived from a less con- servative notion of discrepancy than the DM algorithm. We call this quantity generalized discrepancy. We present a detailed description of our algorithm and show that it can be formulated as a convex optimization problem. We also give a detailed theoretical analysis of its learning guarantees which helps us select its parameters. Finally, we report the results of experiments demonstrating that it improves upon discrepancy minimization in several tasks.
View details
On-line learning algorithms for path experts with non-additive losses
Preview
Vitaly Kuznetsov
Manfred K. Warmuth
Proceedings of The 28th Annual Conference on Learning Theory (COLT 2015)
Structural maxent models
Vitaly Kuznetsov
Proceedings of the Thirty-Second International Conference on Machine Learning (ICML 2015)
Preview abstract
We present a new class of density estimation
models, Structural Maxent models, with feature
functions selected from a union of possibly
very complex sub-families and yet benefiting
from strong learning guarantees. The design
of our models is based on a new principle
supported by uniform convergence bounds and
taking into consideration the complexity of the
different sub-families composing the full set of
features. We prove new data-dependent learning
bounds for our models, expressed in terms of the
Rademacher complexities of these sub-families.
We also prove a duality theorem, which we use
to derive our Structural Maxent algorithm. We
give a full description of our algorithm, including
the details of its derivation, and report the results
of several experiments demonstrating that its performance
improves on that of existing L1-norm
regularized Maxent algorithms. We further similarly
define conditional Structural Maxent models
for multi-class classification problems. These
are conditional probability models also making
use of a union of possibly complex feature subfamilies.
We prove a duality theorem for these
models as well, which reveals their connection
with existing binary and multi-class deep boosting
algorithms.
View details
Learning ensembles of structured prediction rules
Preview
Vitaly Kuznetsov
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014)
Deep boosting
Proceedings of the Thirty-First International Conference on Machine Learning (ICML 2014)
Preview abstract
We present a new ensemble learning algorithm, DeepBoost, which can use as base classifiers a hypothesis set containing deep decision trees, or members of other rich or complex families, and succeed in achieving high accuracy without overfitting the data. The key to the success of the algorithm is a capacity-conscious criterion for the selection of the hypotheses. We give new data- dependent learning bounds for convex ensembles expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. Our algorithm directly benefits from these guarantees since it seeks to minimize the corresponding learning bound. We give a full description of our algorithm, including the details of its derivation, and report the results of several experiments showing that its performance compares favorably to that of AdaBoost and Logistic Regression and their L1-regularized variants.
View details
Ensemble methods for structured prediction
Preview
Vitaly Kuznetsov
Proceedings of the 31st International Conference on Machine Learning (ICML 2014)
Multi-class classification with maximum margin multiple kernel
Preview
Proceedings of the Thirtieth International Conference on Machine Learning (ICML 2013)
Learning kernels using local rademacher complexity
Marius Kloft
Advances in Neural Information Processing Systems (NIPS 2013), MIT Press.
Preview abstract
We use the notion of local Rademacher complexity to design new algorithms for learning kernels. Our algorithms thereby benefit from the sharper learning bounds based on that notion which, under certain general conditions, guarantee a faster convergence rate. We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail. We also re- port the results of experiments with both algorithms in both binary and multi-class classification tasks.
View details
Accuracy at the Top
Stephen Boyd
NIPS: Neural Information Processing Systems Foundation (2012)
Preview abstract
We introduce a new notion of classification accuracy based on the top τ -quantile
values of a scoring function, a relevant criterion in a number of problems arising
for search engines. We define an algorithm optimizing a convex surrogate of the
corresponding loss, and show how its solution can be obtained by solving a set
of convex optimization problems. We also present margin-based guarantees for
this algorithm based on the top τ -quantile of the scores of the functions in the
hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our algorithm and comparing the results to several other algorithms seeking high precision at the top. In most examples, our algorithm achieves a better performance in precision at the top.
View details
Algorithms for Learning Kernels Based on Centered Alignment
Preview
Journal of Machine Learning Research, vol. 13 (2012), pp. 795-828
Domain adaptation in regression
Preview
Proceedings of The 22nd International Conference on Algorithmic Learning Theory, ALT 2011, Springer, Heidelberg, Germany
Ensembles of Kernel Predictors
Preview
Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011)
A Dual Coordinate Descent Algorithm for SVMs Combined with Rational Kernels
Preview
International Journal of Foundations of Computer Science, vol. 22 (2011), pp. 1761-1779
On the Impact of Kernel Approximation on Learning Accuracy
Ameet Talwalkar
Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)
Preview abstract
Kernel approximation is commonly used to scale
kernel-based algorithms to applications containing
as many as several million instances. This
paper analyzes the effect of such approximations
in the kernel matrix on the hypothesis generated
by several widely used learning algorithms. We
give stability bounds based on the norm of the
kernel approximation for these algorithms, including
SVMs, KRR, and graph Laplacian-based
regularization algorithms. These bounds help determine
the degree of approximation that can be
tolerated in the estimation of the kernel matrix.
Our analysis is general and applies to arbitrary
approximations of the kernel matrix. However,
we also give a specific analysis of the Nystr¨om
low-rank approximation in this context and report
the results of experiments evaluating the
quality of the Nystr¨om low-rank kernel approximation
when used with ridge regression.
View details
Two-Stage Learning Kernel Algorithms
Proceedings of the 27th Annual International Conference on Machine Learning (ICML 2010)
Preview abstract
This paper examines two-stage techniques for
learning kernels based on a notion of alignment.
It presents a number of novel theoretical, algorithmic,
and empirical results for alignmentbased
techniques. Our results build on previous
work by Cristianini et al. (2001), but we adopt
a different definition of kernel alignment and
significantly extend that work in several directions:
we give a novel and simple concentration
bound for alignment between kernel matrices;
show the existence of good predictors for kernels
with high alignment, both for classification
and for regression; give algorithms for learning a
maximum alignment kernel by showing that the
problem can be reduced to a simple QP; and report
the results of extensive experimentswith this
alignment-based method in classification and regression
tasks, which show an improvement both
over the uniformcombination of kernels and over
other state-of-the-art learning kernel methods.
View details
Learning Bounds for Importance Weighting
Preview
Yishay Mansour
Advances in Neural Information Processing Systems (NIPS 2010), MIT Press, Vancouver, Canada
Half Transductive Ranking
Bing Bai
Jason Weston
David Grangier
Ronan Collobert
Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)
Preview abstract
We study the standard retrieval task of ranking
a fixed set of items given a previously unseen
query and pose it as the half transductive
ranking problem. The task is transductive
as the set of items is fixed. Transductive
representations (where the vector representation
of each example is learned) allow
the generation of highly nonlinear embeddings
that capture object relationships without
relying on a specific choice of features,
and require only relatively simple optimization.
Unfortunately, they have no direct outof-
sample extension. Inductive approaches
on the other hand allow for the representation
of unknown queries. We describe algorithms
for this setting which have the advantages
of both transductive and inductive approaches,
and can be applied in unsupervised
(either reconstruction-based or graph-based)
and supervised ranking setups. We show empirically
that our methods give strong performance
on all three tasks.
View details
Generalization Bounds for Learning Kernels
Proceedings of the 27th Annual International Conference on Machine Learning (ICML 2010)
Preview abstract
This paper presents several novel generalization
bounds for the problem of learning kernels based
on a combinatorial analysis of the Rademacher
complexity of the corresponding hypothesis sets.
Our bound for learning kernels with a convex
combination of p base kernels using L1 regularization
admits only a √log p dependency on the
number of kernels, which is tight and considerably
more favorable than the previous best bound
given for the same problem. We also give a novel
bound for learning with a non-negative combination
of p base kernels with an L2 regularization
whose dependency on p is also tight and only in
p^(1/4). We present similar results for Lq regularization
with other values of q, and outline the relevance
of our proof techniques to the analysis of
the complexity of the class of linear functions.
Experiments with a large number of kernels further
validate the behavior of the generalization
error as a function of p predicted by our bounds.
View details
Algorithms for Learning Kernels Based on Centered Alignment
Preview
Journal of Machine Learning Research, vol. 13 (2010), pp. 795-828
Invited talk: Can learning kernels help performance?
Preview
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, ACM, New York, NY, USA (2009), pp. 1-1
L2 Regularization for Learning Kernels
Preview
Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montr\'eal, Canada
Learning non-linear combinations of kernels
Preview
NIPS 2009, Advances in Neural Information Processing Systems, MIT Press
Polynomial semantic indexing
Preview
Bing Bai
Jason Weston
David Grangier
Ronan Collobert
Kunihiko Sadamasa
Yanjun Qi
Advances in Neural Information Processing Systems (NIPS 2009), MIT Press
Stability of Transductive Regression Algorithms
Preview
Dmitry Pechyony
Ashish Rastogi
Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML 2008), Helsinki, Finland
Kernel Methods for Learning Languages
Preview
Leonid Kontorovich
Theoretical Computer Science, vol. 405 (2008), pp. 223-236
Learning with weighted transducers
Preview
Proceedings of the Seventh International Workshop Finite-State Methods and Natural Language Processing (2008)
Learning sequence kernels
Preview
Proceedings of IEEE International Workshop on Machine Learning for Signal Processing (2008)
On the Computation of the Relative Entropy of Probabilistic Automata
Preview
Ashish Rastogi
International Journal of Foundations of Computer Science, vol. 19 (2008), pp. 219-242
A Machine Learning Framework for Spoken-Dialog Classification
Preview
Patrick Haffner
Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)
Sample Selection Bias Correction Theory
Preview
Proceedings of The 19th International Conference on Algorithmic Learning Theory (ALT 2008), Springer, Heidelberg, Germany, Budapest, Hungary
An Alternative Ranking Problem for Search Engines
Preview
Ashish Rastogi
Proceedings of the 6th Workshop on Experimental Algorithms (WEA 2007), Springer-Verlag, Heidelberg, Germany, Rome, Italy, pp. 1-21
On the Computation of the Relative Entropy of Probabilistic Automata
Preview
Ashish Rastogi
International Journal of Foundations of Computer Science, vol. to appear (2007)
A Machine Learning Framework for Spoken-Dialog Classification
Preview
Patrick Haffner
Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)
Learning Languages with Rational Kernels
Preview
Leonid Kontorovich
Proceedings of The 20th Annual Conference on Computational Learning Theory (COLT 2007), Springer, Heidelberg, Germany, San Diego, California
L_p Distance and Equivalence of Probabilistic Automata
Preview
Ashish Rastogi
International Journal of Foundations of Computer Science, vol. to appear (2007)
Lp Distance and Equivalence of Probabilistic Automata
Preview
Ashish Rastogi
International Journal of Foundations of Computer Science, vol. 18 (2007)
On Transductive Regression
Preview
Advances in Neural Information Processing Systems (NIPS 2006), MIT Press, Vancouver, Canada (2007)
Magnitude-Preserving Ranking Algorithms
Preview
Ashish Rastogi
Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML 2007), Oregon State University, Corvallis, OR
Lp Distance and Equivalence of Probabilistic Automata
Preview
Ashish Rastogi
International Journal of Foundations of Computer Science, vol. 18 (2007), pp. 761-780
On the Computation of Some Standard Distances between Probabilistic Automata
Preview
Ashish Rastogi
Proceedings of the 11th International Conference on Implementation and Application of Automata (CIAA 2006), Springer-Verlag, Heidelberg, Germany, Taipei, Taiwan
Efficient Computation of the Relative Entropy of Probabilistic Automata
Preview
Ashish Rastogi
Proceedings of the 7th Latin American Symposium (LATIN 2006), Springer-Verlag, Heidelberg, Germany, Valdivia, Chile
Learning Linearly Separable Languages
Preview
Leonid Kontorovich
Proceedings of The 17th International Conference on Algorithmic Learning Theory (ALT 2006), Springer, Heidelberg, Germany
Finite-State Transducers in Computational Biology
Preview
Tutorial presented at the 13th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB 2005), Detroit, MI
Margin-Based Ranking Meets Boosting in the Middle
Preview
Cynthia Rudin
Robert E. Schapire
Proc. of the 18th Annual Conference on Computational Learning Theory (COLT 2005), Springer, Heidelberg, Germany, pp. 63-78
Confidence Intervals for the Area under the ROC Curve
Preview
Advances in Neural Information Processing Systems (NIPS 2004), MIT Press, Vancouver, Canada (2005)
Margin-Based Ranking Meets Boosting in the Middle
Preview
Cynthia Rudin
Robert E. Schapire
Proceedings of The 18th Annual Conference on Computational Learning Theory (COLT 2005), Springer, Heidelberg, Germany, Bertinoro, Italy, pp. 63-78
A General Regression Technique for Learning Transductions
Preview
Jason Weston
Proceedings of the Twenty-Second International Conference on Machine Learning (ICML 2005), Bonn, Germany
Distribution Kernels Based on Moments of Counts
Preview
Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada
Rational Kernels: Theory and Algorithms
Preview
Patrick Haffner
Journal of Machine Learning Research (JMLR), vol. 5 (2004), pp. 1035-1062
Hancock: A language for analyzing transactional data streams
Preview
Kathleen Fisher
Daryl Pregibon
Anne Rogers
Frederick Smith
ACM Trans. Program. Lang. Syst., vol. 26 (2004), pp. 301-338
Voted Kernel Regularization
Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions
Stability Analysis and Learning Bounds for Transductive Regression Algorithms
New Generalization Bounds for Learning Kernels
Learning Languages with Rational Kernels
Magnitude-preserving ranking algorithms
An Alternative Ranking Problem for Search Engines
Efficient Computation of the Relative Entropy of Probabilistic Automata
On the Computation of Some Standard Distances Between Probabilistic Automata
On Transductive Regression
Learning Linearly Separable Languages
Moment Kernels for Regular Distributions
Rational Kernels: Theory and Algorithms
Patrick Haffner
Journal of Machine Learning Research, vol. 5 (2004), pp. 1035-1062
AUC Optimization vs. Error Rate Minimization
Advances in Neural Information Processing Systems (NIPS 2003), MIT Press, Vancouver, Canada (2004)
Distribution kernels based on moments of counts
Confidence Intervals for the Area Under the ROC Curve
Lattice Kernels for Spoken-Dialog Classification
Patrick Haffner
Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong
Positive Definite Rational Kernels
Patrick Haffner
Proceedings of The 16th Annual Conference on Computational Learning Theory (COLT 2003), Springer, Heidelberg, Germany, Washington D.C.
AUC Optimization vs. Error Rate Minimization
Lattice kernels for spoken-dialog classification
Rational Kernels
Patrick Haffner
Advances in Neural Information Processing Systems (NIPS 2002), MIT Press, Vancouver, Canada (2003)
Weighted automata kernels - general framework and algorithms
Weighted Automata Kernels -- General Framework and Algorithms
Patrick Haffner
Proceedings of the 9th European Conference on Speech Communication and Technology (Eurospeech '03), Special Session Advanced Machine Learning Algorithms for Speech and Language Processing, Geneva, Switzerland (2003)
Positive Definite Rational Kernels
Rational Kernels
Communities of interest
Signature-Based Methods for Data Streams
Communities of Interest
Hancock: a language for extracting signatures from data streams
Context-Free Recognition with Weighted Automata
Information Mining Platforms: An Infrastructure for KDD Rapid Deployment
Context-Free Recognition with Weighted Automata
Proceedings of the Sixth Meeting on Mathematics of Language (MOL6), Orlando, Florida (1999)
Squashing Flat Files Flatter
William DuMouchel
Chris Volinsky
Theodore Johnson
Daryl Pregibon
KDD (1999), pp. 6-15
Giga-Mining
Capacity and Complexity Control in Predicting the Spread Between Borrowing and Lending Interest Rates
Support-Vector Networks
Limits on Learning Machine Accuracy Imposed by Data Quality
Boosting Decision Trees
Boosting and Other Machine Learning Algorithms
Limits in Learning Machine Accuracy Imposed by Data Quality
Learning Curves: Asymptotic Values and Rate of Convergence
Lawrence D. Jackel
Sara A. Solla
Vladimir Vapnik
John S. Denker
NIPS (1993), pp. 327-334