Jump to Content
Andrea Gesmundo

Andrea Gesmundo

Research Areas

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Multi-task neural networks, when trained successfully, can learn to leverage related concepts from different tasks by using weight sharing. Sharing parameters between highly unrelated tasks can hurt both of them, so a strong multi-task model should be able to control the amount of weight sharing between pairs of tasks, and flexibly adapt it to their relatedness. In recent works, routing networks have shown strong performance in a variety of settings, including multi-task learning. However, optimization difficulties often prevent routing models from unlocking their full potential. In this work, we propose a novel routing method, specifically designed for multi-task learning, where routing is optimized jointly with the model parameters by standard backpropagation. We show that it can discover related pairs of tasks, and improve accuracy over strong baselines. In particular, on multi-task learning for the Omniglot dataset our method reduces the state-of-the-art error rate by $17\%$. View details
    Routing Networks with Co-training for Continual Learning
    Mark Patrick Collier
    Jesse Berent
    ICML 2020 Workshop on Continual Learning (to appear)
    Preview abstract Many continual learning methods can be characterized as either altering the learning algorithm in a fixed capacity neural network or dynamically growing the capacity of the network to handle new tasks. We propose to use fixed capacity sparse routing networks for continual learning. We retain the advantages of architectural solutions to the continual learning problem, in that different paths through the network can be learned for different tasks. However, we stay within the regime of fixed capacity networks which are more realistic for real-world use cases. We find it is necessary to develop a new training method for routing networks, which we call co-training which avoids poorly initialized experts when new tasks are presented. In initial experiments, when combined with a small episodic memory replay buffer, sparse routing networks with co-training outperform densely connected networks on the MNIST-Permutations and MNIST-Rotations benchmarks. View details
    Ranking architectures using meta-learning
    Alina Dubatovka
    Jesse Berent
    NeurIPS Workshop on Meta-Learning (MetaLearn 2019) (to appear)
    Preview abstract Neural architecture search has recently attracted lots of research efforts as it promises to automate the manual design of neural networks. However, it requires a large amount of computing resources and in order to alleviate this, a performance prediction network has been recently proposed that enables efficient architecture search by forecasting the performance of candidate architectures, instead of relying on actual model training. The performance predictor is task-aware taking as input not only the candidate architecture but also task meta-features and it has been designed to collectively learn from several tasks. In this work, we introduce a pairwise ranking loss for training a network able to rank candidate architectures for a new unseen task conditioning on its task meta-features. We present experimental results, showing that the ranking network is more effective in architecture search than the previously proposed performance predictor. View details
    Preview abstract The timing of individual neuronal spikes is essential for biological brains to make fast responses to sensory stimuli. However, conventional artificial neural networks lack the intrinsic dimension of temporal coding present in biological networks. We propose a spiking neural network model that encodes information in the relative timing of individual neuron spikes. An image can be encoded in this manner by an input layer where each neuron spikes at a time proportional to the brightness of an individual pixel. In classification tasks, the output of the network is indicated by the first neuron to spike in the output layer. By encoding information in time in this manner, we are able to train the network to perform supervised learning with backpropagation, using exact derivatives of the postsynaptic spike times with respect to presynaptic spike times. The network operates using a biologically-plausible alpha synaptic transfer function. Additionally, we use trainable synchronisation pulses that provide bias, add more flexibility during the training process and allow the exploitation of the decay part of the alpha function. We show that such spiking networks can be trained successfully on noisy temporal Boolean logic problems. Moreover, they perform better than comparable spiking models on the MNIST benchmark when encoded in time. During training, we find that the network spontaneously discovers two operating regimes: a slow regime, where a decision is taken after all hidden neurons have spiked and the accuracy is very high, and a fast regime, where a decision is taken very fast but the accuracy is lower. These results demonstrate the computational power of spiking networks with biological characteristics that encode information in the timing of individual neurons. By studying temporal coding in spiking networks, we aim to create building blocks towards energy-efficient, state-based and more complex biologically-inspired neural architectures. View details
    Parameter Efficient Transfer Learning for NLP
    Andrei Giurgiu
    Stanisław Kamil Jastrzębski
    Bruna Halila Morrone
    Mona Attariyan
    Sylvain Gelly
    ICML (2019)
    Preview abstract Fine-tuning large pretrained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can be added without revisiting previous ones. The parameters of the original network remain fixed, yielding a high degree of parameter sharing. To demonstrate adapter's effectiveness, we transfer the recently proposed BERT Transformer model to 26 diverse text classification tasks, including the GLUE benchmark. Adapters attain near state-of-the-art performance, whilst adding only a few parameters per task. On GLUE, we attain within 0.8% of the performance of full fine-tuning, adding only 3.6% parameters per task. By contrast, fine-tuning trains 100% of the parameters per task. View details
    Preview abstract Neural architecture search has been shown to hold great promise towards the automation of deep learning. However in spite of its potential, neural architecture search remains quite costly. To this point, we propose a novel gradient-based framework for efficient architecture search by sharing information across several tasks. We start by training many model architectures on several related (training) tasks. When a new unseen task is presented, the framework performs architecture inference in order to quickly identify a good candidate architecture, before any model is trained on the new task. At the core of our framework lies a deep value network that can predict the performance of input architectures on a task by utilizing task meta-features and the previous model training experiments performed on related tasks. We adopt a continuous parametrization of the model architecture which allows for efficient gradient-based optimization. Given a new task, an effective architecture is quickly identified by maximizing the estimated performance with respect to the model architecture parameters with simple gradient ascent. It is key to point out that our goal is to achieve reasonable performance at the lowest cost. We provide experimental results showing the effectiveness of the framework despite its high computational efficiency. View details
    Preview abstract We reduce the computational cost of Neural AutoML with transfer learning. AutoML relieves human effort by automating the design of ML algorithms. Neural AutoML has become popular for the design of deep learning architectures, however, this method has a high computation cost.To address this we propose Transfer Neural AutoML that uses knowledge from prior tasks to speed up network design. We extend RL-based architecture search methods to support parallel training on multiple tasks and then transfer the search strategy to new tasks. On language and image classification data, Transfer Neural AutoML reduces convergence time over single-task training by over an order of magnitude on many tasks. View details
    Preview abstract We frame Question Answering (QA) as a Reinforcement Learning task, an approach that we call Active Question Answering. We propose an agent that sits between the user and a black box QA system and learns to reformulate questions to elicit the best possible answers. The agent probes the system with, potentially many, natural language reformulations of an initial question and aggregates the returned evidence to yield the best answer. The reformulation system is trained end-to-end to maximize answer quality using policy gradient. We evaluate on SearchQA, a dataset of complex questions extracted from Jeopardy!. The agent outperforms a state-of-the-art base model, playing the role of the environment, and other benchmarks. We also analyze the language that the agent has learned while interacting with the question answering system. We find that successful question reformulations look quite different from natural language paraphrases. The agent is able to discover non-trivial reformulation strategies that resemble classic information retrieval techniques such as term re-weighting (tf-idf) and stemming. View details
    Preview abstract We analyze the language learned by an agent trained with reinforcement learning as a component of the ActiveQA system [Buck et al., 2017]. In ActiveQA, question answering is framed as a reinforcement learning task in which an agent sits between the user and a black box question-answering system. The agent learns to reformulate the user's questions to elicit the optimal answers. It probes the system with many versions of a question that are generated via a sequence-to-sequence question reformulation model, then aggregates the returned evidence to find the best answer. This process is an instance of machine-machine communication. The question reformulation model must adapt its language to increase the quality of the answers returned, matching the language of the question answering system. We find that the agent does not learn transformations that align with semantic intuitions but discovers through learning classical information retrieval techniques such as tf-idf re-weighting and stemming. View details
    Projecting the Knowledge Graph to Syntactic Parsing
    Keith Hall
    EACL 2014: 15th Conference of the European Chapter of the Association for Computational Linguistics
    Preview
    No Results Found