AI

Multilingual Word Embeddings using Multigraphs

Abstract

We present a family of neural-network–inspired models for computing continuous word representation, specifically designed to exploit monolingual and multilingual text, without and with annotations (syntactic dependencies, word alignments, etc.). We find that this framework allows us to train embeddings with significantly higher accuracy on syntactic and semantic compositionality, as well as multilingual semantic similarity, compared to previous models. We also show that some of these embeddings can be used to improve the performance of a state-of-the-art machine translation system for words outside the vocabulary of the parallel training data.