AI

CogALex-V Shared Task: GHHH - Detecting Semantic Relations via Word Embeddings

Abstract

This paper describes our system submitted to the CogALex-2016 Shared Task on the Corpus-Based Identification of Semantic Relations. The evaluation results of our system on the test set are 88.1\% (79.0\% for TRUE only) f-measure for Task-1 on detecting semantic similarity, and 76.0\% (42.3\% when excluding RANDOM) for Task-2 on identifying more finer grained semantic relations. In our experiments, we try word analogy, linear regression, and multi-task Convolutional Neural Networks (CNN) with word embeddings from publicly available word vectors. We found that linear regression performs better in binary classification (Task-1), while CNN has better performance in multi-class semantic classification (Task-2). We assume that word analogy is more suited for deterministic answers rather than handling the ambiguity of one-to-many and many-to-many relationships. We also show that classifier performance could benefit from balancing the frequency of labels in the training data.