Image analysis tasks such as classication, clustering, detection, and retrieval are only as good as the feature representation of the images they use. Much research in computer vision is focused on finding better or semantically richer image representations. Bag of visual Words (BoW) is a representation that has emerged as an eective one for a variety of computer vision tasks. BoW methods traditionally use low level features. We have devised a strategy to use these low level features to create \higher level" features by making use of the spatial context in images. In this paper, we propose a novel hierarchical feature learning framework that uses a Naive Bayes Clustering algorithm to convert a 2-D symbolic image at one level to a 2-D symbolic image at the next level with richer features. On two popular datasets, Pascal VOC 2007 and Caltech 101, we empirically show that classication accuracy obtained from the hierarchical features computed using our approach is signicantly higher than the traditional SIFT based BoW representation of images even though our image representations are more compact.