In practice, machine learning systems deal with multiple datasets over time. When the feature spaces between these datasets overlap, it is possible to transfer information from one task to another. Typically in transfer learning, all labeled data from a source task is saved to be applied to a new target task thereby raising concerns of privacy, memory and scaling. To ameliorate such concerns, we present a semi-supervised algorithm for text categorization that transfers information across tasks without storing the data of the source task. In particular, our technique learns a sparse low-dimensional projection from unlabeled and the source task data. In particular, our technique learns low-dimensional sparse word clusters-based features from the source task data and a massive amount of additional unlabeled data. Our algorithm is efﬁcient, highly parallelizable, and outperforms competitive baselines by up to 9% on several difﬁcult benchmark text categorization tasks.