Correcting for Batch Effects Using Wasserstein Distance


Profiling cellular phenotypes from microscopic imaging can provide meaningful biological information resulting from various factors affecting the cells. The general approach is to find a function mapping the images an embedding space of manageable dimensionality whose geometry captures relevant features of the input images. An important known issue for such methods is separating relevant biological signal from nuisance variation. For example, the embedding vectors tend to be more correlated for cells from the same domain. We develop a method for adjusting the cellular image embedding in order to ‘forget’ domain-specific information. To do this, we minimize a loss function based on the Wasserstein distance. We observe (1) improvement in the detection of biological signal in the transformed embeddings and (2) a decreased ability to discern domain via a classifier.