When no answer is better than a wrong answer: A causal perspective on batch effects
When you pool data from many studies into one big analysis, the standard tools for cleaning up site-to-site differences can quietly invent or hide effects that aren't really there — this paper shows why, reframes the problem in terms of cause and effect, and offers methods that, when the data are too tangled to give a real answer, simply say so.
Abstract
Batch effects, undesirable sources of variability across multiple experiments, present significant challenges for scientific and clinical discoveries. Batch effects can (i) produce spurious signals and/or (ii) obscure genuine signals, contributing to the ongoing reproducibility crisis. Because batch effects are typically modeled as classical statistical effects, they often cannot differentiate between sources of variability due to confounding biases, which may lead them to erroneously conclude batch effects are present (or not). We formalize batch effects as causal effects, and introduce algorithms leveraging causal machinery, to address these concerns. Simulations illustrate that when non-causal methods provide the wrong answer, our methods either produce more accurate answers or “no answer”, meaning they assert the data are an inadequate to confidently conclude on the presence of a batch effect. Applying our causal methods to 27 neuroimaging datasets yields qualitatively similar results: in situations where it is unclear whether batch effects are present, non-causal methods confidently identify (or fail to identify) batch effects, whereas our causal methods assert that it is unclear whether there are batch effects or not. In instances where batch effects should be discernable, our techniques produce different results from prior art, each of which produce results more qualitatively similar to not applying any batch effect correction to the data at all. This work therefore provides a causal framework for understanding the potential capabilities and limitations of analysis of multi-site data.
@article{bridgeford2025when,
title = {When No Answer Is Better than a Wrong Answer: {{A}} Causal Perspective on Batch Effects},
shorttitle = {When No Answer Is Better than a Wrong Answer},
author = {Bridgeford, Eric W. and Powell, Michael and Kiar, Gregory and Noble, Stephanie and Chung, Jaewon and Panda, Sambit and Lawrence, Ross and Xu, Ting and Milham, Michael and Caffo, Brian and Vogelstein, Joshua T.},
year = 2025,
month = jan,
journal = {Imaging Neuroscience},
volume = {3},
pages = {imag\_a\_00458},
issn = {2837-6056},
doi = {10.1162/imag_a_00458},
}