Multivariate Independence and k-sample Testing
Two years of work on multivariate hypothesis testing — distance correlation and MGC, and Nonparametric MANOVA — pulled into one framework, with the open-source hyppo package as its software contribution.
Abstract
With the increase in the amount of data in many fields, a method to consistently and efficiently decipher relationships within high dimensional data sets is important. Because many modern datasets are multivariate, univariate tests are not applicable. While many multivariate independence tests have R packages available, the interfaces are inconsistent and most are not available in Python. We introduce hyppo, which includes many state of the art multivariate testing procedures. This thesis provides details for the implementations of each of the tests within a test hyppo as well as extensive power and run-time benchmarks on a suite of high-dimensional simulations previously used in different publications. The documentation and all releases for hyppo are available at https://hyppo.neurodata.io.
Advisor: Joshua T. Vogelstein
@phdthesis{panda2020multivariate,
title = {Multivariate {{Independence}} and K-Sample {{Testing}}},
author = {Panda, Sambit},
year = 2020,
month = may,
langid = {american},
school = {Johns Hopkins University},
}