GMM-based procedure for multiple hypotheses testing
摘要
Multiple hypotheses testing has been widely studied in the literature due to its broad applicability, particularly in the fields of biogenetics and astrogeology. The false discovery rate (FDR) is a useful error control criterion for large-scale multiple hypotheses, which is loosely defined as the expected proportion of false positives among all rejected hypotheses. In this paper, we propose a Gaussian mixture model (GMM) to fit the distribution of the Z-value statistics, including the nulls distribution as a fixed component. The nulls proportion and the real nulls distribution are estimated by the fitted GMM simultaneously. A GMM-based procedure is then proposed to minimize the false nondiscovery rate (FNR) subject to a constraint on the FDR. Both simulations and real data analysis show that the GMM-based procedure performs considerably well comparing to some competitors.
