Title: | A Combined Association Test for Genes using Summary Statistics |
---|---|
Description: | Genome-wide association studies (GWAS) have been widely used for identifying common variants associated with complex diseases. Due to the small effect sizes of common variants, the power to detect individual risk variants is generally low. Complementary to SNP-level analysis, a variety of gene-based association tests have been proposed. However, the power of existing gene-based tests is often dependent on the underlying genetic models, and it is not known a priori which test is optimal. Here we proposed COMBined Association Test (COMBAT) to incorporate strengths from multiple existing gene-based tests, including VEGAS, GATES and simpleM. Compared to individual tests, COMBAT shows higher overall performance and robustness across a wide range of genetic models. The algorithm behind this method is described in Wang et al (2017) <doi:10.1534/genetics.117.300257>. |
Authors: | Minghui Wang, Yiyuan Liu, Shizhong Han |
Maintainer: | Minghui Wang <[email protected]> |
License: | GPL-2 |
Version: | 0.0.4 |
Built: | 2024-11-04 02:47:08 UTC |
Source: | https://github.com/mw201608/combat |
This function implements a combined gene-based association test using SNP-level P values and reference genotype data.
COMBAT(x, snp.ref, vegas.pct = c(0.1,0.2,0.3,0.4,1), pca_cut_perc = 0.995, nperm = 100, seed=12345, ncores=1)
COMBAT(x, snp.ref, vegas.pct = c(0.1,0.2,0.3,0.4,1), pca_cut_perc = 0.995, nperm = 100, seed=12345, ncores=1)
x |
a vector of SNP-level P values. |
snp.ref |
a matrix of SNP genotypes (coded as allele counts) from reference samples, with samples in rows and SNPs in columns. |
vegas.pct |
a numeric vector, fraction of the top SNPs to be used in the VEGAS method. |
pca_cut_perc |
numeric, cutoff for percentage of sum of eigen values in the simpleM approach. |
nperm |
number of permutations for computing the correlation between P values of different tests. |
seed |
random seed to derive consistent outcome. |
ncores |
number of CPU cores for parallel computing. |
COMBAT uses simulation and the extended Simes procedure (ext_simes
) to combine multiple gene-based association test statistics (currently including gates
, vegas
, and simpleM
) to perform a more powerful association analysis. This method does not require raw genotype or phenotype data, but needs only SNP-level P-values and correlations between SNPs from ancestry-matched samples. The technical details about the method is described in Wang et al (2017) <doi:10.1534/genetics.117.300257>.
A vector of p-values from COMAT and each individual gene-based test.
Minghui Wang, Jianfei Huang, Yiyuan Liu, Li Ma, James B. Potash, Shizhong Han. COMBAT: A Combined Association Test for Genes using Summary Statistics. Genetics 2017, 207(3): 883-891. https://doi.org/10.1534/genetics.117.300257.
ext_simes
, gates
, vegas
, simpleM
.
# read SNP P values file1 <- paste(path.package("COMBAT"),"extdata","SNP_info.txt.gz",sep="/") snp.info <- read.table(file1, header = TRUE, as.is=TRUE) snp.pvals <- as.matrix(snp.info[,2]) # read reference genotype file2 <- paste(path.package("COMBAT"),"extdata","SNP_ref.txt.gz",sep="/") snp.ref <- read.table(file2, header = TRUE) snp.ref <- as.matrix(snp.ref) #call COMBAT COMBAT(snp.pvals, snp.ref, nperm=100, ncores=2)
# read SNP P values file1 <- paste(path.package("COMBAT"),"extdata","SNP_info.txt.gz",sep="/") snp.info <- read.table(file1, header = TRUE, as.is=TRUE) snp.pvals <- as.matrix(snp.info[,2]) # read reference genotype file2 <- paste(path.package("COMBAT"),"extdata","SNP_ref.txt.gz",sep="/") snp.ref <- read.table(file2, header = TRUE) snp.ref <- as.matrix(snp.ref) #call COMBAT COMBAT(snp.pvals, snp.ref, nperm=100, ncores=2)
Combine a vector of test P values by correction for number of independent tests.
ext_simes(x, cor_r)
ext_simes(x, cor_r)
x |
a vector of SNP-level P values. |
cor_r |
correlation among P values. |
P value.
# see ?COMBAT
# see ?COMBAT
Several gene-based association tests methods are implemented.
gates(x, cor_G) vegas(x, cor_G, vegas.pct=c(0.1,0.2,0.3,0.4,1), max.simulation=1e6) simpleM(x, cor_G, pca_cut_perc=0.995)
gates(x, cor_G) vegas(x, cor_G, vegas.pct=c(0.1,0.2,0.3,0.4,1), max.simulation=1e6) simpleM(x, cor_G, pca_cut_perc=0.995)
x |
a vector of SNP-level P values. |
cor_G |
SNP-SNP correlation matrix. |
vegas.pct |
a numeric vector, specifying the fraction of the top SNPs to be used in the VEGAS method. |
max.simulation |
maximum number of simulations to be performed. Must be at least 1e6. |
pca_cut_perc |
cutoff for percentage of sum of eigen values. |
Function gates
implements the GATES method (Li et al 2011, American Journal of Human Genetics 88:283-293), vegas
implements VEGAS with different proportion tests (Liu et al 2010, American Journal of Human Genetics 87:139-145), and simpleM
is the simpleM method (Gao et al 2008, Genetic Epidemiology 32:361-369).
P value(s).
# read SNP P values file1 <- paste(path.package("COMBAT"),"extdata","SNP_info.txt.gz",sep="/") snp.info <- read.table(file1, header = TRUE, as.is=TRUE) snp.pvals <- as.matrix(snp.info[,2]) # read reference genotype file2 <- paste(path.package("COMBAT"),"extdata","SNP_ref.txt.gz",sep="/") snp.ref <- read.table(file2, header = TRUE) snp.ref <- as.matrix(snp.ref) #compute correlation among SNPs cor_G <- ld.Rsquare(snp.ref) #call gates (pval_gates <- gates(x=snp.pvals, cor_G=cor_G)) #call vegas (pval_vegas <- vegas(x=snp.pvals, cor_G=cor_G)) #call simpleM (pval_simpleM <- simpleM(x=snp.pvals, cor_G=cor_G))
# read SNP P values file1 <- paste(path.package("COMBAT"),"extdata","SNP_info.txt.gz",sep="/") snp.info <- read.table(file1, header = TRUE, as.is=TRUE) snp.pvals <- as.matrix(snp.info[,2]) # read reference genotype file2 <- paste(path.package("COMBAT"),"extdata","SNP_ref.txt.gz",sep="/") snp.ref <- read.table(file2, header = TRUE) snp.ref <- as.matrix(snp.ref) #compute correlation among SNPs cor_G <- ld.Rsquare(snp.ref) #call gates (pval_gates <- gates(x=snp.pvals, cor_G=cor_G)) #call vegas (pval_vegas <- vegas(x=snp.pvals, cor_G=cor_G)) #call simpleM (pval_simpleM <- simpleM(x=snp.pvals, cor_G=cor_G))
Compute linkage disequilibrium among SNPs using correlation coefficients.
ld.Rsquare(x)
ld.Rsquare(x)
x |
a matrix of SNP genotypes with samples in the rows. |
A positive definite correlation matrix.