1. Introduction
2. Basic information
3. Download and general notes
4. Command reference table
5. Basic usage/data formats
6. Data management
7. Summary stats
8. Inclusion thresholds
9. Population stratification
10. IBS/IBD estimation
11. Association
12. Family-based association
13. Permutation procedures
14. LD calculations
15. Multimarker tests
16. Conditional haplotype tests
17. Proxy association
18. Imputation (beta)
19. Dosage data
20. Meta-analysis
21. Annotation
22. LD-based results clumping
23. Gene-based report
24. Epistasis
25. Rare CNVs
26. Common CNPs
27. R-plugins
28. Annotation web-lookup
29. Simulation tools
30. Profile scoring
31. ID helper
32. Resources
33. Flow-chart
34. Miscellaneous
35. FAQ & Hints
36. gPLINK
|
|
Epistasis testing
This page contains extra details on the test for epistasis implemented
in the --fast-epistasis command, designed for the detection
of SNPxSNP pairwise interactions in large-scale case-control
association studies.
This test is based on a Z-score for the difference in SNP-SNP
assocation (odds ratio) between cases and controls (or in cases only,
in a case-only analysis).
We follow the procedure for constructing an allelic test of a
single locus, twice collapsing three genotype categories into two
allele categories. Specifically, we count the 4N independent alleles
observed at two loci in a sample of N individuals into a 2x2 table,
following the logic below, so the allele (not the individual or
haplotype) is the unit of analysis.
BB Bb bb
AA a b c
Aa d e f
aa g h i
We first count alleles at one locus, e.g. B, conditional on the genotype at A, which can be represented as a 3x2 table:
B b
AA 2a+b 2c+b
Aa 2d+e 2f+e
aa 2g+h 2i+h
which represents 2N alleles, not N individuals. We again collapse this
3x2 table into a 2x2 table, as follows
B b
A 4a+2b+2d+e 4c+2b+2f+e
a 4g+2h+2d+e 4i+2h+2f+e
Based on this 2x2 table, the odds ratio between loci A and B and its
standard error are calculated in the standard manner. When cases and
controls are present, the above procedure is performed separately in
cases and controls, and the test for epistasis is the difference of
the two odds ratios:
Z = ( log(R) - log(S) ) / sqrt( SE(R) + SE(S) )
where R and S are the odds ratios in cases and
controls respectively, estimated as ab/cd with
variance 1/a+1/b+1/c+1/d and a,b,c,d are the four
cells of the 2x2 table above. This test follows a standard normal distribution
under the multiplicative model of no interaction.
Note that, despite superficial similarity to a table of 2N haplotypes
(AB, Ab, aB and ab), this table
ignores phase, i.e. we are not attempting to resolve phase
for Aa/Bb individuals. Rather, given 4N independent alleles
(assuming Hardy-Weinberg and linkage equilibrium for the two test
loci), these 4N observations are simply counted following the scheme
given above, that partitions the 4N counts into a 2x2 table. Whilst an
inexact heuristic, we observe appropriate type I error rates in
simulation (see below) and equivalent power to the logistic regression
test. The correlation with a logistic regression analysis is very high
(r = 0.995, based on -log10 P-value).
This table shows the type-I error of the case-control epistasis
test. We considered three models that included no interaction between
two unlinked SNPs and no marginal SNP effect (model 1) or a strong
effect for one (model 2) or both SNPs (model 3). Type-I error is based
on the analysis of 100,000 simulated datasets (disease prevalence =
0.01, minor allele frequency = 0.1 for both loci).
Model Marginal SNP effects Nominal alpha
(Odds Ratio)
SNP A SNP B 0.05 0.0005
1 1.0 1.0 0.04750 0.00030
2 1.0 1.4 0.04941 0.00030
3 1.4 1.4 0.04817 0.00048
The power to detect a large interaction effect (GRR = 2) and no
marginal single SNP effects was 0.74 (a = 1.2e-12; disease prevalence
= 0.01, MAF = 0.1 for both loci). Power for other two-locus models
can be estimated using the power calculator available through the
genetic power calculator,
GPC.
Our procedure assumes Hardy-Weinberg and linkage equilibrium for the
two SNPs hold in the population. However, simulation studies have
shown the case/control test to be very robust to deviations from the
linkage equilibrium assumption, whereas a case-only test is not (data
not shown). Analogous to adopting an allelic single locus test, we
also assume an allelic mode of gene action where any interaction term
represents an allele-by-allele effect, not a genotype-by-genotype
effect.
HINT If you use this to screen a large number of
SNPs, you should probably report the more standard logistic regression
test value also. In practice, both approaches usually give similar
results, which justifies the use of --fast-epistasis as a
screening tool for a computationally-demanding problem. Of course,
given a specific (and often extreme) threshold, --epi1, the
exact above-threshold list of SNPs will not always be the same; if you
choose to use this approach, it is probably wise to apply it to select
a subset of pairs of SNPs below a reasonably liberal --epi1
threshold to be tested with the more standard --epistasis
command.
|
|