1. Introduction
2. Basic information
3. Download and general notes
4. Command reference table
5. Basic usage/data formats
6. Data management
7. Summary stats
8. Inclusion thresholds
9. Population stratification
10. IBS/IBD estimation
11. Association
12. Familybased association
13. Permutation procedures
14. LD calculations
15. Multimarker tests
16. Conditional haplotype tests
17. Proxy association
18. Imputation (beta)
19. Dosage data
20. Metaanalysis
21. Annotation
22. LDbased results clumping
23. Genebased report
24. Epistasis
25. Rare CNVs
26. Common CNPs
27. Rplugins
28. Annotation weblookup
29. Simulation tools
30. Profile scoring
31. ID helper
32. Resources
33. Flowchart
34. Miscellaneous
35. FAQ & Hints
36. gPLINK


Epistasis testing
This page contains extra details on the test for epistasis implemented
in the fastepistasis command, designed for the detection
of SNPxSNP pairwise interactions in largescale casecontrol
association studies.
This test is based on a Zscore for the difference in SNPSNP
assocation (odds ratio) between cases and controls (or in cases only,
in a caseonly analysis).
We follow the procedure for constructing an allelic test of a
single locus, twice collapsing three genotype categories into two
allele categories. Specifically, we count the 4N independent alleles
observed at two loci in a sample of N individuals into a 2x2 table,
following the logic below, so the allele (not the individual or
haplotype) is the unit of analysis.
BB Bb bb
AA a b c
Aa d e f
aa g h i
We first count alleles at one locus, e.g. B, conditional on the genotype at A, which can be represented as a 3x2 table:
B b
AA 2a+b 2c+b
Aa 2d+e 2f+e
aa 2g+h 2i+h
which represents 2N alleles, not N individuals. We again collapse this
3x2 table into a 2x2 table, as follows
B b
A 4a+2b+2d+e 4c+2b+2f+e
a 4g+2h+2d+e 4i+2h+2f+e
Based on this 2x2 table, the odds ratio between loci A and B and its
standard error are calculated in the standard manner. When cases and
controls are present, the above procedure is performed separately in
cases and controls, and the test for epistasis is the difference of
the two odds ratios:
Z = ( log(R)  log(S) ) / sqrt( SE(R) + SE(S) )
where R and S are the odds ratios in cases and
controls respectively, estimated as ab/cd with
variance 1/a+1/b+1/c+1/d and a,b,c,d are the four
cells of the 2x2 table above. This test follows a standard normal distribution
under the multiplicative model of no interaction.
Note that, despite superficial similarity to a table of 2N haplotypes
(AB, Ab, aB and ab), this table
ignores phase, i.e. we are not attempting to resolve phase
for Aa/Bb individuals. Rather, given 4N independent alleles
(assuming HardyWeinberg and linkage equilibrium for the two test
loci), these 4N observations are simply counted following the scheme
given above, that partitions the 4N counts into a 2x2 table. Whilst an
inexact heuristic, we observe appropriate type I error rates in
simulation (see below) and equivalent power to the logistic regression
test. The correlation with a logistic regression analysis is very high
(r = 0.995, based on log10 Pvalue).
This table shows the typeI error of the casecontrol epistasis
test. We considered three models that included no interaction between
two unlinked SNPs and no marginal SNP effect (model 1) or a strong
effect for one (model 2) or both SNPs (model 3). TypeI error is based
on the analysis of 100,000 simulated datasets (disease prevalence =
0.01, minor allele frequency = 0.1 for both loci).
Model Marginal SNP effects Nominal alpha
(Odds Ratio)
SNP A SNP B 0.05 0.0005
1 1.0 1.0 0.04750 0.00030
2 1.0 1.4 0.04941 0.00030
3 1.4 1.4 0.04817 0.00048
The power to detect a large interaction effect (GRR = 2) and no
marginal single SNP effects was 0.74 (a = 1.2e12; disease prevalence
= 0.01, MAF = 0.1 for both loci). Power for other twolocus models
can be estimated using the power calculator available through the
genetic power calculator,
GPC.
Our procedure assumes HardyWeinberg and linkage equilibrium for the
two SNPs hold in the population. However, simulation studies have
shown the case/control test to be very robust to deviations from the
linkage equilibrium assumption, whereas a caseonly test is not (data
not shown). Analogous to adopting an allelic single locus test, we
also assume an allelic mode of gene action where any interaction term
represents an allelebyallele effect, not a genotypebygenotype
effect.
HINT If you use this to screen a large number of
SNPs, you should probably report the more standard logistic regression
test value also. In practice, both approaches usually give similar
results, which justifies the use of fastepistasis as a
screening tool for a computationallydemanding problem. Of course,
given a specific (and often extreme) threshold, epi1, the
exact abovethreshold list of SNPs will not always be the same; if you
choose to use this approach, it is probably wise to apply it to select
a subset of pairs of SNPs below a reasonably liberal epi1
threshold to be tested with the more standard epistasis
command.

