1. Introduction
2. Basic information
3. Download and general notes
4. Command reference table
5. Basic usage/data formats
6. Data management
7. Summary stats
8. Inclusion thresholds
9. Population stratification
10. IBS/IBD estimation
11. Association
12. Family-based association
13. Permutation procedures
14. LD calculations
15. Multimarker tests
16. Conditional haplotype tests
17. Proxy association
18. Imputation (beta)
19. Dosage data
20. Meta-analysis
21. Annotation
22. LD-based results clumping
23. Gene-based report
24. Epistasis
25. Rare CNVs
26. Common CNPs
27. R-plugins
28. Annotation web-lookup
29. Simulation tools
30. Profile scoring
31. ID helper
32. Resources
33. Flow-chart
34. Miscellaneous
35. FAQ & Hints
36. gPLINK
|
|
Epistasis
For disease-trait population-based samples, it is possible to test for
epistasis. The epistasis test can either be case-only or
case-control. All pairwise combinations of SNPs can be tested:
although this may or may not be desirable in statistical terms, it is
computationally feasible for moderate datasets using PLINK, e.g. the
4.5 billion two-locus tests generated from a 100K data set took just
over 24 hours to run, for approximately 500 individuals (with
the --fast-epistasis command).
Alternatively, sets can be specified (e.g. to test only the most
significant 100 SNPs against all other SNPs, or against themselves,
etc). The output consists only pairwise epistatic results above a
certain significance value; also, for each SNP, a summary of all the
pairwise epistatic tests is given (e.g. maximum test, proportion of
tests significant at a certain threshold, etc).
To test for gene-by-environment interaction, see either
the section on stratified analyses for
disease traits, or the section on QTL
GxE for quantitative traits.
IMPORTANT! These tests for epistasis are currently
only applicable for population-based samples, not family-based.
SNP x SNP epistasis
To test SNP x SNP epistasis for case/control population-based samplse, use the command
plink --file mydaya --epistasis
which will send output to the files
plink.epi.cc
plink.epi.cc.summary
where cc = case-control; for quantitative traits, cc will be replaced by qt.
The default test uses either linear or logistic regression, depending on whether the phenoype is
a quantitative or binary trait. PLINK makes a model based on allele dosage for each SNP, A
and B, and fits the model in the form of
Y ~ b0 + b1.A + b2.B + b3.AB + e
The test for interaction is based on the coefficient b3.
This test therefore only considers allelic by allelic epistasis.
Currently, covariates can not be included when using this
command. Similarly, permutation, and use of modifier commands such
as --genotypic, --within or --sex, etc, are
not currently available.
Important The --epistasis command is set up
for testing a potentially very large number of SNP by SNP comparisons,
most of which would not be significant or of interest. Because the output
may contains millions or billions of line, the default is to only output
tests with p-values less than 1e-4, as specified by the --epi1
option (see below). If your dataset is much smaller and you definitely
want to see all the output, add --epi1 1 . If you do not, odds
are you'll see a blank output file except for the header (i.e. immediately
telling you that none of the tests were significant at 1e-4).
Specifying which SNPs to test
There are different modes for specifying which SNPs are tested:
ALL x ALL
plink --file mydata --epistasis
SET1 x SET1 { where epi.set contains only 1 set }
plink --file mydata --epistasis --set-test --set epi.set
SET1 x ALL { where epi.set contains only 1 set }
plink --file mydata --epistasis --set-test --set epi.set --set-by-all
SET1 x SET2 { where epi.set contains 2 sets }
plink --file mydata --epistasis --set-test --set epi.set
For the 'symmetrical' cases (ALLxALL and SET1xSET1) then only unique pairs
are analysed.
For the other two cases (SET1xALL, SET1xSET2) then all pairs are
analysed (e.g. will perform SNPA x SNPB as well as SNPB x SNPA, if A
and B are in both SET1 and SET2). It will not try to analysis SNPA x
SNPA however.
The output
The output can be controlled via
plink --file mydata --epistasis --epi1 0.0001
which means only record results that are significant p<=0.0001. (This
prevents too much output from being generated). The output is in the form
CHR1 Chromosome of first SNP
SNP1 Identifier for first SNP
CHR2 Chromosome of second SNP
SNP2 Identifier for second SNP
OR_INT Odds ratio for interaction
STAT Chi-square statistic, 1df
P Asymptotic p-value
The odds ratio for interaction is interpreted in the standard manner:
a value of 1.0 indicates no effect. To better visualise the manner of
an interaction, use the --twolocus command to produce a
report. For example:
plink --bfile mydata --twolocus rs9442385 rs4486391
generates the file
plink.twolocus
which contains counts and frequencies of the two locus genotypes,
e.g. (there is no interaction evident in this case):
All individuals
===============
rs4486391
1/1 1/4 4/4 0/0 */*
rs9442385 4/4 4 5 7 1 17
4/3 7 15 14 0 36
3/3 6 20 10 0 36
0/0 0 1 0 0 1
*/* 17 41 31 1 90
rs4486391
1/1 1/4 4/4 0/0 */*
rs9442385 4/4 0.044 0.056 0.078 0.011 0.189
4/3 0.078 0.167 0.156 0.000 0.400
3/3 0.067 0.222 0.111 0.000 0.400
0/0 0.000 0.011 0.000 0.000 0.011
*/* 0.189 0.456 0.344 0.011 1.000
For case/control data, two similar sets of tables are included which
stratify the two-locus genotype counts by cases and controls
A second part of the output: for each SNP in SET1, or in ALL if no
sets were specified, is information about the number of significant
epistatic tests that SNP featured in (i.e. either with ALL other SNPs,
with SET1, or with SET2). The threshold --epi2 determines this:
plink --file mydata --epistasis --epi1 0.0001 --epi2 0.05
The output in the plink.epi.cc.summary file containts the following fields:
CHR Chromosome
SNP SNP identifier
N_SIG # significant epistatic tests (p <= "--epi2" threshold)
N_TOT # of valid tests (i.e. non-zero allele counts, etc)
PROP Proportion significant of valid tests
BEST_CHISQ Highest statistic for this SNP
BEST_CHR Chromosome of best SNP
BEST_SNP SNP identifier of best SNP
This file should be interpreted as giving only a very rough idea about
the extent of epistasis and which SNPs seem to be interacting
(although, of course, this is a naive statistic as we do not take LD
into account -- i.e. PROP does not represent the number
of independent epistatic results).
A faster epistasis option
For disease traits only, an approximate but faster method can be used
to screen for epistasis: use the --fast-epistasis command
instead of --epistasis. This test is based on a Z-score for
difference in SNP1-SNP2 assocation (odds ratio) between cases and
controls (or in cases only, in a case-only analysis). For more
details, see this page.
Case-only epistasis
For case-only epistatic analysis,
plink --file mydata --fast-epistasis --case-only
sends output to (co = case-only)
plink.epi.co
plink.epi.co.summary
All other options are as described above.
Currently, in case-only analysis, only SNPs that are more than 1 Mb
apart, or on different chromosomes, are included in case-only
tests. This behavior can be changed with the --gap option,
with the distance specified kb: for example, to specify a gap of 5 Mb,
plink --file mydata --fast-epistasis --case-only --gap 5000
This option is important, as the case-only test for epistasis assumes
that the two SNPs are in linkage equilibrium in the general
population.
Gene-based tests of epistasis
WARNING This test is still under heavy
development and not ready for use.
|
|