PLINK: Whole genome data analysis toolset
[an error occurred while processing this directive]
Epistasis
For the disease-trait population-based samples, it is possible to test
for epistasis. The epistasis test can either be case-only or
case-control. Either all pairwise combinations of SNPs can be tested
(although this is most likely not desirable, it is computationally
feasible using PLINK -- the 4.5 billion two-locus tests generated from
a 100K data set took just over 24 hours to run) or sets can be
specified (e.g. to test only the most significant 100 SNPs against all
other SNPs, or against themselves, etc). The output consists only
pairwise epistatic results above a certain significance value; also,
for each SNP, a summary of all the pairwise epistatic tests is given
(e.g. maximum test, proportion of tests significant at a certain
threshold, etc). A similar methodology allows for testing of
gene-environment interaction (for dichotomous environmental
variables).
SNP x SNP epistasis
To test SNP x SNP epistasis, the command
plink --file mydaya --epistasis
will send output to the files
plink.epi-cc1
plink.epi-cc2
where cc = case-control.
There are different modes for specifying which SNPs are tested:
plink --file mydata --epistasis --set epi.set
For the 'symmetrical' cases (ALLxALL and SET1xSET1) then only unique pairs
are analysed.
For the other two cases (SET1xALL, SET1xSET2) then all pairs are
analysed (e.g. will perform SNPA x SNPB as well as SNPB x SNPA, if A
and B are in both SET1 and SET2). It will not try to analysis SNPA x
SNPA however.
The output can be controlled via
plink --file mydata --epistasis --epi1 0.0001
which means only record results that are significant p<=0.0001. (This
prevents too much output from being generated). The output is in the form
Col 1 : SNP 1
Col 2 : SNP 2
Col 3 : Interaction odds ratio
Col 4 : z-score
Col 5 : p-value
The z-score is a test for difference in SNP1-SNP2 assocation (odds
ratio) between cases and controls (or in cases only).
A second part of the output: for each SNP in SET1, or in ALL if no
sets were specified, is information about the number of significant
epistatic tests that SNP featured in (i.e. either with ALL other SNPs,
with SET1, or with SET2). The threshold --epi2 determines this:
plink --file mydata --epistasis --epi1 0.0001 --epi2 0.05
The output is
Col 1 : Chromosome
Col 2 : SNP
Col 3 : # significant epistatic tests (p <= "--epi2" threshold)
Col 4 : # of valid tests (i.e. non-zero allele counts, etc)
Col 5 : proportion significant of valid tests
This will give a rough idea about the extent of epistasis and which
SNPs seem to be interacting (although, of course, this is a naive
statistic as we do not take LD into account -- i.e. Col 3 does not
represent the number of *independent* epistatic results).
Case-only epistasis
For case-only epistatic analysis,
plink --file mydata --epistasis --case-only
sends output to (co = case-only)
plink.epi-co1
plink.epi-co2
All other options are as described above.
Take note! Currently, in case-only analysis, all pairs of
SNPs will be tested (i.e. regardless of whether they are near each
other on the chromosome -- we probably want an option to just
automatically skip SNPs that are too close, in terms of physical
distance).
Gene-based tests
A gene-based test is available: this is performed using the
statistical package R. The following command will automatically
generate the script and data file required for the R analysis; if
possible, it will also directly call R and start the analysis
also. The --genepi command initiates this analysis. It is
always necessary to specify a set-file (--set
filename) which contains at least two sets
(i.e. specifying the SNPs in two or more genes; the analysis is
pairwise between pairs of genes, not pairs of SNPs).
plink --file mydata --genepi --R --set gene.set
sends output to
plink.genepi-
plink.epi-co2
[an error occurred while processing this directive]
This document last modified [an error occurred while processing this directive]