PLINK: Whole genome data analysis toolset plink...
Last original PLINK release is v1.07 (10-Oct-2009); PLINK 1.9 is now available for beta-testing

Whole genome association analysis toolset

Introduction | Basics | Download | Reference | Formats | Data management | Summary stats | Filters | Stratification | IBS/IBD | Association | Family-based | Permutation | LD calcualtions | Haplotypes | Conditional tests | Proxy association | Imputation | Dosage data | Meta-analysis | Result annotation | Clumping | Gene Report | Epistasis | Rare CNVs | Common CNPs | R-plugins | SNP annotation | Simulation | Profiles | ID helper | Resources | Flow chart | Misc. | FAQ | gPLINK

1. Introduction

2. Basic information

3. Download and general notes

4. Command reference table

5. Basic usage/data formats 6. Data management

7. Summary stats 8. Inclusion thresholds 9. Population stratification 10. IBS/IBD estimation 11. Association 12. Family-based association 13. Permutation procedures 14. LD calculations 15. Multimarker tests 16. Conditional haplotype tests 17. Proxy association 18. Imputation (beta) 19. Dosage data 20. Meta-analysis 21. Annotation 22. LD-based results clumping 23. Gene-based report 24. Epistasis 25. Rare CNVs 26. Common CNPs 27. R-plugins 28. Annotation web-lookup 29. Simulation tools 30. Profile scoring 31. ID helper 32. Resources 33. Flow-chart 34. Miscellaneous 35. FAQ & Hints

36. gPLINK
 

Epistasis

For disease-trait population-based samples, it is possible to test for epistasis. The epistasis test can either be case-only or case-control. All pairwise combinations of SNPs can be tested: although this may or may not be desirable in statistical terms, it is computationally feasible for moderate datasets using PLINK, e.g. the 4.5 billion two-locus tests generated from a 100K data set took just over 24 hours to run, for approximately 500 individuals (with the --fast-epistasis command).

Alternatively, sets can be specified (e.g. to test only the most significant 100 SNPs against all other SNPs, or against themselves, etc). The output consists only pairwise epistatic results above a certain significance value; also, for each SNP, a summary of all the pairwise epistatic tests is given (e.g. maximum test, proportion of tests significant at a certain threshold, etc).

To test for gene-by-environment interaction, see either the section on stratified analyses for disease traits, or the section on QTL GxE for quantitative traits.

IMPORTANT! These tests for epistasis are currently only applicable for population-based samples, not family-based.

SNP x SNP epistasis

To test SNP x SNP epistasis for case/control population-based samplse, use the command
plink --file mydaya --epistasis

which will send output to the files
     plink.epi.cc
     plink.epi.cc.summary
where cc = case-control; for quantitative traits, cc will be replaced by qt.

The default test uses either linear or logistic regression, depending on whether the phenoype is a quantitative or binary trait. PLINK makes a model based on allele dosage for each SNP, A and B, and fits the model in the form of
     Y ~ b0 + b1.A + b2.B + b3.AB + e 
The test for interaction is based on the coefficient b3. This test therefore only considers allelic by allelic epistasis. Currently, covariates can not be included when using this command. Similarly, permutation, and use of modifier commands such as --genotypic, --within or --sex, etc, are not currently available.

Important The --epistasis command is set up for testing a potentially very large number of SNP by SNP comparisons, most of which would not be significant or of interest. Because the output may contains millions or billions of line, the default is to only output tests with p-values less than 1e-4, as specified by the --epi1 option (see below). If your dataset is much smaller and you definitely want to see all the output, add --epi1 1 . If you do not, odds are you'll see a blank output file except for the header (i.e. immediately telling you that none of the tests were significant at 1e-4).

Specifying which SNPs to test

There are different modes for specifying which SNPs are tested:
ALL x ALL
plink --file mydata --epistasis

SET1 x SET1  { where epi.set contains only 1 set }
plink --file mydata --epistasis --set-test --set epi.set

SET1 x ALL  { where epi.set contains only 1 set } 
plink --file mydata --epistasis --set-test --set epi.set --set-by-all

SET1 x SET2  { where epi.set contains 2 sets }  
plink --file mydata --epistasis --set-test --set epi.set

For the 'symmetrical' cases (ALLxALL and SET1xSET1) then only unique pairs are analysed.

For the other two cases (SET1xALL, SET1xSET2) then all pairs are analysed (e.g. will perform SNPA x SNPB as well as SNPB x SNPA, if A and B are in both SET1 and SET2). It will not try to analysis SNPA x SNPA however.

The output

The output can be controlled via
plink --file mydata --epistasis --epi1 0.0001

which means only record results that are significant p<=0.0001. (This prevents too much output from being generated). The output is in the form
     CHR1    Chromosome of first SNP
     SNP1    Identifier for first SNP
     CHR2    Chromosome of second SNP
     SNP2    Identifier for second SNP
     OR_INT  Odds ratio for interaction
     STAT    Chi-square statistic, 1df
     P       Asymptotic p-value
The odds ratio for interaction is interpreted in the standard manner: a value of 1.0 indicates no effect. To better visualise the manner of an interaction, use the --twolocus command to produce a report. For example:
plink --bfile mydata --twolocus rs9442385 rs4486391

generates the file
     plink.twolocus
which contains counts and frequencies of the two locus genotypes, e.g. (there is no interaction evident in this case):
All individuals
===============
                    rs4486391
                    1/1    1/4    4/4    0/0    */*
  rs9442385  4/4      4      5      7      1     17
             4/3      7     15     14      0     36
             3/3      6     20     10      0     36
             0/0      0      1      0      0      1
             */*     17     41     31      1     90

                    rs4486391
                    1/1    1/4    4/4    0/0    */*
  rs9442385  4/4  0.044  0.056  0.078  0.011  0.189
             4/3  0.078  0.167  0.156  0.000  0.400
             3/3  0.067  0.222  0.111  0.000  0.400
             0/0  0.000  0.011  0.000  0.000  0.011
             */*  0.189  0.456  0.344  0.011  1.000
For case/control data, two similar sets of tables are included which stratify the two-locus genotype counts by cases and controls

A second part of the output: for each SNP in SET1, or in ALL if no sets were specified, is information about the number of significant epistatic tests that SNP featured in (i.e. either with ALL other SNPs, with SET1, or with SET2). The threshold --epi2 determines this:
plink --file mydata --epistasis --epi1 0.0001 --epi2 0.05

The output in the plink.epi.cc.summary file containts the following fields:
     CHR        Chromosome
     SNP        SNP identifier
     N_SIG      # significant epistatic tests (p <= "--epi2" threshold)
     N_TOT      # of valid tests (i.e. non-zero allele counts, etc)
     PROP       Proportion significant of valid tests
     BEST_CHISQ Highest statistic for this SNP 
     BEST_CHR   Chromosome of best SNP
     BEST_SNP   SNP identifier of best SNP
This file should be interpreted as giving only a very rough idea about the extent of epistasis and which SNPs seem to be interacting (although, of course, this is a naive statistic as we do not take LD into account -- i.e. PROP does not represent the number of independent epistatic results).

A faster epistasis option
For disease traits only, an approximate but faster method can be used to screen for epistasis: use the --fast-epistasis command instead of --epistasis. This test is based on a Z-score for difference in SNP1-SNP2 assocation (odds ratio) between cases and controls (or in cases only, in a case-only analysis). For more details, see this page.

Case-only epistasis

For case-only epistatic analysis,
plink --file mydata --fast-epistasis --case-only

sends output to (co = case-only)
     plink.epi.co
     plink.epi.co.summary
All other options are as described above.

Currently, in case-only analysis, only SNPs that are more than 1 Mb apart, or on different chromosomes, are included in case-only tests. This behavior can be changed with the --gap option, with the distance specified kb: for example, to specify a gap of 5 Mb,
plink --file mydata --fast-epistasis --case-only --gap 5000

This option is important, as the case-only test for epistasis assumes that the two SNPs are in linkage equilibrium in the general population.

Gene-based tests of epistasis

WARNING This test is still under heavy development and not ready for use.
 
This document last modified Wednesday, 25-Jan-2017 11:39:26 EST