| 1. Introduction 
2. Basic information 
3. Download and general notes 
4. Command reference table 
5. Basic usage/data formats 
6. Data management 
 
7. Summary stats
8. Inclusion thresholds
9. Population stratification
10. IBS/IBD estimation
11. Association
12. Family-based association
13. Permutation procedures
14. LD calculations
15. Multimarker tests
16. Conditional haplotype tests
17. Proxy association
18. Imputation (beta)
19. Dosage data
20. Meta-analysis
21. Annotation
22. LD-based results clumping
23. Gene-based report
24. Epistasis
25. Rare CNVs
26. Common CNPs
27. R-plugins
28. Annotation web-lookup
29. Simulation tools
30. Profile scoring
31. ID helper
32. Resources
33. Flow-chart
34. Miscellaneous
35. FAQ & Hints
36. gPLINK |  | LD calculationsPLINK includes a set of options to calculate pairwise linkage
disequilibrium between SNPs, and to present or process this
information in various ways. Also see the functions
on haplotype analyisis.Pairwise LD measures for a single pair of SNPsThe command --ld followed by two SNP identifiers prints the
following LD statistics to the LOG file, for a single pair of SNPs:
r-squared, D', the estimated haplotype frequencies and those expected
under linkage equilibrium, and indicates which haplotypes are in phase
(i.e. occuring more often than expected by chance). For example:
plink --bfile mydata --ld rs2840528 rs7545940
gives the following output
     LD information for SNP pair [ rs2840528 rs7545940 ]
        R-sq = 0.592     D' = 0.936
        Haplotype     Frequency    Expectation under LE
        ---------     ---------    --------------------
            GC          0.013            0.199
            AC          0.435            0.245
            GT          0.441            0.250
            AT          0.111            0.307
        In phase alleles are GT/AC
The LD statistics presented here are based on haplotype frequencies
estimated via the EM algorithm. Only founders are used in these
calculations.Pairwise LD measures for multiple SNPs (genome-wide)Correlations based on genotype allele counts (i.e. w/out phasing, and
for founders only) can be obtained with the commands
plink --file mydata --r
or
plink --file mydata --r2
That is, this calculates for each SNP the correlation between two
variables, coded 0, 1 or 2 to represent the number of non-reference
alleles at each. The squared correlation based on genotypic allele
counts is therefore not identical to the r-sq as estimated from
haplotype frequencies (see above), although it will typically be very
similar. Because it is faster to calculate, it provides a good way to
screen for strong LD.  The estimated value for the example in the
section above (rs2840528,rs7545940) is 0.5748 (versus 0.592).
Both commands create a file called
	plink.ld
with a list of R or R-squared values in it. Filtering the outputBy default, several filters on imposed on which pairwise calculations
are calculated and reported. To only analyse SNPs that are not more
than 10 SNPs apart, for example, use the option (default is 10 SNPs)
     --ld-window 10
to specify a kb window in addition (default 1Mb)
     --ld-window-kb 1000
and to report only values above a particular value (this only applies when the --r2 
and not the --r command is used) (default is 0.2)
     --ld-window-r2 0.2
The default for --ld-window-r2 is set at 0.2 to reduce the
size of output files when many comparisons are made: to get all pairs
reported, set --ld-window-r2 to 0. Obtaining LD values for a specific SNP versus all othersTo obtain all LD values for a set of SNPs versus one specific SNP, use the --ld-snp
command in conjunction with --r2. For example, to get a list of all values for 
every SNP within 1Mb of rs12345, use the command
The --ld-window and --ld-window-r2 commands effectively means that output 
will be shown for all other SNPs within 1Mb of rs12345.
Similar to the --ld-snp command, but for multiple seed SNPs:
to obtain all LD values from a group of SNPs with other SNPs, use the
command
    plink --file mydata 
          --r2 
          --ld-snp rs12345 
          --ld-window-kb 1000 
          --ld-window 99999 
          --ld-window-r2 0
     --ld-snp-list mysnps.txt
where mysnps.txt is a list of SNPs.Obtaining a matrix of LD valuesAlternatively, it is possible to add the --matrix option,
which creates a matrix of LD values rather than a list: in this case,
all SNP pairs are calculated and reported, even for SNPs on different 
chromosomes.  
Note To force all SNP-by-SNP cross-chromosome comparisons 
with the standard output format (e.g. without --matrix) add the flag
     --inter-chr
instead.  This can be combined
with --ld-window-r2, for example to list all
inter-chromosomal SNPs pairs with very high R-squared
values.  Warning: this command could take an excessively long
time to run if applied to large datasets with many SNPs.Functions to select tag SNPs for specified SNP setsThe command
 plink --bfile mydata --show-tags mysnps.txt
where mysnps.txt is just a list of SNP IDs, generates a file
     plink.tags
that lists all the SNPs in the dataset that tag the SNPs
in mysnps.txt (including the SNPs in the original file).
A message is also written to the LOG file that indicates how many new
SNPs were added
     Reading SNPs to tag from [ mysnps.txt ]
     Read 10 SNPs to tag, of which 10 are unique and present
     In total, added 2 tag SNPs
     Writing tag list to [ plink.tags ]
meaning that plink.tags will contain 12 SNPs. This command
could be useful, for example, if one wants to generate a list of SNPs
that tag all known coding SNPs, or a list of known disease-associated
SNPs.
If the option
     --list-all
is also added, then an additional file is generated that gives some
more details for each target SNP (i.e. each SNP listed
in mysnps.txt, in the above example) regarding how many and
which tags were set for it. The file is named
     plink.tags.list
and has the following fields
       SNP   Target SNP ID
       CHR   Chromosome code
        BP   Physical position (base-pair)
      NTAG   Number of other SNPs that tag this SNP
      LEFT   Physical position of left-most (5') tagging SNP (bp)
     RIGHT   Physical position of right-most (3') tagging SNP (bp)
    KBSPAN   Kilobase size of region implied by LEFT-RIGHT
      TAGS   List of SNPs that tag target
For example:
            SNP  CHR         BP NTAG       LEFT      RIGHT   KBSPAN TAGS
      rs2542334   22   16694612    2   16693517   16695440    1.923 rs415170|rs2587108
      rs2587108   22   16695440    2   16693517   16695440    1.923 rs415170|rs2542334
       rs873387   22   16713566    0   16713566   16713566        0 NONE
        rs11917   22   16717565    2   16717565   16742194   24.629 rs1057721|rs2075444
      rs1057721   22   16718397    2   16717565   16742194   24.629 rs11917|rs2075444
      rs9605422   22   16737494    0   16737494   16737494        0 NONE
      rs2075444   22   16742194    2   16717565   16742194   24.629 rs11917|rs1057721
      rs4819644   22   16744470    0   16744470   16744470        0 NONE
      rs2083882   22   16769795    0   16769795   16769795        0 NONE
      rs5992907   22   16796453    5   16796453   16830384   33.931 rs400509|rs396012|rs415651|rs384215|rs453557
       rs400509   22   16800853    3   16796453   16813039   16.586 rs5992907|rs396012|rs384215
       rs396012   22   16806587    3   16796453   16813039   16.586 rs5992907|rs400509|rs384215
      rs7293187   22   16807274    0   16807274   16807274        0 NONE
The settings for declaring that a SNP tags another SNP can be varied with the 
commands
     --tag-r2 0.5
to specify a minimum r-squared (based on the genotypic correlation,
see above); in this case it is set to a value of 0.5 as being
necessary to declare that one SNP tags another (the default is 0.8). Also,
     --tag-kb 1000
will constrain the search for tags to be within a megabase (the default
is 250kb).
HINT If you specify the filename for
the --show-tags command to be the keyword all, then
PLINK will only generate the plink.tags.list file, but for
all SNPs in the dataset.  (This means that you cannot have a file
actually called all used as the input for
the --show-tags command of course).
NOTE You can add the --tag-mode2 command to
specify an alternative input and output format. In this case, we
assume the input file contains two columns, with the second field being 
either 0 or 1 to indicate whether or not this is a target SNP:
     rs00001  0
     rs00002  0
     rs00003  1
     rs00004  0
     rs00005  1
     rs00006  0
The output is in a similar form, except that tagging SNPs will now have a 1 in the second field:
     rs00001  0
     rs00002  0
     rs00003  1
     rs00004  1
     rs00005  1
     rs00006  1
i.e. this above example would be equivalent to the original input file
     rs00003  
     rs00005  
and output file
     rs00003  
     rs00004  
     rs00005  
     rs00006  
indicating that SNPs rs00004 and rs00006 have been added as tags.
NOTE This function does not pick the minimal set of
SNPs required to tag all common variation in a region, in the way
tagging algorithms typically work (e.g. such
as Tagger). Rather,
this utility function is designed merely to indicate which other SNPs
tag a one or more of a pre-specified list of SNPs.Haplotyp block estimationThe command
 plink --bfile mydata --blocks
generates two files
     plink.blocks
and
     plink.blocks.det
Haplotype blocks are estimated following the default procedure in Haploview. Note
that only individuals with a non-missing phenotype are included in
this analysis.
By default, pairwise LD is only calculated for SNPs within 200kb. If
needed, this parameter can be changed via the --ld-window-kb
option.
The first file lists each block (2 or more SNPs) on a row, starting
with an asterisk symbol (*), for example:
     * rs7527871 rs2840528 rs7545940
     * rs2296442 rs2246732
     * rs10752728 rs897635
     * rs10489588 rs9661525 rs2993510
This format can be used with the --hap command, for example
to test each haplotype in each block for assocaition, or to estimate
the haplotype frequencies: for example,
 plink --bfile mydata --hap plink.blocks --hap-freq
The second file, plink.blocks.det is similar to the first, but 
contains some addition information:
     CHR      Chromosome identifier
     BP1      The start position (base-pair units) of this block
     BP2      The end position (base-pair units) of this block
     KB       The kilobase distanced spanned by this block
     NSNPS    The number of SNPs in this block
     SNPS     List of SNPs in this block
for example
     CHR          BP1          BP2           KB  NSNPS SNPS
       1      2313888      2331789       17.902      3 rs7527871|rs2840528|rs7545940
       1      2462779      2482556       19.778      2 rs2296442|rs2246732
       1      2867411      2869431        2.021      2 rs10752728|rs897635
       1      2974991      2979823        4.833      3 rs10489588|rs9661525|rs2993510
       ....
 |  |