PLINK includes a set of options to calculate pairwise linkage disequilibrium between SNPs, and to present or process this information in various ways. Also see the functions on haplotype analyisis.

Pairwise LD measures for a single pair of SNPs

The command --ld followed by two SNP identifiers prints the following LD statistics to the LOG file, for a single pair of SNPs: r-squared, D', the estimated haplotype frequencies and those expected under linkage equilibrium, and indicates which haplotypes are in phase (i.e. occuring more often than expected by chance). For example:

plink --bfile mydata --ld rs2840528 rs7545940

gives the following output

     LD information for SNP pair [ rs2840528 rs7545940 ]

        R-sq = 0.592     D' = 0.936

        Haplotype     Frequency    Expectation under LE
        ---------     ---------    --------------------
            GC          0.013            0.199
            AC          0.435            0.245
            GT          0.441            0.250
            AT          0.111            0.307

        In phase alleles are GT/AC

The LD statistics presented here are based on haplotype frequencies estimated via the EM algorithm. Only founders are used in these calculations.

Pairwise LD measures for multiple SNPs (genome-wide)

Correlations based on genotype allele counts (i.e. w/out phasing, and for founders only) can be obtained with the commands

plink --file mydata --r

plink --file mydata --r2

That is, this calculates for each SNP the correlation between two variables, coded 0, 1 or 2 to represent the number of non-reference alleles at each. The squared correlation based on genotypic allele counts is therefore not identical to the r-sq as estimated from haplotype frequencies (see above), although it will typically be very similar. Because it is faster to calculate, it provides a good way to screen for strong LD. The estimated value for the example in the section above (rs2840528,rs7545940) is 0.5748 (versus 0.592).

Both commands create a file called

	plink.ld

with a list of R or R-squared values in it.

Filtering the output

By default, several filters on imposed on which pairwise calculations are calculated and reported. To only analyse SNPs that are not more than 10 SNPs apart, for example, use the option (default is 10 SNPs)

     --ld-window 10

to specify a kb window in addition (default 1Mb)

     --ld-window-kb 1000

and to report only values above a particular value (this only applies when the --r2 and not the --r command is used) (default is 0.2)

     --ld-window-r2 0.2

The default for --ld-window-r2 is set at 0.2 to reduce the size of output files when many comparisons are made: to get all pairs reported, set --ld-window-r2 to 0.

Obtaining LD values for a specific SNP versus all others

To obtain all LD values for a set of SNPs versus one specific SNP, use the --ld-snp command in conjunction with --r2. For example, to get a list of all values for every SNP within 1Mb of rs12345, use the command

    plink --file mydata 
          --r2 
          --ld-snp rs12345 
          --ld-window-kb 1000 
          --ld-window 99999 
          --ld-window-r2 0

The --ld-window and --ld-window-r2 commands effectively means that output will be shown for all other SNPs within 1Mb of rs12345.

Similar to the --ld-snp command, but for multiple seed SNPs: to obtain all LD values from a group of SNPs with other SNPs, use the command

     --ld-snp-list mysnps.txt

where mysnps.txt is a list of SNPs.

Obtaining a matrix of LD values

Alternatively, it is possible to add the --matrix option, which creates a matrix of LD values rather than a list: in this case, all SNP pairs are calculated and reported, even for SNPs on different chromosomes.

Note To force all SNP-by-SNP cross-chromosome comparisons with the standard output format (e.g. without --matrix) add the flag

     --inter-chr

instead. This can be combined with --ld-window-r2, for example to list all inter-chromosomal SNPs pairs with very high R-squared values. Warning: this command could take an excessively long time to run if applied to large datasets with many SNPs.

Functions to select tag SNPs for specified SNP sets

The command

plink --bfile mydata --show-tags mysnps.txt

where mysnps.txt is just a list of SNP IDs, generates a file

     plink.tags

that lists all the SNPs in the dataset that tag the SNPs in mysnps.txt (including the SNPs in the original file). A message is also written to the LOG file that indicates how many new SNPs were added

     Reading SNPs to tag from [ mysnps.txt ]
     Read 10 SNPs to tag, of which 10 are unique and present
     In total, added 2 tag SNPs
     Writing tag list to [ plink.tags ]

meaning that plink.tags will contain 12 SNPs. This command could be useful, for example, if one wants to generate a list of SNPs that tag all known coding SNPs, or a list of known disease-associated SNPs.

If the option

     --list-all

is also added, then an additional file is generated that gives some more details for each target SNP (i.e. each SNP listed in mysnps.txt, in the above example) regarding how many and which tags were set for it. The file is named

     plink.tags.list

and has the following fields

       SNP   Target SNP ID
       CHR   Chromosome code
        BP   Physical position (base-pair)
      NTAG   Number of other SNPs that tag this SNP
      LEFT   Physical position of left-most (5') tagging SNP (bp)
     RIGHT   Physical position of right-most (3') tagging SNP (bp)
    KBSPAN   Kilobase size of region implied by LEFT-RIGHT
      TAGS   List of SNPs that tag target

For example:


            SNP  CHR         BP NTAG       LEFT      RIGHT   KBSPAN TAGS
      rs2542334   22   16694612    2   16693517   16695440    1.923 rs415170|rs2587108
      rs2587108   22   16695440    2   16693517   16695440    1.923 rs415170|rs2542334
       rs873387   22   16713566    0   16713566   16713566        0 NONE
        rs11917   22   16717565    2   16717565   16742194   24.629 rs1057721|rs2075444
      rs1057721   22   16718397    2   16717565   16742194   24.629 rs11917|rs2075444
      rs9605422   22   16737494    0   16737494   16737494        0 NONE
      rs2075444   22   16742194    2   16717565   16742194   24.629 rs11917|rs1057721
      rs4819644   22   16744470    0   16744470   16744470        0 NONE
      rs2083882   22   16769795    0   16769795   16769795        0 NONE
      rs5992907   22   16796453    5   16796453   16830384   33.931 rs400509|rs396012|rs415651|rs384215|rs453557
       rs400509   22   16800853    3   16796453   16813039   16.586 rs5992907|rs396012|rs384215
       rs396012   22   16806587    3   16796453   16813039   16.586 rs5992907|rs400509|rs384215
      rs7293187   22   16807274    0   16807274   16807274        0 NONE

The settings for declaring that a SNP tags another SNP can be varied with the commands

     --tag-r2 0.5

to specify a minimum r-squared (based on the genotypic correlation, see above); in this case it is set to a value of 0.5 as being necessary to declare that one SNP tags another (the default is 0.8). Also,

     --tag-kb 1000

will constrain the search for tags to be within a megabase (the default is 250kb).

HINT If you specify the filename for the --show-tags command to be the keyword all, then PLINK will only generate the plink.tags.list file, but for all SNPs in the dataset. (This means that you cannot have a file actually called all used as the input for the --show-tags command of course).

NOTE You can add the --tag-mode2 command to specify an alternative input and output format. In this case, we assume the input file contains two columns, with the second field being either 0 or 1 to indicate whether or not this is a target SNP:

The output is in a similar form, except that tagging SNPs will now have a 1 in the second field:

i.e. this above example would be equivalent to the original input file

     rs00003  
     rs00005

and output file

indicating that SNPs rs00004 and rs00006 have been added as tags.

NOTE This function does not pick the minimal set of SNPs required to tag all common variation in a region, in the way tagging algorithms typically work (e.g. such as Tagger). Rather, this utility function is designed merely to indicate which other SNPs tag a one or more of a pre-specified list of SNPs.

Haplotyp block estimation

The command

plink --bfile mydata --blocks

generates two files

     plink.blocks

and

     plink.blocks.det

Haplotype blocks are estimated following the default procedure in Haploview. Note that only individuals with a non-missing phenotype are included in this analysis.

By default, pairwise LD is only calculated for SNPs within 200kb. If needed, this parameter can be changed via the --ld-window-kb option.

The first file lists each block (2 or more SNPs) on a row, starting with an asterisk symbol (*), for example:

     * rs7527871 rs2840528 rs7545940
     * rs2296442 rs2246732
     * rs10752728 rs897635
     * rs10489588 rs9661525 rs2993510

This format can be used with the --hap command, for example to test each haplotype in each block for assocaition, or to estimate the haplotype frequencies: for example,

plink --bfile mydata --hap plink.blocks --hap-freq

The second file, plink.blocks.det is similar to the first, but contains some addition information:

     CHR      Chromosome identifier
     BP1      The start position (base-pair units) of this block
     BP2      The end position (base-pair units) of this block
     KB       The kilobase distanced spanned by this block
     NSNPS    The number of SNPs in this block
     SNPS     List of SNPs in this block

for example

     CHR          BP1          BP2           KB  NSNPS SNPS
       1      2313888      2331789       17.902      3 rs7527871|rs2840528|rs7545940
       1      2462779      2482556       19.778      2 rs2296442|rs2246732
       1      2867411      2869431        2.021      2 rs10752728|rs897635
       1      2974991      2979823        4.833      3 rs10489588|rs9661525|rs2993510
       ....

This document last modified Wednesday, 25-Jan-2017 11:39:27 EST

Whole genome association analysis toolset

LD calculations