PLINK: Whole genome data analysis toolset

PLINK: Whole genome data analysis toolset [an error occurred while processing this directive]

IBS/IBD estimation

As well as the standard summary statistics described above, PLINK offers some more novel ones such as estimated inbreeding coefficients for each individual and genome-wide identity-by-state and identity-by-descent estimates for all pairs of individuals. The later can be used to detect sample contaminations, swaps and duplications as well as pedigree errors and unknown familial relationships (e.g. sibling pairs in a case/control population-based sample).

All these analyses require a large number of SNPs!

Pairwise IBD estimation

In a homogeneous sample, it is possible to calculate genome-wide IBD given IBS information, as long as a large number of SNPs are available (probably 1000 independent SNPs at a bare minimum; ideally 100K or more).

plink --file mydata --genome

which create the file

     plink.genome

Current output has columns:

     ID (family, individual) for individual 1 (2 cols)
     ID (family, individual) for individual 2 (2 cols)
     P(IBD=0)
     P(IBD=1)
     P(IBD=2)
     pi-hat = proportion of alleles shared IBD = P(IBD1)/2 + P(IBD2) 
     Number of IBS 0 loci
     Number of IBS 1 loci 
     Number of IBS 2 loci

Therefore, on a *nix system,

sort -nr --key=9 plink.genome | head -n 50

is a good way of getting the top 50 most relateds.

Hint Calculating the average pi-hat for each individual and looking for outliers is also useful (in particular, sample contamination will lead to too many hets, which leads to fewer IBS 0 calls, which leads to over-estimated IBD with all other people in the sample). [this example to be described in more detail].

Inbreeding coefficients

Given a large number of SNPs, in a homogeneous sample, it is possible to calculate inbreeding coefficients (i.e. based on the observed versus expected number of homozygous genotypes).

plink --file mydata --inbreeding

Output is in

     plink.het

and contains the fields

     Family ID
     Individual ID
     # observed homozygous genotypes
     # expected homozygous genotypes
     # non-missing genotypes
     F inbreeding coefficient

Warning Only perform this analysis for autosomal markers: this can be achieved by having a MAP file with the X/Y markers with negative values in the 4th column to exclude them.

TODO Set to automatically exclude these sex-chromosome markers.

Runs of homozygosity

A simple screen for runs of homozygous genotypes within any one individual is provided by the command:

plink --file mydata --homo-run-snps 500

which will output, to the file

     plink.ihet

all regions with 500 or more contiguous SNPs that do not have a heterozygote.

Longer than expected stretches might be consistent with a deletion (i.e. individual is actually hemizygous for this region) or with autozygosity due to recent inbreeding.

TODO Describe format of output for this analysis. [an error occurred while processing this directive] This document last modified [an error occurred while processing this directive]