PLINK: Whole genome data analysis toolset
[an error occurred while processing this directive]
IBS/IBD estimation
As well as the standard summary statistics described above,
PLINK offers some more novel ones such as estimated
inbreeding coefficients for each individual and genome-wide
identity-by-state and identity-by-descent estimates for all pairs of
individuals. The later can be used to detect sample contaminations,
swaps and duplications as well as pedigree errors and unknown familial
relationships (e.g. sibling pairs in a case/control population-based
sample).
All these analyses require a large number of SNPs!Pairwise IBD estimation
In a homogeneous sample, it is possible to calculate genome-wide IBD
given IBS information, as long as a large number of SNPs are available
(probably 1000 independent SNPs at a bare minimum; ideally 100K or
more).
plink --file mydata --genome
which create the file
plink.genome
Current output has columns:
ID (family, individual) for individual 1 (2 cols)
ID (family, individual) for individual 2 (2 cols)
P(IBD=0)
P(IBD=1)
P(IBD=2)
pi-hat = proportion of alleles shared IBD = P(IBD1)/2 + P(IBD2)
Number of IBS 0 loci
Number of IBS 1 loci
Number of IBS 2 loci
Therefore, on a *nix system,
sort -nr --key=9 plink.genome | head -n 50
is a good way of getting the top 50 most relateds.
Hint Calculating the average pi-hat for each individual
and looking for outliers is also useful (in particular, sample
contamination will lead to too many hets, which leads to fewer IBS 0
calls, which leads to over-estimated IBD with all other people in the
sample). [this example to be described in more detail].
Inbreeding coefficients
Given a large number of SNPs, in a homogeneous sample, it is possible
to calculate inbreeding coefficients (i.e. based on the observed
versus expected number of homozygous genotypes).
plink --file mydata --inbreeding
Output is in
plink.het
and contains the fields
Family ID
Individual ID
# observed homozygous genotypes
# expected homozygous genotypes
# non-missing genotypes
F inbreeding coefficient
Warning Only perform this analysis for autosomal markers:
this can be achieved by having a MAP file with the X/Y markers with
negative values in the 4th column to exclude them.
TODO Set to automatically exclude these sex-chromosome markers.
Runs of homozygosity
A simple screen for runs of homozygous genotypes within any one individual is provided by the command:
plink --file mydata --homo-run-snps 500
which will output, to the file
plink.ihet
all regions with 500 or more contiguous SNPs that do not have a heterozygote.
Longer than expected stretches might be consistent with a deletion
(i.e. individual is actually hemizygous for this region) or with
autozygosity due to recent inbreeding.
TODO Describe format of output for this analysis.
[an error occurred while processing this directive]
This document last modified [an error occurred while processing this directive]