PLINK: Whole genome data analysis toolset
[an error occurred while processing this directive]
Summary statistics
PLINK will generate a number of standard summary statistics
that are useful for quality control (e.g. missing genotype rate, minor
allele frequency, Hardy-Weinberg equilibrium failures and
non-Mendelian transmission rates). Thse can also be used as thresholds
for subsequent analyses (described in next section).
Missing genotypes
To generate a list genotyping/missing rate statistics:
plink --file data --missing
will create two files:
plink.imiss
plink.lmiss
which detail missingness by individual and by SNP. For individuals, the
format is:
Family ID (FID)
Individual ID (IID)
Missing phenotype? (Y/N)
Number of missing SNPs (N_MISS)
Proportion of missing SNPs (%_MISS)
For each SNP, the format is:
SNP ID (SNP)
Chromosome (CHR)
Number of individuals missing this SNP (N_MISS)
Proportion of sample missing for this SNP (%_MISS)
Hardy-Weinberg Equilibrium
To generate a list of genotype counts and Hardy-Weinberg test
statistics for each SNP, with a particular threshold:
plink --file data --hardy 0.01
will create a file:
plink.hwe
This file has the following format: for case/control samples
For quantitative traits, only the whole-sample results will be given.
In addition, the following output will appear in the terminal, detailing
how many SNPs failed the Hardy-Weinberg test, for the sample as a whole,
and (when PLINK has detected a disease phenotype) for cases and
controls separately.
Writing Hardy-Weinberg test results to [ plink.hwe ]
22 markers failed HWE test ( p <= 0.05 )
16 markers failed HWE test in cases
12 markers failed HWE test in controls
WARNING! Currently everybody is included in the
Hardy-Weinberg calculations -- for family data, it would be better
to only consider founders (i.e. independent genotypes). This option will
be added in future.
WARNING! Currently, the H-W test statistic is the standard
contingency table chi-square statistic: it has been shown that exact tests
have more desirable properties, particularly when one allele is rare. This
alternate approach will be adopted in the future.
Allele frequency
To generate a list of minor allele frequencies (MAF) for each SNP:
plink --file data --freq
will create a file:
plink.frq
with five columns:
Chromosome
SNP identifier
Allele 1 code
Allele 2 code
Minor allele frequency
Mendel errors
To generate a list of genotype counts and Hardy-Weinberg test
statistics for each SNP:
plink --file data --mendel
will create files:
plink.mendel
plink.imendel
plink.lmendel
The *.mendel file contains all Mendel errors (i.e. one line per
error); the *.imendel file contains a summary of per-family
error rates; the *.lmendel file contains a summary of per-SNP
error rates.
TODO Output from this option not yet fully implemented.
Pedigree errorsPLINK will spot some basic pedigree errors when performing a
family-based test (--tdt option), otherwise pedigree
structure (Family and individual ID) is completely ignored (i.e. all
individuals are assumed to be unrelated).
For a more comprehensive evaluation of pedigree errors (invalid or
incompletely specified pedigree structures) please use a different
software package such as PEDSTATS or famtypes.
[an error occurred while processing this directive]
This document last modified [an error occurred while processing this directive]