PLINK: Whole genome data analysis toolset [an error occurred while processing this directive]
Inclusion thresholds
This secion describes options that can be used to filter out individuals or SNPs on the basis of the summary statistic measures described in the previous summary statistics page.

Allele frequency

Subsequent analyses can be set to automatically exclude SNPs on the basis of MAF (minor allele frequency):

plink --file mydata --maf 0.05

means only include SNPs with MAF >= 0.05. The default value is 0.02.


Missing rate per SNP

Subsequent analyses can be set to automatically exclude SNPs on the basis of MAF (minor allele frequency):

plink --file mydata --geno 1

means only include all SNPs irrespective of their missing rate. The default value is 0.1 (i.e. exclude SNPs with more than 10% missing genotyping).


Missing rate per person

Subsequent analyses can be set to automatically exclude SNPs on the basis of MAF (minor allele frequency):

plink --file mydata --mind 0.1

means exclude with more than 10% missing genotypes (this is the defalt value). A line in the terminal output will appear, indicating how many individuals were removed due to low genotyping. If any individuals were removed, a file called plink.irem will be created, listing the Family and Individual IDs of these removed individuals. Any subsequent analyses will be performed without these individuals therefore.

One might instead wish to create a new PED file with these individuals removed: this is done in two steps:

plink --file data --mind 0.2 --out data

plink --file data --remove data.irem --out cleaned

will generate files
     cleaned-extract.ped
     cleaned-extract.map
with all individuals with more than 20% missing genotypes removed.


TODO Hardy-Weinberg disequilibrium


TODO Mendel error rate

[an error occurred while processing this directive]
This document last modified [an error occurred while processing this directive]