PLINK: Whole genome data analysis toolset
[an error occurred while processing this directive]
Inclusion thresholds
This secion describes options that can be used to filter out
individuals or SNPs on the basis of the summary statistic measures
described in the previous summary
statistics page.
Allele frequency
Subsequent analyses can be set to automatically exclude SNPs on the
basis of MAF (minor allele frequency):
plink --file mydata --maf 0.05
means only include SNPs with MAF >= 0.05. The default value is 0.02.
Missing rate per SNP
Subsequent analyses can be set to automatically exclude SNPs on the
basis of MAF (minor allele frequency):
plink --file mydata --geno 1
means only include all SNPs irrespective of their missing rate. The
default value is 0.1 (i.e. exclude SNPs with more than 10% missing
genotyping).
Missing rate per person
Subsequent analyses can be set to automatically exclude SNPs on the
basis of MAF (minor allele frequency):
plink --file mydata --mind 0.1
means exclude with more than 10% missing genotypes (this is the defalt
value). A line in the terminal output will appear, indicating how many
individuals were removed due to low genotyping. If any individuals were
removed, a file called plink.irem will be created, listing the
Family and Individual IDs of these removed individuals. Any subsequent
analyses will be performed without these individuals therefore.
One might instead wish to create a new PED file with these individuals
removed: this is done in two steps:
plink --file data --mind 0.2 --out data
plink --file data --remove data.irem --out cleaned
will generate files
cleaned-extract.ped
cleaned-extract.map
with all individuals with more than 20% missing genotypes removed.
TODOHardy-Weinberg disequilibriumTODOMendel error rate
[an error occurred while processing this directive]
This document last modified [an error occurred while processing this directive]