Main PLINK page
Data Management
QC/Summary statistics
Association analysis
|
|
A basic QC pipeline
We assume whole genome SNP data, for a case/control, population based study. The basic QC
- Exclude known bad SNPs w/ multiple hits, monomorphic SNPs and SNPs with < 90% genotyping rate
- Sex check
- Per-individual heterozygosity
- Individuals with call rate < 95%
- Close relatives, duplications
- Population stratification outliers
- Create homogeneous group
- Genome-wide IBD calculations
- Contamination (too much low-level relatedness)
- Exclude SNPs with genotyping rate < 95%
- Exclude SNPs with HWD failure p < 1e-6
- Exclude SNPs with MAF < 0.01
- Exclude SNPs with CHI-MISSING p < 1e-3
- Exclude SNPs with MISHAP p < 1e-10
- Exclude SNPs with PLATE-ASSOCIATION p < 1e-10 for at least one plate
- Perform final cluster solution(s)
Notes: Unix-based; refer to main PLINK website for more detailed explanation of commands
Bad SNPs
Let's assume you have a list of bad SNPs, in file remove.snp.list and wish to remove them,
along with monomorphic SNPs. Also, this would be the stage to exclude any known failing SNPs (genotyping
less than 90%, for example) or invalid samples (individuals who need to be excluded from the study for
other reasons). This latter list of individuals is in remove.individuals.list
plink --bfile data0
--remove remove.individuals.list
--exclude remove.snp.list
--maf 1e-50
--geno 0.1
--make-bed
--out data1
Incorrect sex assignment?
To check that all individuals are of the sex they are reported to be, we run a
check based on X chromosome SNP heterozygosity:
plink --bfile data1
--check-sex
--out data1-sexcheck
HintIf the data are generated on two chips per sample, it is worth running this
step separately for each chip and ensuring concordance between SNP-based sex calls.
|
|
This document last modified Wednesday, 25-Jan-2017 11:39:26 EST
|