PLINK: Whole genome data analysis toolset WGAS SOP

A Broad/MGH suite of scripts and resources for whole genome association analysis

Main PLINK page

Data Management

QC/Summary statistics

Association analysis

 

A basic QC pipeline

We assume whole genome SNP data, for a case/control, population based study. The basic QC
  1. Exclude known bad SNPs w/ multiple hits, monomorphic SNPs and SNPs with < 90% genotyping rate
  2. Sex check
  3. Per-individual heterozygosity
  4. Individuals with call rate < 95%

  5. Close relatives, duplications
  6. Population stratification outliers
  7. Create homogeneous group
  8. Genome-wide IBD calculations
  9. Contamination (too much low-level relatedness)

  10. Exclude SNPs with genotyping rate < 95%
  11. Exclude SNPs with HWD failure p < 1e-6
  12. Exclude SNPs with MAF < 0.01

  13. Exclude SNPs with CHI-MISSING p < 1e-3
  14. Exclude SNPs with MISHAP p < 1e-10
  15. Exclude SNPs with PLATE-ASSOCIATION p < 1e-10 for at least one plate
  16. Perform final cluster solution(s)
Notes: Unix-based; refer to main PLINK website for more detailed explanation of commands Bad SNPs

Let's assume you have a list of bad SNPs, in file remove.snp.list and wish to remove them, along with monomorphic SNPs. Also, this would be the stage to exclude any known failing SNPs (genotyping less than 90%, for example) or invalid samples (individuals who need to be excluded from the study for other reasons). This latter list of individuals is in remove.individuals.list

plink --bfile data0 
      --remove remove.individuals.list 
      --exclude remove.snp.list 
      --maf 1e-50 
      --geno 0.1 
      --make-bed 
      --out data1

Incorrect sex assignment?

To check that all individuals are of the sex they are reported to be, we run a check based on X chromosome SNP heterozygosity:

plink --bfile data1
      --check-sex 
      --out data1-sexcheck

HintIf the data are generated on two chips per sample, it is worth running this step separately for each chip and ensuring concordance between SNP-based sex calls.

 

This document last modified Wednesday, 25-Jan-2017 11:39:26 EST