PLINK: Whole genome data analysis toolset plink...
Last original PLINK release is v1.07 (10-Oct-2009); PLINK 1.9 is now available for beta-testing

Whole genome association analysis toolset

Introduction | Basics | Download | Reference | Formats | Data management | Summary stats | Filters | Stratification | IBS/IBD | Association | Family-based | Permutation | LD calcualtions | Haplotypes | Conditional tests | Proxy association | Imputation | Dosage data | Meta-analysis | Result annotation | Clumping | Gene Report | Epistasis | Rare CNVs | Common CNPs | R-plugins | SNP annotation | Simulation | Profiles | ID helper | Resources | Flow chart | Misc. | FAQ | gPLINK

1. Introduction

2. Basic information

3. Download and general notes

4. Command reference table

5. Basic usage/data formats 6. Data management

7. Summary stats 8. Inclusion thresholds 9. Population stratification 10. IBS/IBD estimation 11. Association 12. Family-based association 13. Permutation procedures 14. LD calculations 15. Multimarker tests 16. Conditional haplotype tests 17. Proxy association 18. Imputation (beta) 19. Dosage data 20. Meta-analysis 21. Annotation 22. LD-based results clumping 23. Gene-based report 24. Epistasis 25. Rare CNVs 26. Common CNPs 27. R-plugins 28. Annotation web-lookup 29. Simulation tools 30. Profile scoring 31. ID helper 32. Resources 33. Flow-chart 34. Miscellaneous 35. FAQ & Hints

36. gPLINK
 

Order of major operations in PLINK

This section contains a rough flow-chart of some of the main operations in PLINK. In particular, it is designed to indicate the order in which certain operations are performed (i.e. whether SNPs are excluded before or after merging files, etc), and also when PLINK halts operation, e.g. after certain commands, meaning that certain combinations are not feasible.

Most of these steps are optional (i.e. will only occur if a specific command has been issued on the command line).

  • Parse command line, for commands and options
  • Check version, unused options, warnings
  • Define chromosome set (human, or --mouse, --rice, etc)

  • Run ID-helper utility (--id-dict and --id-match), then QUIT

  • Run SNP-annotation (--lookup and --lookup-gene), then QUIT

  • Run compression/decompressio utility (--compress and --decompress), then QUIT

  • Read input, either:
    • Dummy dataset(--dummy), or
    • Simulated dataset (--simulate), or
    • Result files for meta-analysis (--meta-analysis), or
    • Result files for gene-based report (--gene-report), or
    • Result and annotation files (--annotate), or
    • Maps for CNVs (--cfile, --cnv-list), or
    • Binary filset (--bfile), or
    • PED fileset (--file), or
    • LGEN fileset (--lfile), or
    • Transposed fileset (--tfile), or
    • Maps for generic variants (--gfile), or
    • Map and dosage files (--dosage)

  • For commands not involving basic SNP or CNV data directly (e.g. --meta-analysis, --annotate, --dosage, --gene-report, etc) then call the corresponding function directly, then QUIT

  • At this stage, the following filters apply directly when loading (Note: some other filters not mentioned below are done later, e.g. --snps, --extract, --remove, --filter-males):
    • --chr
    • --snp, --window
    • --from, --to
    • --from-kb, --to-kb, etc

  • Check for duplicate individual or SNP names
  • Merge one or more filesets (--merge, --bmerge, --merge-list)
  • Swap in alternate phenotype file (--pheno), or make a new phenotype (--make-pheno)
  • Remove individuals with missing phenotypes (--prune)
  • Update SNP information (--update-map)
  • Update FAM information (--update-ids, --update-sex, ...)
  • Update allele information (--update-alleles)
  • Flip strand (--flip)
  • Recode alleles 1234/ACGT (--alleleACGT, --allele1234 )

  • Either, if (--exclude-before-extract), then
    • extract any SNPs (--extract)
    • then exclude any SNPs (--exclude)
  • otherwise
    • exclude any SNPs (--exclude)
    • then extract any SNPs (--extract)
  • Either, if (--keep-before-remove), then
    • keep any individuals (--keep)
    • then remove any individuals (--remove)
  • otherwise
    • remove any individuals (--remove)
    • then keep any individuals (--keep)

  • Filter SNPs based on attributes (--attrib)
  • Filter individuals based on attributes (--attrib-indiv)
  • Filter SNPs based on quality scores (--qual-scores)
  • Filter genotypes based on quality scores (--qual-geno-scores)
  • Random thinning of SNPs (--thin)
  • Read --genome-lists
  • Read list of obligatory missing genotypes (--oblig-missing)
  • Filter based on a variable (--filter)
  • Filter based on sex, phenotype, etc (--filter-males, --filter-cases, ...)
  • Read covariate file (--covar)
  • Read cluster file (--within)
  • Zero-out specific genotypes (--zero-cluster)

  • Process rare CNV data
    • Read CNV list, map to genomic positions
    • Filter on genes, sizes, types, etc (--cnv-intersect, --cnv-del, --cnv-kb, etc)
    • Write back any genes, regions intersected (--cnv-report-regions)
    • Filter CNVs based on frequency (--cnv-freq-exclude-above, etc)
    • Report basic count of CNVs in LOG file
    • Write a new CNV list, map file (--cnv-write, --cnv-make-map)
    • Calculate per-individual CNV summary statistics
    • Calculate per-position CNV summaries
    • Make summary displays(--cnv-track, --cnv-seglist)
    • Find overlapping CNVs as pools (--segment-group)
    • Perform association / genome-wide burden test (--mperm, --cnv-indiv-perm)
    • QUIT
  • Process generic variant data (--gfile)
    • Read GVAR data (might be on top of existing, standard file)
    • Calculate frequency statistics for each allele, CNP state
    • Perform linear/logistic regression of phenotype on CNP states
    • QUIT

  • Main SNP filters
    • Count founders and nonfounders
    • Calculate per-individual genotyping rate, remove individuals below threshold (--missing, --mind)
    • Calculate (or read from file (--read-freq) allele frequencies
    • Determine per SNP missing genotype rate, after removing individuals, exclude below threshold (--geno)
    • Determine minor (reference) allele
    • List of heterozygous hets found, by default set to missing
    • List SNPs with no founder genotypes observed
    • Write allele frequencies to file (--freq)
    • Calculate HWE statistics per SNP (--hardy, --hwe); after --hardy, then QUIT
    • Report genotyping rate per SNP and per individual as calculated above (--missing)
    • Remove SNPs below the MAF filter (--maf)

  • Re-report basic case/control counts to LOG
  • Re-specify reference alleles (--reference-allele )
  • Make family units, if needed; perform Mendel checks (--mendel, --me, --tdt, etc)
  • Reset pat and mat codes of non-founders if parents not present (--make-founders)
  • Perform sex-check (--check-sex)
  • Create pseudo case/control units from trio data (--tucc)

  • Write permuted phenotype file (--make-perm-pheno), QUIT
  • Write table of SNPs/set scoring (--set-table), QUIT
  • Write covariate file (--write-covar), then QUIT
  • Write cluster file (--write-cluster), then QUIT
  • Write snplist file (--write-snplist), then QUIT
  • Write binary fileset file (--make-bed), then QUIT
  • Write other file formats for genotype data (--recode, --recodeA, --list, --two-locus, etc), then QUIT
  • Create and output a SET file given ranges (--make-set), then QUIT

  • LD-based clumping of association results, (--clump), then QUIT
  • Generate lists of SNPs tagging other SNPs (--show-tags), then QUIT
  • Generate haplotype blocks (--blocks), then QUIT

  • Determine if conditioning SNPs used (--condition)
  • Perform IBS, cluster analysis and MDS analysis (--cluster, --mds-plot, --neighbour), then QUIT
  • Test for differences in IBS between groups (--ibs-test), then QUIT
  • Calculate genome-wide IBS and IBD (--genome), then QUIT
  • Calculate F inbreeding statistic (--het)
  • Calculate runs of homozygosity (--homozyg), then QUIT
  • Perform LD-based pruning of SNP (--indep, --indep-pairwise), then QUIT
  • Perform LD-based scan for strand flips (--flipscan), then QUIT
  • Calculate and display pairwise LD (--r2, --ld), then QUIT
  • General haplotype estimation, (association, phase reports, frequencies) --hap)
    • Phasing
    • Report haplotype frequencies
    • Report hapotype phases
    • Perform mis-hap test for non-missing randomness
    • Proxy association and imputation
    • QUIT

  • SNP-by-SNP epistasis tests (--epistasis), then QUIT
  • Score per-individual risk profiles (--score), then QUIT
  • Run R-plugin on dataset (--R), then QUIT

  • For main association tests, loop over all phenotypes, (--all-pheno)
    • Perform assocaition test (--mh, --model, --assoc, --fisher, --linear, --logistic, --homog, --qfam, --tdt, --poo, --dfam, --gxe, etc)
    • Perform haplotype association test (--hap-assoc, --hap-tdt)
    • Perform conditional haplotype test (--chap), then QUIT
    • Perform --test-missing
    • If specified, repeat the above tests with permuted datasets
    • Go to next phenotype

  • Perform PLINK segmental sharing test

  • Definitely QUIT
 

This document last modified Wednesday, 25-Jan-2017 11:39:26 EST