PLINK: Whole genome data analysis toolset plink...
Last original PLINK release is v1.07 (10-Oct-2009); PLINK 1.9 is now available for beta-testing

Whole genome association analysis toolset

Introduction | Basics | Download | Reference | Formats | Data management | Summary stats | Filters | Stratification | IBS/IBD | Association | Family-based | Permutation | LD calcualtions | Haplotypes | Conditional tests | Proxy association | Imputation | Dosage data | Meta-analysis | Result annotation | Clumping | Gene Report | Epistasis | Rare CNVs | Common CNPs | R-plugins | SNP annotation | Simulation | Profiles | ID helper | Resources | Flow chart | Misc. | FAQ | gPLINK

1. Introduction

2. Basic information

3. Download and general notes

4. Command reference table

5. Basic usage/data formats 6. Data management

7. Summary stats 8. Inclusion thresholds 9. Population stratification 10. IBS/IBD estimation 11. Association 12. Family-based association 13. Permutation procedures 14. LD calculations 15. Multimarker tests 16. Conditional haplotype tests 17. Proxy association 18. Imputation (beta) 19. Dosage data 20. Meta-analysis 21. Annotation 22. LD-based results clumping 23. Gene-based report 24. Epistasis 25. Rare CNVs 26. Common CNPs 27. R-plugins 28. Annotation web-lookup 29. Simulation tools 30. Profile scoring 31. ID helper 32. Resources 33. Flow-chart 34. Miscellaneous 35. FAQ & Hints

36. gPLINK
 

Changelog

This page contains a version history recording changes and additions to PLINK.

V1.07 (10-Oct-2009)

  • Added --meta-analysis function
  • Added --annotate function
  • Added --dosage, --write-dosage, etc
  • Added --Z-genome and ability to read compressed .genome.gz
  • Added --simulate-qt to simulate quantitative trait data
  • Added --recode-rlist and --recode-lgen/--with-reference
  • Added preliminary ZLIB support for reading and writing compressed files directly
  • Added --standard-beta for --linear
  • Added --counts to modify --assoc output
  • Added dosage-based score analysis
     
  • Changed how options are passed to commands
  • Changed order of --reference-allele and --freq, so that the reference is now fixed prior to calculating and reporting allele frequencies
  • Improved speed of binary file merging
  • MAF-based SNP removal for --indep-pairwise option
     
  • Fixed bug in --blocks routine
  • Fixed bug in --adjust with quantitative trait --assoc
  • Fixed bug in --fisher and --adjust
  • Fixed bug in --set-test with selection of SNPs in LE

V1.06 (24-Apr-2009)

  • Added --show-tags and modifiers, including --list-all
  • Added --blocks command
  • Added --hap-logistic and --hap-linear (with support for covariates and permutation)
  • Added --hap-omnibus that modifies --hap-linear and --hap-logistic
  • Add multiple sizes for sliding window (--hap-window 1,2,3)
  • Fixed issue with D' calculation in --ld command
  • Added --inter-chr option
  • Now enable --ci to work with --chap

  • Added --id-dict, --id-match, --id-replace, --id-alias, --id-table, --id-lookup, etc

  • Added SVD speedup using LAPACK compile-time option

  • Added --reference {file} to complement --lfile
  • Added --allele-count and --compound-genotype support for LGEN files
  • Added --attrib and --attrib-indiv options
  • Added --thin command to prune SNPs randomly

  • Added --hide-covar option for GLM tests

  • Added --make-perm-pheno {N} to create plink.pphe

  • Added --cnv-check-no-overlap
  • Fixed bug that stopped --cnv-method2 from working

  • Can now make sets on-the-fly; added --write-set option
  • Added --make-set-complement {label}
  • Now added a group field to ranges; added --make-set-collapse-group, --make-set-complement-group
  • Fixed issue when mapping ranges to SNPs, would not properly include all end SNPs
  • Fixed it that --range can take >3 cols (--extract file.txt --range )

  • Added --update-pheno (i.e. similar to --pheno, but doesn't set to missing all non-included people
  • Fixed bug in --update-sex and --update-parents
  • Fixed problem with --update-chr

  • Changed basic --assoc OR calculation to avoid int size limits in v. large samples
  • Changed --model --perm to only consider ALLELIC, DOM, REC (not GENO)

  • Add chr XY for --dog
  • Added --mouse support

  • Fixed input to now allow comment lines (starting with #) in MAP files
  • Added --R-port command
  • Skip zero-ing MEs when rel-check/genome mode
  • Add --q-score-range and --q-score-file options to modify --score
  • Added --simulate-haps, and format/error checks for --simulate files

V1.05 (11-Dec-2008)
  • Added support for multiple return values from R-plugins; changed basic protocol
  • Added --beta modifier to --logistic, to return beta coefficients, not odds ratios
  • Fixed minor inconsistencies in --cnv-freq-include-exact: now --cnv-freq-exclude-above 1 equals --cnv-freq-include-exact 1 when used with default overlap thresholds
  • Fixed problem with HWE output
  • Added --cnv-test-region option for CNV mapping
  • Added the --subset command to work with --set
  • Changed --clump-best to preferentially take the same SNP
  • Added option to downweight set-based tests with --lambda
  • Added option --reference-allele to specify manually which is reference allele A1, i.e. instead of minor allele
  • Added support for --rice chromosomes
  • Added --ld-snp-list option
  • Added --make-set-collapse
  • Added --update-alleles, --update-chr, --update-name, --update-ids, --update-sex and --update-parents
  • The --pfilter command now works for --mh
  • Fixed problem with the --hap-tdt command (wasn't working in 1.04 at all)
  • Added support for SNP-specific and genotype-specific quality score filters (--qual-scores and --qual-geno-scores, with corresponding commands --qual-threshold and --qual-max-threshold and --qual-geno-threshold and --qual-geno-max-threshold
  • Made default --score behavior to impute missing genotype scores based on sample frequency, unless --score-no-mean-imputation specified
  • Fixed minor issue with --simulation routine (now we do not assume HWE holds within cases and controls, but only in the population as a whole).
  • Output format of .genome file is changed
  • Modified --genome to display the type of relationship, and expected IBD sharing level, if pair are in same family; also changed output format
  • Added --rel-check modifier to --genome, so that only pairs within the same FID code are considered
  • Modified --read-genome option to accept different format .genome files, by looking for header rows rather than assuming fixed column number/order
  • Added --set-table command to make a SNP by SET matrix
  • Added --tucc option to make pseudo case/control units from trio data

V1.04 (26-Aug-2008)
  • Added -gene-report function (with --gene-list, etc)
  • Added --cnv-subset option
  • Added --cnv-verbose-report-regions option
  • Added --clump-best, --clump-range and --clump-range-border
  • Added new LD-aware set-based test and functions --set-p, --set-r2
  • Added ability to include --covar with --gvar
  • Added --flip-subset option, to flip strand only for some individuals
  • Added --flip-scan procedure, to identify likely strand-flip errors
  • Added --mperm-save and --mperm-save-all options
  • Now reports sample summary after filtering for QTs, as for case/control data
  • Now --pfilter works on .adjusted output files also
  • Changed default behavior to not set pheotype to missing if sex code is missing
  • Changed --recode, etc, output names to plink.ped, etc, rather than plink.recode.ped
  • Added --must-have-sex option to set phenotype to missing when recoding data
  • Added --make-pheno {file} * feature to set as cases people in {file}, else control
  • N_GENO field is now always reported in the missing data output
  • Added fields PHOM and PHET to .hom output (and modified --read-homozyg also)
  • Added function --compound-genotype to allele AG, 11, 00, etc in PED files only
  • Changed default thresholds (to be similar to --all)
  • Added --output-missing-phenotype and --output-missing-genotype options
  • Added --keep-allele-order to stop flipping of allele codes when minor allele frequency is greater than 0.5
  • Added --proxy-b-r2 to specify alternate proxy parameters for rarer alleles
  • Added allele codes and frequencies to --homog output files
  • Added allele code fields to --hardy and --model output
  • Fixed female Y chromosome genotype rate counting
  • Fixed a problem with --proxy-glm and missing haplotypic data
  • Fixed --R function to send counts of minor, not major, allele
  • Fixed issue with missing genotypes and --gvar association statistics
  • Fixed minor bug with --hardy2 and --hwe2
  • Fixed problem when --sex used with --chap test
  • Fixed problem with --qfam routine when there are no valid observations
  • Fixed a bug with the --score function
  • Fixed a bug with --hap-impute when used on the X chromosome
  • Fixed a bug in the CNV frequency filter commands
  • Corrected mislabelling of distances/similarities for IBM clustering
  • Removed Hotelling's T2 test from this version
  • Updated the --lookup routines; changed the backend server considerably
  • Updated Rserve client code for use with Rserve version 0.5.2
  • Added gene-lists to resources section on web
  • Updated HapMap resource to latest release

V1.03 (10-Jun-2008)
  • Added teaching material/tutorial to resources section of web
  • Added --write-cluster, which can handle strings
  • Added --cnv-freq-exclude-exact and --cnv-freq-include-exact
  • Added --cnv-region-overlap
  • Displays type and score display in --segment-group for CNVs
  • Fixed problem with --read-freq
  • Fixed problem with --hethom and X chromosome data
  • Fixed problem when --condition and --genotypic used together
  • Added --genome-minimal and --read-genome-minimal
  • Now possible to --filter on strings and lists of strings
  • Added --make-pheno command to generate a binary phenotype given string filter
  • Allow --keep and --remove files to have additional columns beyond two
  • Additional case/control statistics given in LOG after filtering
  • Fixed a bug in the --hap-tdt and --proxy-tdt analyses
  • Added the --make-set and --make-set-border commands
  • Added --lookup-kb and --lookup-gene-kb
  • Added --lookup-gene-list (to create a SET file)
  • Added additional output information to SNP and gene lookups
  • Added --ld-snp command to modify the behaviour of --r2
  • LD pruning now considers non-autosomal markers
  • Fixed some issues with non-human data and the IBS/IBD calculation (previously skipped chromosomes over 22)

V1.02 (27-Mar-2008)
  • Added beta versions of CNV and generic variant commands, described here
  • Created a PDF version of the web page
  • Added --hethom flag to modify --genotypic
  • Added --seed to specify a fixed random seed
  • Added --recode-allele to modify --recodeA
  • Fixed issue with --clump-index-first option
  • Enabled PED files to be input from standard input (--ped -)
  • Fixed potential error in --chap output when test not defined

V1.01 (28-Jan-2008)
  • Added --dummy-coding modifier for --write-covar
  • Added --upate-map
  • Outputs phenotype names for --all-pheno if given
  • Reworked --mds-plot and --mds-cluster option to work with --within and without re-running the clustering
  • Fixed --qfam issues with permutation test
  • Changed defaults for --proxy-assoc and --proxy-impute
  • Changed direction of allele coding for proxy association options
  • Changed --proxy-r2-filter command (3 parameters) and naming
  • Changed syntax for proxy association options, --proxy-r2, etc
  • Added --proxy-glm method
  • Fixed problems with --hap-impute
  • Fixed problem with --hap-window
  • Issue with hyphens in SNP names and use as range delimiter (--d)
  • Fixed issue with numeric chromosome codes greater than 22 and --file
  • Changed output format of TDT and CMH commands
  • Make monomorphic SNPs have missing alleles in output if forced
  • Fixed minor problem with --bmerge when more than 2 alleles seen per SNP
  • Physical position output correctly with --genotypic option
  • Changed threshold to print NA in logistic
  • Changed headers BETA or OR in GLM output for clarity
  • Now --recodeA and --recodeAD count number of minor alleles
  • Added --sheep option
  • Fixed problem with --homozyg

V1.00 (4-Dec-2007)
  • Added conditional haplotype-based testing (--chap)
  • Added simple data simulation option (--simulate)
  • Added/extended SNP imputation functions (--proxy-assoc and --proxy-impute)
  • Added LD-based results clumping procedure (--clump)
  • Added option to select specific covariates (--values)
  • Added ability to specify lists and ranges of SNPs (--snps)
  • Added ability to select ranges based on regions (--range)
  • Added proxy selection features based on LD (--proxy-r2-filter)
  • Added simple "risk-profile" tool (--score)
  • Fixed issue with scaling of covariates in GLMs
  • Added --rerun option to repeat analysis given LOG file
  • Added --write-snplist option
  • Fixed dirction-of-effect error in haplotypic QTL test
  • Enabled --fisher to work with --model
  • Made variance inflation factor default value less stringent
  • Fixed some problems with haplotype TDT
  • Fixed problem with slightly different p-values for QTL tests from --adjust
  • Fixed bug in --all-pheno option when used with disease traits
  • Fixed bug in --epistasis routine regarding handling of missing data
V0.99s (26-July-2007)
  • Added SNP annotation --lookup set of options
  • Added proxy assocition functions (--proxy-assoc, etc)
  • Added extensible R plugin functionality (--R)
  • Added --lfile option for long-format input
  • Fixed problem with all-male or all-female X chromosome test
  • Added r-squared calculation for two SNPs based on haplotype frequencies
  • Added geno-grouping speedup to E-M algorithm; fixed minor problem with treatment of missing genotype data
  • Added --oblig-missing and --oblig-cluster options, to specify obligatory-missing genotypes
  • Added --impute-sex option
  • Added concordance calculation to --merge-mode 6 and 7
  • Added haplotype support for X and haploid chromosomes
  • Added haplotype support for quantitative trait analysis
  • Mendel error filter now zero's out the people implicated as per heurtistic described here
  • Fixed output commands to use user-defined missing phenotype and genotype values
  • Added dominant and recessive models for --linear and --logistic
  • Improved convergence of EM haplotyping routine
  • Fixed minor bug in --parameters function
  • Added --lambda option to fix genomic control factor
  • Added --log10 option to change output in *.adjusted
  • Added --horse species option
  • Added --qq-plot function
  • Added --loop-assoc option
  • Added --distance-matrix option
  • Changed implementation and interface of the --homozyg-* methods
  • Enabled permutation and set-tests with --dfam
  • Added ability to constrain --cluster with --within
  • Added --recode-bimbam, --recode-fastphase and --recode-structure options
  • Fixed minor issue with --het command
  • Added --liability option
  • Fixed issue with --genotypic and --covar
  • Fixed issue with --dfam
V0.99r (29-April-2007)
  • Added --parameters and --tests options
  • Added --zero-cluster option
  • Added --no-fid, --no-parents, --no-sex and --no-pheno options
  • Added --with-phenotype flag to modify --write-covar
  • Now give a warning if fileroots contain a fullstop/period character
  • Added --fisher for Fisher's exact test; use this in --test-missing
  • Added --set-test option
  • DFAM can include unrelateds (possibly in clusters) as well as families in a combined test
  • Improved multicollinearity check in linear model tests
  • Added --all-pheno option for some tests
  • Enabled permutation for --mh
  • Added XY and MT chromosome support
  • Fixed problem with --hap-window introduced in 0.99q
  • Fixed --homog for X and changed output format
  • Fixed problem with --out and --script introduced in 0.99q
V0.99q (3-March-2007)
  • Support for PED files larger than 4GB
  • Added --tfile to load transposed (row=SNP,column=person) files (i.e. as from --recode --transpose)
  • Added --recodeA option (like --recodeAD but only output additive components)
  • Added --write-covar option and also ability to include covariate files when recoding or making binary files
  • Add simple filters: --filter-cases, --filter-controls, --filter-males, --filter-females, --filter-founders and --filter-nonfounders
  • Added weighted multimarker tests with --whap
  • Added X chromosome and haploid models for --linear and --logistic with --xchr-model
  • Add --set-me-missing -- now, by default, remaining (i.e. for SNPs/individuals not removed) Mendel errors are not fixed to zero when recoding (--make-bed, etc) a file and filtering on --me.
  • Fixed bug in loading of covariates which made missing phenotypes
  • Changed implementation of --fast-epistasis
  • Fixed minor --bmerge issue with monomorphic alleles in offspring-only subsamples
  • Added --allele1234 and --alleleACGT options
  • Fixed CMH output to NA rather than -9
  • Added web-based context-specific warnings

V0.99p (16-January-2007)
  • Fixed bug in loading of covariates which made missing phenotypes no longer missing (e.g. -9 phenotype would have been treated literally as -9)
  • Fixed bug in --bmerge function when merged-in SNPs already exist
  • Added --transpose option to modify --recode
  • Fixed bug in --genotypic option that lead to incorrect results
  • Added --test-all option for --linear and --logistic
  • Changed --fast-epistasis to use correlational test
  • Added --ci support for --linear and --logistic
  • Added --mds-plot option
  • Now allow --remove and --keep together (similarly for --extract and --exclude
  • Added --genome-lists option to facilitate parallization of --genome
  • Added lower pool size in pool segment output, with --pool-size option
  • Added odds ratio calculation for --model tests
  • Modified --qfam within test (only model W)
  • Added --check-sex option
  • Cleaned up excessive memory use issue when merging multiple files
  • Added speed-up and bug fixes to QFAM routines
  • Added gPLINK compatibility via --gplink flag
  • Now treats half-missing genotypes, e.g. A 0 as missing rather than giving an error (haploid genotypes should still be coded as homozygous)
  • Recode file options (--make-bed, --recode, etc) now do not automatically set haploid heterozygous genotypes to missing, unless --set-hh-missing specified
  • No longer sets p-values <1e-16 to 0
  • Now use t-statistic for QTL test
  • Improved verbose segmental output (separate files)
  • Added --filter and --mfilter options

V0.99o 27-November-2006
  • Permutation applicable to --test-missing option
  • Added --twolocus output option
  • Added --overlap option
  • Added --logistic and --linear options
  • Added --genotypic and --interaction options
  • Reframed --homozyg tests
  • Added epistasis using linear (QT) and logistic regression models
  • Fixed bug in haplotype-based TDT test (counted transmissions to unaffecteds)
  • IBD estimation adjusted, and fixed a minor bug
V0.99n 11-October-2006
  • Added option to print warning when duplicate individual or marker IDs are found
  • Added --read-segment option
  • Changed output format of HWE and genotypic/model association tests
  • Implemented new bias-correct IBD estimators
  • Fixed minor bug that could cause problems when merging datasets on some platforms
  • Large restructuring of haplotype inference code
  • Added --test-mishap option
  • Added --indep-pairwise option
  • Added --hap-window option
  • Added --ld-window option
  • Added --plist option
  • Added --read-genome option
  • Added --map3 option
V0.99m 23-August-2006
  • Added --gene extraction option
  • Fixed bug affecting labels after set pruning
  • Added --list output option
  • Added --counts option to modify --freq
  • Fixed bug in the Hotelling's T(2) test handling of missing genotypes
  • Added permutation options for --model
  • Fixed minor bug introduced in v0.99l that caused crash when attempting a set-based TDT analysis
  • Altered some field headers in various output files for greater consistency
V0.99l 27-July-2006
  • Added --bmerge option to merge in a binary file
  • Added framework for QFAM test (option not yet available in release version)
  • Added Wiggington et al (AJHG, 2005) exact Hardy-Weinberg calculation
  • Added --from-kb etc options to select regions
V0.99j 14-July-2006
  • Added --window option to extract a +/- X kb region around a given SNP
  • Fixed bug which made set VIF pruning fail with a set containing a single SNP
  • Redircted ambiguous sex and no-non-missing-founders messages to files (plink.nosex and plink.nof) rather than to plink.log
  • Fixed bug in HWE tests which meant non-founders were included
V0.99i 5-July-2006
  • Improved parsing of PED and haplotype specification files; fixed some minor bugs since 0.99h in this regard, mainly DOS versus UNIX issues
  • Fixed bug in haploTDT routine
  • Implemented gene-based canonical correlation test within PLINK (previously, an R script was generated and this analysis was performed externally)
  • Added feature to scan genome and extract a set of SNPs that are relatively uncorrelated with each other (sliding window based on VIF; implemented in the --indep option)
V0.99h 29-June-2006
  • Added option to prune SNPs based on LD (i.e. select an independent subset of SNPs) using the --indep option
  • Fixed bug that occurred when creating a binary map file if a SNP had no non-missing alleles (i.e. previously one allele field was left blank, meaning that the file would not be properly read in subsequently)
  • Improved Hotelling's T(2) calculation -- now it better handles highly or completely correlated SNPs
  • Added singular value decomposition routines and variance inflation calculation
  • Add --allow-no-sex option to differently handle individuals with ambiguous sex codes
  • Fixed bug in --r and --r2 routines
V0.99g 20-June-2006
  • Implemented web-based version checking
  • Fixed error in which families counted twice when filtering on Mendel errors and performing TDT also
  • Added column count check for PED files
  • Allowed comments in PED files (lines starting #) for basic input and merge commands
  • Fixed specification of --gap for case-only epistasis tests -- using kb now, not bp
V0.99f 12-June-2006
  • Fixed bug that TDT in version 0.99e (but not prior versions), that meant that transmissions to unaffecteds as well as affecteds were counted
  • Improved parsing of --merge-list for end-of-file
V0.99e 9-June-2006
  • Improved efficiency of haplotype phase routine
  • Added nearest neighbour identification in --neighbour routine, and fixed a minor bug
  • Added support for haplotypic TDT test
  • Fixed error in homozygosity-run analysis
  • Fixed error in handling of monomorphic variants when creating a binary map file
  • Added --snp option to select single SNPs
  • Added out-of-memory warning
V0.99c 23-May-2006
  • Fixed error in conversion from SNP-major to individual-major data representations that effected Mendel error check routines
V0.99b 16-May-2006
  • Fixed error in Hardy-Weinberg calculations for quantitative traits
  • Implemented --nudge and --impossible features for IBD calculation
V0.99 30-Apr-2006
  • Major internal restructuring to hold data in either row-major or column-major formats, depending on choice of analysis (i.e. order genotypes either by individual or by SNP in memory).
  • Added ability to stratify summary statistics by a cluster variable
  • Improved parsing of haplotypes ( creates .mishap file for mis-specified haplotypes)
  • Fixed bug in CMH tests (problem with individuals who were not assigned to a cluster)
  • Fixed problem with extracting SNPs and individuals with binary PED files
V0.98 19-Apr-2006
  • Added support for adjusted significance test calculation (Bonferroni, FDR, Sidak, etc)
  • Added --script feature to allow long command lines
  • Added --1 feature to allow for 0/1 coding of affection variables
  • Added --tab feature to control field delimiters in recoded PED files
  • Added proper support for combined label-swapping and gene-dropping permutation (--swap-parents, --swap-sibs and --swap-unrel
  • Corrected bug in filters for binary files that aren't in genomic order (i.e. those that result from merge operations).
V0.97 10-Apr-2006
  • Added Hotelling's T2 test for multilocus SNP data
  • Added a test for interaction with quantitative traits and a dichotomous covariate
  • Added --merge-list option to merge more than two filesets simultaneously
  • Fixed bug quantitative trait association test (not dealing with missing phenotypes properly)
  • Fixed some minor bugs with parsing the command line
V0.96 30-Mar-2006
  • Fixed bug in --remove option
  • Added Breslow-Day test of homogeneous odds ratios
  • Added option to skip nearby SNPs in case-only epistasis test
  • Added time/date stamps to output
  • Records output in *.log file; most remaining output echoed to STDOUT instead of STDERR (aside from errors and warnings)
  • Improved parsing of command lines (checking numeric inputs, etc)
  • Added -mcc option to specify number of cases:controls in clustering, e.g. for 3:1 matching of cases to controls, for example.
V0.95 20-Mar-2006
  • Added Cochran-Mantel-Haenszel tests (2x2xK and IxJxK)
  • Added homogeneity of odds ratio between clusters test (partitioning chi-square)
  • Added support for gene-dropping simulation
V0.94 7-Mar-2006
  • Added feature to perform error checking of command line options (scan for unused options)
  • Ability to include external matching criteria for --cluster added
  • Ability to specify merge modes and a diff function for PED files
V0.93 1-Mar-2006
  • X chromosome support added for basic association test & quantitative traits
  • Threshold for --genome output based on pi-hat exceeding --min
V0.92 22-Feb-2006
  • X chromosome support added for case/control tests, quantitative trait association, TDT, genotypic correlations, allele frequency statistics. Not yet implemented for the population stratification, inbreeding or epistasis tests.
  • --chr and --from X and --to X options added
  • Some problems with the --merge option corrected
  • Now only considers founders for the allele frequency and HWE tests
 
This document last modified Wednesday, 25-Jan-2017 11:39:26 EST