PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.

PLINK (one syllable) is being developed by Shaun Purcell whilst at the Center for Human Genetic Research (CHGR), Massachusetts General Hospital (MGH), and the Broad Institute of Harvard & MIT, with the support of others.

New in 1.07: meta-analysis, result annotation and analysis of dosage data.

Quick links

PLINK tutorial

gPLINK

FAQs | PDF

Data management

Read data in a variety of formats
Recode and reorder files
Merge two or more files
Extracts subsets (SNPs or individuals)
Flip strand of SNPs
Compress data in a binary file format

Summary statistics for quality control

Allele, genotypes frequencies, HWE tests
Missing genotype rates
Inbreeding, IBS and IBD statistics for individuals and pairs of individuals
non-Mendelian transmission in family data
Sex checks based on X chromosome SNPs
Tests of non-random genotyping failure

Population stratification detection

Complete linkage hierarchical clustering
Handles virtually unlimited numbers of SNPs
Multidimensional scaling analysis to visualise substructure
Significance test for whether two individuals belong to the same population
Constrain cluster solution by phenotype, cluster size and/or external matching criteria
Perform subsequent association analyses conditional on cluster solution

Basic association testing

Case/control
- Standard allelic test
- Fisher's exact test
- Cochran-Armitage trend test
- Mantel-Haenszel and Breslow-Day tests for stratified samples
- Dominant/recessive and general models
- Model comparison tests (e.g. general versus multiplicative)
Family-based association (TDT, sibship tests)
Quantitative traits, association and interaction
Association conditional on one or more SNPs
Asymptotic and empirical p-values
Flexible clustered permutation scheme
Analysis of genotype probability data and fractional allele coounts (post-imputation)

Multimarker predictors, haplotypic tests

Suite of flexible, conditional haplotype tests
Case/control and TDT association on the probabilistic haplotype phase
A set of proxy associaiton" methods to study single SNP associations in their local haplotypic context
Imputation heuristic, to test untyped SNPs given a reference panel

Copy number variant analysis

Joint SNP and CNV tests for common copy number variants
Filtering and summary procedures for segmental (rare) CNV data
Case/control comparison tests for global CNV properties
Permutation-based association procedure for identifying specific loci

Additional tests

Gene-based tests of association
Screen for epistasis
Gene-environment interaction with continuous and dichotomous environments

Meta-analysis

Automatically combine several generically-formatted summary files, for millions of SNPs
Fixed and random effects models

Result annotation and reporting

Post-analysis annotation of result files
LD-based and region-based grouping of results across multiple studies

Additional features

Extensible with via R function plug-ins
Web-based SNP and gene annotation lookup feature
Simple SNP simulation feature
ID helper tools, for tracking and working with project data
See the main documentation for full list of features

This document last modified Wednesday, 25-Jan-2017 11:39:27 EST