PLINK: Whole genome data analysis toolset plink...
Last original PLINK release is v1.07 (10-Oct-2009); PLINK 1.9 is now available for beta-testing

Whole genome association analysis toolset

Introduction | Basics | Download | Reference | Formats | Data management | Summary stats | Filters | Stratification | IBS/IBD | Association | Family-based | Permutation | LD calcualtions | Haplotypes | Conditional tests | Proxy association | Imputation | Dosage data | Meta-analysis | Result annotation | Clumping | Gene Report | Epistasis | Rare CNVs | Common CNPs | R-plugins | SNP annotation | Simulation | Profiles | ID helper | Resources | Flow chart | Misc. | FAQ | gPLINK

1. Introduction

2. Basic information

3. Download and general notes

4. Command reference table

5. Basic usage/data formats 6. Data management

7. Summary stats 8. Inclusion thresholds 9. Population stratification 10. IBS/IBD estimation 11. Association 12. Family-based association 13. Permutation procedures 14. LD calculations 15. Multimarker tests 16. Conditional haplotype tests 17. Proxy association 18. Imputation (beta) 19. Dosage data 20. Meta-analysis 21. Annotation 22. LD-based results clumping 23. Gene-based report 24. Epistasis 25. Rare CNVs 26. Common CNPs 27. R-plugins 28. Annotation web-lookup 29. Simulation tools 30. Profile scoring 31. ID helper 32. Resources 33. Flow-chart 34. Miscellaneous 35. FAQ & Hints

36. gPLINK
 

This page contains some important information regarding how to set up and use PLINK. Individuals familiar with using command line programs can probably skip most of this page.

Download

PLINK is now available for free download. Below are links to ZIP files containing binaries compilied on various platforms as well as the C/C++ source code. Linux/Unix users should download the source code and compile (see notes below).

These downloads also contain a version of gPLINK, an (optional) GUI for PLINK. Please see these pages for instructions on use of gPLINK.

Remember This release is considered a stable release, although please remember that we cannot guarantee that it, just like most computer programs, does not contain bugs...

Platform File Version
Linux (x86_64) plink-1.07-x86_64.zip v1.07
Linux (i686) plink-1.07-i686.zip v1.07
MS-DOS plink-1.07-dos.zip v1.07
Apple Mac (Intel) plink-1.07-mac-intel.zip v1.07
C/C++ source (.zip) plink-1.07-src.zip v1.07

One more thing... If you download PLINK please either join the very low-volume e-mail list (link from Introduction page) or drop an e-mail to plink AT chgr dot mgh dot harvard dot edu letting me know you've downloaded a copy.

For old versions of PLINK please visit the archive.

Debian users PLINK is available as a Debian package, see these notes. Note, the executable is named snplink in the Debian plink package.

Development version source code

You can download the very latest development source code in this ZIP file. This is really, strongly not recommended for most users. The code posted here could change on a daily basis and is not versioned.

Development source code versions have a p suffix, meaning pre-release. For example, if the current release is 1.04, the next stable release will be 1.05 and the development code will be 1.05p. Note that 1.05 may differ from 1.05p and as noted before, from day-to-day the 1.05 development code may change in any case.

The principle reason for including the source code here is to allow access for specific users to specific, new features. These features are described here.

General installation notes

The PLINK executable file should be placed in either the current working directory or somewhere in the command path. This means that typing
plink

or
./plink

at the command line prompt will run PLINK, no matter which current directory you happen to be in. PLINK is a command line program -- clicking on an icon with the mouse will get you nowhere.

Below, on this page, is a general overview of how to use the command line to run PLINK. The next sections give details about how to install PLINK on different platforms.

Windows/MS-DOS notes

Unzipping the downloaded ZIP file should reveal a single executable program plink.exe. The Windows/MS-DOS version of PLINK is also a command line program, and is run by typing
plink {options...}

not by clicking on the icon with the mouse. Open a DOS windows by selecting "Command Prompt" from the start menu, or entering "command" or "cmd" in the "Run..." option of the start menu.

The folders c:\windows\ or c:\winnt\ are typically in the path, so these are good places to copy the file plink.exe to. You can copy the plink.exe file using Windows, as you would copy-and-paste any file (e.g. using the right-button menu or the keyboard shortcuts control-C (paste) and control-V (paste).

Alternatively, if you know that you will only ever run PLINK on files in a single folder, then you can paste plink.exe into that folder, e.g. C:\work\genetics\. The disadvantage of this approach is that PLINK will not be available from the command line if you are in a folder other than this one.

Once you have copied plink.exe to the correct location, you can test whether or not PLINK is available (i.e. in your command path) by simply typing
plink

at the command line. You should see something like the following message:
     Microsoft Windows XP [Version 5.1.2600]
     (C) Copyright 1985-2001 Microsoft Corp.

     C:\>plink

     @----------------------------------------------------------@
     |         PLINK!       |    v0.99l     |   27/Jul/2006     |
     |----------------------------------------------------------|
     |  (C) 2006 Shaun Purcell, GNU General Public License, v2  |
     |----------------------------------------------------------|
     |       http://pngu.mgh.harvard.edu/purcell/plink/         |
     @----------------------------------------------------------@
 
     Web-based version check ( --noweb to skip )
     Connecting to web...  OK, v0.99l is current
 
     *** Pre-Release Testing Version ***
 
     Writing this text to log file [ plink.log ]
     Analysis started: Fri Jul 28 10:07:57 2006
 
     Options in effect:
 
 
     ERROR: No file [ plink.ped ] exists.
Do not worry about this error message -- normally you would specify your own PED/MAP file names to analyse (i.e. the default input filename is plink.ped).

Please ask your system administrator for help if you do not understand this.

HINT In MS-DOS, you can to increase the width of the window to avoid output lines wrapping around and being hard to read. To do this under Windows XP DOS: right click on the top title/menu bar of the window and select Properties / Layout / Window Size / Width -- increse the width value to a larger value (e.g. 120, or as large as possible without the window getting too big to fit on your screen!).  

UNIX/Linux notes

If you are not familiar with the concept of the path variable, ask your system administrator to help. In a UNIX/Linux environment, this would mean either copying the PLINK executable to a folder such as
     /usr/local/bin/
or
     ~/bin/
assuming these directories exist and are in the path. To see which directories are in the path, typing
$PATH

at the command prompt will often work. To create a directory, say called bin in your home directory and add it to the path, try
mkdir ~/bin

export PATH=$PATH:~/bin/

although this will depend on which shell you are using. Some shells do not include the current directory in the path: in this case, you might need to prefix all PLINK commands with the characters ./, e.g.
./plink --file mydata --assoc

 

Source code compilation

PLINK is also distributed as C/C++ source code, which you can compile for your particular system using any standard C/C++ compile. Download the .zip or .tar.gz files and perform the following steps:
tar -xzvf plink-0.99s-src.tar.gz

or
unzip plink-0.99s-src.zip

or use a graphical tool such as WinZip to extract the contents of the archive. This should create a directory called
     plink-0.99s-src
(the exact version number might be different, of course). On the command line, move to that dirctory and simply type make :
cd plink-0.99s

You will need a C/C++ compiler installed on your system for the next step. Linux distributions will include gcc/g++ by default. Ask your system administrator about installing a C/C++ compiler if you do not have one already (Windows, MS-DOS users).

Hint PLINK has not been exhaustively tested on different compilers. We sugest you use a recent download of MinGW for Windows, or at least gcc 4.1.

WARNING We suggest using the most recent stable release of the compiler available on your platform to avoid compilation problems. For most platforms this means gcc 4.2 as of writing this. Some issues with specific older compiler and specific platforms have been detected, e.g. gcc 3.3.3 on a SGI Altix 3700 system.

Use a standard text editor such as emacs, pico or WordPad to edit the Makefile to suit your particular platform: the top of the Makefile should look like this:
     # ---------------------------------------------------------------------
     # 
     #   Makefile for PLINK 
     #    
     #   Supported platforms
     #       Unix / Linux                LINUX
     #       Windows                     WIN
     #       Mac                         MAC
     #       Solaris                     SOLARIS
     #  
     #   Compilation options
     #       R plugins                   WITH_R_PLUGINS
     #       Web-based version check     WITH_WEBCHECK
     #       Ensure 32-bit binary        FORCE_32BIT 
     #       (Ignored)                   WITH_ZLIB
     #       Link to LAPACK              WITH_LAPACK
     #       Force dynamic linking       FORCE_DYNAMIC
     #
     # ---------------------------------------------------------------------

     # Set this variable to either UNIX, MAC or WIN
     SYS = UNIX

     # Leave blank after "=" to disable; put "= 1" to enable
     WITH_R_PLUGINS = 1
     WITH_WEBCHECK = 1
     FORCE_32BIT =
     WITH_ZLIB =
     WITH_LAPACK =
     FORCE_DYNAMIC =

     # Put C++ compiler here; Windows has it's own specific version
     CXX_UNIX = g++
     CXX_WIN = c:\bin\mingw\bin\mingw32-g++.exe

     # Any other compiler flags here ( -Wall, -g, etc)
     CXXFLAGS =

     # Misc
     LIB_LAPACK = /usr/lib/liblapack.so.3

     # --------------------------------------------------------------------
     # Do not edit below this line
     # --------------------------------------------------------------------
The steps to edit this:
  1. Change the SYS variable to your platform, e.g. WIN for Windows
  2. For the next set of options, put either a 1 or leave blank to turn on or off these options, respectively.
    • WITH_R_PLUGINS This enables support for R plugins using Rserve as described here. Currently this only works for Unix-based machines.
    • If you want to disable the web-based version check option (not recommended) or if compilation fails with this on, you might try removing the 1 after WITH_WEBCHECK
    • When compiling on a 64-bit machine, this option, FORCE_32BIT, can force (when set) a 32 bit binary (assumes all necessary libraries, etc) are in place
    • Other options listed here are described below.
  3. Edit the CXX_* variable to point to the C/C++ compiler you wish to use
  4. To pass any extra commands to the compiler (e.g. location of libraries, etc), you can edit CXX_FLAGS
LAPACK support
As described here, linking to the LAPACK library can greatly speed up MDS analysis of population stratificaiton. This may take a little tweaking:
  • Obtain and compile LAPACK, here. This requires the gfortran compiler. I cannot assist in any technical difficulties you have with this: ask you IT staff. It is quite possible that LAPACK is already installed somewhere in your institution.
  • Determine where the LAPACK library file is located, and whether it is a shared (e.g. liblapack.so.3) or static (e.g. lapack_LINUX.a) library. (Libraries ending .a are static; libraries ending .so.* are shared, or dynamically linked. If the LAPACK libraries are shared libraries, then set the FORCE_DYNAMIC flag to have 1 after it in the PLINK Makefile.
  • Set the variable LIB_LAPACK to point to the LAPACK libraries. This may vary by machine and the precise installation of LAPACK. For example, on one machine, I have three static LAPACK libraries in the directory I compiled LAPACK in:
         ~/src/plink> ls ../lapack-3.2/*a
         ../lapack-3.2/blas_LINUX.a  ../lapack-3.2/lapack_LINUX.a  ../lapack-3.2/tmglib_LINUX.a
    
    In this case, set (all one line)
      LIB_LAPACK = ../lapack-3.2/lapack_LINUX.a 
      LIB_LAPACK += ../lapack-3.2/blas_LINUX.a 
      LIB_LAPACK += ../lapack-3.2/tmglib_LINUX.a
    
    On this machine, it was also necessary to add
      LIB_LAPACK += -lgfortran
    
    On a different (Linux) machine, the LAPACK library was a shared one, in /usr/lib/liblapack.so.3, that worked as a single file. In this case, the necessary changes were to set the WITH_LAPACK and FORCE_DYNAMIC flags, then set
       LIB_LAPACK = /usr/lib/liblapack.so.3
    
Doubtless there is a better way to configure this, but for now I present the above as a quick-fix way of achieving LAPACK support. A little tweaking by somebody who knows what they are doing should suffice. I will not be able to provide detailed help for platforms I am unfamiliar with: you are on your own I'm afraid! You are likely to see some linker errors when compiling if things are not right.

Starting compilation

You should then just type
make

and PLINK should (hopefully) start compiling. You should use GNU version, which is sometimes called gmake on some platforms (e.g. FreeBSD). It is also possible that you have installed make but it is not in your path and/or your version of make.exe is called something slightly different, in which case use the full path, e.g. change the following to suit your system:
c:\mingw\bin\mingw32-make

NOTE Often problems in compilation will reflect system-specific / compiler-specific problems: unfortunately, we are not able to give detailed advice on how to do this. If things do not work and you are unsure, you will need to enlist the help of your systems/IT department.

You should see something like the following output (abbreviated)
g++ -O3 -I. -DUNIX -static -c plink.cpp
g++ -O3 -I. -DUNIX -static -c options.cpp
g++ -O3 -I. -DUNIX -static -c input.cpp
...
g++ -O3 -static -o plink plink.o options.o input.o binput.o 
helper.o genome.o snpfilter.o indfilter.o locus.o multi.o 
regress.o crandom.o cluster.o output.o informative.o affpair.o 
assoc.o bins.o epi.o phase.o trio.o sharing.o genepi.o sets.o 
perm.o mh.o genedrop.o gxe.o merge.o hotel.o multiple.o
After a minute or so, this will have created an executable binary file called plink (or plink.exe for Windows/MSDOS users).

Running PLINK from the command line

A typical session might involve running several commands, e.g. to produce summary statistics on missing data, to exclude some SNPs based on these results, to run an association analysis. Each command involves a separate instantiation of plink -- note that PLINK does not remember any parameter settings between different runs or store any other information. In otherwords, if you want to perform two association tests with different PED files, but only including SNPs that are above a certain minor allele frequency in both runs, you would use the following:
plink --ped file1.ped --map file1.map --maf 0.05 --assoc

plink --ped file2.ped --map file2.map --maf 0.05 --assoc

In otherwords, the following sequence would not work:
plink --ped file1.ped --map file1.map --maf 0.05

plink --ped file1.ped --map file1.map --assoc

    MAF returns to default {0.01}
plink --ped file2.ped --map file2.map --assoc

    As above

 

Viewing PLINK output files

UPDATE We are developing the tool gPLINK to integrate PLINK with Haploview. Haploview 4.0 provides a number of features for viewing, filtering and plotting PLINK results files. This is intended to supplant the methods suggested below.

All the output files that PLINK generates are plain-text, space-delimited files. Most files will have the same number of fields per line and will have the field names in the first line, facilitating use of a spreadsheet or statistics package to view and process the results.

For small results files, simply printing the files to the terminal or viewing in a text-editor should work well. In Windows/MS-DOS use the type command, e.g.
type mydata.assoc

to view a results file. Alternatively, you can call up WordPad from the command line as follows:
write mydata.assoc

If you are using a Unix/Linux system, then commands such as cat, more or less can be used to display the results; alternatively text-editors such as pico, emacs or vi.

Of course, Unix/Linux users also have available the entire range of text-processing tools (grep, gawk, perl, sort, head, etc) and shell-scripting tools, as well as powerful text-editors (emacs, etc) that are ideal for processing very large result files. Another alternative is to use a statistics package such as the R package which will provide powerful visualisation tools also.

Windows/MS-DOS users have fewer options for handling very large results files: For moderate size files (e.g. up to 50K SNPs), you could use Excel. For larger files, you can either install cygwin to provide a Linux-like environment, or use a statistics package such as the R package.

Personal opinion... Although a MS-DOS version of PLINK is supported, we would, in general, advise any any researchers planning on performing many large-scale analyses to look into adopting a Linux environment, if they are not already using this.
 
This document last modified Wednesday, 25-Jan-2017 11:39:26 EST