PLINK: Whole genome data analysis toolset

gPLINK


gPLINK is a freely-available, Java-based software package that:
  • is a GUI that allows construction of many common PLINK operations
  • provides a simple project management tool and analysis log
  • allows for data and computation to be on a separate server (via SSH)
  • facilitates intergration with Haploview
This site provides:

 

Quick links

Take the quick tour

gPLINK documentation

Downloading gPLINK

Screenshots

PLINK

Haploview

IMPORTANT The gPLINK GUI is currently still under development: this current version is for early-adopters only.



Documentation

In this Section, we cover:  

Overview of gPLINK

gPLINK is a Java program that provides a simple interface to the more commonly-used PLINK commands (i.e. instead of using the command line options). gPLINK provides menus and dialogs to create valid PLINK commands, executes them, keeps a record of all commands run in a project, keeps track of input and output files, allows annotation of result files and facilitates integration with Haploview.

Alternatively, gPLINK can be used to collate previously-generated PLINK analyses, organising the results and allowing for easier browsing with Haploview. Using gPLINK in this "browse-only" mode, it can provide a means for distributing results of analyses to a wider set of collaborators, for example.

Please refer to the main PLINK documentation pages for a more detailed description of the different analytic options and the output file formats.

gPLINK provides can be used to initiate analyses from the five major domains of PLINK commands; the menu options are shown in the figure below:


We will be adding new PLINK commands to this list; this figure represents only those available in the initial version 0.1 release.

HINT It is also possible to enter whole PLINK command lines (i.e. bypassing the menus) via gPLINK, which can be useful if the exact PLINK option has not yet been incorporated into gPLINK.

In gPLINK, a project corresponds to a folder, either on the local machine or on a remote machine. All output will be written to that folder. Each operation must be assigned a unique fileroot name, which is used to track operations. gPLINK keeps track of the commands run, and which files were used for input and which were created as output, storing this information in a metafile in the project folder.

Local versus remote modes

You can run gPLINK in one of two modes: either local or remote.

In local mode, everything resides on the same machine: PLINK, gPLINK, Haploview, the data and all computation.

In remote mode, PLINK, the data files and all computation is assumed to reside on a separate Unix/Linux machine, connected to the local machine via SSH networking. The user runs the two Java-based tools gPLINK and Haploview on the local machine, issuing commands to the remote machine that actually does all the work; select results files can be downloaded to the local machine for subsequent viewing, either using Haploview or any other local software.

In remote mode, gPLINK will use two project folders: one remote folder where all the original data and results are stored, and one local temporary version. Any file in the temporary folder can be deleteted after the session finishes.

If the remote server is also the head node of a cluster, and if jobs are sent to the cluster with a simple prefix on a command line, then gPLINK can also send PLINK jobs off to the cluster to be performed.

To utilise remote mode, you need In remote mode, gPLINK can also be used as a browser of precomputed results, providing a simple means to distribute or summarize WGAS results.

Starting a new project

The first step is always to create a local project folder (using the standard operating system, as you would create any other folder/directory). If running in local mode, you will typically want to populate this folder with your datasets. If running in remote mode, this local folder will typically be empty, and it will just be used as a place to store temporary files.

On opening gPLINK, you first select which local project folder you'll be using, with the File -> Set project folder option.

Next, you indicate whether the project will be local or remote. See the tour for more details on these steps.

Configuring your project

After you have opened your project folder, a configuration dialog will pop up. Here you should set the PLINK path to point to your current copy of PLINK (plink.exe in Windows, otherwise plink). If in remote mode, you will be pointing to the copy of PLINK you wish to use on the remote server.

If you intend to use Haploview, you need to set the Haploview .jar path to where the .jar file. You need Haploview version 4.0 or later to integrate with gPLINK.

The Editor options allows you to pick what command you wish to call when you view input or output files. This is an advanced feature and the defaults should work fine in most cases.

Again, see the tour for more details on these steps.

Starting PLINK jobs

By this stage, you have started the project and configured gPLINK. You only need to configure gPLINK once at the start of each project.

To initiate a PLINK job, select the appropriate menu option from the PLINK menu. A dialog will be shown, in which you must: If there are files with the same root and the appropriate extensions, then you can select this fileset from the top combo box; alternatively, the files can be specified separately. If you select an alternate phenotype, you must return the panel to either the binary or standard fileset panel before completing the analysis.

Additionally, you can often optionally After clicking OK, you will be shown the corresponding PLINK command line that gPLINK has generated given your choices. You are also given the option to add a description to this operation: this can be any text that will help you track what you are doing (i.e. when you return a month later to the project and can't remember what all those cryptic filenames mean...)

After adding the description, you will be returned to the main window; the newly generated operation will be added to the list of operations. This does not necessarily mean that the PLINK command will be finished. Depending on the size of the data and the analysis chosen, PLINK analyses can take from seconds to days or more. The command will run in the background. You can close gPLINK and the PLINK command (or commands) will continue to run in the background. When you next open up gPLINK, if the analysis has finished, gPLINK should automatically detect this and connect up the ouput to the entry in the list of operations. Clicking on the operation will display the log file associated with that PLINK run; if the PLINK job has finished, then the log file will be complete, ending either in a fatal error message or a line saying when the analysis was finished.

Viewing output files

You can view the result files by expanding the tree for the desired operation, going to the list of output files and right-clicking on the entry. You will be given a choice to view the file in the default viewer or an alternate viewer. The default viewer depends on the machine type: it will be WordPad, TextEdit or emacs for Windows, Mac and Linux/other systems respectively. The alternate viewer can be set to anything (e.g. Excel) via the Configuration menu option.

Integration with Haploview

If it is an appropriate file type, and if in the Configuration panel you pointed gPLINK to an instance of Haploview (version 4), then an Open in Haploview option will also appear. For results files, this will bring up the results-viewer panel of Haploview. You can filter and sort results here as well as merging multiple result files together. You can also generate plots of results very easily (which are interactive in the sense that if you hover the mouse over a point, it will tell you which SNP, or individual, the point represents; clicking on the point will take you to the relevant entry in the results table).

It is also possible to extract subsets of your whole genome SNP data files for viewing in Haploview (i.e. viewing data rather than results of PLINK runs). Use the Data management -> Generate fileset -> Haploview fileset option for this. Then right-clicking on either the .info or .ped file and selecting to view in Haploview will load the data into Haploview. Note: use the filters to select manageable subsets of the data for viewing in Haploview (i.e. restrict the number of SNPs you wish to include).

Miscellanea and known issues

How do I kill a PLINK process initiated by gPLINK?

When you run a PLINK command from gPLINK a separate PLINK program is started independent of gPLINK. This means that you can close gPLINK and your operation will run to completion. It also means that if you decide to kill your PLINK operation you will need to do so through your operating system (for example kill in Linux or through Task Manager in Windows).

Manual Rescan folder option

The menu option Rescan folder checks the project folder for files created by PLINK commands. Typically this rescanning is performed automatically, for both local and remote projects, every couple of seconds or so. This option is for the inpatient, therefore.

Quirks, known issues

In no particular order:

Download and installation

You first need up-to-date versions of Java, PLINK and Haploview on your computer.

Please follow all 4 of these steps:

  1. You will need Java 1.5 installed on your machine: this is freely available from the Sun website, for all common platforms. To download Java, follow this link and select Download Now.

  2. You need PLINK version 0.99p or greater to work with gPLINK, which can be downloaded from here.

  3. Haploview can be obtained from this page

  4. After completing the above three steps, download the latest gPLINK (version 1.05) by clicking here for the JAR file (as a zipped archive). Alternatively, you could test version 2.024 by clicking here for the JAR file. Note that this is in beta still and potentially more buggie then version 1.05.

If you download this file, please refer to the GPL v2 licence.

If you have downloaded the ZIP file, you must first extract the contents (a single JAR file).

In Windows, double-clicking on the gPLINK.jar file should probably work

If not (and on all other platforms), typing at the command line prompt
     java -jar gPLINK.jar
will start gPLINK (you must be in the directory where gPLINK.jar is, or specify it's location explicitly; likewise, java must be in your path; please ask your system administrator if you have problems with this).

Source Code

For interested developers the source code can be found here (gPLINK-2.024-src.zip). Note that most users do not need the source code!

Screenshots

Main interface, tracking a project

Common PLINK analysis options available via pull-down menus


Set analysis parameters, filters and thresholds


This document last modified