1. Introduction
2. Basic information
3. Download and general notes
4. Command reference table
5. Basic usage/data formats
6. Data management
7. Summary stats
8. Inclusion thresholds
9. Population stratification
10. IBS/IBD estimation
11. Association
12. Family-based association
13. Permutation procedures
14. LD calculations
15. Multimarker tests
16. Conditional haplotype tests
17. Proxy association
18. Imputation (beta)
19. Dosage data
20. Meta-analysis
21. Annotation
22. LD-based results clumping
23. Gene-based report
24. Epistasis
25. Rare CNVs
26. Common CNPs
27. R-plugins
28. Annotation web-lookup
29. Simulation tools
30. Profile scoring
31. ID helper
32. Resources
33. Flow-chart
34. Miscellaneous
35. FAQ & Hints
36. gPLINK
|
|
gPLINK
gPLINK is a freely-available, Java-based software
package that:
- is a GUI that allows construction of many common PLINK
operations
- provides a simple project management tool and analysis log
- allows for data and computation to be on a separate server (via SSH)
- facilitates integration with Haploview
This site provides:
|
|
Documentation
In this Section, we cover:
Overview of gPLINK
gPLINK is a Java program that provides a simple form interface to the
more commonly-used PLINK commands (i.e. instead of using the
command line options). gPLINK provides menus and dialogs to create valid PLINK
commands, executes them, keeps a record of all commands run in a
project, keeps track of input and output files, allows
annotation of result files and facilitates integration with
Haploview.
Important Only common PLINK commands are
included in gPLINK forms. However it is possible to enter
whole PLINK command lines via PLINK -> Create Plink Command,
which can be useful if the exact PLINK
option has not been incorporated into gPLINK.
Alternatively, gPLINK can be used to collate previously-generated
PLINK analysis, organizing the results and allowing for easier
browsing with Haploview. Using gPLINK in this "browse-only" mode,
it can provide a means for distributing results of analysis to a wider set
of collaborators, for example.
Please refer to the main PLINK documentation pages
for a more detailed description of the different analytic options and the
output file formats.
gPLINK provides can be used to initiate analysis from the five major domains
of PLINK commands; the menu options are shown in the figure below:
gPLINK is currently considered stable, however from time to time there may be updates to add new PLINK commands.
In gPLINK, a project corresponds to a folder, either on the local machine or
on a remote machine. All output will be written to that folder. Each operation must
be assigned a unique fileroot name, which is used to track operations.
gPLINK keeps track of the commands run, and which files were
used for input and which were created as output, storing this information in a
metafile in the project folder.
gPLINK has three main panes: Folder viewer, Operation viewer and Log viewer.
Folder viewer shows a list of the files in your current project. Left-clicking on the Folder viewer you can open files, launch Haploview, track the hierarchy of a file and edit file notations. Operation viewer displays the operations recorded by gPLINK, their associated input and output files as well as any operations notes. Left-clicking on the Operations viewer will pop-up options to: open files, edit operation notes, unlink/link files to/from an operation, create a new PLINK operation, and delete operation. Log viewer shows the log file associated with the selected operation in the Operation viewer.
Local versus remote modes
You can run gPLINK in one of two modes: either local or remote.
In local mode,
everything resides on the same machine: PLINK, gPLINK, Haploview, the data and all computation.
In remote mode, PLINK, the data files and all PLINK computation reside on a
separate Unix/Linux machine, connected to the local machine via SSH. The
user runs the two Java-based tools gPLINK and Haploview on the local machine,
issuing
commands to the remote machine that actually does all the work; select results files
can be downloaded to the local machine for subsequent viewing, either using Haploview
or any other local software.
In remote mode, gPLINK will use two project folders: one remote folder where all the
original data and results are stored, and one local temporary version. Any file in the
temporary folder can be deleted after the session finishes.
If the remote server is also the head node of a cluster, and if jobs are sent to the
cluster with a simple prefix on a command line, then gPLINK can also send
PLINK jobs
off to the cluster to be performed.
To utilize remote mode, you need
- To have a Linux/Unix server with PLINK installed
- To have access to this server via SSH (secure shell)
Starting a new project
The first step is always to create a local project folder by creating a new folder/directory
(using the standard
operating system, as you would create any other folder/directory). If running in local
mode, you will typically want to populate this folder with your datasets. If running
in remote mode, this local folder will typically be empty, and it will just be used as
a place to store temporary files.
On opening gPLINK, you first select which local project folder you'll be using,
with the File -> Set project folder option.
Next, you indicate whether the project will be local or remote. See the tour for more details on these steps.
Configuring your project
After you have opened your project folder, a configuration dialog will pop
up. Here you should set the PLINK path to point to
your current copy of PLINK (plink.exe in Windows, otherwise
plink). If in remote mode, you will be pointing to the copy of PLINK
you wish to use on the remote server.
If you intend to use Haploview, you need to set the Haploview .jar
path to where the .jar file. You need
Haploview version 4.0 or later to integrate with gPLINK.
The Editor options allows you to pick what command you wish to
call when you view input or output files. This is an advanced feature and
the defaults should work fine in most cases.
Again, see the tour for more details on these steps.
Starting PLINK jobs
By this stage, you have started the project and configured gPLINK. You only need to
configure gPLINK once at the start of each project.
To initiate a PLINK job, select the appropriate menu option from the
PLINK menu. A dialog will be shown, in which you must:
- Specify the binary or standard fileset to be used for input
- Specify a unique name for the output files
If there are files with the same root and the appropriate extensions, then you can select
this fileset from the top combo box; alternatively, the files can be specified separately.
If you select an alternate phenotype, you must return the panel to either the binary or
standard fileset panel before completing the analysis.
Additionally, you can often optionally
- Select an alternate phenotype file
- Set filters and thresholds
- Change other parameters relevant to the requested analysis
After clicking OK, you will be shown the corresponding PLINK
command line that gPLINK has generated given your choices. You are
also given the option to add a description to this operation: this can be
any text that will help you track what you are doing (i.e. when you return
a month later to the project and can't remember what all those cryptic filenames
mean...)
After adding the description, you will be returned to the main window; the newly
generated operation will be added to the list of operations. This does not necessarily
mean that the PLINK command will be finished. Depending on the size of the
data and the analysis chosen, PLINK analysis can take from seconds to
days or more. The command will run in the background. You can close gPLINK and
the PLINK command (or commands) will continue to run in the background.
When you next open up gPLINK, if the analysis has finished,
gPLINK should automatically detect this and connect the output
to the entry in the list of operations. Clicking on the operation will
display the log file associated with that PLINK
run; if the PLINK job has finished, then the log file will be complete, ending
either in a fatal error message or a line saying when the analysis was finished.
Viewing output files
You can view the result files by expanding the tree for the desired
operation, going to the list of output files and
left-clicking on the selected entry. Alternatively you can select the file from the Folder
viewer in a similar manner. You will be given a choice to view the file in the
default viewer or an alternate viewer. The default viewer depends on the machine type:
it will be WordPad, TextEdit or emacs for Windows, Mac and
Linux/other systems respectively. The alternate viewer can be set to anything
(e.g. Excel)
via the Configuration menu option.
Integration with Haploview
If in the Configuration panel you pointed
gPLINK to an instance of Haploview (version 4), then an Open in
Haploview option will also appear. For results files, this will bring up the
results-viewer panel of Haploview. You can filter and sort results here as well as merging
multiple result files together. You can also generate plots of results very easily
(which are interactive in the sense that if you hover the mouse over a point, it
will tell you which SNP, or individual, the point represents; clicking on the
point will take you to the relevant entry in the results table).
It is also possible to extract subsets of your whole genome SNP data files for
viewing in Haploview (i.e. viewing data rather than results of PLINK runs). Use the
Data management -> Generate fileset -> Haploview fileset option for this. Then
right-clicking on either the .info or .ped file and selecting to
view in Haploview will load the data into Haploview. Note: use the filters to select
manageable subsets of the data for viewing in Haploview (i.e. restrict the number of SNPs
you wish to include).
Miscellanea and known issues
How do I kill a PLINK process initiated by gPLINK?
When you run a PLINK command from gPLINK
a separate PLINK program is started independent of
gPLINK. This means that you can close gPLINK and your
operation will run to completion. It also means that if you decide to
kill your PLINK operation you will need to do so through your
operating system (for example kill in Linux or through Task
Manager in Windows).
Manual Rescan folder option
The menu option Rescan folder checks the project folder
for files created by PLINK commands. Typically this rescanning
is performed automatically, for both local and remote projects, every couple
of seconds or so and can be set for a different timing in the Configuration
dialog. This option is for the inpatient, therefore.
Quirks, known issues
In no particular order:
- All input files must have an extension (contain a period .) if
they are to appear in the operation view
- Do not attempt to use different machines with different architecture
to access a project on a shared network drive that is mapped to the different
machines. That is, in this case, PLINK would be running in "local" mode.
- Java 1.5 is required to run gPLINK.
Download and installation
You first need up-to-date versions of Java, PLINK and Haploview on your
computer.
Please follow all 4 of these steps:
-
You will need Java 1.5 installed on your machine: this is freely available from the Sun
website, for all common platforms. To download Java, follow this link
and select Download Now.
- You need PLINK version 0.99p or greater to work with gPLINK,
which can be downloaded from here.
- A beta version of Haploview that supports gPLINK can be
obtained from this page
- After completing the above three steps, download the latest gPLINK
(version 2.050) by clicking here for the JAR file (as a
zipped archive).
If you download this file, please refer to the GPL v2 license.
Source code is available upon request, and will soon be posted.
If you have downloaded the ZIP file, you must first extract the contents (a single JAR
file).
In Windows, double-clicking on the gPLINK.jar file should probably work
If not (and on all other platforms), typing at the command line prompt
java -jar gPLINK.jar
will start gPLINK (you must be in the directory where gPLINK.jar is,
or specify it's location explicitly; likewise, java must be in your path; please
ask your system administrator if you have problems with this).
Source Code
For interested developers the source code can be found here. Note that most users do not need the source code!
|
|