2. Basic information
3. Download and general notes
DocumentationIn this Section, we cover:
Overview of gPLINKgPLINK is a Java program that provides a simple interface to the more commonly-used PLINK commands (i.e. instead of using the command line options). gPLINK provides menus and dialogs to create valid PLINK commands, executes them, keeps a record of all commands run in a project, keeps track of input and output files, allows annotation of result files and facilitates integration with Haploview. Alternatively, gPLINK can be used to collate previously-generated PLINK analyses, organising the results and allowing for easier browsing with Haploview. Using gPLINK in this "browse-only" mode, it can provide a means for distributing results of analyses to a wider set of collaborators, for example. Please refer to the main PLINK documentation pages for a more detailed description of the different analytic options and the output file formats. gPLINK provides can be used to initiate analyses from the five major domains of PLINK commands; the menu options are shown in the figure below:
We will be adding new PLINK commands to this list; this figure represents only those available in the initial version 0.1 release. HINT It is also possible to enter whole PLINK command lines (i.e. bypassing the menus) via gPLINK, which can be useful if the exact PLINK option has not yet been incorporated into gPLINK. In gPLINK, a project corresponds to a folder, either on the local machine or on a remote machine. All output will be written to that folder. Each operation must be assigned a unique fileroot name, which is used to track operations. gPLINK keeps track of the commands run, and which files were used for input and which were created as output, storing this information in a metafile in the project folder.
Local versus remote modesYou can run gPLINK in one of two modes: either local or remote. In local mode, everything resides on the same machine: PLINK, gPLINK, Haploview, the data and all computation. In remote mode, PLINK, the data files and all computation is assumed to reside on a separate Unix/Linux machine, connected to the local machine via SSH networking. The user runs the two Java-based tools gPLINK and Haploview on the local machine, issuing commands to the remote machine that actually does all the work; select results files can be downloaded to the local machine for subsequent viewing, either using Haploview or any other local software. In remote mode, gPLINK will use two project folders: one remote folder where all the original data and results are stored, and one local temporary version. Any file in the temporary folder can be deleteted after the session finishes. If the remote server is also the head node of a cluster, and if jobs are sent to the cluster with a simple prefix on a command line, then gPLINK can also send PLINK jobs off to the cluster to be performed. To utilise remote mode, you need
Starting a new projectThe first step is always to create a local project folder (using the standard operating system, as you would create any other folder/directory). If running in local mode, you will typically want to populate this folder with your datasets. If running in remote mode, this local folder will typically be empty, and it will just be used as a place to store temporary files. On opening gPLINK, you first select which local project folder you'll be using, with the File -> Set project folder option. Next, you indicate whether the project will be local or remote. See the tour for more details on these steps.
Configuring your projectAfter you have opened your project folder, a configuration dialog will pop up. Here you should set the PLINK path to point to your current copy of PLINK (plink.exe in Windows, otherwise plink). If in remote mode, you will be pointing to the copy of PLINK you wish to use on the remote server. If you intend to use Haploview, you need to set the Haploview .jar path to where the .jar file. You need Haploview version 4.0 or later to integrate with gPLINK. The Editor options allows you to pick what command you wish to call when you view input or output files. This is an advanced feature and the defaults should work fine in most cases. Again, see the tour for more details on these steps.
Starting PLINK jobsBy this stage, you have started the project and configured gPLINK. You only need to configure gPLINK once at the start of each project. To initiate a PLINK job, select the appropriate menu option from the PLINK menu. A dialog will be shown, in which you must:
Viewing output filesYou can view the result files by expanding the tree for the desired operation, going to the list of output files and right-clicking on the entry. You will be given a choice to view the file in the default viewer or an alternate viewer. The default viewer depends on the machine type: it will be WordPad, TextEdit or emacs for Windows, Mac and Linux/other systems respectively. The alternate viewer can be set to anything (e.g. Excel) via the Configuration menu option.
Integration with HaploviewIf it is an appropriate file type, and if in the Configuration panel you pointed gPLINK to an instance of Haploview (version 4), then an Open in Haploview option will also appear. For results files, this will bring up the results-viewer panel of Haploview. You can filter and sort results here as well as merging multiple result files together. You can also generate plots of results very easily (which are interactive in the sense that if you hover the mouse over a point, it will tell you which SNP, or individual, the point represents; clicking on the point will take you to the relevant entry in the results table). It is also possible to extract subsets of your whole genome SNP data files for viewing in Haploview (i.e. viewing data rather than results of PLINK runs). Use the Data management -> Generate fileset -> Haploview fileset option for this. Then right-clicking on either the .info or .ped file and selecting to view in Haploview will load the data into Haploview. Note: use the filters to select manageable subsets of the data for viewing in Haploview (i.e. restrict the number of SNPs you wish to include).
Miscellanea and known issuesHow do I kill a PLINK process initiated by gPLINK? When you run a PLINK command from gPLINK a separate PLINK program is started independent of gPLINK. This means that you can close gPLINK and your operation will run to completion. It also means that if you decide to kill your PLINK operation you will need to do so through your operating system (for example kill in Linux or through Task Manager in Windows). Manual Rescan folder option The menu option Rescan folder checks the project folder for files created by PLINK commands. Typically this rescanning is performed automatically, for both local and remote projects, every couple of seconds or so. This option is for the inpatient, therefore. Quirks, known issues In no particular order:
Download and installationYou first need up-to-date versions of Java, PLINK and Haploview on your computer. Please follow all 4 of these steps:
java -jar gPLINK.jarwill start gPLINK (you must be in the directory where gPLINK.jar is, or specify it's location explicitly; likewise, java must be in your path; please ask your system administrator if you have problems with this). Source Code For interested developers the source code can be found here (gPLINK-2.024-src.zip). Note that most users do not need the source code!
ScreenshotsMain interface, tracking a project
Common PLINK analysis options available via pull-down menus
Set analysis parameters, filters and thresholds