PLINK: Whole genome data analysis toolset plink...
Last original PLINK release is v1.07 (10-Oct-2009); PLINK 1.9 is now available for beta-testing

Whole genome association analysis toolset

Introduction | Basics | Download | Reference | Formats | Data management | Summary stats | Filters | Stratification | IBS/IBD | Association | Family-based | Permutation | LD calcualtions | Haplotypes | Conditional tests | Proxy association | Imputation | Dosage data | Meta-analysis | Result annotation | Clumping | Gene Report | Epistasis | Rare CNVs | Common CNPs | R-plugins | SNP annotation | Simulation | Profiles | ID helper | Resources | Flow chart | Misc. | FAQ | gPLINK

1. Introduction

2. Basic information

3. Download and general notes

4. Command reference table

5. Basic usage/data formats 6. Data management

7. Summary stats 8. Inclusion thresholds 9. Population stratification 10. IBS/IBD estimation 11. Association 12. Family-based association 13. Permutation procedures 14. LD calculations 15. Multimarker tests 16. Conditional haplotype tests 17. Proxy association 18. Imputation (beta) 19. Dosage data 20. Meta-analysis 21. Annotation 22. LD-based results clumping 23. Gene-based report 24. Epistasis 25. Rare CNVs 26. Common CNPs 27. R-plugins 28. Annotation web-lookup 29. Simulation tools 30. Profile scoring 31. ID helper 32. Resources 33. Flow-chart 34. Miscellaneous 35. FAQ & Hints

36. gPLINK
 

Gene reporting tool

The functions listed here are designed to provide a quick and easy way to partition any PLINK results file that indexes SNPs based on chromosome and base-pair position in terms of genes.

Basic usage

The basic command to produce a gene-centric report of single SNP results, for example from run1.assoc, is
./plink --gene-report run1.assoc --gene-list glist-hg18

which assumes the file run1.assoc will have a standard header row containing the fields CHR and BP, which it will if it was created by the PLINK --assoc command previously. It is not necessary that the original genotype filesets be present when running this command.

The gene list, glist-hg18, should a standard text file in the following format: one row per gene, chromosome, start and stop positions (base-pair) and then gene name, e.g.
     7 20140803 20223538 7A5
     19 63549983 63556677 A1BG
     10 52236330 52315441 A1CF
     8 43266741 43337485 A26A1
     15 19305252 19336667 A26B1
     21 13904368 13935777 A26B3
     ...
These files are available for download from the resources section of this web-site.

This generates a file
      plink.range.report
which simply takes the lines of the results file, and lists them by the genes specified in the gene-list file. The listing is alphabetical by gene name. For example,

     ACO2 -- chr22:40195074..40254939 ( 59.865kb ) 

           DIST  CHR        SNP         BP   A1      F_A      F_U   A2        CHISQ            P           OR 
        13.22kb   22  rs2267435   40208294    3   0.3958   0.3537    1       0.3351       0.5627        1.197 
        24.84kb   22  rs2076196   40219909    1   0.3333   0.2683    3       0.8852       0.3468        1.364 
        57.13kb   22  rs1810460   40252200    4  0.04167  0.07317    2       0.8278       0.3629       0.5507 


     ADORA2A -- chr22:23153529..23168325 ( 14.796kb ) 

           DIST  CHR        SNP         BP   A1      F_A      F_U   A2        CHISQ            P           OR 
        11.14kb   22  rs5760423   23164672    4   0.4592   0.4024    3       0.5854       0.4442        1.261 

etc, which shows the lines of run1.assoc split by the genes the SNPs fall in. In this case, the first gene is ACO2; the location based on glist-hg18 is specified, along with the length. Then the SNPs within this gene are listed. If genes overlap, then the SNPs will be listed more than once. If a SNP does not fall within any gene or region specified, then it will not be listed here.

The first field, DIST is added, which represents the distance from the start position of the gene. (Note: if a border is added, with --gene-list-border, see below, then DIST can be negative, i.e. representing that the SNP is before the actual start of the gene.)

Naturally, the regions listed in the --gene-list file do not have to correspond to actual genes -- for example, they might correspond to known linkage peaks, or regions with disease-related copy number variants, etc.

Other options

The following options modify this procedure:
     --pfilter 0.01
will list only SNPs with p-values less than 0.01. This requires that the results file has a field labelled P in the header row.

The additional command
     --gene-list-border 20
will add a 20kb border to the start and stop of each gene listed in the gene file.

The additional command
     --gene-subset candidate.list
will make a report extracting only the genes listed in candidate.list from the file specified by --gene-list. For example, if the file candidate.list contained two schizophrenia candidate genes,
     DISC1
     COMT
then (assuming the genes listed here match a row in the gene-list file, glist-hg18)
plink --gene-report run1.assoc 
      --gene-list glist-hg18 
      --gene-subset candidate.list 
      --pfilter 0.05 
      --gene-list-border 50

will only report nominally significant (P=0.05) SNPs within or near (+/- 50kb) these two genes. This is designed to be a more convenient way to quickly query a focussed set of genes, so one can keep only a single, central gene-list file.

 
This document last modified Wednesday, 25-Jan-2017 11:39:28 EST