1. Introduction
2. Basic information
3. Download and general notes
4. Command reference table
5. Basic usage/data formats
6. Data management
7. Summary stats
8. Inclusion thresholds
9. Population stratification
10. IBS/IBD estimation
11. Association
12. Family-based association
13. Permutation procedures
14. LD calculations
15. Multimarker tests
16. Conditional haplotype tests
17. Proxy association
18. Imputation (beta)
19. Dosage data
20. Meta-analysis
21. Annotation
22. LD-based results clumping
23. Gene-based report
24. Epistasis
25. Rare CNVs
26. Common CNPs
27. R-plugins
28. Annotation web-lookup
29. Simulation tools
30. Profile scoring
31. ID helper
32. Resources
33. Flow-chart
34. Miscellaneous
35. FAQ & Hints
36. gPLINK
|
|
Gene reporting tool
The functions listed here are designed to provide a quick and easy way
to partition any PLINK results file that indexes SNPs based on
chromosome and base-pair position in terms of genes.
Basic usage
The basic command to produce a gene-centric report of single SNP results, for example from run1.assoc, is
./plink --gene-report run1.assoc --gene-list glist-hg18
which assumes the file run1.assoc will have a standard header row containing the fields
CHR and BP, which it will if it was created by the PLINK --assoc command
previously. It is not necessary that the original genotype filesets be present when running this command.
The gene list, glist-hg18, should a standard text file in the
following format: one row per gene, chromosome, start and stop
positions (base-pair) and then gene name, e.g.
7 20140803 20223538 7A5
19 63549983 63556677 A1BG
10 52236330 52315441 A1CF
8 43266741 43337485 A26A1
15 19305252 19336667 A26B1
21 13904368 13935777 A26B3
...
These files are available for download from the resources section of this web-site.
This generates a file
plink.range.report
which simply takes the lines of the results file, and lists them by the genes specified in the gene-list file. The listing is
alphabetical by gene name. For example,
ACO2 -- chr22:40195074..40254939 ( 59.865kb )
DIST CHR SNP BP A1 F_A F_U A2 CHISQ P OR
13.22kb 22 rs2267435 40208294 3 0.3958 0.3537 1 0.3351 0.5627 1.197
24.84kb 22 rs2076196 40219909 1 0.3333 0.2683 3 0.8852 0.3468 1.364
57.13kb 22 rs1810460 40252200 4 0.04167 0.07317 2 0.8278 0.3629 0.5507
ADORA2A -- chr22:23153529..23168325 ( 14.796kb )
DIST CHR SNP BP A1 F_A F_U A2 CHISQ P OR
11.14kb 22 rs5760423 23164672 4 0.4592 0.4024 3 0.5854 0.4442 1.261
etc, which shows the lines of run1.assoc split by the genes the SNPs fall in. In this case, the first
gene is ACO2; the location based on glist-hg18 is specified, along with the length. Then the
SNPs within this gene are listed. If genes overlap, then the SNPs will be listed more than once. If a SNP does
not fall within any gene or region specified, then it will not be listed here.
The first field, DIST is added, which represents the distance from the start position of the gene. (Note: if
a border is added, with --gene-list-border, see below, then DIST can be negative, i.e. representing
that the SNP is before the actual start of the gene.)
Naturally, the regions listed in the --gene-list file do not have to correspond to actual genes -- for
example, they might correspond to known linkage peaks, or regions with disease-related copy number variants, etc.
Other options
The following options modify this procedure:
--pfilter 0.01
will list only SNPs with p-values less than 0.01. This requires that
the results file has a field labelled P in the header row.
The additional command
--gene-list-border 20
will add a 20kb border to the start and stop of each gene listed in the gene file.
The additional command
--gene-subset candidate.list
will make a report extracting only the genes listed
in candidate.list from the file specified by
--gene-list. For example, if the
file candidate.list contained two schizophrenia candidate
genes,
DISC1
COMT
then (assuming the genes listed here match a row in the gene-list file, glist-hg18)
plink --gene-report run1.assoc
--gene-list glist-hg18
--gene-subset candidate.list
--pfilter 0.05
--gene-list-border 50
will only report nominally significant (P=0.05) SNPs within or near
(+/- 50kb) these two genes. This is designed to be a more convenient
way to quickly query a focussed set of genes, so one can keep only a
single, central gene-list file.
|
|