1. Introduction
2. Basic information
3. Download and general notes
4. Command reference table
5. Basic usage/data formats
6. Data management
7. Summary stats
8. Inclusion thresholds
9. Population stratification
10. IBS/IBD estimation
11. Association
12. Family-based association
13. Permutation procedures
14. LD calculations
15. Multimarker tests
16. Conditional haplotype tests
17. Proxy association
18. Imputation (beta)
19. Dosage data
20. Meta-analysis
21. Annotation
22. LD-based results clumping
23. Gene-based report
24. Epistasis
25. Rare CNVs
26. Common CNPs
27. R-plugins
28. Annotation web-lookup
29. Simulation tools
30. Profile scoring
31. ID helper
32. Resources
33. Flow-chart
34. Miscellaneous
35. FAQ & Hints
36. gPLINK
|
|
SNP annotation database lookup
This page describes PLINK's ability to output basic annotation information
on SNPs on common WGAS genotyping platforms, via a web-based lookup
function.
The SNP annotation data were compiled by
Patrick Sullivan's lab;
the original data files are available here.
NOTE All gene names must be HUGO standard gene
names . For example, the serotonin transporter is SLC6A4 (not HTT or
SERT).
If you use these annotations in a publication, include the following
sentence and corresponding references:
Using the PLINK retrieval interface, SNP annotations were created using
the TAMAL database (1) based chiefly on UCSC genome browser files (2),
HapMap (3), and dbSNP (4).
- Hemminger BM, Saelim B, Sullivan PF. TAMAL: An integrated approach to
choosing SNPs for genetic studies of human complex traits.
Bioinformatics 2006;22:626-7.
- Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson
H, Diekhans M, Furey TS, Harte RA, Hsu F, Hillman-Jackson J, Kuhn RM,
Pedersen JS, Pohl A, Raney BJ, Rosenbloom KR, Siepel A, Smith KE, Sugnet
CW, Sultan-Qurraie A, Thomas DJ, Trumbower H, Weber RJ, Weirauch M,
Zweig AS, Haussler D, Kent WJ. The UCSC Genome Browser Database: update
2006. Nucleic Acids Res 2006;34:D590-8.
- Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly
P. A haplotype map of the human genome. Nature 2005;437:1299-320.
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V,
Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Helmberg W,
Kapustin Y, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR,
Ostell J, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST,
Sirotkin K, Souvorov A, Starchenko G, Suzek TO, Tatusov R, Tatusova TA,
Wagner L, Yaschenko E. Database resources of the National Center for
Biotechnology Information. Nucleic Acids Res 2006;34:D173-80.
Basic usage for SNP lookup function
The basic command is, for example,
plink --lookup rs1475515
which outputs to the LOG file the following information
PLINK-SNP (WGAS SNP annotation courtesy of Patrick Sullivan)
Connecting to web...
SNP ID : rs1475515
Affy ID :
Affy 5.0 : no
Affy 6.0 : no
Perlegen ID :
Perlgen 600 : no
Illumina 650 : yes
Illumina 550 : no
Non-syn SNP : no
SNP Error : no
SNP Pos Duplication : no
Chromosome : 1
Strand : -
HG17 Position (bp) : 228459232
HG18 Position (bp) : 230219120
Pseudo-autosomal region? : N/A
NCBI reference allele : T
UCSC reference allele : A
Observed alleles : C/T
Human alleles : C/T
Predominant human allele : A
Chimp allele : T
Macaque allele : T
dbSNP MAF : 0.038
HapMap CEU MAF : 0
HapMap ASI MAF : 0
HapMap YRI MAF : 0.15
HapMap CEU Strand : -
HapMap CEU Allele : C
HapMap ASI Allele : C
HapMap YRI Allele : C
In gene transcript :
In gene coding region :
Nearby Genes(KB distance) :
Segmental duplication? : no
Copy Number Variant? : no
Conservation >95% pctile? : no
Conservation >99% pctile? : no
Disease-causing region? : no
miRNA target? (TargetScan) : no
miRNA target? (PicTAR) : no
Regulatory potential? : yes
Promotor region? (Stanford) : no
Promotor region? (firstEF) : no
Transfactor binding site : no
Enhancer? : no
Exon? : no
Consensus splice site? : no
5' UTR? : no
3' UTR? : no
------------------------------------------------
To perform a lookup query on a batch of SNPs rather than 1 at a time, use the
command
plink --lookup-list hits.list
where hits.list is just a list of SNP IDs (RS numbers); this will
generate a file
plink.snp.annot
containing multiple reports of the above kind. There is a limit to the number of
SNPs that can be submitted at one time (currently 200).
Gene-based SNP lookup
It is possible to dump all SNPs in a gene with the command
plink --lookup-gene DISC1
which does two things: writes some gene-centric informationto the LOG file, and
lists all the SNPs that feature on common WGAS platforms to the file
plink.snp.list
By default, SNPs within 20kb upstream and downstream of the gene are recorded. To change this,
add the command
--lookup-gene-kb 0
or
--lookup-gene-kb 100
for example.
In the information written to the LOG file, there is a strong bias
towards neuropsychiatrically-relevant information, reflecting the
research interests of the creator. For example, the output
for DISC1 is: (note: there are a few relatively redundant or
uninformative fields currently that will be removed in future
releases)
Looking up gene information (and SNPs +/- 20 kb)
Connecting to web... Writing SNP details to [ plink.snp.list ]
Gene Name : DISC1
Product : disrupted in schizophrenia 1 isoform Es
Entry : 1
CCDS Name : CCDS31056.1
KG ID : uc001hux.1
SwissProt ID : Q9NRI5-4
Hugo ID : 2888
Hugo alias :
Hugo old gene names :
Has gene name? : no
HG18 strand : +
HG18 chrom : 1
HG18 TX Start : 229829236
HG18 TX End : 229924970
HG18 CDS Start : 229829236
HG18 CDS End : 229924970
HG18 TX Length : 95734
HG18 TX Length Percentile : 96
HG17 strand : -
HG17 chrom : 0
HG17 TX Start : 0
HG17 TX End : 0
HG17 CDS Start : 0
HG17 CDS End : 0
HG17 TX Length : 0
Has HG17 pos : no
mRNA accession numbers : NM_001012958.1 ENST00000317586 OTTHUMT00000092355
Protein accession numbers : NP_001012976.1 ENSP00000320784 OTTHUMP00000035959
Pseudoautosomal HG18 : no
Pseudoautosomal HG17 : no
Brain expressed 50th percentile : yes
Brain expressed 75th percentile : yes
Correlated cortex expression : NA
Correlated lymphoblastoid expression : yes
Number association studies from SZGene : 20
Annotation from SLEP database : ? Schizophrenia [PMID=16033310]/{Schizoaffective
: disorder, susceptibility to}, 181500 (3) [OMIM=605210]
: /{Schizophrenia, susceptibility to}, 604906 (3) [OMIM=605210]
Association studies from GAD database : psych (16)
----------------------------------------------------------
It is possible to supply a list of genes to lookup, with the command
plink --lookup-gene-list mygenes.txt
that will dump the SNPs from multple genes in
a SET file format, e.g. where the file
mygenes.txt
is something like
COMT
DISC1
CACNA1C
...
These could then be subsequently extracted with the command
--extract plink.snp.list
as the END comments and gene names will just be ignored if
these are not SNP IDs in the MAP file.
Description of the annotation information
For a detailed description of the annotation fields and how they were compiled,
please see Patrick Sullivan's PDF
|
|