PLINK: Whole genome data analysis toolset plink...
Last original PLINK release is v1.07 (10-Oct-2009); PLINK 1.9 is now available for beta-testing

Whole genome association analysis toolset

Introduction | Basics | Download | Reference | Formats | Data management | Summary stats | Filters | Stratification | IBS/IBD | Association | Family-based | Permutation | LD calcualtions | Haplotypes | Conditional tests | Proxy association | Imputation | Dosage data | Meta-analysis | Result annotation | Clumping | Gene Report | Epistasis | Rare CNVs | Common CNPs | R-plugins | SNP annotation | Simulation | Profiles | ID helper | Resources | Flow chart | Misc. | FAQ | gPLINK

1. Introduction

2. Basic information

3. Download and general notes

4. Command reference table

5. Basic usage/data formats 6. Data management

7. Summary stats 8. Inclusion thresholds 9. Population stratification 10. IBS/IBD estimation 11. Association 12. Family-based association 13. Permutation procedures 14. LD calculations 15. Multimarker tests 16. Conditional haplotype tests 17. Proxy association 18. Imputation (beta) 19. Dosage data 20. Meta-analysis 21. Annotation 22. LD-based results clumping 23. Gene-based report 24. Epistasis 25. Rare CNVs 26. Common CNPs 27. R-plugins 28. Annotation web-lookup 29. Simulation tools 30. Profile scoring 31. ID helper 32. Resources 33. Flow-chart 34. Miscellaneous 35. FAQ & Hints

36. gPLINK
 

SNP scoring routine

PLINK provides a simple means to generate scores or profiles for individuals based on an allelic scoring system involving one or more SNPs. One potential use would be to assign a single quantitative index of genetic load, perhaps to build multi-SNP prediction models, or just as a quick way to identify a list of individuals containing one or more of a set of variants of interest.

Basic usage

The basic command to generate a score is the --score option, e.g.
./plink --bfile mydata --score myprofile.raw

which takes as a parameter the name of a file (here myprofile.raw) that describes the scoring system. This file has the format of one or more lines, each with exactly three fields
     SNP ID
     Reference allele
     Score (numeric)
for example
     SNPA   A    1.95
     SNPB   C    2.04
     SNPC   C   -0.98
     SNPD   C   -0.24
These scores can be based on whatever you want. One choice might be the log of the odds ratio for significantly associated SNPs, for example. Then, running the command above would generate a file
     plink.profile
with one individual per row and the fields:
     FID     Family ID
     IID     Individual ID
     PHENO   Phenotype for that
     CNT     Number of non-missing SNPs used for scoring
     CNT2    The number of named alleles
     SCORE   Total score for that individual
The score is simply a sum across SNPs of the number of reference alleles (0,1 or 2) at that SNP multiplied by the score for that SNP. For, example,
     Variant(1/2)          A/T         C/G         A/C        C/G   
     Freq. of allele 1     0.20        0.43        0.02       0.38

     Ind 1 genotype        A/A         G/G         A/C        0/0
     # ref alleles          2           0           1         2*0.38 (=expectation)

     Score           (  2*1.95    +   0*2.04  +  1*(-0.98) +  2*0.38*(-0.24) ) / 4
                    =    2.74 / 4   =  0.68
The score 2.74/4 (the average score per non-missing SNP) could then be used, e.g. as a covariate, or a predictor of disease if it is scored in a sample that is independent from the one used to generate the original scoring weights. Obviously, a score profile based on some effect size measure from a large number of SNPs will necessarily be highly correlated with the phenotype in the original sample: i.e. this in no (straightforward) way provides additional statistical evidence for associations in that sample.

Multiple scores from SNP subsets

To calculate multiple scores from subsets of SNPs in a single --score file, it is possible to use the two commands, each followed by a filename, e.g.
     --q-score-file snpval.dat
     --q-score-range q.ranges
in addition to --score, where snpval.dat is a file that contains for each SNP a number (e.g. that might be the p-value from some test)
     rs00001  0.234
     rs00002  0.046
     rs00003  0.887
     ...
and q.ranges is a file in which each row corresponds to a different score, containing a label, then a lower and upper bound for the values as given in the other file, e.g.
     S1  0.00 0.01
     S2  0.00 0.20
     S3  0.10 0.50
would create three score files,
     plink.S1.profile
     plink.S2.profile
     plink.S3.profile
in which the first only uses SNPs that have a value in snpval.txt between 0.0 and 0.01; the second uses only SNPs which have a value between 0.00 and 0.20, etc.

Misc. options

By default, if a genotype in the score is missing for a particular individual, then the expected value is imputed, i.e. based on the sample allele frequency. To change this behavior, add the flag
     --score-no-mean-imputation
which means the above example would be calculated as
     Score           (  2*1.95    +   0*2.04  +  1*(-0.98)  ) / 3
                    =    2.92 / 3   =  0.97
 
This document last modified Wednesday, 25-Jan-2017 11:39:27 EST