1. Introduction
2. Basic information
3. Download and general notes
|
Rare variant burden testsThis page describes the methods currently in PLINK to evaluate genes or sets of SNPs for association with disease, with a focus on the burden of rare and low-frequency mutations as assayed by resequencing studies.The primary method, the variable-threshold test, is described in Price et al (American Journal of Human Genetics, 2010). Please refer to that manuscript for methodological details, and cite that manuscript as well as PLINK if you use this test in a publication. Variable threshold testThis test is appropriate for samples of unrelated individuals, to assess association between multiple rare and low-frequency variants in one or more genes (or, more generically, any set of variants) and a dichotomous or quantitative phenotype.For each set tested, the threshold for including variants in the test (e.g. all variants below 1% minor allele frequency) is automatically optimised. False positive rates are controlled by use of a permutation procedure, that repeats the same optimisation for each permuted dataset. Weights for each variant (e.g. from PolyPhen2) can be included. Currently other covariates are not supported. To perform a variable-threshold burden test, the basic command is --vt-test, along with the number of permutations to be performed: plink --bfile mydata --vt-test --mperm 10000As with all PLINK commands, the genotype data can be loaded into PLINK from a variety of formats: in this example we've loaded a binary fileset (--bfile), but we could also use any of the formats described here. Note The --mperm command, which specifies permutations to be applied, is required for this test. This analysis produces three output files: the fileplink.vthas the fields (for a quantitative trait): SET Name of the set (or ALL if no sets specified) NSNP Total number of SNPs in the set TC Threshold for inclusion, number of minor alleles TF Threshold for inclusion, as minor allele frequency NSNP2 Number of SNPs included in the test (i.e. below threshold) CNT1 Number of individuals with at least 1 included rare variant CNT0 Number of individuals with no included rare variants MEAN1 Phenotypic mean for individuals with 1 or more rare variants MEAN0 Phenotypic mean for individuals with no included rare variants whereas for a case/control outcome would list CNTA and CNTU instead, which are the number of alleles from included variants for cases and controls respectively. For each set, the variants included in the test (i.e. passing the frequency threshold) are listed in the file: plink.vt.varwhich contains the fields SET Name of set SNP Name of each included SNP WGT Weight (if any) for this variant CNT Number of minor alleles observed in sample F Minor allele frequency of this variant ATTRIB Any attributes specified (see below)Finally, the file plink.vt.mpermcontains the fields SET Set name EMP1 Empirical p-value for this setNOTE By default, the test is 1-sided test, and assumes that a risk allele increases a quantitative score (or increases risk for disease). The command --vt-test-lowwill reverse this. HINT As with other PLINK tests, for stratified samples the --within command (that takes a cluster file) can be used to constrain the permutation (swapping of phenotypes) to be only between individuals within the same cluster (thereby preserving any between cluster association with phenotype and or genotype). Defining setsSets define the groups of SNPs to which the test is applied, which will commonly be genes or groups of genes. As described here and here, any PLINK option for this can be used, e.g.--set myset.setwhere myset.set is a file GENE1 snp1 snp2 END GENE2 snp2 snp3 snp4 ENDor, alternatively, --make-set myset.datwhere myset.dat is in format of each line containing four fields: CHR BP1 BP2 GENE-NAMEThis second form can be combined with --make-set-border to specify a kb interval around each gene to be included. If no sets are specified, then all SNPs in the file will be included in the test; otherwise, the test will be performed separately for each set. Variant weightsWeights can be applied to each variant, for example, to represent the probability that a missense variant has a deleterious impact on protein function.plink --bfile mydata --vt-test --mperm 10000 --weights myweights.txtWe assume weights are coded on scale of 0 to 1, with a lower number meaning a lower weight. Weights will be censored at these values. If the --weights command is included, then variants in the dataset but not listed in the weight file will be assigned a defaul weight of 0. The format of the weights file is a text file with exactly two entries per line:SNP WEIGHTIf the command --ppweightsis used instead of --weights, the behavior is identical except that PLINK will assume these are weights from PolyPhen2 and apply an adjustment to the weight for common SNPs (above 1% MAF), by setting it equal to 0.5 if the weight is less than 1.0. AttributesOptionally, attributes can be used to filter the dataset (described here), or to append information to the results for the variable-threshold test. Specifically:--attrib {file} Include attrib info as column in plink.vt.var --filter-attrib {file} {attrib} Pre-filter on attributes, e.g. only missense SNPsAttribute files should have the format of one SNP per line, starting with the SNP ID and then a list of whitespace-delimited attributes: SNP attrib1 attrib2 ...For example, with the file mysnps.txt rs00001 missense rs00002 missense rs00003 synon rs00005 nonsensethe command plink --bfile mydata --vt-test --mperm 10000 --attrib mysnps.txtwould append to the plink.vt.var file this information where presentSET SNP WGT CNT F ATTRIB ALL rs00001 1 2 0.0002877 missense ALL rs00002 1 1 0.0001438 missense ALL rs00003 1 1 0.0001438 synon ALL rs00004 1 1 0.0001438 ALL rs00005 1 3 0.0004315 nonsense ALL rs00006 1 10 0.001438 ...whereas adding the option --filter-attrib myssnps.txt missensewould only include SNPs with the missense atttribute in analysis. Attributes are defined by the user rather than being hard-coded into PLINK (i.e. and so could represent any type of meta-information or be coded differently, i.e the labels Mis, M, etc, could be used instead of, or as well as, missense). An example of alternative input formatsHere we illustrate an alternate input format for the rare variant tests (but that is also applicable to any PLINK analysis): the case in which one only has a list of minor/non-reference allele counts, for example, where each line of a file represents the fields:FID IID SNP-ID ALLELE-COUNTHere, one could use the following: ./plink --lfile data1 --reference data1.ref --allele-count { etc ... }where we expect a FAM, MAP, LGEN and reference file as described here.Other rare-variant burden testsTwo other tests that are similar to the variable-threshold test are also available in this same framework. If, instead of --vt-test, you specify--fw-testthen PLINK will perform the frequency-weighted Madsen/Browning test. If you instead specify, --rv-test 0.01then PLINK will perform a fixed-threshold rare-variant test: in this example, all minor alleles with sample frequency below 1% would be included. The --weights (or --ppweights) command can be combined with both of these tests. For the frequency-weighted test, the weight for each variant becomes the product of the frequency-weight and the user-specified weight. The --vt-test-low command still applies for these two tests, also. Currently, these commands only produce an empirical p-value output file, named either plink.fw.mperm or plink.rv.mperm. |