|
Genetic Power Calculator
|
S. Purcell & P. Sham, 2001-2009
This site provides automated power analysis for variance components
(VC) quantitative trait locus (QTL) linkage and association tests in
sibships, and other common tests. Suggestions, comments, etc to Shaun Purcell.
If you use this site, please reference the following Bioinformatics
article:
Purcell S, Cherny SS, Sham PC. (2003) Genetic Power Calculator:
design of linkage and association genetic mapping studies of complex
traits. Bioinformatics, 19(1):149-150.
Instructions for power calculations
|
VC model calculations are based upon formula derived in
Sham et al (2000)
[AJHG, 66, 1616-1630]. Users of this site who are unsure of the
nature of the VC tests and power calculations are strongly
advised to consult this article.
A genetic model for a single diallelic QTL is specified in terms of
- Variance components: For the linkage test, four variance
components are entered for additive and dominant QTL components, and
shared and nonshared residual components. These are entered as
absolute variances (i.e. they don't need to be entered as
proportions). For the association test, the total proportion
of trait variance attributable to QTL is given (i.e. between 0 and 1).
- Recombination fraction: For tests that involve families, the
recombination fraction between the marker and QTL is sometimes
required. This ranges from 0 (no recombination) to 0.5 (unlinked
loci).
- Sample Size: Sample size can range from 2 to
10,000,000.
- Sibship Size: Sibship size can range from 1 in the case of
association (in which case the within sibship test is undefined), or 2
in the case of linkage, up to 8 (linkage) or 5 (association).
In addition, the VC association test requires some extra parameters:
- Dominance-to-additive effects ratio: This also ranges between 0 and 1,
0 represents no dominance (i.e. all QTL effects due to the
additive effects of alleles), 1 represents complete dominance, and 0.5
represents an equal contribution to the trait variance of both additive
and dominance effects. The default is to test for dominance
(a 2 df test). If the 'no dominance' box is checked, then
the d-to-a ratio is set to zero and an additive effects
only (1 df) test is used to calculate the power.
- Increaser allele frequency: The frequency of the trait increasing
allele is specified, 0 < p < 1. We assume biallelic loci.
- Marker allele '1' frequency: The frequency of the marker allele which
is in disequilibrium with the trait increasing allele (0 < m1 < 1,
biallelic marker)
- Linkage disequilibrium (LD): The amount of LD between the
QTL and the marker is specified as D-prime (0 < d < 1). A value
of 0 indicates that the two loci are in complete equilibrium, whereas
1 represents the highest amount of disequilibrium possible is present
(this amount depends of the relative allele frequencies of QTL and
marker - i.e. complete disequilibrium could never be observed if the
allele frequencies are different at QTL and marker). D-prime is
related to other common measures of LD, such as R-squared (the square
root of D-prime -- see the abovementioned article for further detail).
- Sibling correlation: Specifying the sibling correlation
(0 < r < 1)
determines the relative balance of shared and nonshared residual variance
for sibships. Note that invalid results might be obtained if the sibling
correlation is specified to be less than half the QTL variance (i.e.
assuming d:a = 0, then the sibling correlation must at least be half
the variance due to the additive effects of the QTL).
Given these parameters, the program outputs the expected
non-centrality parameter given the genetic model and the sample
specification. The power of detecting a QTL effect is given for the
different levels of type I error rate (including the 'user-defined'
type I error rate). Also, the sample size needed to obtain the
'user-defined' power is given for each level of type I error rate
(alpha). Please note that the VC tests are one-sided tests.
Instructions for VC QTL linkage conditional on trait
For sample of sibships measured on a quantitative normally-distributed
trait, the expected contribution to the sample noncentrality parameter
(NCP) is calculated, conditional on their trait scores, the sample
residual correlation and the QTL effect size.
Simply paste the trait scores into the text windows (no other
variables, whitespace delimited, also no trailing whitespace
please). Enter also the residual sibling correlation (0-1) and the
desired QTL variance (0-1).
Output consists of the trait scores returned but with an additional
column added (first column) that represents the expected contribution
to the sample NCP. This is an index of potential informativeness of
that sibship: the higher the expected NCP, the more informative the
sibship. Sibships can therefore be rank-ordered by this index for
selective genotyping. This index can also be summed over all sibships
to give the sample NCP, from which power can be calculated.
Instructions for discrete trait TDT power calculator
The main parameters that must be specified by the user are
- High risk allele frequency, for 'A' allele. Typically, this
would be rare, say under 0.10.
- The disease prevalence in the general population (K).
- The genotypic relative risks for the 'Aa' and 'AA' genotypes
relative to the baseline 'aa' genotype risk. This risk is
calculated from the parameters above. That is, the
prevalence K equals f(AA)r(AA) + f(Aa)r(Aa) + f(aa)r(aa) where
f() is the genotype frequency and r() is the genotypic risk, or
P(disease|genotype). Rearranging gives a formula for r(aa) in
terms of the other parameters. The genotypic relative risks for
the 'Aa' and 'AA' genotypes equal r(Aa)/r(aa) and r(AA)/r(aa)
respectively.
- Sample size - the total number of parent-child trios.
The output gives the baseline genotypic risk r(aa) and also the
genotypic odds ratios for the 'Aa' and 'AA' genotypes (will be
very similar to the genotypic relative risks for rare diseases).
The expected number of heterozygous (i.e. informative) parents
per family is given and also the transmission probabilities of
the two alleles for affected offspring. The deviation of these
two probabilities from 50:50 is the basis for the TDT statistic.
Power is given for various values of alpha for the user-specified
sample size. Also, required sample size is given for various
levels of alpha for the user-specified power.
Instructions for discrete trait Case-Control power calculator
See the instructions above for discrete TDT for
a description of most of the model parameters. One difference is that
this procedure is for a marker B in linkage disequilibrium with the
test locus A. To specify power at the test locus, set the LD measure
(d-prime) to 1 and the allele frequencies of A and B equal.
As well as specifying the number of affected individuals (cases) the
user must specify the control:case ratio. If this equals 1, then
there are as many controls as cases. If this were 0.5, and there were
200 cases, there would be 100 controls, etc.
The output also gives the expected allele and genotype frequencies
for cases and controls. A chi-squared test statistic (and associated
power at alpha=0.05) is given for a test of Hardy-Weinberg equilibrium
in cases and controls (the presence of H-W disequilibrium in cases but
not controls can be indicative of an association).
Additionally, the haplotype frequencies and implied r-squared (measure of LD)
are displayed, along with the D-prime measure input by the user.
The rest of the output is as described above - note that the number
of cases for a specific power refers to the number of affected
individuals along with the appropriate number of controls as
specified by the control:case ratio.
For this module, we also consider the performance of genotypic tests
that assume either a recessive, dominant or general (2 df) model, as
well as the standard allelic test (which is printed last). For most
purposes, only this last allelic test (the label is highlighted in red)
will be of interest.
Instructions for quantitative trait TDT power calculator
The specification of parameters is somewhat different in the
case of quantitative traits. Firstly, effects are expressed
in terms of variance components rather than genotypic relative
risks. Secondly, it is possible to consider the scenario where
the test locus is a marker in LD with the true QTL.
The total QTL variance should range between 0 and 1. For example,
0.05 represents a QTL that accounts for 5% of the phenotypic
variance. For no dominance, set dominance:additive QTL effects
to 0, or 1 for complete dominance. As well as specifying
the allele frequency for the biallelic QTL, one must specify
the marker frequency for the biallelic marker, as well as
D-prime (the proportion of possible LD present) between
marker and QTL.
If the test locus were the QTL, equate p and m1 and set
d-prime to 1.
The case threshold refers to the value of a standard normal
scale above which offspring are ascertained as cases. For
example, if only individuals who score 2 standard deviations
above the mean (assuming a normally distributed trait) are
ascertained, then the case threshold equals 2. Always use the
absolute threshold value, even if cases are defined as scoring
less than a negative threshold.
The recombination fraction (0 for complete linkage to 0.5 for
no linkage) is always required. Typically, this would be
set near 0.
Instructions for quantitative trait Case-Control power calculator
See the quantitative TDT notes above for a
description of the main parameters. The number of cases and the
control:case ratio (see notes above) specify the
sample size. The thresholds specify where the cases and controls are
approximately sampled from on a standard normal scale. If cases are
selected as scoring over 2 standard deviations above the mean, the
lower case threshold would be 2, the higher case threshold could be
set to something like 5 or 6 (setting it even higher would not
influence results as we would not expect to see any individuals
scoring so high usually). If controls were sampled as being 'average',
selecting out extreme scorers, one might specify -1 and +1 as the
lower and upper control thresholds. If an equal number of controls as
cases were sampled, the control:case ratio would be set to 1 also.
Various misc. utilities
The following utilities are undocumented and no longer supported
Variance Components - Relative Risk Conversion
Two locus QTL linkage (means)
Two locus QTL linkage (effects)
Epistasis: genotypic means -> effects