Genetic Power Calculator

S. Purcell & P. Sham, 2001-2009

This site provides automated power analysis for variance components (VC) quantitative trait locus (QTL) linkage and association tests in sibships, and other common tests. Suggestions, comments, etc to Shaun Purcell.

If you use this site, please reference the following Bioinformatics article:

Purcell S, Cherny SS, Sham PC. (2003) Genetic Power Calculator: 
design of linkage and association genetic mapping studies of complex 
traits. Bioinformatics, 19(1):149-150.

Modules

Case-control for discrete traits	Notes
Case-control for threshold-selected quantitative traits	Notes
QTL association for sibships and singletons	Notes

TDT for discrete traits	Notes
TDT and parenTDT with ascertainment	Notes
TDT for threshold-selected quantitative traits	Notes

Epistasis power calculator	Notes

QTL linkage for sibships	Notes

Probability Function Calculator	Notes

Instructions for power calculations

VC model calculations are based upon formula derived in Sham et al (2000) [AJHG, 66, 1616-1630]. Users of this site who are unsure of the nature of the VC tests and power calculations are strongly advised to consult this article.

A genetic model for a single diallelic QTL is specified in terms of

Variance components: For the linkage test, four variance components are entered for additive and dominant QTL components, and shared and nonshared residual components. These are entered as absolute variances (i.e. they don't need to be entered as proportions). For the association test, the total proportion of trait variance attributable to QTL is given (i.e. between 0 and 1).
Recombination fraction: For tests that involve families, the recombination fraction between the marker and QTL is sometimes required. This ranges from 0 (no recombination) to 0.5 (unlinked loci).
Sample Size: Sample size can range from 2 to 10,000,000.
Sibship Size: Sibship size can range from 1 in the case of association (in which case the within sibship test is undefined), or 2 in the case of linkage, up to 8 (linkage) or 5 (association).

In addition, the VC association test requires some extra parameters:

Dominance-to-additive effects ratio: This also ranges between 0 and 1, 0 represents no dominance (i.e. all QTL effects due to the additive effects of alleles), 1 represents complete dominance, and 0.5 represents an equal contribution to the trait variance of both additive and dominance effects. The default is to test for dominance (a 2 df test). If the 'no dominance' box is checked, then the d-to-a ratio is set to zero and an additive effects only (1 df) test is used to calculate the power.
Increaser allele frequency: The frequency of the trait increasing allele is specified, 0 < p < 1. We assume biallelic loci.
Marker allele '1' frequency: The frequency of the marker allele which is in disequilibrium with the trait increasing allele (0 < m1 < 1, biallelic marker)
Linkage disequilibrium (LD): The amount of LD between the QTL and the marker is specified as D-prime (0 < d < 1). A value of 0 indicates that the two loci are in complete equilibrium, whereas 1 represents the highest amount of disequilibrium possible is present (this amount depends of the relative allele frequencies of QTL and marker - i.e. complete disequilibrium could never be observed if the allele frequencies are different at QTL and marker). D-prime is related to other common measures of LD, such as R-squared (the square root of D-prime -- see the abovementioned article for further detail).
Sibling correlation: Specifying the sibling correlation (0 < r < 1) determines the relative balance of shared and nonshared residual variance for sibships. Note that invalid results might be obtained if the sibling correlation is specified to be less than half the QTL variance (i.e. assuming d:a = 0, then the sibling correlation must at least be half the variance due to the additive effects of the QTL).

Given these parameters, the program outputs the expected non-centrality parameter given the genetic model and the sample specification. The power of detecting a QTL effect is given for the different levels of type I error rate (including the 'user-defined' type I error rate). Also, the sample size needed to obtain the 'user-defined' power is given for each level of type I error rate (alpha). Please note that the VC tests are one-sided tests.

Instructions for VC QTL linkage conditional on trait

For sample of sibships measured on a quantitative normally-distributed trait, the expected contribution to the sample noncentrality parameter (NCP) is calculated, conditional on their trait scores, the sample residual correlation and the QTL effect size.

Simply paste the trait scores into the text windows (no other variables, whitespace delimited, also no trailing whitespace please). Enter also the residual sibling correlation (0-1) and the desired QTL variance (0-1).

Output consists of the trait scores returned but with an additional column added (first column) that represents the expected contribution to the sample NCP. This is an index of potential informativeness of that sibship: the higher the expected NCP, the more informative the sibship. Sibships can therefore be rank-ordered by this index for selective genotyping. This index can also be summed over all sibships to give the sample NCP, from which power can be calculated.

Instructions for discrete trait TDT power calculator

The main parameters that must be specified by the user are

High risk allele frequency, for 'A' allele. Typically, this would be rare, say under 0.10.
The disease prevalence in the general population (K).
The genotypic relative risks for the 'Aa' and 'AA' genotypes relative to the baseline 'aa' genotype risk. This risk is calculated from the parameters above. That is, the prevalence K equals f(AA)r(AA) + f(Aa)r(Aa) + f(aa)r(aa) where f() is the genotype frequency and r() is the genotypic risk, or P(disease|genotype). Rearranging gives a formula for r(aa) in terms of the other parameters. The genotypic relative risks for the 'Aa' and 'AA' genotypes equal r(Aa)/r(aa) and r(AA)/r(aa) respectively.
Sample size - the total number of parent-child trios.

The output gives the baseline genotypic risk r(aa) and also the genotypic odds ratios for the 'Aa' and 'AA' genotypes (will be very similar to the genotypic relative risks for rare diseases). The expected number of heterozygous (i.e. informative) parents per family is given and also the transmission probabilities of the two alleles for affected offspring. The deviation of these two probabilities from 50:50 is the basis for the TDT statistic.

Power is given for various values of alpha for the user-specified sample size. Also, required sample size is given for various levels of alpha for the user-specified power.

Instructions for discrete trait Case-Control power calculator

See the instructions above for discrete TDT for a description of most of the model parameters. One difference is that this procedure is for a marker B in linkage disequilibrium with the test locus A. To specify power at the test locus, set the LD measure (d-prime) to 1 and the allele frequencies of A and B equal.

As well as specifying the number of affected individuals (cases) the user must specify the control:case ratio. If this equals 1, then there are as many controls as cases. If this were 0.5, and there were 200 cases, there would be 100 controls, etc.

The output also gives the expected allele and genotype frequencies for cases and controls. A chi-squared test statistic (and associated power at alpha=0.05) is given for a test of Hardy-Weinberg equilibrium in cases and controls (the presence of H-W disequilibrium in cases but not controls can be indicative of an association).

Additionally, the haplotype frequencies and implied r-squared (measure of LD) are displayed, along with the D-prime measure input by the user.

The rest of the output is as described above - note that the number of cases for a specific power refers to the number of affected individuals along with the appropriate number of controls as specified by the control:case ratio.

For this module, we also consider the performance of genotypic tests that assume either a recessive, dominant or general (2 df) model, as well as the standard allelic test (which is printed last). For most purposes, only this last allelic test (the label is highlighted in red) will be of interest.

Instructions for quantitative trait TDT power calculator

The specification of parameters is somewhat different in the case of quantitative traits. Firstly, effects are expressed in terms of variance components rather than genotypic relative risks. Secondly, it is possible to consider the scenario where the test locus is a marker in LD with the true QTL.

The total QTL variance should range between 0 and 1. For example, 0.05 represents a QTL that accounts for 5% of the phenotypic variance. For no dominance, set dominance:additive QTL effects to 0, or 1 for complete dominance. As well as specifying the allele frequency for the biallelic QTL, one must specify the marker frequency for the biallelic marker, as well as D-prime (the proportion of possible LD present) between marker and QTL.

If the test locus were the QTL, equate p and m1 and set d-prime to 1.

The case threshold refers to the value of a standard normal scale above which offspring are ascertained as cases. For example, if only individuals who score 2 standard deviations above the mean (assuming a normally distributed trait) are ascertained, then the case threshold equals 2. Always use the absolute threshold value, even if cases are defined as scoring less than a negative threshold.

The recombination fraction (0 for complete linkage to 0.5 for no linkage) is always required. Typically, this would be set near 0.

Instructions for quantitative trait Case-Control power calculator

See the quantitative TDT notes above for a description of the main parameters. The number of cases and the control:case ratio (see notes above) specify the sample size. The thresholds specify where the cases and controls are approximately sampled from on a standard normal scale. If cases are selected as scoring over 2 standard deviations above the mean, the lower case threshold would be 2, the higher case threshold could be set to something like 5 or 6 (setting it even higher would not influence results as we would not expect to see any individuals scoring so high usually). If controls were sampled as being 'average', selecting out extreme scorers, one might specify -1 and +1 as the lower and upper control thresholds. If an equal number of controls as cases were sampled, the control:case ratio would be set to 1 also.

Various misc. utilities

The following utilities are undocumented and no longer supported

Variance Components - Relative Risk Conversion

Two locus QTL linkage (means)

Two locus QTL linkage (effects)

Epistasis: genotypic means -> effects