Behavioural Genetic Interactive Modules

Covariance

Overview

Covariance is a fundamental statistic that informs us about the relationship between two characteristics (such as height and weight, for example). This module aims to show how this measure of association is calculated, and how it is related to the concept of variance.

Tutorial

This module, covariance.exe is very similar to the previous module that illustrated the calculation of variance. This is unsurprising, for statistically and conceptually covariance is indeed very similar to variance - that is, it is a measure of co-variation between two traits.

This first panel controls the input of data into the module, and is similar to that of the variance module except that we are now dealing with pairs of scores rather than individual scores. Each observation represents, say, one individual and the pair of scores represents two measures for each individual, X and Y. The module will calculate the covariance between X and Y.

As before, the module tabulates scores, this time in pairs. In this case, for example, the 15^th individual in the sample has scored 6 on measure X and 45 on measure Y.

The univariate statistics such as the sum and mean for X and Y are calculated and displayed as before. In this case we see, for example, that measure X has a sum of 1374 and a mean of 52.85 in the sample of the 26 individuals entered into the module.

For bivariate data, as well as plotting a bar chart for each measure, we can also plot a scatter-plot that gives a visual representation of the association between the two measures. Each point represents an individual, the horizontal axis represents their score on X, the vertical axis their score on Y. The little lines plotted on the axes represent 1 and 2 standard deviations away from the mean.

From the scatter-plot, we can see that the two variables do not seem to be strongly related. If anything, there appears to be a slight negative association, in that individuals who score higher on X seem to, on average, score slightly lower on Y. The covariance statistic we are about to calculate will quantify this relationship more precisely.

For both X and Y we calculate the deviations from the mean, and square these deviations. These are used in the calculation of variance. The fifth column represents the quantity that is most relevant to covariance however: the cross-product of the deviation from the mean for X and the deviation from the mean for Y for each individual. For example, in the top line here we see that this individual has scored 2.15 units above the mean on X and 43.5 units above the mean on Y. The cross-product is simply 2.15 multiplied by 43.5, or 93.69.

Note how some of these cross-products are negative, unlike the squared deviations from the mean. This represents the fact that covariance represents not just a measure of strength of association, but also implies a direction of association. In this context, negative values reflect scoring higher than average on one measure but lower than average on the other measure.

In a similar manner, these squared deviations are summed, as are the cross-products. Note how, in this instance, the sum of cross-products is actually negative.

For both X and Y we can use the sum of squared deviations from the mean to calculate the variance of each measure, as in the previous module. This picture shows the variance calculated for X.

Here we calculate the covariance: whereas the variance of one measure is the average squared deviation from the mean, the covariance between two measures is the average cross-product of deviations from the mean for the two measures. In this case, the covariance is -308.6, confirming the impression given by the scatter-plot that X and Y appear to be negatively associated.

So what does -308.6 mean? Is this a strong negative association? Is this meaningfully different from a covariance of zero, implying no association? As you might expect, in order to answer such questions, we have to take into consideration the individual variances of each measure. That is, if the variances of X and Y were both around 10,000 then the absolute magnitude of the covariance, only in the hundreds, would mean that the covariation between X and Y was relatively small compared to the variation in the measures that was not shared between them. If the variances were of the order of four or five hundred, however, then the majority of the variance would appear to be shared. This kind of calculation is precisely what a correlation does: the correlation coefficient will be introduced in the next module.

Finally, the standardised scores for both X and Y are given: these are calculated in an identical manner to the variance module. [note: correct error on labels of screen-shot]

Questions

If X and Y are two variables with the same variance, would it ever be possible for the covariance between X and Y to be greater than their variances? If not, why not?
Use the module to calculate the covariance between the two measures in the following sets. What do the covariances tell you about the relationship between the two variables here?
- X : 1, 2, 3, 4, 5, 6, 7
  Y : 2, 3, 4, 5, 6, 7, 8
- X : 1, 2, 3, 4, 5, 6, 7
  Y : 2, 4, 6, 8, 10, 12, 14
- X : 60, 55, 45, 60, 45
  Y : 60, 55, 65, 50, 45
- X : 5, 5, 5, 5, 5, 5, 5
  Y : 1, 2, 3, 4, 5, 6, 7
Would it tell you anything if you were to recalculate a covariance on standardised scores instead of raw scores?

Answers

In short, no. If you think of a covariance as a measure of association, it is obvious that a trait is completely 'associated' with itself. If two measures had the same variance, it would be possible for the covariance to equal the variance. The only way this could happen if the two variables have identical deviations from the mean for every single observation, however. Two such variables could not really be considered to measure different things in any case.
The first set of numbers represent the case we mentioned above, where the covariance is equal to the two variances. That is, there is a perfect linear association between X and Y, even though Y has a greater mean than X.
The contrast with the first set of numbers is interesting here. There is also a perfect linear relationship between X and Y. In this case, X and Y have different variances, however, so the covariance is different from above.
In the third set, X and Y are unrelated. That is, the covariance is zero. Note how the cross-products of the deviations from the mean cancel themselves out, so that the sum is 0.
In the final set, we see that X has a variance of 0, as there is simply no variation in the scores: they are all 5. The covariance is necessarily 0 also: if a trait doesn't vary, it cannot covary with any other trait.
Yes it would: in fact, this is the basis of the correlation coefficient, which we shall review in the next module. As an exercise now, re-enter the values for the first two sets of numbers from the last question. Make a note of the pairs of standardised scores for both sets. Then try entering the standardised scores into the module as input in both cases.
You will find that the standardised scores are the same for both cases; when entered the variance of both X and Y will be 1, in both cases, because they have been standardised. The covariance will be 1 in both cases also. This actually represents the correlation between X and Y, as we shall see in the next module. The fact that this quantity is now the same for the first and second sets of numbers represents the 'standardised' nature of the correlation: in both cases the association between X and Y is equally strong, all that changes is the scale (variance) of one of the measures.

Please refer to the Appendix for further discussion of what covariances represent and how they are used.

Site created by S.Purcell, last updated 12.11.2000