Introduction to Mx
S.Purcell, 2000.
Note: these pages were created a long time ago... Interested parties should probably look at OpenMx or alternatives...
Mx model-fitting software by Mike Neale
Mx is a freely-available model-fitting program, very
popular in the field of behavioural genetics. It is
similar to commercial programs such as LISREL and EQS.
It offers a Graphical User Interface (MxGUI), which
allows path diagrams to be drawn describing the model
to be fit to the data. Alternatively, scripts
can be written to specify the model: this tutorial
looks at the structure of Mx scripts and output.
Mx is available for free downloading, supporting various
platforms from the Mx homepage.
General Structure of an Mx Script
An Mx script comprises of one of more groups.
Mx is primarily structured this way because it was designed
for the analysis of genetically informative data, where different
types of relatives can be thought of as forming distinct groups. In
an Mx script, one might find one group for MZ twins,
one for DZ same-sex twins, say, and possibly one for
DZ opposite-sex twins.
The groups in an Mx script do not always map onto groups of data from
different types of relatives, however. Certain groups have special functions
such as specifying the model, performing calculations or imposing
constraints upon the analysis.
The basic structure of a group is:
- TITLE
- Indicate GROUP TYPE : data / calculation / constraint
- Read and select any observed data, supply labels
- MATRICES : declare at least one matrix
- Specify numbers and parameters, starting values, equality constraints
- MODEL : define matrix formula: covariance / means / threshold / compute
- Request fit functions, statistical output and optimisation options, other options
- END command
First script : Univariate ACE twin model
Below we will walk through a script designed
fit the ACE model to twin data. The basic function
is to take two covariance matrices (one for MZ twins,
one for DZ twins) and to estimate the proportion of
variance attributable to additive genetic effects,
to shared environmental effects and nonshared
environmental effects for the trait. Click
here to see the
whole script.
! Mx ACE script for twin data
We have used the exclamation mark character to comment-out
this line, so it does not get processed by Mx. It
is good practice to put as many comments or remarks in scripts
as possible, as they will help you and others navigate scripts.
|
#define nvar 1
The #define command is a form of macro that assigns
the value 1, in this case, to the variable named nvar.
Where ever nvar appears in the script subsequently, it will
be replaced by the value 1, as determined here. In this case, the
label nvar represents the number of variables to be
analysed per twin (which, for a univariate script, is 1). This command allows
the user to easily make lots of changes in a script by just changing
the first #define line. We will see how this is used when
we consider modifying the script for multivariate analysis. This command
always occurs before the first group.
|
Group1: Defines Matrices
Calc NGroups=4
The first line here is the title of the group: we have named it
Group1: Defines Matrices - a descriptive name is
generally a good idea. The second line tells Mx what type of
group this is. Because the keyword Calc (or
Calculation) appears on this line, Mx does not expect to
read any data for this group. The NGroups command
is needed to tell Mx how many groups there are in the entire script (so
there are four in this script, including this group). The
function of this first group is to set up the basic parameters
that will be used in the modelling of twin data.
|
Begin Matrices;
! path coefficients
X Lower nvar nvar free ! additive genetic
Y Lower nvar nvar free ! shared environmental
Z Lower nvar nvar free ! non-shared environmental
The Begin Matrices; (note that Mx can often be very
fussy about punctuation - the semi-colon is not optional)
defines three matrices, labelled X, Y and
Z. As we shall see, these correspond to the path coefficients
for additive genetic, shared environmental and nonshared environmental
twin model parameters. They correspond to these entities because of
the way in which the twin covariance matrices are described in
terms of them (as we will see below). There is nothing 'fixed' about
them at this stage, however: here they are just any three matrices.
The Lower keyword tells Mx that these matrices are lower
triangular matrices - that is, the only 'free' elements in these
matrices are those below and including the diagonal. In this case, the
matrices are only 1 by 1 (specified by the nvar macro) - so
we can think of X, Y and Z as simply
being three numbers rather than matrices.
The free keyword tells Mx that the maximum-likelihood
model-fitting should attempt to estimate best-fitting values for
the free elements in these matrices, rather than assuming the values
to be fixed at either zero (by default) or some other user-specified value
(which Mx would assume if the free keyword was missing, or
replaced with the term fixed).
|
H Full 1 1
Here we define one other matrix, H. As we shall see,
this is a single number (as it is a 1 by 1 matrix, so even
though the matrix is full, it is identical to a
lower matrix in this case. As we shall see, H
is assigned to be the constant 0.5. This value is used in the
specification of the DZ covariance structure, below. In Mx, everything
must be in a matrix - it is not possible, for example, to specify
that a matrix is multiplied by a constant number (e.g 10*A).
Always one matrix must operate on another matrix, where one or both
matrices might have been predefined to represent constants
(i.e. rather than estimated or fixed parameters) previously (e.g.
T*A).
|
End Matrices;
This command pairs up with the Begin Matrices;
command, indicating that we have defined all the matrices
in this group.
|
Matrix H .5
Here we assign a value of 0.5 to the matrix H. Because
it is a 1 by 1 matrix, it is unambiguous which element should be
assigned as 0.5 (i.e. there is only one element). If H
had more than one element, however, it might be necessary to specify
which element(s) are being assigned to.
|
Begin Algebra;
We have defined our initial matrices in the
section above. Those matrices will contain the
free elements that are to be estimated in the model
fitting process. Mx also allows us to calculate other
matrices that are algebraic functions of the other
matrices. These can be used, as we shall see, to make
the specification of the model easier, or to make the
output for interpretable (e.g. when we standardise the output,
as in the final group in this script).
|
! covariance matrices
A= X*X'; ! genetic
C= Y*Y'; ! shared environmental
E= Z*Z'; ! nonshared environmental
Here we calculate three matrices that are essentially the squares
of the original three matrices - that is, a matrix post-multiplied
by its transpose. (Note that this formulation also gives the
matrix-equivalent of squaring when the matrices are larger than
1 by 1.) If the first three variables represented the path
coefficients, these can be thought of as the variance components
(in the univariate case: they would be the components of covariance
matrices in the multivariate case, as the script indicates).
That is, in path diagram terms, we have traced up the path, round
the variance of the latent variable (which is conceptually fixed
to 1, and so need not appear in the script) and back down the path.
Multiplying the values of the paths traced, gives us the path
coefficient squared as the contribution to variance from that
associated latent variable. Below we will write out the model
for the covariance structure of MZ and DZ twins directly in terms
of the variance components - that is, directly in terms of the
matrices A, C and E.
|
End Algebra;
This is required to let Mx know that we have finished
calculating new matrices.
|
Start 0 X 1 1 to X nvar nvar
Start 0 Y 1 1 to Y nvar nvar
Start 1.5 Z 1 1 to Z nvar nvar
For the original matrices, whose free elements will
be estimated during model-fitting, we have to supply
starting values. For univariate ACE twin models, it
is usually enough to simply start the genetic and
shared environmental paths at zero, and set the
nonshared environmental path to the trait standard
deviation. Obviously, the better the starting values,
the quicker model-fitting will be. Note the use of the
to keyword - if nvar were in fact different
from 1, then all the elements between X 1 1 (that is,
the top left element of X) and X nvar nvar
(that is, this bottom right) would be initialised at the set value.
This can be especially useful in the multivariate case.
|
End
This command signals that it is the end of this, the first group.
|
Group2: MZ twin pairs
Data NInput_vars=2 NObservations=1500
After the title line of the second group, we see a Data
keyword in place of Calc. Mx
therefore assumes that we will be supplying some data
of some sort for this group to model. We need to tell Mx
here about the structure of the data - the number of
input variables (in the univariate case we will have one
variable on both twins - so we have two input variables)
and the number of observations (i.e. MZ twin pairs
in this case).
|
CMatrix
2.12
1.67 2.08
The CMatrix command tells Mx that we are supplying
the data in the form of a covariance matrix. Unless otherwise
told, Mx will assume that this covariance matrix will be
a lower diagonal matrix - the element above the diagonal is
intentionally missing, as it is necessarily equal to the
corresponding below diagonal element - i.e. 1.67.
Mx can read data in other forms - notably as raw data (in which
case the keyword RE file="myfile.raw" is used, where
the file is a space-delimited ASCII file with no header or
non-numerical characters) or as a correlation matrix (in
which case KMatrix does the job).
|
Labels trait1 trait2
This command assigns text labels to the input variables. The
labels are assigned to the input variables in the order in
which they are read in: here the first variable is called
trait1 (note that this could have been anything,
such as anxscr01) - this is the trait score
of the first twin. So the second twin is labelled
trait2. Note that whether or not a twin is 'first'
or 'second' is solely reflecting their position in the variance-
covariance matrix.
|
Matrices= Group 1
We could have defined matrices in this group using
the Begin Matrices; but instead we can
specify that the matrices defined in a previous group
will be the matrices also for this group.
|
Covariances A + C + E | A + C _
A + C | A + C + E /
This statement contains the core of the model. The terms
after the Covariance statement represent an
algebraic description of the twin covariance matrix matrix. The
operators | and _ are used to adhere
matrices together to make larger matrices. We can think of the
matrices A, C and E, representing
the variance components, as the building blocks with which we
describe what we expect the MZ covariance matrix to look like.
The top left element, that is all the terms up to the first |
define the trait variance for twin 1: simply the sum of all the components
of variance, additive genetic, shared and nonshared environmental.
The next element (which is the top right element in the covariance
matrix) represents the covariance between twin 1 and twin 2. For
MZ twins, this is equal to the full complement of additive genetic
and shared environmental variation, by definition. By symmetry of
covariance matrices, the bottom left element is clearly the same.
The bottom right element, the variance for twin 2, is also the same
as the variance for twin 1. Although this need not be the case, given
that we are not modelling correlation matrices, the most parsimonious
model has no reason to believe that all 'twin 1s' should be
fundamentally different to 'twin 2s'. The / symbol is
required to tell Mx that we have finished specifying the model.
|
Option RS
This command tells Mx to print out the residual covariance
matrix for this group in the output - that is, the difference
between the observed MZ covariance matrix and the one
predicted by the model. Other optional features can be
specified here, with keywords other than RS after
Option.
|
End
The is the end of the MZ data group.
|
Group3: DZ twin pairs
Data NInput_vars=2 NObservations=2000
Similar to the above data group, we must read in the
data for DZ twins and supply a model for DZ twins.
|
CMatrix
1.98
1.25 2.06
This is the observed covariance matrix for DZ twins. Note
that the covariance is lower than the MZ covariance,
whilst the variances are approximately equal. This is pleasing,
as these are intrinsic assumptions of the ACE model, and
interpreting the data would be difficult if this did not hold.
|
Labels trait1 trait2
Matrices= Group 1
As above. This ensures that the components of variance estimated
for MZ twins are the same as those for DZ twins. This is clearly
what we want - the aetiology of a trait should be the same
whether or not an individual is an MZ or DZ twin. (In fact, if
we did not equate these values across MZ and DZ groups, the model
would not be identified, and would not be able to estimate anything
useful.) The difference between MZ and DZ twins is reflected in the
different specification of the model (see below) - we use the same
building blocks, however.
|
Covariances A + C + E | H@A + C _
H@A + C | A + C + E /
The critical difference is simply that biometrical theory
predicts that only half of the variation due to the
additive effects of genes will be shared between DZ twins.
This is why we multiply the additive component of variance by
0.5 (i.e. H)) in the covariance element. Note the
use of the Kronecker product style of multiplication (@)
which facilitates multivariate analysis.
|
Option RS
End
Group4: Standardised solution
Calculation
This group does not read any data (and so does not
contain a data statement on the second line).
The function of this group, as the title, implies is
to provide a standardised solution. That is,
we have been fitting the models to the covariance
structures of MZ and DZ twins, and so the parameters
represent the components of variance (A, C
and E or the corresponding path coefficients
X, Y and Z). This group
will calculate the proportions of variance
that are attributable to additive genetic,
shared and nonshared environment effects. Although this
is trivial in the univariate case (it could easily be
calculated by hand) this is not the case in the
multivariate case, so it is useful to include a
group such as this.
|
Matrices = group 1
Again, we need to tell Mx which set of matrices will
be used for this group.
|
Begin Algebra;
We tell Mx that we are about to calculate
some new matrices from existing ones. The details of
the procedure we are about to see below are not in
themselves important, but we shall go through them
just to get a sense of how Mx can be used.
|
! phenotypic covariance matrix
P = A + C + E;
First we calculate the phenotypic covariance
matrix. In the univariate case, this is simply a
1 by 1 matrix - the variance of the trait. In the
multivariate case, the off-diagonal elements would
represent the phenotypic (i.e. within individual)
covariances between the different measures.
|
! diagonal matrix of phenotypic
! standard deviations
D = \sqrt(\v2d(\d2v(P)));
We use some of Mx's matrix functions here to
extract the diagonal elements of the phenotypic
covariance matrix and put them in a vector (\d2v).
We then convert this vector back into a matrix (\v2d).
These two operations have essentially had the effect of making
all the off-diagonal elements zero, but preserving the diagonal
elements (i.e. the variances). We then take the square root of these
elements (\sqrt) in order to get the standard
deviations.
|
! Phenotypically standardised
! covariance matrices :
! phenotypically standardised
T = D~ * A * D~; ! additive genetic
U = D~ * C * D~; ! shared environmental
V = D~ * E * D~; ! nonshared environmental
These statements essentially standardise the three variance
component matrices. Pre- and post- multiplying by the inverse
of the diagonal matrix containing the standard deviations (D~)
has this desired effects. Again, the details are not important,
but if you want to work through the matrix algebra you will see
that it is really quite simple. This is the beauty of matrix
notation - that lots of calculations can be specified in just a
few lines. These new matrices will contain the proportions
of variance along the diagonal elements, i.e. in this case, matrix
T would represent the heritability.
|
! Correlation matrices
! for aetiological factors
! i.e. not phenotypic correlations
G = \stnd(T); ! additive genetic
S = \stnd(U); ! shared environmental
N = \stnd(V); ! nonshared environmental
If we standardised these matrices (so that we will have 1's
along the diagonal) we obtain the genetic correlation
matrix, the shared environmental correlation matrix
and the nonshared environmental correlation matrix.
Here, in the multivariate case, the off-diagonal elements will
represent the overlap of genetic and environmental factors in the
aetiologies of the traits (in the univariate case, these matrices
will simply be each a 1 by 1 matrix containing a 1, and so meaningless).
|
End Algebra;
End
...and so we tell this to Mx.
|
Mx Output for the Univariate ACE model
Click
here to see the
whole output, generated from running the above
univariate ACE model script. The output can be broken
down into several main sections:
- The input script is given at the top of the
output
- Parameter specifications : for
each group, all the matrices defined are
listed, and whether elements of these matrices
are free to be estimated or fixed is also given.
- Parameter estimates : for each group,
the best-fit parameter estimates are given for
all free elements. Note that this often
causes repetition in the output, as in the case of
a twin model, the parameters in the MZ data group will
be the same as the parameters in the DZ data group.
- Model summary : at the end of the output,
gives the model fit and other important statistics.
Mx output files do tend to be quite long, so we will not
cover the entire output here: rather, we will focus on some of
the most important parts.
The following MX script lines were read for group 1
#DEFINE NVAR 1
GROUP1: DEFINES MATRICES
DATA CALC NGROUPS=4
BEGIN MATRICES;
X LOWER NVAR NVAR FREE ! GENETIC STRUCTURE
Y LOWER NVAR NVAR FREE ! SHARED ENVIRONMENT
Z LOWER NVAR NVAR FREE ! NON-SHARED PATH CO-EFFICIENTS
...
As mentioned, the output begins with a copy of the
script that has been run.
|
G = \STND(T);
How am I supposed to take the square root of 0.?
Diagonal elements are:
0. 0.
Matrix is
0.0000E+00
Matrix not fully standardized
You may notice certain warnings or error messages: these are not
necessarily important. In this case, for example, Mx is
merely warning us that the T matrix contains zero
before the model-fitting process begins.
|
PARAMETER SPECIFICATIONS
GROUP NUMBER: 1
Group1: Defines Matrices
Next, we see the parameter specifications for the
different groups listed in order.
|
MATRIX A
This is a computed FULL matrix of order 1 by 1
It has no free parameters specified
For instance, under Group 1, we see the A matrix listed.
The output confirms that it is a full, 1 by 1 matrix,
which has no free parameters. This is because it is
calculated from the matrix X, which contains the free
parameter.
|
MATRIX X
This is a LOWER TRIANGULAR matrix of order 1 by 1
1
1 1
As mentioned above, the matrix X contains the
free parameter representing the path coefficient
for additive genetic effects. The matrix A equals
X*X'. Of the three 1s, the top and leftmost
numbers represent the row and column numbers of the
matrix X. The remaining 1 is the number assigned
to this free parameter.
|
MATRIX Y
This is a LOWER TRIANGULAR matrix of order 1 by 1
1
1 2
MATRIX Z
This is a LOWER TRIANGULAR matrix of order 1 by 1
1
1 3
Likewise, we see the 2nd and 3rd parameters being assigned
here (shared environment and nonshared environment).
|
GROUP NUMBER: 2
Group2: MZ twin pairs
Next, the parameter specification for the second group is
given. Because the script used the Matrices=Group1
command for this group, we will see the same information.
|
MATRIX X
This is a LOWER TRIANGULAR matrix of order 1 by 1
1
1 1
For example, in the second group (i.e. the data group
for MZ twins) we see that the free element of matrix X
is still assigned parameter number 1. (Note that, although
different groups can use the same letters to represent
different matrices, the parameter numbers specified in
the output apply to the entire script: in this way, we
know that we have constrained these parameters to be equal
across groups.)
|
GROUP NUMBER: 3
Group3: DZ twin pairs
The third group, the DZ twin data group.
|
MATRIX X
This is a LOWER TRIANGULAR matrix of order 1 by 1
1
1 1
This group also uses the same set of matrices.
|
GROUP NUMBER: 2
Group2: MZ twin pairs
The information so far is useful for checking that the
script has been properly constructed, but they do not
constitute output as such. It is the best-fit
parameter estimates that are of the greatest interest
as well as the goodness-of-fit statistic for the model: these
are given next, again for each group in the script.
We shall look at the values in Group 4: although these
will be identical to the other groups, Group 4, the
standardisation group, will also contain the
standardised estimates.
|
OBSERVED COVARIANCE MATRIX
TRAIT1 TRAIT2
TRAIT1 2.1200
TRAIT2 1.6700 2.0800
Before looking at the best-fit parameter estimates from
the standardised group, it is worth noticing the output
from the two data groups, 2 and 3. Mx will print out the observed
trait covariance matrix, in this case for MZ twins. That is, this
should be identical to what was entered in the script.
|
EXPECTED COVARIANCE MATRIX
TRAIT1 TRAIT2
TRAIT1 2.0512
TRAIT2 1.6225 2.0512
Additionally, the expected covariance for MZ twins will
be displayed: this is based on the best-fit parameter
estimates. This gives a very good impression as to how
the model is performing, i.e. how well it accounts for
the observed data.
|
RESIDUAL MATRIX
TRAIT1 TRAIT2
TRAIT1 0.0688
TRAIT2 0.0475 0.0288
A particularly useful representation is the residual
covariance matrix - that is, the difference between
the observed and expected covariance matrices.
|
GROUP NUMBER: 3
Group3: DZ twin pairs
These statistics are also given for DZ twins:
|
OBSERVED COVARIANCE MATRIX
TRAIT1 TRAIT2
TRAIT1 1.9800
TRAIT2 1.2500 2.0600
The observed statistics, as entered in the script.
|
EXPECTED COVARIANCE MATRIX
TRAIT1 TRAIT2
TRAIT1 2.0512
TRAIT2 1.2780 2.0512
The model-based expected statistics for DZ twins (i.e. note
how the expected variances are the same as for MZ twins).
|
RESIDUAL MATRIX
TRAIT1 TRAIT2
TRAIT1 -0.0712
TRAIT2 -0.0280 0.0088
And the discrepancy between the expected and observed.
|
GROUP NUMBER: 4
Group4: Standardised solution
The expected covariance matrices presented above are based
on the best-fit parameter estimates for the model, as
mentioned. We shall review these here. Mx lists matrices
in alphabetical order, although we shall go through the
matrices in a more logical order here.
|
MATRIX X
This is a LOWER TRIANGULAR matrix of order 1 by 1
1
1 0.8301
The three basic free parameters of this model are the
elements in the matrices X, Y and Z. These are given here
at their best-fit values. That is, the additive genetic
path coefficient is 0.8301;
|
MATRIX Y
This is a LOWER TRIANGULAR matrix of order 1 by 1
1
1 0.9662
The shared environmental path coefficient is 0.9962;
|
MATRIX Z
This is a LOWER TRIANGULAR matrix of order 1 by 1
1
1 0.6547
The nonshared environmental path coefficient is 0.6547.
|
MATRIX A
This is a computed FULL matrix of order 1 by 1
[=X*X']
1
1 0.6890
Rather than presenting results in terms of path coefficients,
it is more standard to talk in terms of proportions of
variance. The matrices A, C and E represent the components
of variance, and are simply the squares of the
path coefficients.
|
MATRIX C
This is a computed FULL matrix of order 1 by 1
[=Y*Y']
1
1 0.9335
MATRIX E
This is a computed FULL matrix of order 1 by 1
[=Z*Z']
1
1 0.4287
Note how for calculated variables, the formula is
given below the description of the matrix type
(e.g. [=Z*Z']).
|
MATRIX H
This is a FULL matrix of order 1 by 1
1
1 0.5000
Not all matrices contain parameters to be estimated, or
functions of those parameters. The matrix H is set to the
constant 0.5, used in the specification of the DZ covariance:
Mx prints this for us in any case.
|
MATRIX P
This is a computed FULL matrix of order 1 by 1
[=A+C+E]
1
1 2.0512
As mentioned, we are typically interested in
proportions of variance. Here we calculate
the total variance, P, by summing A, C and E.
|
MATRIX D
This is a computed FULL matrix of order 1 by 1
[=\SQRT(\V2D(\D2V(P)))]
1
1 1.4322
We take the square root of this to give the
standard deviation.
|
MATRIX T
This is a computed FULL matrix of order 1 by 1
[=D~*A*D~]
1
1 0.3359
And we use this to standardise our estimates (remember,
the script is written in a general, multivariate form, so
the expression looks more complicated than merely what has
been described here). The matrix T therefore indicates
that additive genetic effects account for just under
34% of the variance in the trait.
|
MATRIX U
This is a computed FULL matrix of order 1 by 1
[=D~*C*D~]
1
1 0.4551
Likewise, shared environmental influences account for
just under 46%;
|
MATRIX V
This is a computed FULL matrix of order 1 by 1
[=D~*E*D~]
1
1 0.2090
And nonshared environmental influences account for
just under 21%
|
MATRIX G
This is a computed FULL matrix of order 1 by 1
[=\STND(T)]
1
1 1.0000
The matrices G, S and N would represent the genetic
and environmental correlations in the multivariate case:
in the univariate case, these will always be 1-by-1 unit
matrices, so we shall ignore here.
|
Your model has 3 estimated parameters and 6 Observed statistics
Chi-squared fit of model >>>>>>> 2.491
Degrees of freedom >>>>>>>>>>>>> 3
Probability >>>>>>>>>>>>>>>>>>>> 0.477
Akaike's Information Criterion > -3.509
RMSEA >>>>>>>>>>>>>>>>>>>>>>>>>> 0.003
Last but not least, the goodness-of-fit statistics for the model
are given. The output tells us that we had 6 observed statistics
(i.e. four variances and two covariances from the MZ and DZ covariance
matrices) and estimated 3 parameters (i.e. additive genetic,
shared environmental and nonshared environmental components
of variance). This gives 6-3=3 degrees of freedom, as also
represented above. The chi-squared values for these data is
2.491, which is non-significant (p=0.447). This indicates
that the observed values do not significantly depart from the
expected values: the model fits. The other lines represent
other fit statistics that will not be reviewed here.
|
Modifying the Script to Test Submodels
Under construction
Multivariate ACE Model Script
Click
here to see the
whole script.
Under construction
Output from Multivariate ACE Model Script
Click here to see the
whole script.
Under construction
|