|Figure 4: Path diagram for 2 factor model|
An important feature of a structural model is that it should provide a parsimonious explanation of the data. We could simply use our optimization procedures to estimate the covariance matrix and means in our dataset, without having any model of their relationships, but this would be a theoretically vacuous exercise. As we shall see later on, we do in fact do such an estimation when evaluating a model, as it provides a baseline (a 'saturated model') against which to evaluate the likelihood of a structural model. But a good structural model will explain the data using fewer estimated parameters than observed statistics. The difference between the number of estimated parameters and observed statistics corresponds to the degrees of freedom of the model. For the model in Figure 4, assuming we set the variance of the latent variables to 1 and so do not have to estimate these values, we have a total of eight estimated parameters in the model; i.e. a, b, c, d, e, f, g and h. The observed covariance matrix that we input to the model contains 10 distinct values (remember that the symmetry of the matrix means that we do not need to count the covariances in the upper triangle, as they are the same as those in the lower triangle). This gives 10-8 = 2 degree of freedom(DF) In practice, in OpenMx, means as well as covariances are estimated, and for the model in Figure 4, this gives four additional observed values (giving total of 14) and four additional estimated parameters (giving total of 12), with the DF remaining at 2. It is important that a model has a positive DF: such a model is termed 'overidentified'. If you specify a model with more estimated parameters than observed variables it is underidentified and not amenable to sensible solution. As will be discussed further below, for some models, it may be necessary to give parameters a fixed value (so they do not count in the N estimated parameters), or to set two paths to the same value (so that two estimated parameters become one), in order to ensure the model is overidentified.
There are two types of comparison that are useful. The first, which is automatically provided in OpenMx, is a comparison with the 'saturated model'. Although this is termed a 'model', it is not one in the normally accepted sense, since there are no latent variables in it. It simply represents the case where the optimization procedure is used to estimate the observed statistics. This is the kind of exercise we did with the two scripts showing likelihood calculations, first for a pair of variables, and then for three variables. The value of -2LL for a saturated model will always be lower than that for a model with latent variables, because the search for a fit is totally unconstrained. As noted above, however, the saturated model is of no theoretical interest, and it has an equal number of observed statistics and estimated values, and hence no degrees of freedom. The saturated model is of use, though, to provide a baseline level of -2LL against which to compare our overidentified model. If we subtract the -2LL value of the saturated model from the -2LL value for our model of interest, we get a statistic that follows the chi square distribution. The degree of freedom correspond to the DF for the model of interest minus the DF for the saturated model (which is zero). So, in this case: