Measurement Invariance + 2nd order Growth Model (ECLS-K)

1 Overview

In this tutorial we walk through the very basics of testing measurement invariance in the context if a longitudinal factor model - and how a 2nd order growth model can then be used to describe change in an invariant factor.

This tutorial follows the example in Chapter 14 of Growth Modeling: Structural Equation and Multilevel Modeling Approaches (Grimm, Ram & Estabrook, 2017). Using 3-occasion data from the ECLS-K, we test for factorial invariance, and then use a second-order growth model to describe change in the factor scores across time.

2 Introduction to the Common Factor Model

The basic factor analysis model can be written as a matrix equation …

\[ \boldsymbol{Y_{i}} = \boldsymbol{\tau} + \boldsymbol{\Lambda}\boldsymbol{F_{i}} + \boldsymbol{U_{i}} \]

where \(\boldsymbol{Y_{i}}\) is a \(p\) x 1 vector of observed variable scores, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{F_{i}}\) is a \(q\) x 1 vector of common factor scores, and \(\boldsymbol{U_{i}}\) is a p x 1 vector of unique factor scores.

We can rewrite the model in terms of variance-covariance and mean expectations. For example, the expected covariance matrix, \(\boldsymbol{\Sigma} = \boldsymbol{Y}'\boldsymbol{Y}\), becomes

\[ \boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Psi}\boldsymbol{\Lambda}' + \boldsymbol{\Theta} \] where \(\boldsymbol{\Sigma}\) is a p x p covariance (or correlation) matrix of the observed variables, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{\Psi}\) is a q x q covariance matrix of the latent factor variables, and \(\boldsymbol{\Theta}\) is a diagonal matrix of unique factor variances.

and the expected p x 1 mean vector, $ $ becomes \[ \boldsymbol{\mu} = \boldsymbol{\tau} + \boldsymbol{\Lambda}\boldsymbol{\alpha} \]

where \(\boldsymbol{\tau}\) is a p x 1 vector of manifest variable means, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, and \(\boldsymbol{\alpha}\) is a q x 1 vector of latent variable means.

We can then extend the model to multiple-occasion settings with occasion-specific subscript, so that

\[ \boldsymbol{\Sigma_t} = \boldsymbol{\Lambda_t}\boldsymbol{\Psi_t}\boldsymbol{\Lambda_t}' + \boldsymbol{\Theta_t} \] and \[ \boldsymbol{\mu_t} = \boldsymbol{\tau_t} + \boldsymbol{\Lambda_t}\boldsymbol{\alpha_t} \]

Different levels of measurement invariance are established by testing (or requiring) that various matrices are equal.

Specifically, …
For Configural Invaraince we establish that the structure of \(\boldsymbol{\Lambda_t}\) is equivalent across occasions.

For Weak Invariance we additionally establish that the factor loadings are equivalent across occasions, \(\boldsymbol{\Lambda_t} = \boldsymbol{\Lambda}\).

For Strong Invariance we additionally establish that the manifest means are equivalent across occasions, \(\boldsymbol{\tau_t} = \boldsymbol{\tau}\). and

For Strict Invariance we additionally establish that the residual/unique variances are also equivalent across occasions \(\boldsymbol{\Theta_t} = \boldsymbol{\Theta}\).

Measurement Invariance testing is usually conducted within a Structural Equation Modeling (SEM) framework. Here we illustrate how this may be done using the lavaan package.

Lots of good information and instruction (including about invariance testing - in a multiple-group setting) can be found on the package website … http://lavaan.ugent.be.

2.0.1 Prelim - Loading libraries used in this script.

library(psych)
library(ggplot2)
library(corrplot) #plotting correlation matrices
library(lavaan)  #for fitting structural equation models
library(semPlot)  #for automatically making diagrams 

2.0.2 Prelim - Reading in Repeated Measures Data

For this example, we use an ECLS-K dataset that is in wideform. There are variables for children’s science, reading, and math aptitude scores, obtained in 3rd, 5th, and 8th grade. The three aptitude scores are considered indicators of an academic achievement latent factor.

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/ECLS_Science.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),na.strings = ".") 

names(dat) <- c("id", "s_g3", "r_g3", "m_g3", "s_g5", "r_g5", "m_g5", "s_g8", 
                "r_g8", "m_g8", "st_g3", "rt_g3", "mt_g3", "st_g5", "rt_g5", 
                "mt_g5", "st_g8", "rt_g8", "mt_g8")

#selecting only the variables of interest
dat <- dat[ ,c("id", "s_g3", "r_g3", "m_g3", "s_g5", "r_g5", "m_g5", "s_g8", 
                "r_g8", "m_g8")]

2.0.3 Prelim - Descriptives

Lets have a quick look at the data file and the descriptives.

#data structure
head(dat,10)
id s_g3 r_g3 m_g3 s_g5 r_g5 m_g5 s_g8 r_g8 m_g8
1 NA NA NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA NA
8 NA NA NA NA NA NA 103.90 204.10 166.67
16 51.57 142.18 115.59 65.94 141.02 133.67 86.90 169.83 156.67
28 NA NA NA NA NA NA NA NA NA
44 NA NA NA NA NA NA NA NA NA
46 72.09 154.43 96.87 79.44 170.57 116.28 89.08 192.07 132.40
62 34.71 106.40 87.86 47.44 145.72 104.68 NA NA NA
66 NA NA NA NA NA NA NA NA NA
74 62.94 126.06 92.47 73.70 145.17 124.73 92.67 193.43 133.93
#descriptives (means, sds)
psych::describe(dat[,-1]) #-1 to remove the id column
vars n mean sd median trimmed mad min max range skew kurtosis se
s_g3 1 1442 50.99316 15.61923 50.935 50.82393 16.76079 18.37 92.66 74.29 0.0789253 -0.5926226 0.4113174
r_g3 2 1430 127.65780 29.21852 126.975 128.44100 31.33475 51.46 195.82 144.36 -0.2126704 -0.4974589 0.7726633
m_g3 3 1442 99.71625 25.53598 102.605 99.98539 27.70979 35.72 159.40 123.68 -0.0954525 -0.7022825 0.6724653
s_g5 4 1135 65.25489 16.18239 67.530 65.99278 16.51616 22.57 103.23 80.66 -0.3888726 -0.4883320 0.4803355
r_g5 5 1133 151.09049 27.30787 152.330 153.09954 26.73128 64.69 202.22 137.53 -0.6153502 -0.0358596 0.8112841
m_g5 6 1136 124.34706 25.16900 128.645 126.08475 25.13748 50.87 169.53 118.66 -0.5818852 -0.2415627 0.7467526
s_g8 7 947 84.88585 16.71213 88.930 86.88478 14.81117 29.61 107.90 78.29 -0.9948362 0.4785809 0.5430713
r_g8 8 941 172.04634 27.72934 179.700 175.53752 24.90768 89.15 208.44 119.29 -0.9835795 0.2411465 0.9039507
m_g8 9 945 142.46749 22.50262 147.360 145.06745 21.23083 67.75 172.20 104.45 -0.9360756 0.3588947 0.7320104
#correlation matrix
round(cor(dat[,-1], use = "pairwise.complete"),2)
##      s_g3 r_g3 m_g3 s_g5 r_g5 m_g5 s_g8 r_g8 m_g8
## s_g3 1.00 0.76 0.71 0.85 0.73 0.68 0.75 0.68 0.66
## r_g3 0.76 1.00 0.75 0.73 0.85 0.70 0.70 0.76 0.68
## m_g3 0.71 0.75 1.00 0.70 0.72 0.88 0.71 0.66 0.81
## s_g5 0.85 0.73 0.70 1.00 0.77 0.74 0.81 0.73 0.70
## r_g5 0.73 0.85 0.72 0.77 1.00 0.75 0.74 0.80 0.71
## m_g5 0.68 0.70 0.88 0.74 0.75 1.00 0.74 0.68 0.85
## s_g8 0.75 0.70 0.71 0.81 0.74 0.74 1.00 0.78 0.78
## r_g8 0.68 0.76 0.66 0.73 0.80 0.68 0.78 1.00 0.75
## m_g8 0.66 0.68 0.81 0.70 0.71 0.85 0.78 0.75 1.00
#visusal correlation matrix
corrplot(cor(dat[,-1], use = "pairwise.complete"), order = "original", tl.col='black', tl.cex=.75)