Measurement Invariance + 2nd order Growth Model (ECLS-K)
Nilam Ram, Kevin Grimm, et al.
1 Overview
In this tutorial we walk through the very basics of testing measurement invariance in the context if a longitudinal factor model - and how a 2nd order growth model can then be used to describe change in an invariant factor.
This tutorial follows the example in Chapter 14 of Growth Modeling: Structural Equation and Multilevel Modeling Approaches (Grimm, Ram & Estabrook, 2017). Using 3-occasion data from the ECLS-K, we test for factorial invariance, and then use a second-order growth model to describe change in the factor scores across time.
2 Introduction to the Common Factor Model
The basic factor analysis model can be written as a matrix equation …
\[ \boldsymbol{Y_{i}} = \boldsymbol{\tau} + \boldsymbol{\Lambda}\boldsymbol{F_{i}} + \boldsymbol{U_{i}} \]
where \(\boldsymbol{Y_{i}}\) is a \(p\) x 1 vector of observed variable scores, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{F_{i}}\) is a \(q\) x 1 vector of common factor scores, and \(\boldsymbol{U_{i}}\) is a p x 1 vector of unique factor scores.
We can rewrite the model in terms of variance-covariance and mean expectations. For example, the expected covariance matrix, \(\boldsymbol{\Sigma} = \boldsymbol{Y}'\boldsymbol{Y}\), becomes
\[ \boldsymbol{\Sigma} = \boldsymbol{\Lambda}\boldsymbol{\Psi}\boldsymbol{\Lambda}' + \boldsymbol{\Theta} \] where \(\boldsymbol{\Sigma}\) is a p x p covariance (or correlation) matrix of the observed variables, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, \(\boldsymbol{\Psi}\) is a q x q covariance matrix of the latent factor variables, and \(\boldsymbol{\Theta}\) is a diagonal matrix of unique factor variances.
and the expected p x 1 mean vector, $ $ becomes \[ \boldsymbol{\mu} = \boldsymbol{\tau} + \boldsymbol{\Lambda}\boldsymbol{\alpha} \]
where \(\boldsymbol{\tau}\) is a p x 1 vector of manifest variable means, \(\boldsymbol{\Lambda}\) is a p x q matrix of factor loadings, and \(\boldsymbol{\alpha}\) is a q x 1 vector of latent variable means.
We can then extend the model to multiple-occasion settings with occasion-specific subscript, so that
\[ \boldsymbol{\Sigma_t} = \boldsymbol{\Lambda_t}\boldsymbol{\Psi_t}\boldsymbol{\Lambda_t}' + \boldsymbol{\Theta_t} \] and \[ \boldsymbol{\mu_t} = \boldsymbol{\tau_t} + \boldsymbol{\Lambda_t}\boldsymbol{\alpha_t} \]
Different levels of measurement invariance are established by testing (or requiring) that various matrices are equal.
Specifically, …
For Configural Invaraince we establish that the structure of \(\boldsymbol{\Lambda_t}\) is equivalent across occasions.
For Weak Invariance we additionally establish that the factor loadings are equivalent across occasions, \(\boldsymbol{\Lambda_t} = \boldsymbol{\Lambda}\).
For Strong Invariance we additionally establish that the manifest means are equivalent across occasions, \(\boldsymbol{\tau_t} = \boldsymbol{\tau}\). and
For Strict Invariance we additionally establish that the residual/unique variances are also equivalent across occasions \(\boldsymbol{\Theta_t} = \boldsymbol{\Theta}\).
Measurement Invariance testing is usually conducted within a Structural Equation Modeling (SEM) framework. Here we illustrate how this may be done using the lavaan
package.
Lots of good information and instruction (including about invariance testing - in a multiple-group setting) can be found on the package website … http://lavaan.ugent.be.
2.0.1 Prelim - Loading libraries used in this script.
library(psych)
library(ggplot2)
library(corrplot) #plotting correlation matrices
library(lavaan) #for fitting structural equation models
library(semPlot) #for automatically making diagrams
2.0.2 Prelim - Reading in Repeated Measures Data
For this example, we use an ECLS-K dataset that is in wideform. There are variables for children’s science, reading, and math aptitude scores, obtained in 3rd, 5th, and 8th grade. The three aptitude scores are considered indicators of an academic achievement latent factor.
#set filepath for data file
<- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/ECLS_Science.dat"
filepath #read in the text data file using the url() function
<- read.table(file=url(filepath),na.strings = ".")
dat
names(dat) <- c("id", "s_g3", "r_g3", "m_g3", "s_g5", "r_g5", "m_g5", "s_g8",
"r_g8", "m_g8", "st_g3", "rt_g3", "mt_g3", "st_g5", "rt_g5",
"mt_g5", "st_g8", "rt_g8", "mt_g8")
#selecting only the variables of interest
<- dat[ ,c("id", "s_g3", "r_g3", "m_g3", "s_g5", "r_g5", "m_g5", "s_g8",
dat "r_g8", "m_g8")]
2.0.3 Prelim - Descriptives
Lets have a quick look at the data file and the descriptives.
#data structure
head(dat,10)
id | s_g3 | r_g3 | m_g3 | s_g5 | r_g5 | m_g5 | s_g8 | r_g8 | m_g8 |
---|---|---|---|---|---|---|---|---|---|
1 | NA | NA | NA | NA | NA | NA | NA | NA | NA |
3 | NA | NA | NA | NA | NA | NA | NA | NA | NA |
8 | NA | NA | NA | NA | NA | NA | 103.90 | 204.10 | 166.67 |
16 | 51.57 | 142.18 | 115.59 | 65.94 | 141.02 | 133.67 | 86.90 | 169.83 | 156.67 |
28 | NA | NA | NA | NA | NA | NA | NA | NA | NA |
44 | NA | NA | NA | NA | NA | NA | NA | NA | NA |
46 | 72.09 | 154.43 | 96.87 | 79.44 | 170.57 | 116.28 | 89.08 | 192.07 | 132.40 |
62 | 34.71 | 106.40 | 87.86 | 47.44 | 145.72 | 104.68 | NA | NA | NA |
66 | NA | NA | NA | NA | NA | NA | NA | NA | NA |
74 | 62.94 | 126.06 | 92.47 | 73.70 | 145.17 | 124.73 | 92.67 | 193.43 | 133.93 |
#descriptives (means, sds)
::describe(dat[,-1]) #-1 to remove the id column psych
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
s_g3 | 1 | 1442 | 50.99316 | 15.61923 | 50.935 | 50.82393 | 16.76079 | 18.37 | 92.66 | 74.29 | 0.0789253 | -0.5926226 | 0.4113174 |
r_g3 | 2 | 1430 | 127.65780 | 29.21852 | 126.975 | 128.44100 | 31.33475 | 51.46 | 195.82 | 144.36 | -0.2126704 | -0.4974589 | 0.7726633 |
m_g3 | 3 | 1442 | 99.71625 | 25.53598 | 102.605 | 99.98539 | 27.70979 | 35.72 | 159.40 | 123.68 | -0.0954525 | -0.7022825 | 0.6724653 |
s_g5 | 4 | 1135 | 65.25489 | 16.18239 | 67.530 | 65.99278 | 16.51616 | 22.57 | 103.23 | 80.66 | -0.3888726 | -0.4883320 | 0.4803355 |
r_g5 | 5 | 1133 | 151.09049 | 27.30787 | 152.330 | 153.09954 | 26.73128 | 64.69 | 202.22 | 137.53 | -0.6153502 | -0.0358596 | 0.8112841 |
m_g5 | 6 | 1136 | 124.34706 | 25.16900 | 128.645 | 126.08475 | 25.13748 | 50.87 | 169.53 | 118.66 | -0.5818852 | -0.2415627 | 0.7467526 |
s_g8 | 7 | 947 | 84.88585 | 16.71213 | 88.930 | 86.88478 | 14.81117 | 29.61 | 107.90 | 78.29 | -0.9948362 | 0.4785809 | 0.5430713 |
r_g8 | 8 | 941 | 172.04634 | 27.72934 | 179.700 | 175.53752 | 24.90768 | 89.15 | 208.44 | 119.29 | -0.9835795 | 0.2411465 | 0.9039507 |
m_g8 | 9 | 945 | 142.46749 | 22.50262 | 147.360 | 145.06745 | 21.23083 | 67.75 | 172.20 | 104.45 | -0.9360756 | 0.3588947 | 0.7320104 |
#correlation matrix
round(cor(dat[,-1], use = "pairwise.complete"),2)
## s_g3 r_g3 m_g3 s_g5 r_g5 m_g5 s_g8 r_g8 m_g8
## s_g3 1.00 0.76 0.71 0.85 0.73 0.68 0.75 0.68 0.66
## r_g3 0.76 1.00 0.75 0.73 0.85 0.70 0.70 0.76 0.68
## m_g3 0.71 0.75 1.00 0.70 0.72 0.88 0.71 0.66 0.81
## s_g5 0.85 0.73 0.70 1.00 0.77 0.74 0.81 0.73 0.70
## r_g5 0.73 0.85 0.72 0.77 1.00 0.75 0.74 0.80 0.71
## m_g5 0.68 0.70 0.88 0.74 0.75 1.00 0.74 0.68 0.85
## s_g8 0.75 0.70 0.71 0.81 0.74 0.74 1.00 0.78 0.78
## r_g8 0.68 0.76 0.66 0.73 0.80 0.68 0.78 1.00 0.75
## m_g8 0.66 0.68 0.81 0.70 0.71 0.85 0.78 0.75 1.00
#visusal correlation matrix
corrplot(cor(dat[,-1], use = "pairwise.complete"), order = "original", tl.col='black', tl.cex=.75)