Chapter 16 - Latent Change Score Modeling
Nilam Ram, Kevin Grimm et al.
1 Overview
This tutorial walks through the fitting of univariate latent change score models in the structural equation modeling framework in R using using the lavaan
package.
The example follows Chapter 16 of Grimm, Ram, and Estabrook (2017). Please refer to the chapter for further interpretations and insights about the analyses.
1.0.1 Prelim - Loading libraries used in this script.
library(psych)
library(ggplot2)
library(corrplot) #plotting correlation matrices
library(lavaan) #for fitting structural equation models
library(semPlot) #for automatically making diagrams
1.0.2 Prelim - Reading in Repeated Measures Data
We use data from the NLSY-CYA (Center for Human Resource Research, 2009) that includes repeated measures of children’s math ability (math) from the second through eighth grade.
Reading in the data
#set filepath
<- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_wide_R.dat"
filepath #read in the text data file using the url() function
<- read.table(file=url(filepath),na.strings = ".")
nlsy_data
#adding names for the columns of the data set
names(nlsy_data) <- c('id', 'female', 'lb_wght', 'anti_k1', 'math2', 'math3', 'math4', 'math5', 'math6', 'math7', 'math8', 'age2', 'age3', 'age4', 'age5', 'age6', 'age7', 'age8', 'men2', 'men3', 'men4', 'men5', 'men6', 'men7', 'men8', 'spring2', 'spring3', 'spring4', 'spring5', 'spring6', 'spring7', 'spring8', 'anti2', 'anti3', 'anti4', 'anti5', 'anti6', 'anti7', 'anti8')
#reduce data down to the id variable and the math and reading variables of interest
<- nlsy_data[ ,c('id', 'math2', 'math3', 'math4', 'math5', 'math6', 'math7', 'math8')]
nlsy_data
::describe(nlsy_data) psych
vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | 1 | 933 | 532334.89711 | 3.280208e+05 | 506602.0 | 520130.77108 | 391999.4400 | 201 | 1256601 | 1256400 | 0.2841659 | -0.9078872 | 1.073892e+04 |
math2 | 2 | 335 | 32.60896 | 1.028600e+01 | 32.0 | 32.27509 | 10.3782 | 12 | 60 | 48 | 0.2686681 | -0.4600130 | 5.619840e-01 |
math3 | 3 | 431 | 39.88399 | 1.029949e+01 | 41.0 | 39.88406 | 10.3782 | 13 | 67 | 54 | -0.0520929 | -0.3259168 | 4.961091e-01 |
math4 | 4 | 378 | 46.16931 | 1.016510e+01 | 46.0 | 46.22039 | 8.8956 | 18 | 70 | 52 | -0.0595783 | -0.0759665 | 5.228366e-01 |
math5 | 5 | 372 | 49.77419 | 9.471909e+00 | 48.0 | 49.76510 | 8.8956 | 23 | 71 | 48 | 0.0425474 | -0.3381126 | 4.910956e-01 |
math6 | 6 | 390 | 52.72308 | 9.915594e+00 | 50.5 | 52.38462 | 9.6369 | 24 | 78 | 54 | 0.2510417 | -0.3808615 | 5.020956e-01 |
math7 | 7 | 173 | 55.35260 | 1.062727e+01 | 53.0 | 55.08633 | 11.8608 | 31 | 81 | 50 | 0.2148479 | -0.9709622 | 8.079765e-01 |
math8 | 8 | 142 | 57.83099 | 1.153101e+01 | 56.0 | 57.42982 | 12.6021 | 26 | 81 | 55 | 0.1590545 | -0.5222967 | 9.676607e-01 |
1.0.3 Prelim - Plotting the Repeated Measures Data
#reshaping wide to wide
<- reshape(data=nlsy_data,
data_long varying = c('math2', 'math3', 'math4', 'math5', 'math6', 'math7', 'math8'),
timevar=c("grade"),
idvar=c("id"),
direction="long", sep="")
#sorting for easy viewing
#reorder by id and day
<- data_long[order(data_long$id,data_long$grade), ]
data_long
#looking at the long data
head(data_long, 8)
id | grade | math | |
---|---|---|---|
201.2 | 201 | 2 | NA |
201.3 | 201 | 3 | 38 |
201.4 | 201 | 4 | NA |
201.5 | 201 | 5 | 55 |
201.6 | 201 | 6 | NA |
201.7 | 201 | 7 | NA |
201.8 | 201 | 8 | NA |
303.2 | 303 | 2 | 26 |
#Plotting intraindividual change MATH
ggplot(data = data_long, aes(x = grade, y = math, group = id)) +
geom_point(color="blue") +
geom_line(color="blue") +
xlab("Grade") +
ylab("PIAT Mathematics") +
scale_x_continuous(limits=c(2,8), breaks=seq(2,8,by=1)) +
scale_y_continuous(limits=c(0,90), breaks=seq(0,90,by=10))
## Warning: Removed 4310 rows containing missing values (geom_point).
## Warning: Removed 2787 row(s) containing missing values (geom_path).
The plot shows individual trajectories of the PIAT Mathematics scores across Grade 2 to 8. One can notice the attrition and only partial overlap across years - which requires some assumptions about the missing data (i.e., missing at random).
2 Dual Change Score Model
2.1 Model Specification
Following the examples in the book, we specify the Dual Change Score Model - which accomodates the nonlinear shape we see in the data. The model invokes the latent variables and builds the difference scores, incorporates the constant growth factor and the proportional change. Lines invoking each set of latent variables and arrows is indicated in the model specification.
The model diagram follows this kind of setup …
Model Specification
<- ' #opening quote
dcm_math #MATHEMATICS
#latent true scores (loadings = 1)
lm1 =~ 1*math2
lm2 =~ 1*math3
lm3 =~ 1*math4
lm4 =~ 1*math5
lm5 =~ 1*math6
lm6 =~ 1*math7
lm7 =~ 1*math8
#latent true score means (initial free, others = 0)
lm1 ~ 1
lm2 ~ 0*1
lm3 ~ 0*1
lm4 ~ 0*1
lm5 ~ 0*1
lm6 ~ 0*1
lm7 ~ 0*1
#latent true score variances (initial free, others = 0)
lm1 ~~ start(15)*lm1
lm2 ~~ 0*lm2
lm3 ~~ 0*lm3
lm4 ~~ 0*lm4
lm5 ~~ 0*lm5
lm6 ~~ 0*lm6
lm7 ~~ 0*lm7
#observed intercepts (fixed to 0)
math2 ~ 0*1
math3 ~ 0*1
math4 ~ 0*1
math5 ~ 0*1
math6 ~ 0*1
math7 ~ 0*1
math8 ~ 0*1
#observed residual variances (constrained to equality)
math2 ~~ sigma2_u*math2
math3 ~~ sigma2_u*math3
math4 ~~ sigma2_u*math4
math5 ~~ sigma2_u*math5
math6 ~~ sigma2_u*math6
math7 ~~ sigma2_u*math7
math8 ~~ sigma2_u*math8
#autoregressions (fixed = 1)
lm2 ~ 1*lm1
lm3 ~ 1*lm2
lm4 ~ 1*lm3
lm5 ~ 1*lm4
lm6 ~ 1*lm5
lm7 ~ 1*lm6
#latent change scores (fixed = 1)
dm2 =~ 1*lm2
dm3 =~ 1*lm3
dm4 =~ 1*lm4
dm5 =~ 1*lm5
dm6 =~ 1*lm6
dm7 =~ 1*lm7
#latent change score means (constrained to 0)
dm2 ~ 0*1
dm3 ~ 0*1
dm4 ~ 0*1
dm5 ~ 0*1
dm6 ~ 0*1
dm7 ~ 0*1
#latent change score variances (constrained to 0)
dm2 ~~ 0*dm2
dm3 ~~ 0*dm3
dm4 ~~ 0*dm4
dm5 ~~ 0*dm5
dm6 ~~ 0*dm6
dm7 ~~ 0*dm7
#constant change factor (loadings = 1)
g2 =~ 1*dm2 +
1*dm3 +
1*dm4 +
1*dm5 +
1*dm6 +
1*dm7
#constant change factor mean
g2 ~ start(15)*1
#constant change factor variance
g2 ~~ g2
#constant change factor covariance with the initial true score
g2 ~~ lm1
#proportional effects (constrained equal)
dm2 ~ start(-.2)*pi_m * lm1
dm3 ~ start(-.2)*pi_m * lm2
dm4 ~ start(-.2)*pi_m * lm3
dm5 ~ start(-.2)*pi_m * lm4
dm6 ~ start(-.2)*pi_m * lm5
dm7 ~ start(-.2)*pi_m * lm6
' #closing quote
2.2 Model Estimation and Interpretation
We fit the model using lavaan.
#Model fitting
<- lavaan(dcm_math,
fit_math data = nlsy_data, #note that fitting uses wide data
meanstructure = TRUE,
estimator = "ML",
missing = "fiml",
fixed.x = FALSE,
mimic="mplus",
control=list(iter.max=500),
verbose=FALSE)
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
## 741
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: due to missing values, some pairwise combinations have less than 10%
## coverage
#Model summary
summary(fit_math, fit.measures=TRUE)
## lavaan 0.6-8 ended normally after 48 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 18
## Number of equality constraints 11
##
## Used Total
## Number of observations 932 933
## Number of missing patterns 60
##
## Model Test User Model:
##
## Test statistic 58.308
## Degrees of freedom 28
## P-value (Chi-square) 0.001
##
## Model Test Baseline Model:
##
## Test statistic 862.333
## Degrees of freedom 21
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.964
## Tucker-Lewis Index (TLI) 0.973
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -7895.605
## Loglikelihood unrestricted model (H1) -7866.451
##
## Akaike (AIC) 15805.209
## Bayesian (BIC) 15839.071
## Sample-size adjusted Bayesian (BIC) 15816.839
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.034
## 90 Percent confidence interval - lower 0.022
## 90 Percent confidence interval - upper 0.046
## P-value RMSEA <= 0.05 0.985
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.136
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## lm1 =~
## math2 1.000
## lm2 =~
## math3 1.000
## lm3 =~
## math4 1.000
## lm4 =~
## math5 1.000
## lm5 =~
## math6 1.000
## lm6 =~
## math7 1.000
## lm7 =~
## math8 1.000
## dm2 =~
## lm2 1.000
## dm3 =~
## lm3 1.000
## dm4 =~
## lm4 1.000
## dm5 =~
## lm5 1.000
## dm6 =~
## lm6 1.000
## dm7 =~
## lm7 1.000
## g2 =~
## dm2 1.000
## dm3 1.000
## dm4 1.000
## dm5 1.000
## dm6 1.000
## dm7 1.000
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## lm2 ~
## lm1 1.000
## lm3 ~
## lm2 1.000
## lm4 ~
## lm3 1.000
## lm5 ~
## lm4 1.000
## lm6 ~
## lm5 1.000
## lm7 ~
## lm6 1.000
## dm2 ~
## lm1 (pi_m) -0.241 0.018 -13.387 0.000
## dm3 ~
## lm2 (pi_m) -0.241 0.018 -13.387 0.000
## dm4 ~
## lm3 (pi_m) -0.241 0.018 -13.387 0.000
## dm5 ~
## lm4 (pi_m) -0.241 0.018 -13.387 0.000
## dm6 ~
## lm5 (pi_m) -0.241 0.018 -13.387 0.000
## dm7 ~
## lm6 (pi_m) -0.241 0.018 -13.387 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## lm1 ~~
## g2 13.746 1.699 8.092 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## lm1 32.533 0.434 74.955 0.000
## .lm2 0.000
## .lm3 0.000
## .lm4 0.000
## .lm5 0.000
## .lm6 0.000
## .lm7 0.000
## .math2 0.000
## .math3 0.000
## .math4 0.000
## .math5 0.000
## .math6 0.000
## .math7 0.000
## .math8 0.000
## .dm2 0.000
## .dm3 0.000
## .dm4 0.000
## .dm5 0.000
## .dm6 0.000
## .dm7 0.000
## g2 15.222 0.815 18.687 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## lm1 71.900 6.548 10.980 0.000
## .lm2 0.000
## .lm3 0.000
## .lm4 0.000
## .lm5 0.000
## .lm6 0.000
## .lm7 0.000
## .math2 (sg2_) 30.818 1.700 18.126 0.000
## .math3 (sg2_) 30.818 1.700 18.126 0.000
## .math4 (sg2_) 30.818 1.700 18.126 0.000
## .math5 (sg2_) 30.818 1.700 18.126 0.000
## .math6 (sg2_) 30.818 1.700 18.126 0.000
## .math7 (sg2_) 30.818 1.700 18.126 0.000
## .math8 (sg2_) 30.818 1.700 18.126 0.000
## .dm2 0.000
## .dm3 0.000
## .dm4 0.000
## .dm5 0.000
## .dm6 0.000
## .dm7 0.000
## g2 5.602 0.837 6.695 0.000
#Model Diagram
# semPaths(dcm_math, what="est",
# sizeLat = 7, sizeMan = 7, edge.label.cex = .75)
#Note that semPaths does not know how to draw this kind of model.
#So, better to map to the diagram from the book.
The change equation based on the ouput could be written as
Change in Math:
\[\Delta_{math} = 15.222 - 0.241(math_{t-1})\] We see there is both constant change (a linear component) and proportional change (a nonlinear component)
2.3 Predicted scores
#obtaining predicted scores
<- cbind(nlsy_data$id, as.data.frame(lavPredict(fit_math, type = "yhat")))
nlsy_predicted names(nlsy_predicted)[1] <- "id"
#looking at data
head(nlsy_predicted)
id | math2 | math3 | math4 | math5 | math6 | math7 | math8 |
---|---|---|---|---|---|---|---|
201 | 33.14196 | 40.99641 | 46.95784 | 51.48248 | 54.91663 | 57.52309 | 59.50137 |
303 | 24.16182 | 30.57662 | 35.44537 | 39.14069 | 41.94539 | 44.07411 | 45.68979 |
2702 | 49.24606 | 56.78947 | 62.51481 | 66.86027 | 70.15842 | 72.66167 | 74.56160 |
4303 | 37.40837 | 45.27911 | 51.25289 | 55.78691 | 59.22818 | 61.84005 | 63.82242 |
5002 | 33.84478 | 41.89858 | 48.01132 | 52.65080 | 56.17210 | 58.84472 | 60.87321 |
5005 | 35.65097 | 43.24185 | 49.00324 | 53.37605 | 56.69495 | 59.21396 | 61.12585 |
#reshaping wide to long
<- reshape(data=nlsy_predicted,
predicted_long varying = c('math2', 'math3', 'math4', 'math5', 'math6', 'math7', 'math8'),
timevar=c("grade"),
idvar=c("id"),
direction="long", sep="")
#sorting for easy viewing
#reorder by id and day
<- predicted_long[order(predicted_long$id,predicted_long$grade), ]
predicted_long
#looking at the long data
head(predicted_long, 14)
id | grade | math | |
---|---|---|---|
201.2 | 201 | 2 | 33.14196 |
201.3 | 201 | 3 | 40.99641 |
201.4 | 201 | 4 | 46.95784 |
201.5 | 201 | 5 | 51.48248 |
201.6 | 201 | 6 | 54.91663 |
201.7 | 201 | 7 | 57.52309 |
201.8 | 201 | 8 | 59.50137 |
303.2 | 303 | 2 | 24.16182 |
303.3 | 303 | 3 | 30.57662 |
303.4 | 303 | 4 | 35.44537 |
303.5 | 303 | 5 | 39.14069 |
303.6 | 303 | 6 | 41.94539 |
303.7 | 303 | 7 | 44.07411 |
303.8 | 303 | 8 | 45.68979 |
#Plotting intraindividual change MATH
ggplot(data = predicted_long, aes(x = grade, y = math, group = id)) +
#geom_point(color="blue") +
geom_line(color="blue") +
xlab("Grade") +
ylab("Predicted PIAT Mathematics") +
scale_x_continuous(limits=c(2,8), breaks=seq(2,8,by=1)) +
scale_y_continuous(limits=c(0,90), breaks=seq(0,90,by=10))
## Warning: Removed 7 row(s) containing missing values (geom_path).
These figures (similar to Figure 16.3 in the book) show the nonlinear exponential-type shape that is captured by the dual change score model. Notice also how the missing data was inferred through full information maximum likelihood uncer the missing at random assumption.
See also Ghisletta, P., & McArdle, J. J. (2012). Teacher’s Corner: Latent Curve Models and Latent Change Score Models Estimated in R. Structural Equation Modeling, 19(4): 651-682. doi:10.1080/10705511.2012.713275. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4259494/pdf/nihms412332.pdf
3 Conclusion
The latent change score models provide a framework for examining change as an outcome explicitly by constructing a series of latent change variables and modeling those. This changes our thinking from modeling the observed repeated measures outcome \(Y_it\) to thinking about the change \(\Delta Y_{it}\) explicitly. This totally expands how we model dynamics of change!
Change to Change! Go for it!