Chapter 16 – Latent Change Score Modeling

1 Overview

This tutorial walks through the fitting of univariate latent change score models in the structural equation modeling framework in R using using the lavaan package.

The example follows Chapter 16 of Grimm, Ram, and Estabrook (2017). Please refer to the chapter for further interpretations and insights about the analyses.

1.0.1 Prelim - Loading libraries used in this script.

library(psych)
library(ggplot2)
library(corrplot) #plotting correlation matrices
library(lavaan)  #for fitting structural equation models
library(semPlot)  #for automatically making diagrams

1.0.2 Prelim - Reading in Repeated Measures Data

We use data from the NLSY-CYA (Center for Human Resource Research, 2009) that includes repeated measures of children’s math ability (math) from the second through eighth grade.

Reading in the data

#set filepath
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_wide_R.dat"
#read in the text data file using the url() function
nlsy_data <- read.table(file=url(filepath),na.strings = ".")

#adding names for the columns of the data set
names(nlsy_data) <- c('id', 'female', 'lb_wght', 'anti_k1', 'math2', 'math3', 'math4', 'math5', 'math6', 'math7', 'math8', 'age2', 'age3', 'age4', 'age5', 'age6', 'age7', 'age8', 'men2', 'men3', 'men4', 'men5', 'men6', 'men7', 'men8', 'spring2', 'spring3', 'spring4', 'spring5', 'spring6', 'spring7', 'spring8', 'anti2', 'anti3', 'anti4', 'anti5', 'anti6', 'anti7', 'anti8')

#reduce data down to the id variable and the math and reading variables of interest
nlsy_data <- nlsy_data[ ,c('id', 'math2', 'math3', 'math4', 'math5', 'math6', 'math7', 'math8')]

psych::describe(nlsy_data)

	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
id	1	933	532334.89711	3.280208e+05	506602.0	520130.77108	391999.4400	201	1256601	1256400	0.2841659	-0.9078872	1.073892e+04
math2	2	335	32.60896	1.028600e+01	32.0	32.27509	10.3782	12	60	48	0.2686681	-0.4600130	5.619840e-01
math3	3	431	39.88399	1.029949e+01	41.0	39.88406	10.3782	13	67	54	-0.0520929	-0.3259168	4.961091e-01
math4	4	378	46.16931	1.016510e+01	46.0	46.22039	8.8956	18	70	52	-0.0595783	-0.0759665	5.228366e-01
math5	5	372	49.77419	9.471909e+00	48.0	49.76510	8.8956	23	71	48	0.0425474	-0.3381126	4.910956e-01
math6	6	390	52.72308	9.915594e+00	50.5	52.38462	9.6369	24	78	54	0.2510417	-0.3808615	5.020956e-01
math7	7	173	55.35260	1.062727e+01	53.0	55.08633	11.8608	31	81	50	0.2148479	-0.9709622	8.079765e-01
math8	8	142	57.83099	1.153101e+01	56.0	57.42982	12.6021	26	81	55	0.1590545	-0.5222967	9.676607e-01

1.0.3 Prelim - Plotting the Repeated Measures Data

#reshaping wide to wide
data_long <- reshape(data=nlsy_data,
                    varying = c('math2', 'math3', 'math4', 'math5', 'math6', 'math7', 'math8'),
                    timevar=c("grade"), 
                    idvar=c("id"),
                    direction="long", sep="")

#sorting for easy viewing
#reorder by id and day
data_long <- data_long[order(data_long$id,data_long$grade), ]

#looking at the long data
head(data_long, 8)

	id	grade	math
201.2	201	2	NA
201.3	201	3	38
201.4	201	4	NA
201.5	201	5	55
201.6	201	6	NA
201.7	201	7	NA
201.8	201	8	NA
303.2	303	2	26

#Plotting intraindividual change MATH
ggplot(data = data_long, aes(x = grade, y = math, group = id)) +
  geom_point(color="blue") + 
  geom_line(color="blue") +
  xlab("Grade") + 
  ylab("PIAT Mathematics") + 
  scale_x_continuous(limits=c(2,8), breaks=seq(2,8,by=1)) +
  scale_y_continuous(limits=c(0,90), breaks=seq(0,90,by=10))

## Warning: Removed 4310 rows containing missing values (geom_point).

## Warning: Removed 2787 row(s) containing missing values (geom_path).

The plot shows individual trajectories of the PIAT Mathematics scores across Grade 2 to 8. One can notice the attrition and only partial overlap across years - which requires some assumptions about the missing data (i.e., missing at random).

2 Dual Change Score Model

2.1 Model Specification

Following the examples in the book, we specify the Dual Change Score Model - which accomodates the nonlinear shape we see in the data. The model invokes the latent variables and builds the difference scores, incorporates the constant growth factor and the proportional change. Lines invoking each set of latent variables and arrows is indicated in the model specification.

The model diagram follows this kind of setup … diagram

Model Specification

dcm_math <- ' #opening quote
#MATHEMATICS
  #latent true scores (loadings = 1)
    lm1 =~ 1*math2
    lm2 =~ 1*math3
    lm3 =~ 1*math4
    lm4 =~ 1*math5
    lm5 =~ 1*math6
    lm6 =~ 1*math7
    lm7 =~ 1*math8

  #latent true score means (initial free, others = 0)  
    lm1 ~ 1
    lm2 ~ 0*1
    lm3 ~ 0*1
    lm4 ~ 0*1
    lm5 ~ 0*1
    lm6 ~ 0*1
    lm7 ~ 0*1

  #latent true score variances (initial free, others = 0)
    lm1 ~~ start(15)*lm1
    lm2 ~~ 0*lm2
    lm3 ~~ 0*lm3
    lm4 ~~ 0*lm4
    lm5 ~~ 0*lm5
    lm6 ~~ 0*lm6
    lm7 ~~ 0*lm7

  #observed intercepts (fixed to 0)
    math2 ~ 0*1
    math3 ~ 0*1
    math4 ~ 0*1
    math5 ~ 0*1
    math6 ~ 0*1
    math7 ~ 0*1
    math8 ~ 0*1

  #observed residual variances (constrained to equality)
    math2 ~~ sigma2_u*math2
    math3 ~~ sigma2_u*math3
    math4 ~~ sigma2_u*math4
    math5 ~~ sigma2_u*math5
    math6 ~~ sigma2_u*math6
    math7 ~~ sigma2_u*math7
    math8 ~~ sigma2_u*math8

  #autoregressions (fixed = 1)
    lm2 ~ 1*lm1
    lm3 ~ 1*lm2
    lm4 ~ 1*lm3
    lm5 ~ 1*lm4
    lm6 ~ 1*lm5
    lm7 ~ 1*lm6

  #latent change scores (fixed = 1)
    dm2 =~ 1*lm2
    dm3 =~ 1*lm3
    dm4 =~ 1*lm4
    dm5 =~ 1*lm5
    dm6 =~ 1*lm6
    dm7 =~ 1*lm7

  #latent change score means (constrained to 0)  
    dm2 ~ 0*1
    dm3 ~ 0*1
    dm4 ~ 0*1
    dm5 ~ 0*1
    dm6 ~ 0*1
    dm7 ~ 0*1

  #latent change score variances (constrained to 0)  
    dm2 ~~ 0*dm2
    dm3 ~~ 0*dm3
    dm4 ~~ 0*dm4
    dm5 ~~ 0*dm5
    dm6 ~~ 0*dm6
    dm7 ~~ 0*dm7

  #constant change factor (loadings = 1)
    g2 =~ 1*dm2 + 
          1*dm3 + 
          1*dm4 + 
          1*dm5 + 
          1*dm6 + 
          1*dm7 

  #constant change factor mean  
    g2 ~ start(15)*1

  #constant change factor variance
    g2 ~~ g2

  #constant change factor covariance with the initial true score
    g2 ~~ lm1

  #proportional effects (constrained equal)
    dm2 ~ start(-.2)*pi_m * lm1
    dm3 ~ start(-.2)*pi_m * lm2
    dm4 ~ start(-.2)*pi_m * lm3
    dm5 ~ start(-.2)*pi_m * lm4
    dm6 ~ start(-.2)*pi_m * lm5
    dm7 ~ start(-.2)*pi_m * lm6
' #closing quote

2.2 Model Estimation and Interpretation

We fit the model using lavaan.

#Model fitting
fit_math <- lavaan(dcm_math,
                data = nlsy_data, #note that fitting uses wide data
                meanstructure = TRUE,
                estimator = "ML",
                missing = "fiml",
                fixed.x = FALSE,
                mimic="mplus",
                control=list(iter.max=500),
                verbose=FALSE)

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
##   741

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: due to missing values, some pairwise combinations have less than 10%
## coverage

#Model summary
summary(fit_math, fit.measures=TRUE)

## lavaan 0.6-8 ended normally after 48 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        18
##   Number of equality constraints                    11
##                                                       
##                                                   Used       Total
##   Number of observations                           932         933
##   Number of missing patterns                        60            
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                58.308
##   Degrees of freedom                                28
##   P-value (Chi-square)                           0.001
## 
## Model Test Baseline Model:
## 
##   Test statistic                               862.333
##   Degrees of freedom                                21
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.964
##   Tucker-Lewis Index (TLI)                       0.973
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -7895.605
##   Loglikelihood unrestricted model (H1)      -7866.451
##                                                       
##   Akaike (AIC)                               15805.209
##   Bayesian (BIC)                             15839.071
##   Sample-size adjusted Bayesian (BIC)        15816.839
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.034
##   90 Percent confidence interval - lower         0.022
##   90 Percent confidence interval - upper         0.046
##   P-value RMSEA <= 0.05                          0.985
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.136
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   lm1 =~                                              
##     math2             1.000                           
##   lm2 =~                                              
##     math3             1.000                           
##   lm3 =~                                              
##     math4             1.000                           
##   lm4 =~                                              
##     math5             1.000                           
##   lm5 =~                                              
##     math6             1.000                           
##   lm6 =~                                              
##     math7             1.000                           
##   lm7 =~                                              
##     math8             1.000                           
##   dm2 =~                                              
##     lm2               1.000                           
##   dm3 =~                                              
##     lm3               1.000                           
##   dm4 =~                                              
##     lm4               1.000                           
##   dm5 =~                                              
##     lm5               1.000                           
##   dm6 =~                                              
##     lm6               1.000                           
##   dm7 =~                                              
##     lm7               1.000                           
##   g2 =~                                               
##     dm2               1.000                           
##     dm3               1.000                           
##     dm4               1.000                           
##     dm5               1.000                           
##     dm6               1.000                           
##     dm7               1.000                           
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   lm2 ~                                               
##     lm1               1.000                           
##   lm3 ~                                               
##     lm2               1.000                           
##   lm4 ~                                               
##     lm3               1.000                           
##   lm5 ~                                               
##     lm4               1.000                           
##   lm6 ~                                               
##     lm5               1.000                           
##   lm7 ~                                               
##     lm6               1.000                           
##   dm2 ~                                               
##     lm1     (pi_m)   -0.241    0.018  -13.387    0.000
##   dm3 ~                                               
##     lm2     (pi_m)   -0.241    0.018  -13.387    0.000
##   dm4 ~                                               
##     lm3     (pi_m)   -0.241    0.018  -13.387    0.000
##   dm5 ~                                               
##     lm4     (pi_m)   -0.241    0.018  -13.387    0.000
##   dm6 ~                                               
##     lm5     (pi_m)   -0.241    0.018  -13.387    0.000
##   dm7 ~                                               
##     lm6     (pi_m)   -0.241    0.018  -13.387    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   lm1 ~~                                              
##     g2               13.746    1.699    8.092    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     lm1              32.533    0.434   74.955    0.000
##    .lm2               0.000                           
##    .lm3               0.000                           
##    .lm4               0.000                           
##    .lm5               0.000                           
##    .lm6               0.000                           
##    .lm7               0.000                           
##    .math2             0.000                           
##    .math3             0.000                           
##    .math4             0.000                           
##    .math5             0.000                           
##    .math6             0.000                           
##    .math7             0.000                           
##    .math8             0.000                           
##    .dm2               0.000                           
##    .dm3               0.000                           
##    .dm4               0.000                           
##    .dm5               0.000                           
##    .dm6               0.000                           
##    .dm7               0.000                           
##     g2               15.222    0.815   18.687    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     lm1              71.900    6.548   10.980    0.000
##    .lm2               0.000                           
##    .lm3               0.000                           
##    .lm4               0.000                           
##    .lm5               0.000                           
##    .lm6               0.000                           
##    .lm7               0.000                           
##    .math2   (sg2_)   30.818    1.700   18.126    0.000
##    .math3   (sg2_)   30.818    1.700   18.126    0.000
##    .math4   (sg2_)   30.818    1.700   18.126    0.000
##    .math5   (sg2_)   30.818    1.700   18.126    0.000
##    .math6   (sg2_)   30.818    1.700   18.126    0.000
##    .math7   (sg2_)   30.818    1.700   18.126    0.000
##    .math8   (sg2_)   30.818    1.700   18.126    0.000
##    .dm2               0.000                           
##    .dm3               0.000                           
##    .dm4               0.000                           
##    .dm5               0.000                           
##    .dm6               0.000                           
##    .dm7               0.000                           
##     g2                5.602    0.837    6.695    0.000

#Model Diagram 
# semPaths(dcm_math, what="est",
#          sizeLat = 7, sizeMan = 7, edge.label.cex = .75)
#Note that semPaths does not know how to draw this kind of model. 
#So, better to map to the diagram from the book.

The change equation based on the ouput could be written as

Change in Math:

\[\Delta_{math} = 15.222 - 0.241(math_{t-1})\] We see there is both constant change (a linear component) and proportional change (a nonlinear component)

2.3 Predicted scores

#obtaining predicted scores
nlsy_predicted <- cbind(nlsy_data$id, as.data.frame(lavPredict(fit_math, type = "yhat")))
names(nlsy_predicted)[1] <- "id"
#looking at data
head(nlsy_predicted)

id	math2	math3	math4	math5	math6	math7	math8
201	33.14196	40.99641	46.95784	51.48248	54.91663	57.52309	59.50137
303	24.16182	30.57662	35.44537	39.14069	41.94539	44.07411	45.68979
2702	49.24606	56.78947	62.51481	66.86027	70.15842	72.66167	74.56160
4303	37.40837	45.27911	51.25289	55.78691	59.22818	61.84005	63.82242
5002	33.84478	41.89858	48.01132	52.65080	56.17210	58.84472	60.87321
5005	35.65097	43.24185	49.00324	53.37605	56.69495	59.21396	61.12585

#reshaping wide to long
predicted_long <- reshape(data=nlsy_predicted,
                    varying = c('math2', 'math3', 'math4', 'math5', 'math6', 'math7', 'math8'),
                    timevar=c("grade"), 
                    idvar=c("id"),
                    direction="long", sep="")

#sorting for easy viewing
#reorder by id and day
predicted_long <- predicted_long[order(predicted_long$id,predicted_long$grade), ]

#looking at the long data
head(predicted_long, 14)

	id	grade	math
201.2	201	2	33.14196
201.3	201	3	40.99641
201.4	201	4	46.95784
201.5	201	5	51.48248
201.6	201	6	54.91663
201.7	201	7	57.52309
201.8	201	8	59.50137
303.2	303	2	24.16182
303.3	303	3	30.57662
303.4	303	4	35.44537
303.5	303	5	39.14069
303.6	303	6	41.94539
303.7	303	7	44.07411
303.8	303	8	45.68979

#Plotting intraindividual change MATH
ggplot(data = predicted_long, aes(x = grade, y = math, group = id)) +
  #geom_point(color="blue") + 
  geom_line(color="blue") +
  xlab("Grade") + 
  ylab("Predicted PIAT Mathematics") + 
  scale_x_continuous(limits=c(2,8), breaks=seq(2,8,by=1)) +
  scale_y_continuous(limits=c(0,90), breaks=seq(0,90,by=10))

## Warning: Removed 7 row(s) containing missing values (geom_path).

These figures (similar to Figure 16.3 in the book) show the nonlinear exponential-type shape that is captured by the dual change score model. Notice also how the missing data was inferred through full information maximum likelihood uncer the missing at random assumption.

See also Ghisletta, P., & McArdle, J. J. (2012). Teacher’s Corner: Latent Curve Models and Latent Change Score Models Estimated in R. Structural Equation Modeling, 19(4): 651-682. doi:10.1080/10705511.2012.713275. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4259494/pdf/nihms412332.pdf

3 Conclusion

The latent change score models provide a framework for examining change as an outcome explicitly by constructing a series of latent change variables and modeling those. This changes our thinking from modeling the observed repeated measures outcome \(Y_it\) to thinking about the change \(\Delta Y_{it}\) explicitly. This totally expands how we model dynamics of change!

Change to Change! Go for it!

Chapter 16 - Latent Change Score Modeling

Nilam Ram, Kevin Grimm et al.