Linear Growth Model – SEM Implementation in R

1 Overview
- 1.0.1 Preliminaries - Loading libraries used in this script.
- 1.0.2 Preliminaries - Data Preparation and Description
2 No Growth Model
3 Linear Growth Model
4 Conclusion

1 Overview

This tutorial illustrates fitting of linear growth models in the SEM framework in R using the lavaan package.

Example data and code are drawn from Chapter 3 of Grimm, Ram, and Estabrook (2017). Specifically, we examine change in children’s mathematics achievement through elementary and middle school using the NLSY-CYA Dataset. We fit both No Growth and Linear Growth moodels. Please see the book chapter for additional interpretations and insights about the analyses.

1.0.1 Preliminaries - Loading libraries used in this script.

library(psych)  #for basic functions
library(ggplot2)  #for plotting
library(lavaan) #for SEM 
library(semPlot) #for making SEM diagrams

1.0.2 Preliminaries - Data Preparation and Description

For our examples, we use the mathematics achievement scores from the NLSY-CYA Wide Data.

Load the repeated measures data

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_wide_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
                  na.strings = ".")  #indicates the missing data designator
#copy data with new name 
nlsy_math_wide <- dat  

# Give the variable names
names(nlsy_math_wide)=c('id'     ,'female' ,'lb_wght','anti_k1',
                        'math2'  ,'math3'  ,'math4'  ,'math5'  ,'math6'  ,'math7'  ,'math8'  ,
                        'age2'   ,'age3'   ,'age4'   ,'age5'   ,'age6'   ,'age7'   ,'age8'   ,
                        'men2'   ,'men3'   ,'men4'   ,'men5'   ,'men6'   ,'men7'   ,'men8'   ,
                        'spring2','spring3','spring4','spring5','spring6','spring7','spring8',
                        'anti2'  ,'anti3'  ,'anti4'  ,'anti5'  ,'anti6'  ,'anti7'  ,'anti8')


#view the first few observations (and columns) in the data set 
head(nlsy_math_wide[ ,1:11], 10)

id	female	anti_k1	math2	math3	math4	math5	math6	math7	math8
201	1	0	NA	38	NA	55	NA	NA	NA
303	1	1	26	NA	NA	33	NA	NA	NA
2702	0	0	56	NA	58	NA	NA	NA	80
4303	1	0	NA	41	58	NA	NA	NA	NA
5002	0	4	NA	NA	46	NA	54	NA	66
5005	1	0	35	NA	50	NA	60	NA	59
5701	0	2	NA	62	61	NA	NA	NA	NA
6102	0	0	NA	NA	55	67	NA	81	NA
6801	1	0	NA	54	NA	62	NA	66	NA
6802	0	0	NA	55	NA	66	NA	68	NA

Our specific interest is in change across the the repeated measures of math, math2 to math8.

As noted in Chapter 2 , it is important to plot the data to obtain a better understanding of the structure and form of the observed phenomenon. Here, we want to examine the data to make sure a growth model would be an appropriate analysis for the data (i.e., we need to check that there is in fact growth to model).

Longitudinal Plot of Math across Grade at Testing We have to reshape from wide to long for plotting.

#subsetting to variables of interest
nlsy_math_sub <- nlsy_math_wide[ ,c("id", "math2", "math3", "math4", 
                                    "math5", "math6", "math7", "math8")]
#reshaping wide to long
nlsy_math_long <- reshape(data=nlsy_math_sub, 
                          timevar=c("grade"), 
                          idvar="id",
                          varying=c("math2", "math3", "math4", 
                                    "math5", "math6", "math7", "math8"),
                          direction="long", sep="")
#sorting for easy viewing
# order by id and time
nlsy_math_long <- nlsy_math_long[order(nlsy_math_long$id,nlsy_math_long$grade), ]
#remove rows with NA for math (needed to get trajoctory lines connected)
nlsy_math_long <- nlsy_math_long[which(is.na(nlsy_math_long$math) == FALSE), ]

#intraindividual change trajetories
ggplot(data=nlsy_math_long,                    #data set
       aes(x = grade, y = math, group = id)) + #setting variables
  geom_point(size=.5) + #adding points to plot
  geom_line() +  #adding lines to plot
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks = c(2,3,4,5,6,7,8), 
                     name = "Grade at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks = c(10,30,50,70,90), 
                     name = "PIAT Mathematics")

We see both intraindividual change and interindividual differences in change.

Of note, the data begin at Grade 2 - suggesting that we rescale the time variable so that time = 0 is located at the Grade 2 assessment.

2 No Growth Model

There appears to be systematic growth in the mathematics scores over time, as can be seen in our plot above. To begin the analysis we will fit a no growth model. This no growth model will act as our “null” model to which to compare later models.

Now, we set up the no growth model as a lavaan model

#writing out no growth model in full SEM way 
ng_math_lavaan_model <- ' 
  # latent variable definitions
      #intercept
      eta_1 =~ 1*math2
      eta_1 =~ 1*math3
      eta_1 =~ 1*math4
      eta_1 =~ 1*math5
      eta_1 =~ 1*math6
      eta_1 =~ 1*math7
      eta_1 =~ 1*math8

  # factor variances
      eta_1 ~~ eta_1

  # covariances among factors 
      #none (only 1 factor)

  # factor means 
      eta_1 ~ start(30)*1

  # manifest variances (made equivalent by naming theta)
      math2 ~~ theta*math2
      math3 ~~ theta*math3
      math4 ~~ theta*math4
      math5 ~~ theta*math5
      math6 ~~ theta*math6
      math7 ~~ theta*math7
      math8 ~~ theta*math8
  # manifest means (fixed at zero)
      math2 ~ 0*1
      math3 ~ 0*1
      math4 ~ 0*1
      math5 ~ 0*1
      math6 ~ 0*1
      math7 ~ 0*1
      math8 ~ 0*1
' #end of model definition

And fit the model to the data …

#estimating the model using sem() function
ng_math_lavaan_fit <- sem(ng_math_lavaan_model, 
                          data = nlsy_math_wide,
                          meanstructure = TRUE,
                          estimator = "ML",
                          missing = "fiml")

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
##   741

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: due to missing values, some pairwise combinations have less than 10%
## coverage

We can look more specifically at the numerical summaries of the model.

#obtaining summary of the model using the object we just created
summary(ng_math_lavaan_fit, fit.measures=TRUE)

## lavaan 0.6-7 ended normally after 18 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                          9
##   Number of equality constraints                     6
##                                                       
##                                                   Used       Total
##   Number of observations                           932         933
##   Number of missing patterns                        60            
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                              1759.002
##   Degrees of freedom                                32
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               862.333
##   Degrees of freedom                                21
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.000
##   Tucker-Lewis Index (TLI)                      -0.347
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -8745.952
##   Loglikelihood unrestricted model (H1)      -7866.451
##                                                       
##   Akaike (AIC)                               17497.903
##   Bayesian (BIC)                             17512.415
##   Sample-size adjusted Bayesian (BIC)        17502.888
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.241
##   90 Percent confidence interval - lower         0.231
##   90 Percent confidence interval - upper         0.250
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.480
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   eta_1 =~                                            
##     math2             1.000                           
##     math3             1.000                           
##     math4             1.000                           
##     math5             1.000                           
##     math6             1.000                           
##     math7             1.000                           
##     math8             1.000                           
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     eta_1            45.915    0.324  141.721    0.000
##    .math2             0.000                           
##    .math3             0.000                           
##    .math4             0.000                           
##    .math5             0.000                           
##    .math6             0.000                           
##    .math7             0.000                           
##    .math8             0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     eta_1            46.917    4.832    9.709    0.000
##    .math2   (thet)  116.682    4.548   25.656    0.000
##    .math3   (thet)  116.682    4.548   25.656    0.000
##    .math4   (thet)  116.682    4.548   25.656    0.000
##    .math5   (thet)  116.682    4.548   25.656    0.000
##    .math6   (thet)  116.682    4.548   25.656    0.000
##    .math7   (thet)  116.682    4.548   25.656    0.000
##    .math8   (thet)  116.682    4.548   25.656    0.000

#Other summaries
#parameterEstimates(ng_math_lavaan_fit)
#inspect(ng_math_lavaan_fit, what="est")

We get information about how the model fits the data, as well as some information about the data itself. Interpetation is covered in more detail in the accompanying video.

We can use the fitted model object to make a diagram (and also check if we specified and estimated the model as intended).

#diagram of fitted model
semPaths(ng_math_lavaan_fit,what = "path", whatLabels = "par")

Pulling predicted trajectories out of lavaan takes a bit of work, but is possible.

#obtaining predicted factor scores for individuals
nlsy_math_predicted <- as.data.frame(cbind(nlsy_math_wide$id,lavPredict(ng_math_lavaan_fit)))

#naming columns
names(nlsy_math_predicted) <- c("id", "eta_1")

#looking at data
head(nlsy_math_predicted)

id	eta_1
201	46.17558
303	38.59816
2702	56.16725
4303	47.51278
5002	51.06429
5005	49.05038

#calculating implied manifest scores
nlsy_math_predicted$math2 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math3 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math4 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math5 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math6 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math7 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math8 <- 1*nlsy_math_predicted$eta_1

#reshaping wide to long
nlsy_math_predicted_long <- reshape(data=nlsy_math_predicted, 
                          timevar=c("grade"), 
                          idvar="id",
                          varying=c("math2", "math3", "math4", 
                                    "math5", "math6", "math7", "math8"),
                          direction="long", sep="")
#sorting for easy viewing
# order by id and time
nlsy_math_predicted_long <- nlsy_math_predicted_long[order(nlsy_math_predicted_long$id,nlsy_math_predicted_long$grade), ]

#intraindividual change trajetories
ggplot(data=nlsy_math_predicted_long,                    #data set
       aes(x = grade, y = math, group = id)) + #setting variables
  #geom_point(size=.5) + #adding points to plot
  geom_line() +  #adding lines to plot
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks = c(2,3,4,5,6,7,8), 
                     name = "Grade at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks = c(10,30,50,70,90), 
                     name = "Predicted PIAT Mathematics")

We see from the plot that this no growth model characterizes each individual’s trajectory as a straight, flat line.

3 Linear Growth Model

To capture the systematic growth in the mathematics scores over time, we exapnd the model to include a linear “slope” factor.

Specifying linear growth model

#writing out linear growth model in full SEM way 
lg_math_lavaan_model <- '
  # latent variable definitions
      #intercept (note intercept is a reserved term)
      eta_1 =~ 1*math2
      eta_1 =~ 1*math3
      eta_1 =~ 1*math4
      eta_1 =~ 1*math5
      eta_1 =~ 1*math6
      eta_1 =~ 1*math7
      eta_1 =~ 1*math8

      #linear slope 
      eta_2 =~ 0*math2
      eta_2 =~ 1*math3
      eta_2 =~ 2*math4
      eta_2 =~ 3*math5
      eta_2 =~ 4*math6
      eta_2 =~ 5*math7
      eta_2 =~ 6*math8

  # factor variances
      eta_1 ~~ eta_1
      eta_2 ~~ eta_2

  # covariances among factors 
      eta_1 ~~ eta_2

  # factor means 
      eta_1 ~ start(35)*1
      eta_2 ~ start(4)*1

  # manifest variances (made equivalent by naming theta)
      math2 ~~ theta*math2
      math3 ~~ theta*math3
      math4 ~~ theta*math4
      math5 ~~ theta*math5
      math6 ~~ theta*math6
      math7 ~~ theta*math7
      math8 ~~ theta*math8
  # manifest means (fixed at zero)
      math2 ~ 0*1
      math3 ~ 0*1
      math4 ~ 0*1
      math5 ~ 0*1
      math6 ~ 0*1
      math7 ~ 0*1
      math8 ~ 0*1
' #end of model definition

And fit the model to the data …

#estimating the model using sem() function
lg_math_lavaan_fit <- sem(lg_math_lavaan_model, 
                          data = nlsy_math_wide,
                          meanstructure = TRUE,
                          estimator = "ML",
                          missing = "fiml")

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
##   741

## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: due to missing values, some pairwise combinations have less than 10%
## coverage

We can look more specifically at the numerical summaries of the model.

#obtaining summary of the model using the object we just created
summary(lg_math_lavaan_fit, fit.measures=TRUE)

## lavaan 0.6-7 ended normally after 27 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         12
##   Number of equality constraints                     6
##                                                       
##                                                   Used       Total
##   Number of observations                           932         933
##   Number of missing patterns                        60            
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                               204.484
##   Degrees of freedom                                29
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                               862.333
##   Degrees of freedom                                21
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.791
##   Tucker-Lewis Index (TLI)                       0.849
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -7968.693
##   Loglikelihood unrestricted model (H1)      -7866.451
##                                                       
##   Akaike (AIC)                               15949.386
##   Bayesian (BIC)                             15978.410
##   Sample-size adjusted Bayesian (BIC)        15959.354
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.081
##   90 Percent confidence interval - lower         0.070
##   90 Percent confidence interval - upper         0.091
##   P-value RMSEA <= 0.05                          0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.121
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   eta_1 =~                                            
##     math2             1.000                           
##     math3             1.000                           
##     math4             1.000                           
##     math5             1.000                           
##     math6             1.000                           
##     math7             1.000                           
##     math8             1.000                           
##   eta_2 =~                                            
##     math2             0.000                           
##     math3             1.000                           
##     math4             2.000                           
##     math5             3.000                           
##     math6             4.000                           
##     math7             5.000                           
##     math8             6.000                           
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   eta_1 ~~                                            
##     eta_2            -0.181    1.150   -0.158    0.875
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     eta_1            35.267    0.355   99.229    0.000
##     eta_2             4.339    0.088   49.136    0.000
##    .math2             0.000                           
##    .math3             0.000                           
##    .math4             0.000                           
##    .math5             0.000                           
##    .math6             0.000                           
##    .math7             0.000                           
##    .math8             0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     eta_1            64.562    5.659   11.408    0.000
##     eta_2             0.733    0.327    2.238    0.025
##    .math2   (thet)   36.230    1.867   19.410    0.000
##    .math3   (thet)   36.230    1.867   19.410    0.000
##    .math4   (thet)   36.230    1.867   19.410    0.000
##    .math5   (thet)   36.230    1.867   19.410    0.000
##    .math6   (thet)   36.230    1.867   19.410    0.000
##    .math7   (thet)   36.230    1.867   19.410    0.000
##    .math8   (thet)   36.230    1.867   19.410    0.000

#Other summaries
#parameterEstimates(lg_math_lavaan_fit)
#inspect(lg_math_lavaan_fit, what="est")

We get information about how the model fits the data, as well as some information about the data itself. Interpetation is covered in more detail in the accompanying video.

We can make a diagram to check if we specified and estimated the model as intended.

#diagram of fitted model
semPaths(lg_math_lavaan_fit,what = "path", whatLabels = "par")

Plotting the predicted trajectories again takes some work, but is possible. The prototypical trajectory would need to be made from the model output.

#obtaining predicted factor scores for individuals
nlsy_math_predicted <- as.data.frame(cbind(nlsy_math_wide$id,lavPredict(lg_math_lavaan_fit)))

#naming columns
names(nlsy_math_predicted) <- c("id", "eta_1", "eta_2")

head(nlsy_math_predicted)

id	eta_1	eta_2
201	36.94675	4.534084
303	26.03589	4.050780
2702	49.70187	4.594148
4303	41.04200	4.548063
5002	37.01241	4.496746
5005	37.68809	4.324197

#calculating implied manifest scores
nlsy_math_predicted$math2 <- 1*nlsy_math_predicted$eta_1 + 0*nlsy_math_predicted$eta_2
nlsy_math_predicted$math3 <- 1*nlsy_math_predicted$eta_1 + 1*nlsy_math_predicted$eta_2
nlsy_math_predicted$math4 <- 1*nlsy_math_predicted$eta_1 + 2*nlsy_math_predicted$eta_2
nlsy_math_predicted$math5 <- 1*nlsy_math_predicted$eta_1 + 3*nlsy_math_predicted$eta_2
nlsy_math_predicted$math6 <- 1*nlsy_math_predicted$eta_1 + 4*nlsy_math_predicted$eta_2
nlsy_math_predicted$math7 <- 1*nlsy_math_predicted$eta_1 + 5*nlsy_math_predicted$eta_2
nlsy_math_predicted$math8 <- 1*nlsy_math_predicted$eta_1 + 6*nlsy_math_predicted$eta_2

#reshaping wide to long
nlsy_math_predicted_long <- reshape(data=nlsy_math_predicted, 
                          timevar=c("grade"), 
                          idvar="id",
                          varying=c("math2", "math3", "math4", 
                                    "math5", "math6", "math7", "math8"),
                          direction="long", sep="")
#sorting for easy viewing
# order by id and time
nlsy_math_predicted_long <- nlsy_math_predicted_long[order(nlsy_math_predicted_long$id,nlsy_math_predicted_long$grade), ]

#intraindividual change trajetories
ggplot(data=nlsy_math_predicted_long[which(nlsy_math_predicted_long$id < 80000), ],                    #data set
       aes(x = grade, y = math, group = id)) + #setting variables
  #geom_point(size=.5) + #adding points to plot
  geom_line() +  #adding lines to plot
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks = c(2,3,4,5,6,7,8), 
                     name = "Grade at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks = c(10,30,50,70,90), 
                     name = "Predicted PIAT Mathematics")

We see from the plot that this model characterized everyone’s trajectory as a straight line, with some interindividual differences in the rate of intraindividual change.

4 Conclusion

This tutorial has presented how to set up and fit no growth and linear growth models in the SEM modeling framework using the lavaan package in R, as well as calculate and plot the predicted trajectories.

Yay for Growth Modeling! It’s sooo fun!

Linear Growth Model - SEM Implementation in R