Linear Growth Model - SEM Implementation in R
Nilam Ram, Kevin Grimm, et al.
1 Overview
This tutorial illustrates fitting of linear growth models in the SEM framework in R using the lavaan
package.
Example data and code are drawn from Chapter 3 of Grimm, Ram, and Estabrook (2017). Specifically, we examine change in children’s mathematics achievement through elementary and middle school using the NLSY-CYA Dataset. We fit both No Growth and Linear Growth moodels. Please see the book chapter for additional interpretations and insights about the analyses.
1.0.1 Preliminaries - Loading libraries used in this script.
1.0.2 Preliminaries - Data Preparation and Description
For our examples, we use the mathematics achievement scores from the NLSY-CYA Wide Data.
Load the repeated measures data
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_wide_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
na.strings = ".") #indicates the missing data designator
#copy data with new name
nlsy_math_wide <- dat
# Give the variable names
names(nlsy_math_wide)=c('id' ,'female' ,'lb_wght','anti_k1',
'math2' ,'math3' ,'math4' ,'math5' ,'math6' ,'math7' ,'math8' ,
'age2' ,'age3' ,'age4' ,'age5' ,'age6' ,'age7' ,'age8' ,
'men2' ,'men3' ,'men4' ,'men5' ,'men6' ,'men7' ,'men8' ,
'spring2','spring3','spring4','spring5','spring6','spring7','spring8',
'anti2' ,'anti3' ,'anti4' ,'anti5' ,'anti6' ,'anti7' ,'anti8')
#view the first few observations (and columns) in the data set
head(nlsy_math_wide[ ,1:11], 10)
id | female | lb_wght | anti_k1 | math2 | math3 | math4 | math5 | math6 | math7 | math8 |
---|---|---|---|---|---|---|---|---|---|---|
201 | 1 | 0 | 0 | NA | 38 | NA | 55 | NA | NA | NA |
303 | 1 | 0 | 1 | 26 | NA | NA | 33 | NA | NA | NA |
2702 | 0 | 0 | 0 | 56 | NA | 58 | NA | NA | NA | 80 |
4303 | 1 | 0 | 0 | NA | 41 | 58 | NA | NA | NA | NA |
5002 | 0 | 0 | 4 | NA | NA | 46 | NA | 54 | NA | 66 |
5005 | 1 | 0 | 0 | 35 | NA | 50 | NA | 60 | NA | 59 |
5701 | 0 | 0 | 2 | NA | 62 | 61 | NA | NA | NA | NA |
6102 | 0 | 0 | 0 | NA | NA | 55 | 67 | NA | 81 | NA |
6801 | 1 | 0 | 0 | NA | 54 | NA | 62 | NA | 66 | NA |
6802 | 0 | 0 | 0 | NA | 55 | NA | 66 | NA | 68 | NA |
Our specific interest is in change across the the repeated measures of math, math2
to math8
.
As noted in Chapter 2 , it is important to plot the data to obtain a better understanding of the structure and form of the observed phenomenon. Here, we want to examine the data to make sure a growth model would be an appropriate analysis for the data (i.e., we need to check that there is in fact growth to model).
Longitudinal Plot of Math across Grade at Testing We have to reshape from wide to long for plotting.
#subsetting to variables of interest
nlsy_math_sub <- nlsy_math_wide[ ,c("id", "math2", "math3", "math4",
"math5", "math6", "math7", "math8")]
#reshaping wide to long
nlsy_math_long <- reshape(data=nlsy_math_sub,
timevar=c("grade"),
idvar="id",
varying=c("math2", "math3", "math4",
"math5", "math6", "math7", "math8"),
direction="long", sep="")
#sorting for easy viewing
# order by id and time
nlsy_math_long <- nlsy_math_long[order(nlsy_math_long$id,nlsy_math_long$grade), ]
#remove rows with NA for math (needed to get trajoctory lines connected)
nlsy_math_long <- nlsy_math_long[which(is.na(nlsy_math_long$math) == FALSE), ]
#intraindividual change trajetories
ggplot(data=nlsy_math_long, #data set
aes(x = grade, y = math, group = id)) + #setting variables
geom_point(size=.5) + #adding points to plot
geom_line() + #adding lines to plot
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks = c(2,3,4,5,6,7,8),
name = "Grade at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks = c(10,30,50,70,90),
name = "PIAT Mathematics")
We see both intraindividual change and interindividual differences in change.
Of note, the data begin at Grade 2 - suggesting that we rescale the time variable so that time = 0 is located at the Grade 2 assessment.
2 No Growth Model
There appears to be systematic growth in the mathematics scores over time, as can be seen in our plot above. To begin the analysis we will fit a no growth model. This no growth model will act as our “null” model to which to compare later models.
Now, we set up the no growth model as a lavaan model
#writing out no growth model in full SEM way
ng_math_lavaan_model <- '
# latent variable definitions
#intercept
eta_1 =~ 1*math2
eta_1 =~ 1*math3
eta_1 =~ 1*math4
eta_1 =~ 1*math5
eta_1 =~ 1*math6
eta_1 =~ 1*math7
eta_1 =~ 1*math8
# factor variances
eta_1 ~~ eta_1
# covariances among factors
#none (only 1 factor)
# factor means
eta_1 ~ start(30)*1
# manifest variances (made equivalent by naming theta)
math2 ~~ theta*math2
math3 ~~ theta*math3
math4 ~~ theta*math4
math5 ~~ theta*math5
math6 ~~ theta*math6
math7 ~~ theta*math7
math8 ~~ theta*math8
# manifest means (fixed at zero)
math2 ~ 0*1
math3 ~ 0*1
math4 ~ 0*1
math5 ~ 0*1
math6 ~ 0*1
math7 ~ 0*1
math8 ~ 0*1
' #end of model definition
And fit the model to the data …
#estimating the model using sem() function
ng_math_lavaan_fit <- sem(ng_math_lavaan_model,
data = nlsy_math_wide,
meanstructure = TRUE,
estimator = "ML",
missing = "fiml")
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
## 741
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: due to missing values, some pairwise combinations have less than 10%
## coverage
We can look more specifically at the numerical summaries of the model.
#obtaining summary of the model using the object we just created
summary(ng_math_lavaan_fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 18 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 9
## Number of equality constraints 6
##
## Used Total
## Number of observations 932 933
## Number of missing patterns 60
##
## Model Test User Model:
##
## Test statistic 1759.002
## Degrees of freedom 32
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 862.333
## Degrees of freedom 21
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.000
## Tucker-Lewis Index (TLI) -0.347
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -8745.952
## Loglikelihood unrestricted model (H1) -7866.451
##
## Akaike (AIC) 17497.903
## Bayesian (BIC) 17512.415
## Sample-size adjusted Bayesian (BIC) 17502.888
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.241
## 90 Percent confidence interval - lower 0.231
## 90 Percent confidence interval - upper 0.250
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.480
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## eta_1 =~
## math2 1.000
## math3 1.000
## math4 1.000
## math5 1.000
## math6 1.000
## math7 1.000
## math8 1.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## eta_1 45.915 0.324 141.721 0.000
## .math2 0.000
## .math3 0.000
## .math4 0.000
## .math5 0.000
## .math6 0.000
## .math7 0.000
## .math8 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## eta_1 46.917 4.832 9.709 0.000
## .math2 (thet) 116.682 4.548 25.656 0.000
## .math3 (thet) 116.682 4.548 25.656 0.000
## .math4 (thet) 116.682 4.548 25.656 0.000
## .math5 (thet) 116.682 4.548 25.656 0.000
## .math6 (thet) 116.682 4.548 25.656 0.000
## .math7 (thet) 116.682 4.548 25.656 0.000
## .math8 (thet) 116.682 4.548 25.656 0.000
We get information about how the model fits the data, as well as some information about the data itself. Interpetation is covered in more detail in the accompanying video.
We can use the fitted model object to make a diagram (and also check if we specified and estimated the model as intended).
Pulling predicted trajectories out of lavaan takes a bit of work, but is possible.
#obtaining predicted factor scores for individuals
nlsy_math_predicted <- as.data.frame(cbind(nlsy_math_wide$id,lavPredict(ng_math_lavaan_fit)))
#naming columns
names(nlsy_math_predicted) <- c("id", "eta_1")
#looking at data
head(nlsy_math_predicted)
id | eta_1 |
---|---|
201 | 46.17558 |
303 | 38.59816 |
2702 | 56.16725 |
4303 | 47.51278 |
5002 | 51.06429 |
5005 | 49.05038 |
#calculating implied manifest scores
nlsy_math_predicted$math2 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math3 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math4 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math5 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math6 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math7 <- 1*nlsy_math_predicted$eta_1
nlsy_math_predicted$math8 <- 1*nlsy_math_predicted$eta_1
#reshaping wide to long
nlsy_math_predicted_long <- reshape(data=nlsy_math_predicted,
timevar=c("grade"),
idvar="id",
varying=c("math2", "math3", "math4",
"math5", "math6", "math7", "math8"),
direction="long", sep="")
#sorting for easy viewing
# order by id and time
nlsy_math_predicted_long <- nlsy_math_predicted_long[order(nlsy_math_predicted_long$id,nlsy_math_predicted_long$grade), ]
#intraindividual change trajetories
ggplot(data=nlsy_math_predicted_long, #data set
aes(x = grade, y = math, group = id)) + #setting variables
#geom_point(size=.5) + #adding points to plot
geom_line() + #adding lines to plot
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks = c(2,3,4,5,6,7,8),
name = "Grade at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks = c(10,30,50,70,90),
name = "Predicted PIAT Mathematics")
We see from the plot that this no growth model characterizes each individual’s trajectory as a straight, flat line.
3 Linear Growth Model
To capture the systematic growth in the mathematics scores over time, we exapnd the model to include a linear “slope” factor.
Specifying linear growth model
#writing out linear growth model in full SEM way
lg_math_lavaan_model <- '
# latent variable definitions
#intercept (note intercept is a reserved term)
eta_1 =~ 1*math2
eta_1 =~ 1*math3
eta_1 =~ 1*math4
eta_1 =~ 1*math5
eta_1 =~ 1*math6
eta_1 =~ 1*math7
eta_1 =~ 1*math8
#linear slope
eta_2 =~ 0*math2
eta_2 =~ 1*math3
eta_2 =~ 2*math4
eta_2 =~ 3*math5
eta_2 =~ 4*math6
eta_2 =~ 5*math7
eta_2 =~ 6*math8
# factor variances
eta_1 ~~ eta_1
eta_2 ~~ eta_2
# covariances among factors
eta_1 ~~ eta_2
# factor means
eta_1 ~ start(35)*1
eta_2 ~ start(4)*1
# manifest variances (made equivalent by naming theta)
math2 ~~ theta*math2
math3 ~~ theta*math3
math4 ~~ theta*math4
math5 ~~ theta*math5
math6 ~~ theta*math6
math7 ~~ theta*math7
math8 ~~ theta*math8
# manifest means (fixed at zero)
math2 ~ 0*1
math3 ~ 0*1
math4 ~ 0*1
math5 ~ 0*1
math6 ~ 0*1
math7 ~ 0*1
math8 ~ 0*1
' #end of model definition
And fit the model to the data …
#estimating the model using sem() function
lg_math_lavaan_fit <- sem(lg_math_lavaan_model,
data = nlsy_math_wide,
meanstructure = TRUE,
estimator = "ML",
missing = "fiml")
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
## 741
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: due to missing values, some pairwise combinations have less than 10%
## coverage
We can look more specifically at the numerical summaries of the model.
#obtaining summary of the model using the object we just created
summary(lg_math_lavaan_fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 27 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 12
## Number of equality constraints 6
##
## Used Total
## Number of observations 932 933
## Number of missing patterns 60
##
## Model Test User Model:
##
## Test statistic 204.484
## Degrees of freedom 29
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 862.333
## Degrees of freedom 21
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.791
## Tucker-Lewis Index (TLI) 0.849
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -7968.693
## Loglikelihood unrestricted model (H1) -7866.451
##
## Akaike (AIC) 15949.386
## Bayesian (BIC) 15978.410
## Sample-size adjusted Bayesian (BIC) 15959.354
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.081
## 90 Percent confidence interval - lower 0.070
## 90 Percent confidence interval - upper 0.091
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.121
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## eta_1 =~
## math2 1.000
## math3 1.000
## math4 1.000
## math5 1.000
## math6 1.000
## math7 1.000
## math8 1.000
## eta_2 =~
## math2 0.000
## math3 1.000
## math4 2.000
## math5 3.000
## math6 4.000
## math7 5.000
## math8 6.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## eta_1 ~~
## eta_2 -0.181 1.150 -0.158 0.875
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## eta_1 35.267 0.355 99.229 0.000
## eta_2 4.339 0.088 49.136 0.000
## .math2 0.000
## .math3 0.000
## .math4 0.000
## .math5 0.000
## .math6 0.000
## .math7 0.000
## .math8 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## eta_1 64.562 5.659 11.408 0.000
## eta_2 0.733 0.327 2.238 0.025
## .math2 (thet) 36.230 1.867 19.410 0.000
## .math3 (thet) 36.230 1.867 19.410 0.000
## .math4 (thet) 36.230 1.867 19.410 0.000
## .math5 (thet) 36.230 1.867 19.410 0.000
## .math6 (thet) 36.230 1.867 19.410 0.000
## .math7 (thet) 36.230 1.867 19.410 0.000
## .math8 (thet) 36.230 1.867 19.410 0.000
We get information about how the model fits the data, as well as some information about the data itself. Interpetation is covered in more detail in the accompanying video.
We can make a diagram to check if we specified and estimated the model as intended.
Plotting the predicted trajectories again takes some work, but is possible. The prototypical trajectory would need to be made from the model output.
#obtaining predicted factor scores for individuals
nlsy_math_predicted <- as.data.frame(cbind(nlsy_math_wide$id,lavPredict(lg_math_lavaan_fit)))
#naming columns
names(nlsy_math_predicted) <- c("id", "eta_1", "eta_2")
head(nlsy_math_predicted)
id | eta_1 | eta_2 |
---|---|---|
201 | 36.94675 | 4.534084 |
303 | 26.03589 | 4.050780 |
2702 | 49.70187 | 4.594148 |
4303 | 41.04200 | 4.548063 |
5002 | 37.01241 | 4.496746 |
5005 | 37.68809 | 4.324197 |
#calculating implied manifest scores
nlsy_math_predicted$math2 <- 1*nlsy_math_predicted$eta_1 + 0*nlsy_math_predicted$eta_2
nlsy_math_predicted$math3 <- 1*nlsy_math_predicted$eta_1 + 1*nlsy_math_predicted$eta_2
nlsy_math_predicted$math4 <- 1*nlsy_math_predicted$eta_1 + 2*nlsy_math_predicted$eta_2
nlsy_math_predicted$math5 <- 1*nlsy_math_predicted$eta_1 + 3*nlsy_math_predicted$eta_2
nlsy_math_predicted$math6 <- 1*nlsy_math_predicted$eta_1 + 4*nlsy_math_predicted$eta_2
nlsy_math_predicted$math7 <- 1*nlsy_math_predicted$eta_1 + 5*nlsy_math_predicted$eta_2
nlsy_math_predicted$math8 <- 1*nlsy_math_predicted$eta_1 + 6*nlsy_math_predicted$eta_2
#reshaping wide to long
nlsy_math_predicted_long <- reshape(data=nlsy_math_predicted,
timevar=c("grade"),
idvar="id",
varying=c("math2", "math3", "math4",
"math5", "math6", "math7", "math8"),
direction="long", sep="")
#sorting for easy viewing
# order by id and time
nlsy_math_predicted_long <- nlsy_math_predicted_long[order(nlsy_math_predicted_long$id,nlsy_math_predicted_long$grade), ]
#intraindividual change trajetories
ggplot(data=nlsy_math_predicted_long[which(nlsy_math_predicted_long$id < 80000), ], #data set
aes(x = grade, y = math, group = id)) + #setting variables
#geom_point(size=.5) + #adding points to plot
geom_line() + #adding lines to plot
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks = c(2,3,4,5,6,7,8),
name = "Grade at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks = c(10,30,50,70,90),
name = "Predicted PIAT Mathematics")
We see from the plot that this model characterized everyone’s trajectory as a straight line, with some interindividual differences in the rate of intraindividual change.
4 Conclusion
This tutorial has presented how to set up and fit no growth and linear growth models in the SEM modeling framework using the lavaan
package in R, as well as calculate and plot the predicted trajectories.
Yay for Growth Modeling! It’s sooo fun!