Bivariate Growth Model - Multilevel & SEM Implementation in R
Nilam Ram, Kevin Grimm, et al.
1 Overview
This tutorial illustrates fitting of multivariate (bivariate) linear growth models in the multilevel and SEM frameworks in R.
Example data and code are drawn from Chapter 8 of Grimm, Ram, and Estabrook (2017). Specifically, using the NLSY-CYA Dataset we examine how individual differences in change in children’s mathematics achievement across grade are related to individual differences in change in children’s hyperactivity (as rated by teachers) across grade. Please see the book chapter for additional interpretations and insights about the analyses.
1.0.1 Preliminaries - Loading libraries used in this script.
1.0.2 Preliminaries - Data Description
For our examples, we use the mathematics achievement scores from the NLSY-CYA Long Data.
Load the repeated measures data (long format)
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_hyp_long_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
na.strings = ".") #indicates the missing data designator
#copy data with new name
nlsy_math_hyp_long <- dat
#Add names the columns of the data set
names(nlsy_math_hyp_long) = c('id', 'female', 'lb_wght', 'anti_k1',
'math', 'comp', 'rec', 'bpi', 'as', 'anx', 'hd',
'hyp', 'dp', 'wd',
'grade', 'occ', 'age', 'men', 'spring', 'anti')
#reducing to variables of interest
nlsy_math_hyp_long <- nlsy_math_hyp_long[ ,c("id","grade","math","hyp")]
#view the first few observations in the data set
head(nlsy_math_hyp_long, 10)
id | grade | math | hyp |
---|---|---|---|
201 | 3 | 38 | 0 |
201 | 5 | 55 | 0 |
303 | 2 | 26 | 1 |
303 | 5 | 33 | 1 |
2702 | 2 | 56 | 2 |
2702 | 4 | 58 | 3 |
2702 | 8 | 80 | 3 |
4303 | 3 | 41 | 1 |
4303 | 4 | 58 | 1 |
5002 | 4 | 46 | 3 |
Our specific interest is intraindividual change in the repeated measures of math
and hyp
across grade
.
As noted in Chapter 2 , it is important to plot the data to obtain a better understanding of the structure and form of the observed phenomenon.
Longitudinal Plot of Math across Grade at Testing
#intraindividual change trajetories
ggplot(data=nlsy_math_hyp_long, #data set
aes(x = grade, y = math, group = id)) + #setting variables
geom_point(size=.5) + #adding points to plot
geom_line() + #adding lines to plot
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks = c(2,3,4,5,6,7,8),
name = "Grade at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(10,90),
breaks = c(10,30,50,70,90),
name = "PIAT Mathematics")
Longitudinal Plot of Hyperactivity across Grade at Testing
#intraindividual change trajetories
ggplot(data=nlsy_math_hyp_long, #data set
aes(x = grade, y = hyp, group = id)) + #setting variables
geom_point(size=.5) + #adding points to plot
geom_line() + #adding lines to plot
theme_bw() + #changing style/background
#setting the x-axis with breaks and labels
scale_x_continuous(limits=c(2,8),
breaks = c(2,3,4,5,6,7,8),
name = "Grade at Testing") +
#setting the y-axis with limits breaks and labels
scale_y_continuous(limits=c(0,5),
breaks = c(0,1,2,3,4,5),
name = "Hyperactivity")
1.0.3 Preliminaries - Data Preparation
For the multilevel model implementation we need to collect both math
and hyp
into a single outcome variable along with dummy variables that indicate which variable is represented. There are lots of ways to do this. We use a simple approach here. There are more elegant ways too.
# creating a stacked multivariate form of the data
#make math data
math <- data.frame (id = nlsy_math_hyp_long$id,
var = nlsy_math_hyp_long$math,
grade = nlsy_math_hyp_long$grade,
d_math = 1,
d_hyp = 0,
grp = 'math')
#make hyp data
hyp <- data.frame (id = nlsy_math_hyp_long$id,
var = nlsy_math_hyp_long$hyp,
grade = nlsy_math_hyp_long$grade,
d_math = 0,
d_hyp = 1,
grp = 'hyp')
#rowbinding the data
nlsy_multi_long <- rbind(math,hyp)
#reordering for easy viewing
nlsy_multi_long <- nlsy_multi_long[order(nlsy_multi_long$id,nlsy_multi_long$grade,nlsy_multi_long$d_hyp), ]
#view the first few observations in the data set
head(nlsy_multi_long, 20)
id | var | grade | d_math | d_hyp | grp | |
---|---|---|---|---|---|---|
1 | 201 | 38 | 3 | 1 | 0 | math |
2222 | 201 | 0 | 3 | 0 | 1 | hyp |
2 | 201 | 55 | 5 | 1 | 0 | math |
2223 | 201 | 0 | 5 | 0 | 1 | hyp |
3 | 303 | 26 | 2 | 1 | 0 | math |
2224 | 303 | 1 | 2 | 0 | 1 | hyp |
4 | 303 | 33 | 5 | 1 | 0 | math |
2225 | 303 | 1 | 5 | 0 | 1 | hyp |
5 | 2702 | 56 | 2 | 1 | 0 | math |
2226 | 2702 | 2 | 2 | 0 | 1 | hyp |
6 | 2702 | 58 | 4 | 1 | 0 | math |
2227 | 2702 | 3 | 4 | 0 | 1 | hyp |
7 | 2702 | 80 | 8 | 1 | 0 | math |
2228 | 2702 | 3 | 8 | 0 | 1 | hyp |
8 | 4303 | 41 | 3 | 1 | 0 | math |
2229 | 4303 | 1 | 3 | 0 | 1 | hyp |
9 | 4303 | 58 | 4 | 1 | 0 | math |
2230 | 4303 | 1 | 4 | 0 | 1 | hyp |
10 | 5002 | 46 | 4 | 1 | 0 | math |
2231 | 5002 | 3 | 4 | 0 | 1 | hyp |
Note the alternating row structure of the data with both math
and hyp
in a single variable var
.
For the SEM implementation we need wide data with math2-math8
and hyp2-hym8
all in one row per person.
Load the repeated measures data (wide format)
#set filepath for data file
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_hyp_wide_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
na.strings = ".") #indicates the missing data designator
#copy data with new name
nlsy_math_hyp_wide <- dat
#Add names the columns of the data set
# Give the variable names
names(nlsy_math_hyp_wide)<-c('id','female','lb_wght','anti_k1',
'math2','math3','math4','math5','math6','math7','math8',
'comp2','comp3','comp4','comp5','comp6','comp7','comp8',
'rec2','rec3','rec4','rec5','rec6','rec7','rec8',
'bpi2','bpi3','bpi4','bpi5','bpi6','bpi7','bpi8',
'asl2','asl3','asl4','asl5','asl6','asl7','asl8',
'ax2','ax3','ax4','ax5','ax6','ax7','ax8',
'hds2','hds3','hds4','hds5','hds6','hds7','hds8',
'hyp2','hyp3','hyp4','hyp5','hyp6','hyp7','hyp8',
'dpn2','dpn3','dpn4','dpn5','dpn6','dpn7','dpn8',
'wdn2','wdn3','wdn4','wdn5','wdn6','wdn7','wdn8',
'age2','age3','age4','age5','age6','age7','age8',
'men2','men3','men4','men5','men6','men7','men8',
'spring2','spring3','spring4','spring5','spring6','spring7','spring8',
'anti2','anti3','anti4','anti5','anti6','anti7','anti8')
#reducing to variables of interest
nlsy_multi_wide <- nlsy_math_hyp_wide[ ,c('id',
'math2','math3','math4','math5','math6','math7','math8',
'hyp2','hyp3','hyp4','hyp5','hyp6','hyp7','hyp8')]
#view the first few observations in the data set
head(nlsy_multi_wide, 10)
id | math2 | math3 | math4 | math5 | math6 | math7 | math8 | hyp2 | hyp3 | hyp4 | hyp5 | hyp6 | hyp7 | hyp8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
201 | NA | 38 | NA | 55 | NA | NA | NA | NA | 0 | NA | 0 | NA | NA | NA |
303 | 26 | NA | NA | 33 | NA | NA | NA | 1 | NA | NA | 1 | NA | NA | NA |
2702 | 56 | NA | 58 | NA | NA | NA | 80 | 2 | NA | 3 | NA | NA | NA | 3 |
4303 | NA | 41 | 58 | NA | NA | NA | NA | NA | 1 | 1 | NA | NA | NA | NA |
5002 | NA | NA | 46 | NA | 54 | NA | 66 | NA | NA | 3 | NA | 2 | NA | 3 |
5005 | 35 | NA | 50 | NA | 60 | NA | 59 | 0 | NA | 3 | NA | 0 | NA | 1 |
5701 | NA | 62 | 61 | NA | NA | NA | NA | NA | 4 | 3 | NA | NA | NA | NA |
6102 | NA | NA | 55 | 67 | NA | 81 | NA | NA | NA | 2 | 0 | NA | 0 | NA |
6801 | NA | 54 | NA | 62 | NA | 66 | NA | NA | 0 | NA | 1 | NA | 1 | NA |
6802 | NA | 55 | NA | 66 | NA | 68 | NA | NA | 0 | NA | 0 | NA | 0 | NA |
We are now ready to rock!
2 Multilevel: Bivariate Growth Model using nlme()
function
To capture the systematic growth in the mathematics scores and hyperactivity scores over grade, we use the grade
variable, so that the rate of growth indicates (linear) school-related change in mathematics scores.
We use the multivariate (stacked) data in “extralong” format.
Then we fit the model with nlme using our expansive coding style. This is a difficult model to estimate so we add control parameters to the model (change interation number to 200).
# bivariate growth model with residual covariance
bivar_nlme <- nlme(var~d_math*(b_1i+b_2i*(grade-2))+
d_hyp *(h_1i+h_2i*(grade-2)),
data = nlsy_multi_long,
fixed = b_1i+b_2i+h_1i+h_2i~1,
random = b_1i+b_2i+h_1i+h_2i~1,
group = ~id,
start = c(35, 4, 1, -1),
#invoking separate residual variance for each variable
weights = varIdent(c(hyp=.3), form = ~1|grp),
# invokes off-diagonal symmetric structure for level 1 residuals
corr=corSymm(form = ~1 | id/grade),
na.action = na.exclude,
control=lmeControl(maxIter = 500, msMaxIter = 200, tolerance = 1e-4,
niter = 100, msTol = 1e-5, nlmStepMax = 500,
msVerbose = FALSE, returnObject = TRUE)) #to increase iterations
#obtaining summary of the model using the object we just created
summary(bivar_nlme)
## Nonlinear mixed-effects model fit by maximum likelihood
## Model: var ~ d_math * (b_1i + b_2i * (grade - 2)) + d_hyp * (h_1i + h_2i * (grade - 2))
## Data: nlsy_multi_long
## AIC BIC logLik
## 23614.95 23723.54 -11790.48
##
## Random effects:
## Formula: list(b_1i ~ 1, b_2i ~ 1, h_1i ~ 1, h_2i ~ 1)
## Level: id
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## b_1i 8.04424863 b_1i b_2i h_1i
## b_2i 0.86685535 -0.029
## h_1i 1.24182333 -0.298 0.091
## h_2i 0.06805861 0.196 -0.681 -0.236
## Residual 6.01055399
##
## Correlation Structure: General
## Formula: ~1 | id/grade
## Parameter estimate(s):
## Correlation:
## 1
## 2 -0.002
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | grp
## Parameter estimates:
## math hyp
## 1.0000000 0.1748208
## Fixed effects: b_1i + b_2i + h_1i + h_2i ~ 1
## Value Std.Error DF t-value p-value
## b_1i 35.25867 0.3551088 3456 99.28977 0e+00
## b_2i 4.34300 0.0873775 3456 49.70387 0e+00
## h_1i 1.90348 0.0581513 3456 32.73317 0e+00
## h_2i -0.05660 0.0142232 3456 -3.97915 1e-04
## Correlation:
## b_1i b_2i h_1i
## b_2i -0.531
## h_1i -0.166 0.037
## h_2i 0.040 -0.067 -0.608
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.32773330 -0.53353192 -0.06380224 0.55395314 3.05034041
##
## Number of Observations: 4391
## Number of Groups: 932
Cool that that all works!
Note that this is a slightly more precise model than is in the book (because it has residual covariance in it).
3 SEM: Bivariate Growth Model using lavaan
For the SEM implementation, we use the mathematics achievement and hyperactivity scores and time-invariant covariates from the Wide Data.
Specifying bivariate linear growth model
#writing out linear growth model in full SEM way
bivariate_lavaan_model <- '
# latent variable definitions
#intercept for math
eta_1 =~ 1*math2
eta_1 =~ 1*math3
eta_1 =~ 1*math4
eta_1 =~ 1*math5
eta_1 =~ 1*math6
eta_1 =~ 1*math7
eta_1 =~ 1*math8
#linear slope for math
eta_2 =~ 0*math2
eta_2 =~ 1*math3
eta_2 =~ 2*math4
eta_2 =~ 3*math5
eta_2 =~ 4*math6
eta_2 =~ 5*math7
eta_2 =~ 6*math8
#intercept for hyp
eta_3 =~ 1*hyp2
eta_3 =~ 1*hyp3
eta_3 =~ 1*hyp4
eta_3 =~ 1*hyp5
eta_3 =~ 1*hyp6
eta_3 =~ 1*hyp7
eta_3 =~ 1*hyp8
#linear slope for hyp
eta_4 =~ 0*hyp2
eta_4 =~ 1*hyp3
eta_4 =~ 2*hyp4
eta_4 =~ 3*hyp5
eta_4 =~ 4*hyp6
eta_4 =~ 5*hyp7
eta_4 =~ 6*hyp8
# factor variances
eta_1 ~~ eta_1
eta_2 ~~ eta_2
eta_3 ~~ eta_3
eta_4 ~~ eta_4
# covariances among factors
eta_1 ~~ eta_2 + eta_3 + eta_4
eta_2 ~~ eta_3 + eta_4
eta_3 ~~ eta_4
# factor means
eta_1 ~ start(35)*1
eta_2 ~ start(4)*1
eta_3 ~ start(2)*1
eta_4 ~ start(.1)*1
# manifest variances for math (made equivalent by naming theta1)
math2 ~~ theta1*math2
math3 ~~ theta1*math3
math4 ~~ theta1*math4
math5 ~~ theta1*math5
math6 ~~ theta1*math6
math7 ~~ theta1*math7
math8 ~~ theta1*math8
# manifest variances for hyp (made equivalent by naming theta2)
hyp2 ~~ theta2*hyp2
hyp3 ~~ theta2*hyp3
hyp4 ~~ theta2*hyp4
hyp5 ~~ theta2*hyp5
hyp6 ~~ theta2*hyp6
hyp7 ~~ theta2*hyp7
hyp8 ~~ theta2*hyp8
# residual covariances (made equivalent by naming theta12)
math2 ~~ theta12*hyp2
math3 ~~ theta12*hyp3
math4 ~~ theta12*hyp4
math5 ~~ theta12*hyp5
math6 ~~ theta12*hyp6
math7 ~~ theta12*hyp7
math8 ~~ theta12*hyp8
# manifest means for math (fixed at zero)
math2 ~ 0*1
math3 ~ 0*1
math4 ~ 0*1
math5 ~ 0*1
math6 ~ 0*1
math7 ~ 0*1
math8 ~ 0*1
# manifest means for hyp (fixed at zero)
hyp2 ~ 0*1
hyp3 ~ 0*1
hyp4 ~ 0*1
hyp5 ~ 0*1
hyp6 ~ 0*1
hyp7 ~ 0*1
hyp8 ~ 0*1
' #end of model definition
And estimating the model …
#estimating the model using sem() function
bivariate_lavaan_fit <- sem(bivariate_lavaan_model,
data = nlsy_multi_wide,
meanstructure = TRUE,
estimator = "ML",
missing = "fiml")
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
## 741
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: due to missing values, some pairwise combinations have less than 10%
## coverage
We can look more specifically at the numerical summaries of the model.
#obtaining summary of the model using the object we just created
summary(bivariate_lavaan_fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 67 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of free parameters 35
## Number of equality constraints 18
##
## Used Total
## Number of observations 932 933
## Number of missing patterns 96
##
## Model Test User Model:
##
## Test statistic 318.885
## Degrees of freedom 102
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 1532.041
## Degrees of freedom 91
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.849
## Tucker-Lewis Index (TLI) 0.866
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -11790.476
## Loglikelihood unrestricted model (H1) -11631.033
##
## Akaike (AIC) 23614.951
## Bayesian (BIC) 23697.186
## Sample-size adjusted Bayesian (BIC) 23643.196
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.048
## 90 Percent confidence interval - lower 0.042
## 90 Percent confidence interval - upper 0.054
## P-value RMSEA <= 0.05 0.724
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.097
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## eta_1 =~
## math2 1.000
## math3 1.000
## math4 1.000
## math5 1.000
## math6 1.000
## math7 1.000
## math8 1.000
## eta_2 =~
## math2 0.000
## math3 1.000
## math4 2.000
## math5 3.000
## math6 4.000
## math7 5.000
## math8 6.000
## eta_3 =~
## hyp2 1.000
## hyp3 1.000
## hyp4 1.000
## hyp5 1.000
## hyp6 1.000
## hyp7 1.000
## hyp8 1.000
## eta_4 =~
## hyp2 0.000
## hyp3 1.000
## hyp4 2.000
## hyp5 3.000
## hyp6 4.000
## hyp7 5.000
## hyp8 6.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## eta_1 ~~
## eta_2 -0.204 1.154 -0.176 0.860
## eta_3 -2.979 0.673 -4.426 0.000
## eta_4 0.107 0.164 0.654 0.513
## eta_2 ~~
## eta_3 0.098 0.161 0.608 0.543
## eta_4 -0.040 0.038 -1.061 0.289
## eta_3 ~~
## eta_4 -0.020 0.031 -0.644 0.520
## .math2 ~~
## .hyp2 (th12) -0.011 0.233 -0.046 0.963
## .math3 ~~
## .hyp3 (th12) -0.011 0.233 -0.046 0.963
## .math4 ~~
## .hyp4 (th12) -0.011 0.233 -0.046 0.963
## .math5 ~~
## .hyp5 (th12) -0.011 0.233 -0.046 0.963
## .math6 ~~
## .hyp6 (th12) -0.011 0.233 -0.046 0.963
## .math7 ~~
## .hyp7 (th12) -0.011 0.233 -0.046 0.963
## .math8 ~~
## .hyp8 (th12) -0.011 0.233 -0.046 0.963
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|)
## eta_1 35.259 0.356 99.178 0.000
## eta_2 4.343 0.088 49.139 0.000
## eta_3 1.903 0.058 32.702 0.000
## eta_4 -0.057 0.014 -3.950 0.000
## .math2 0.000
## .math3 0.000
## .math4 0.000
## .math5 0.000
## .math6 0.000
## .math7 0.000
## .math8 0.000
## .hyp2 0.000
## .hyp3 0.000
## .hyp4 0.000
## .hyp5 0.000
## .hyp6 0.000
## .hyp7 0.000
## .hyp8 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## eta_1 64.711 5.664 11.424 0.000
## eta_2 0.752 0.329 2.282 0.023
## eta_3 1.542 0.156 9.908 0.000
## eta_4 0.005 0.009 0.539 0.590
## .math2 (tht1) 36.126 1.863 19.390 0.000
## .math3 (tht1) 36.126 1.863 19.390 0.000
## .math4 (tht1) 36.126 1.863 19.390 0.000
## .math5 (tht1) 36.126 1.863 19.390 0.000
## .math6 (tht1) 36.126 1.863 19.390 0.000
## .math7 (tht1) 36.126 1.863 19.390 0.000
## .math8 (tht1) 36.126 1.863 19.390 0.000
## .hyp2 (tht2) 1.104 0.057 19.404 0.000
## .hyp3 (tht2) 1.104 0.057 19.404 0.000
## .hyp4 (tht2) 1.104 0.057 19.404 0.000
## .hyp5 (tht2) 1.104 0.057 19.404 0.000
## .hyp6 (tht2) 1.104 0.057 19.404 0.000
## .hyp7 (tht2) 1.104 0.057 19.404 0.000
## .hyp8 (tht2) 1.104 0.057 19.404 0.000
We get information about how the model fits the data, and the model parameters.
The main parameter of interest, the slope-slope covariance, is not significantly different than zero.
#from the output above
# Estimate Std.Err z-value P(>|z|)
#eta_3 ~~
# eta_4 -0.020 0.031 -0.644 0.520
We can make a diagram to check if we specified and estimated the model as intended.
Thats cool!
4 Conclusion
This tutorial has presented how the multivariate growth model can be used to examine interrelations in change - specifically the between-person correlation of slopes.
Multivariate change is awesome, but be cautious with the presentation and inferences!