Bivariate Growth Model – Multilevel & SEM Implementation in R

1 Overview

This tutorial illustrates fitting of multivariate (bivariate) linear growth models in the multilevel and SEM frameworks in R.

Example data and code are drawn from Chapter 8 of Grimm, Ram, and Estabrook (2017). Specifically, using the NLSY-CYA Dataset we examine how individual differences in change in children’s mathematics achievement across grade are related to individual differences in change in children’s hyperactivity (as rated by teachers) across grade. Please see the book chapter for additional interpretations and insights about the analyses.

1.0.1 Preliminaries - Loading libraries used in this script.

library(psych)  #for basic functions
library(ggplot2)  #for plotting
library(nlme) #for mixed effects models
library(lavaan) #for SEM 
library(semPlot) #for making SEM diagrams

1.0.2 Preliminaries - Data Description

For our examples, we use the mathematics achievement scores from the NLSY-CYA Long Data.

Load the repeated measures data (long format)

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_hyp_long_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
                  na.strings = ".")  #indicates the missing data designator
#copy data with new name 
nlsy_math_hyp_long <- dat  

#Add names the columns of the data set
names(nlsy_math_hyp_long) = c('id', 'female', 'lb_wght', 'anti_k1', 
                              'math', 'comp', 'rec', 'bpi', 'as', 'anx', 'hd', 
                              'hyp', 'dp', 'wd', 
                              'grade', 'occ', 'age', 'men', 'spring', 'anti')

#reducing to variables of interest 
nlsy_math_hyp_long <- nlsy_math_hyp_long[ ,c("id","grade","math","hyp")]

#view the first few observations in the data set 
head(nlsy_math_hyp_long, 10)
id grade math hyp
201 3 38 0
201 5 55 0
303 2 26 1
303 5 33 1
2702 2 56 2
2702 4 58 3
2702 8 80 3
4303 3 41 1
4303 4 58 1
5002 4 46 3

Our specific interest is intraindividual change in the repeated measures of math and hyp across grade.

As noted in Chapter 2 , it is important to plot the data to obtain a better understanding of the structure and form of the observed phenomenon.

Longitudinal Plot of Math across Grade at Testing

#intraindividual change trajetories
ggplot(data=nlsy_math_hyp_long,                    #data set
       aes(x = grade, y = math, group = id)) + #setting variables
  geom_point(size=.5) + #adding points to plot
  geom_line() +  #adding lines to plot
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks = c(2,3,4,5,6,7,8), 
                     name = "Grade at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(10,90), 
                     breaks = c(10,30,50,70,90), 
                     name = "PIAT Mathematics")

Longitudinal Plot of Hyperactivity across Grade at Testing

#intraindividual change trajetories
ggplot(data=nlsy_math_hyp_long,                    #data set
       aes(x = grade, y = hyp, group = id)) + #setting variables
  geom_point(size=.5) + #adding points to plot
  geom_line() +  #adding lines to plot
  theme_bw() +   #changing style/background
  #setting the x-axis with breaks and labels
  scale_x_continuous(limits=c(2,8),
                     breaks = c(2,3,4,5,6,7,8), 
                     name = "Grade at Testing") +    
  #setting the y-axis with limits breaks and labels
  scale_y_continuous(limits=c(0,5), 
                     breaks = c(0,1,2,3,4,5), 
                     name = "Hyperactivity")

1.0.3 Preliminaries - Data Preparation

For the multilevel model implementation we need to collect both math and hyp into a single outcome variable along with dummy variables that indicate which variable is represented. There are lots of ways to do this. We use a simple approach here. There are more elegant ways too.

# creating a stacked multivariate form of the data
#make math data
math <- data.frame (id = nlsy_math_hyp_long$id,
                    var = nlsy_math_hyp_long$math,
                    grade = nlsy_math_hyp_long$grade,
                    d_math = 1,
                    d_hyp = 0,
                    grp = 'math')

#make hyp data
hyp  <- data.frame (id = nlsy_math_hyp_long$id,
                    var = nlsy_math_hyp_long$hyp,
                    grade = nlsy_math_hyp_long$grade,
                    d_math = 0,
                    d_hyp = 1,
                    grp = 'hyp')
#rowbinding the data
nlsy_multi_long <- rbind(math,hyp)

#reordering for easy viewing
nlsy_multi_long <- nlsy_multi_long[order(nlsy_multi_long$id,nlsy_multi_long$grade,nlsy_multi_long$d_hyp), ]

#view the first few observations in the data set 
head(nlsy_multi_long, 20)
id var grade d_math d_hyp grp
1 201 38 3 1 0 math
2222 201 0 3 0 1 hyp
2 201 55 5 1 0 math
2223 201 0 5 0 1 hyp
3 303 26 2 1 0 math
2224 303 1 2 0 1 hyp
4 303 33 5 1 0 math
2225 303 1 5 0 1 hyp
5 2702 56 2 1 0 math
2226 2702 2 2 0 1 hyp
6 2702 58 4 1 0 math
2227 2702 3 4 0 1 hyp
7 2702 80 8 1 0 math
2228 2702 3 8 0 1 hyp
8 4303 41 3 1 0 math
2229 4303 1 3 0 1 hyp
9 4303 58 4 1 0 math
2230 4303 1 4 0 1 hyp
10 5002 46 4 1 0 math
2231 5002 3 4 0 1 hyp

Note the alternating row structure of the data with both math and hyp in a single variable var.

For the SEM implementation we need wide data with math2-math8 and hyp2-hym8 all in one row per person.

Load the repeated measures data (wide format)

#set filepath for data file
filepath <- "https://raw.githubusercontent.com/LRI-2/Data/main/GrowthModeling/nlsy_math_hyp_wide_R.dat"
#read in the text data file using the url() function
dat <- read.table(file=url(filepath),
                  na.strings = ".")  #indicates the missing data designator
#copy data with new name 
nlsy_math_hyp_wide <- dat  

#Add names the columns of the data set
# Give the variable names
names(nlsy_math_hyp_wide)<-c('id','female','lb_wght','anti_k1',
                             'math2','math3','math4','math5','math6','math7','math8',
                             'comp2','comp3','comp4','comp5','comp6','comp7','comp8',
                             'rec2','rec3','rec4','rec5','rec6','rec7','rec8',
                             'bpi2','bpi3','bpi4','bpi5','bpi6','bpi7','bpi8',
                             'asl2','asl3','asl4','asl5','asl6','asl7','asl8',
                             'ax2','ax3','ax4','ax5','ax6','ax7','ax8',
                             'hds2','hds3','hds4','hds5','hds6','hds7','hds8',
                             'hyp2','hyp3','hyp4','hyp5','hyp6','hyp7','hyp8',
                             'dpn2','dpn3','dpn4','dpn5','dpn6','dpn7','dpn8',
                             'wdn2','wdn3','wdn4','wdn5','wdn6','wdn7','wdn8',
                             'age2','age3','age4','age5','age6','age7','age8',
                             'men2','men3','men4','men5','men6','men7','men8',
                             'spring2','spring3','spring4','spring5','spring6','spring7','spring8',
                             'anti2','anti3','anti4','anti5','anti6','anti7','anti8')

#reducing to variables of interest 
nlsy_multi_wide <- nlsy_math_hyp_wide[ ,c('id',
                             'math2','math3','math4','math5','math6','math7','math8',
                             'hyp2','hyp3','hyp4','hyp5','hyp6','hyp7','hyp8')]

#view the first few observations in the data set 
head(nlsy_multi_wide, 10)
id math2 math3 math4 math5 math6 math7 math8 hyp2 hyp3 hyp4 hyp5 hyp6 hyp7 hyp8
201 NA 38 NA 55 NA NA NA NA 0 NA 0 NA NA NA
303 26 NA NA 33 NA NA NA 1 NA NA 1 NA NA NA
2702 56 NA 58 NA NA NA 80 2 NA 3 NA NA NA 3
4303 NA 41 58 NA NA NA NA NA 1 1 NA NA NA NA
5002 NA NA 46 NA 54 NA 66 NA NA 3 NA 2 NA 3
5005 35 NA 50 NA 60 NA 59 0 NA 3 NA 0 NA 1
5701 NA 62 61 NA NA NA NA NA 4 3 NA NA NA NA
6102 NA NA 55 67 NA 81 NA NA NA 2 0 NA 0 NA
6801 NA 54 NA 62 NA 66 NA NA 0 NA 1 NA 1 NA
6802 NA 55 NA 66 NA 68 NA NA 0 NA 0 NA 0 NA

We are now ready to rock!

2 Multilevel: Bivariate Growth Model using nlme() function

To capture the systematic growth in the mathematics scores and hyperactivity scores over grade, we use the grade variable, so that the rate of growth indicates (linear) school-related change in mathematics scores.

We use the multivariate (stacked) data in “extralong” format.

Then we fit the model with nlme using our expansive coding style. This is a difficult model to estimate so we add control parameters to the model (change interation number to 200).

# bivariate growth model with residual covariance
bivar_nlme <-  nlme(var~d_math*(b_1i+b_2i*(grade-2))+
                        d_hyp *(h_1i+h_2i*(grade-2)),
                  data      = nlsy_multi_long,
                  fixed     = b_1i+b_2i+h_1i+h_2i~1,
                  random    = b_1i+b_2i+h_1i+h_2i~1,
                  group     = ~id,
                  start     = c(35, 4, 1, -1),
                  #invoking separate residual variance for each variable
                  weights   = varIdent(c(hyp=.3), form = ~1|grp), 
                  # invokes off-diagonal symmetric structure for level 1 residuals
                  corr=corSymm(form = ~1 | id/grade), 
                  na.action = na.exclude, 
                  control=lmeControl(maxIter = 500, msMaxIter = 200, tolerance = 1e-4, 
                                    niter = 100, msTol = 1e-5, nlmStepMax = 500, 
                                    msVerbose = FALSE, returnObject = TRUE)) #to increase iterations


#obtaining summary of the model using the object we just created
summary(bivar_nlme)
## Nonlinear mixed-effects model fit by maximum likelihood
##   Model: var ~ d_math * (b_1i + b_2i * (grade - 2)) + d_hyp * (h_1i +      h_2i * (grade - 2)) 
##  Data: nlsy_multi_long 
##        AIC      BIC    logLik
##   23614.95 23723.54 -11790.48
## 
## Random effects:
##  Formula: list(b_1i ~ 1, b_2i ~ 1, h_1i ~ 1, h_2i ~ 1)
##  Level: id
##  Structure: General positive-definite, Log-Cholesky parametrization
##          StdDev     Corr                
## b_1i     8.04424863 b_1i   b_2i   h_1i  
## b_2i     0.86685535 -0.029              
## h_1i     1.24182333 -0.298  0.091       
## h_2i     0.06805861  0.196 -0.681 -0.236
## Residual 6.01055399                     
## 
## Correlation Structure: General
##  Formula: ~1 | id/grade 
##  Parameter estimate(s):
##  Correlation: 
##   1     
## 2 -0.002
## Variance function:
##  Structure: Different standard deviations per stratum
##  Formula: ~1 | grp 
##  Parameter estimates:
##      math       hyp 
## 1.0000000 0.1748208 
## Fixed effects: b_1i + b_2i + h_1i + h_2i ~ 1 
##         Value Std.Error   DF  t-value p-value
## b_1i 35.25867 0.3551088 3456 99.28977   0e+00
## b_2i  4.34300 0.0873775 3456 49.70387   0e+00
## h_1i  1.90348 0.0581513 3456 32.73317   0e+00
## h_2i -0.05660 0.0142232 3456 -3.97915   1e-04
##  Correlation: 
##      b_1i   b_2i   h_1i  
## b_2i -0.531              
## h_1i -0.166  0.037       
## h_2i  0.040 -0.067 -0.608
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.32773330 -0.53353192 -0.06380224  0.55395314  3.05034041 
## 
## Number of Observations: 4391
## Number of Groups: 932

Cool that that all works!

Note that this is a slightly more precise model than is in the book (because it has residual covariance in it).

3 SEM: Bivariate Growth Model using lavaan

For the SEM implementation, we use the mathematics achievement and hyperactivity scores and time-invariant covariates from the Wide Data.

Specifying bivariate linear growth model

#writing out linear growth model in full SEM way 
bivariate_lavaan_model <- '
  # latent variable definitions
      #intercept for math
      eta_1 =~ 1*math2
      eta_1 =~ 1*math3
      eta_1 =~ 1*math4
      eta_1 =~ 1*math5
      eta_1 =~ 1*math6
      eta_1 =~ 1*math7
      eta_1 =~ 1*math8

      #linear slope for math
      eta_2 =~ 0*math2
      eta_2 =~ 1*math3
      eta_2 =~ 2*math4
      eta_2 =~ 3*math5
      eta_2 =~ 4*math6
      eta_2 =~ 5*math7
      eta_2 =~ 6*math8
      
      #intercept for hyp
      eta_3 =~ 1*hyp2
      eta_3 =~ 1*hyp3
      eta_3 =~ 1*hyp4
      eta_3 =~ 1*hyp5
      eta_3 =~ 1*hyp6
      eta_3 =~ 1*hyp7
      eta_3 =~ 1*hyp8

      #linear slope for hyp
      eta_4 =~ 0*hyp2
      eta_4 =~ 1*hyp3
      eta_4 =~ 2*hyp4
      eta_4 =~ 3*hyp5
      eta_4 =~ 4*hyp6
      eta_4 =~ 5*hyp7
      eta_4 =~ 6*hyp8

  # factor variances
      eta_1 ~~ eta_1
      eta_2 ~~ eta_2
      eta_3 ~~ eta_3
      eta_4 ~~ eta_4

  # covariances among factors 
      eta_1 ~~ eta_2 + eta_3 + eta_4
      eta_2 ~~ eta_3 + eta_4
      eta_3 ~~ eta_4 

  # factor means 
      eta_1 ~ start(35)*1
      eta_2 ~ start(4)*1
      eta_3 ~ start(2)*1
      eta_4 ~ start(.1)*1

  # manifest variances for math (made equivalent by naming theta1)
      math2 ~~ theta1*math2
      math3 ~~ theta1*math3
      math4 ~~ theta1*math4
      math5 ~~ theta1*math5
      math6 ~~ theta1*math6
      math7 ~~ theta1*math7
      math8 ~~ theta1*math8
      
  # manifest variances for hyp (made equivalent by naming theta2)
      hyp2 ~~ theta2*hyp2
      hyp3 ~~ theta2*hyp3
      hyp4 ~~ theta2*hyp4
      hyp5 ~~ theta2*hyp5
      hyp6 ~~ theta2*hyp6
      hyp7 ~~ theta2*hyp7
      hyp8 ~~ theta2*hyp8
      
  # residual covariances (made equivalent by naming theta12)
      math2 ~~ theta12*hyp2
      math3 ~~ theta12*hyp3
      math4 ~~ theta12*hyp4
      math5 ~~ theta12*hyp5
      math6 ~~ theta12*hyp6
      math7 ~~ theta12*hyp7
      math8 ~~ theta12*hyp8
      
  # manifest means for math (fixed at zero)
      math2 ~ 0*1
      math3 ~ 0*1
      math4 ~ 0*1
      math5 ~ 0*1
      math6 ~ 0*1
      math7 ~ 0*1
      math8 ~ 0*1
      
  # manifest means for hyp (fixed at zero)
      hyp2 ~ 0*1
      hyp3 ~ 0*1
      hyp4 ~ 0*1
      hyp5 ~ 0*1
      hyp6 ~ 0*1
      hyp7 ~ 0*1
      hyp8 ~ 0*1
' #end of model definition

And estimating the model …

#estimating the model using sem() function
bivariate_lavaan_fit <- sem(bivariate_lavaan_model, 
                          data = nlsy_multi_wide,
                          meanstructure = TRUE,
                          estimator = "ML",
                          missing = "fiml")
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: some cases are empty and will be ignored:
##   741
## Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan
## WARNING: due to missing values, some pairwise combinations have less than 10%
## coverage

We can look more specifically at the numerical summaries of the model.

#obtaining summary of the model using the object we just created
summary(bivariate_lavaan_fit, fit.measures=TRUE)
## lavaan 0.6-7 ended normally after 67 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of free parameters                         35
##   Number of equality constraints                    18
##                                                       
##                                                   Used       Total
##   Number of observations                           932         933
##   Number of missing patterns                        96            
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                               318.885
##   Degrees of freedom                               102
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                              1532.041
##   Degrees of freedom                                91
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.849
##   Tucker-Lewis Index (TLI)                       0.866
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -11790.476
##   Loglikelihood unrestricted model (H1)     -11631.033
##                                                       
##   Akaike (AIC)                               23614.951
##   Bayesian (BIC)                             23697.186
##   Sample-size adjusted Bayesian (BIC)        23643.196
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.048
##   90 Percent confidence interval - lower         0.042
##   90 Percent confidence interval - upper         0.054
##   P-value RMSEA <= 0.05                          0.724
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.097
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   eta_1 =~                                            
##     math2             1.000                           
##     math3             1.000                           
##     math4             1.000                           
##     math5             1.000                           
##     math6             1.000                           
##     math7             1.000                           
##     math8             1.000                           
##   eta_2 =~                                            
##     math2             0.000                           
##     math3             1.000                           
##     math4             2.000                           
##     math5             3.000                           
##     math6             4.000                           
##     math7             5.000                           
##     math8             6.000                           
##   eta_3 =~                                            
##     hyp2              1.000                           
##     hyp3              1.000                           
##     hyp4              1.000                           
##     hyp5              1.000                           
##     hyp6              1.000                           
##     hyp7              1.000                           
##     hyp8              1.000                           
##   eta_4 =~                                            
##     hyp2              0.000                           
##     hyp3              1.000                           
##     hyp4              2.000                           
##     hyp5              3.000                           
##     hyp6              4.000                           
##     hyp7              5.000                           
##     hyp8              6.000                           
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   eta_1 ~~                                            
##     eta_2            -0.204    1.154   -0.176    0.860
##     eta_3            -2.979    0.673   -4.426    0.000
##     eta_4             0.107    0.164    0.654    0.513
##   eta_2 ~~                                            
##     eta_3             0.098    0.161    0.608    0.543
##     eta_4            -0.040    0.038   -1.061    0.289
##   eta_3 ~~                                            
##     eta_4            -0.020    0.031   -0.644    0.520
##  .math2 ~~                                            
##    .hyp2    (th12)   -0.011    0.233   -0.046    0.963
##  .math3 ~~                                            
##    .hyp3    (th12)   -0.011    0.233   -0.046    0.963
##  .math4 ~~                                            
##    .hyp4    (th12)   -0.011    0.233   -0.046    0.963
##  .math5 ~~                                            
##    .hyp5    (th12)   -0.011    0.233   -0.046    0.963
##  .math6 ~~                                            
##    .hyp6    (th12)   -0.011    0.233   -0.046    0.963
##  .math7 ~~                                            
##    .hyp7    (th12)   -0.011    0.233   -0.046    0.963
##  .math8 ~~                                            
##    .hyp8    (th12)   -0.011    0.233   -0.046    0.963
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     eta_1            35.259    0.356   99.178    0.000
##     eta_2             4.343    0.088   49.139    0.000
##     eta_3             1.903    0.058   32.702    0.000
##     eta_4            -0.057    0.014   -3.950    0.000
##    .math2             0.000                           
##    .math3             0.000                           
##    .math4             0.000                           
##    .math5             0.000                           
##    .math6             0.000                           
##    .math7             0.000                           
##    .math8             0.000                           
##    .hyp2              0.000                           
##    .hyp3              0.000                           
##    .hyp4              0.000                           
##    .hyp5              0.000                           
##    .hyp6              0.000                           
##    .hyp7              0.000                           
##    .hyp8              0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     eta_1            64.711    5.664   11.424    0.000
##     eta_2             0.752    0.329    2.282    0.023
##     eta_3             1.542    0.156    9.908    0.000
##     eta_4             0.005    0.009    0.539    0.590
##    .math2   (tht1)   36.126    1.863   19.390    0.000
##    .math3   (tht1)   36.126    1.863   19.390    0.000
##    .math4   (tht1)   36.126    1.863   19.390    0.000
##    .math5   (tht1)   36.126    1.863   19.390    0.000
##    .math6   (tht1)   36.126    1.863   19.390    0.000
##    .math7   (tht1)   36.126    1.863   19.390    0.000
##    .math8   (tht1)   36.126    1.863   19.390    0.000
##    .hyp2    (tht2)    1.104    0.057   19.404    0.000
##    .hyp3    (tht2)    1.104    0.057   19.404    0.000
##    .hyp4    (tht2)    1.104    0.057   19.404    0.000
##    .hyp5    (tht2)    1.104    0.057   19.404    0.000
##    .hyp6    (tht2)    1.104    0.057   19.404    0.000
##    .hyp7    (tht2)    1.104    0.057   19.404    0.000
##    .hyp8    (tht2)    1.104    0.057   19.404    0.000

We get information about how the model fits the data, and the model parameters.

The main parameter of interest, the slope-slope covariance, is not significantly different than zero.

#from the output above
#                   Estimate  Std.Err  z-value  P(>|z|)
#eta_3 ~~                                            
#    eta_4            -0.020    0.031   -0.644    0.520

We can make a diagram to check if we specified and estimated the model as intended.

#diagram of fitted model
semPaths(bivariate_lavaan_fit,what = "path", whatLabels = "par")

Thats cool!

4 Conclusion

This tutorial has presented how the multivariate growth model can be used to examine interrelations in change - specifically the between-person correlation of slopes.

Multivariate change is awesome, but be cautious with the presentation and inferences!