# Multiple equation models: Estimation and marginal effects using gsem

Starting point: A hurdle model with multiple hurdles

In a sequence of posts, we are going to illustrate how to obtain correct standard errors and marginal effects for models with multiple steps.

Our inspiration for this post is an old Statalist inquiry about how to obtain marginal effects for a hurdle model with more than one hurdle ( http://www.statalist.org/forums/forum/general-stata-discussion/general/1337504-estimating-marginal-effect-for-triple-hurdle-model ). Hurdle models have the appealing property that their likelihood is separable. Each hurdle has its own likelihood and regressors. You can estimate each one of these hurdles separately to obtain point estimates. However, you cannot get standard errors or marginal effects this way.

In this post,we show how to get the marginal effects and standard errors for a hurdle model with two hurdles using gsem . gsem is ideal for this purpose because it allows us to estimate likelihood-based models with multiple equations.

The model

Suppose we are interested in the mean spending on dental care, given the characteristic of the individuals. Some people spend zero dollars on dental care in a year, and some people spend more than zero dollars. Only the individuals that cross a hurdle are willing to spend a positive amount on dental care. Hurdle models allow the characteristics of the individuals that spend a positive amount and those who spend zero to differ.

There could be more than one hurdle. In the dental-care spending example, the second hurdle could be insurance coverage: uninsured, basic insurance, or premium insurance. We model the first hurdle of spending zero or a positive amount by a probit. We model the second hurdle of insurance level using an ordered probit. Finally, we model the positive amount spent using an exponential-mean model.

We are interested in the marginal effects for the mean amount spent for someone with premium insurance, given individual characteristics. The expression for this conditional mean is
/begin{eqnarray*}
/exp/left(X_e/beta_e/right)
/end{eqnarray*}
The conditional mean accounts for the probabilities of being in different threshold levels and for the expenditure preferences among those spending a positive amount.We use the subscripts /(p/), /(o/), and /(e/) to emphasize that the covariates and coefficients related to the probit, ordered probit, and exponential mean are different.

Below we will use gsem to estimate the model parameters from simulated data. spend is a binary outcome for whether an individual spends money on dental care, insurance is an ordered outcome indicating insurance level, and expenditure corresponds to the amount spent on dental care.

`. gsem  (spend       <- x1 x2 x4, probit)     >       (insurance   <- x3 x4, oprobit)       >       (expenditure <- x5 x6 x4, poisson),   >        vce(robust) note: expenditure has noncount values;       you are responsible for the family(poisson) interpretation  Iteration 0:   log pseudolikelihood = -171938.67 Iteration 1:   log pseudolikelihood = -79591.213 Iteration 2:   log pseudolikelihood = -78928.015 Iteration 3:   log pseudolikelihood = -78925.126 Iteration 4:   log pseudolikelihood = -78925.126  Generalized structural equation model        Number of obs     =     10,000  Response       : spend Family         : Bernoulli Link           : probit  Response       : insurance Family         : ordinal Link           : probit  Response       : expenditure Family         : Poisson Link           : log  Log pseudolikelihood = -78925.126  ----------------------------------------------------------------------------                |               Robust                |      Coef.   Std. Err.     z    P>|z|  [95% Conf. Interval] ---------------+------------------------------------------------------------ spend <-       |          x1    |   .5189993   .0161283   32.18   0.000   .4873884   .5506102          x2    |  -.4755281     .02257  -21.07   0.000  -.5197646  -.4312917          x4    |   .5300193   .0187114   28.33   0.000   .4933455    .566693       _cons    |   .4849085   .0288667   16.80   0.000   .4283308   .5414862 ---------------+------------------------------------------------------------ insurance <-   |             x3 |    .299793   .0084822   35.34   0.000   .2831681   .3164178             x4 |  -.2835648   .0135266  -20.96   0.000  -.3100765  -.2570531 ---------------+------------------------------------------------------------ expenditure <- |             x5 |  -.2992792   .0192201  -15.57   0.000  -.3369499  -.2616086             x6 |    .319377   .0483959    6.60   0.000   .2245229   .4142312             x4 |    .448041   .0252857   17.72   0.000   .3984819   .4976001          _cons |   1.088217   .0375369   28.99   0.000   1.014646   1.161788 ---------------+------------------------------------------------------------ insurance      |          /cut1 |   -1.28517   .0236876  -54.26   0.000  -1.331596  -1.238743          /cut2 |  -.2925979   .0216827  -13.49   0.000  -.3350951  -.2501006          /cut3 |   .7400875   .0230452   32.11   0.000   .6949198   .7852552 ----------------------------------------------------------------------------`

The estimated probit parameters are in the spend equation. The estimated ordinal-probit parameters are in the insurance equation. The estimated expenditure parameters are in the expenditure equation. We could have obtained these point estimates using probit , oprobit , and poisson . With gsem , we do this jointly and obtain correct standard errors when computing marginal effects. In the case of the poisson model, we are using gsem to obtain an exponential mean and should interpret the outcomes from a quasilikelihood perspective. Because of the quasilikelihood nature of the problem, we use the vce(robust) option.

The average of the marginal effect of x4 is
/begin{equation*}
/frac{1}{N}/sum_{i=1}^N /frac{/partial /hat{E}/left(/text{expenditure}_i|X_i, {/tt insurance}_i/right)}{/partial {/tt x4}_i}
/end{equation*}
and we estimate it by

`. margins, vce(unconditional) predict(expression(normal(eta(spend))*  >          normal(eta(insurance)-_b[insurance_cut2:_cons])*           >          exp(eta(expenditure)))) dydx(x4)  Average marginal effects                     Number of obs     =     10,000  Expression   : Predicted normal(eta(spend))*                normal(eta(insurance)-_b[insurance_cut2:_cons])* e,                predict(expression(normal(eta(spend))*                normal(eta(insurance)-_b[insurance_cut2:_cons])*                exp(eta(expenditure)))) dy/dx w.r.t. : x4  ---------------------------------------------------------------------------           |            Unconditional           |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval] ----------+----------------------------------------------------------------        x4 |   .5382276   .0506354    10.63   0.000     .4389841    .6374711 ---------------------------------------------------------------------------`

We used the expression() option to write an expression for the expected value of interest and predict() and eta() to denote the linear predictions for each model. We use the vce(unconditional) option to allow the covariates to be random instead of fixed. In other words, we are estimating a population effect instead of a sample effect.

Final considerations

We illustrated how to use gsem to obtain the estimates and standard errors for a multiple hurdle model and its marginal effect. In subsequent posts, we will obtain these results using other Stata tools.

Appendix

Below is the code used to produce the data.

`clear set seed 111 set obs 10000 // Generating exogenous variables generate x1 = rnormal() generate x2 = int(3*rbeta(2,3)) generate x3 = rchi2(1)-2 generate x4 = ln(rchi2(4)) generate x5 = rnormal() generate x6 = rbeta(2,3)>.6 // Generating unobservables generate ep = rnormal() // for probit generate eo = rnormal() // for ordered probit generate e  = rnormal() // for lognormal equation // Generating linear predictions generate xbp = .5*(1 + x1 - x2 + x4) generate xbo = .3*(1 + x3 - x4) generate xbe = .3*(1 - x5 + x6 + x4) // Generating outcomes generate spend       = xbp + ep > 0 generate yotemp      = xbo + eo generate insurance   = yotemp generate yexp = exp(xbe + e) replace insurance = 1 if yotemp < -1 replace insurance = 2 if yotemp> -1 & yotemp<0 replace insurance = 3 if yotemp> 0 & yotemp <1 replace insurance = 4 if yotemp>1 generate expenditure = spend*insurance*yexp`