因果中介分析指南（三）：stata中mediate的用法（完结）【包含连续型处理效应的因果中介】

在这里插入图片描述

欢迎关注「阿水实证通」，前沿方法时刻看！🌟🌟🌟

承接之前推文，本文继续介绍mediate因果中介的剩余全部估计用法，原文链接如下：
因果中介分析指南（一）：估计流程与理论介绍

- 示例 7：带有计数中介变量的因果中介模型
- 示例 8：带有指数均值结果的因果中介模型
- 示例 9：带有多值处理的因果中介模型
- 示例 10：带有连续处理的因果中介模型
- 示例 11：估计控制直接效应
- 示例 12：在不同尺度上估计处理效应

示例7：带有计数中介变量的因果中介模型

我们使用一个关于出生体重的虚构数据集，演示当对计数中介变量使用泊松模型时，如何进行因果中介分析。

现在我们假设拥有的是观察数据而非实验数据。样本包括已生育孩子的女性。我们希望了解母亲的社会经济地位和教育程度是否会影响孩子的健康。结果变量是婴儿的出生体重（bweight），处理变量是母亲是否拥有大学学位（college），中介变量是孕期每天吸烟的数量（ncigs）。假设是教育程度较高的女性可能吸烟更少，而孕期吸烟对出生体重有负面影响。以下是数据集的节选：

. use https://www.stata-press.com/data/r19/birthweight, clear 

. list bweight ncigs college ses sespar age in 1/5  

     +-------------------------------------------------------+
     | bweight   ncigs   college        ses     sespar   age |
     |-------------------------------------------------------|
  1. |    3621       1        No     5.3581   3.308523    29 |
  2. |    3278       0       Yes   9.556957   4.376035    38 |
  3. |    3073       1        No   3.980829   6.580275    39 |
  4. |    3306       0       Yes   11.17643   12.12075    30 |
  5. |    4517       0       Yes   9.026146   4.738766    28 |
     +-------------------------------------------------------+

我们为结果变量bweight拟合线性模型，为中介变量ncigs拟合泊松模型，并将college指定为二分类处理变量。由于我们拥有的是完全观察数据，其中处理分配不再完全随机，因此必须考虑示例2中提到的所有类型的混杂因素，包括协变量并放宽无交互作用假设。我们在两个方程中都将若干潜在混杂因素指定为协变量。

我们假设吸烟的不利影响在有大学学位和无大学学位的女性之间没有差异，因此使用nointeract选项。

mediate (bweight sespar c.age##c.age)  (ncigs sespar c.age##c.age, poisson) (college), nointeract

Iteration 0:  EE criterion =  1.980e-21  
Iteration 1:  EE criterion =  1.979e-21  (backed up)

Causal mediation analysis                                Number of obs = 2,000

Outcome model:     Linear
Mediator model:    Poisson
Mediator variable: ncigs
Treatment type:    Binary
------------------------------------------------------------------------------
             |               Robust
     bweight | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
NIE          |
     college |
(Yes vs No)  |   167.3075   21.36134     7.83   0.000     125.4401     209.175
-------------+----------------------------------------------------------------
NDE          |
     college |
(Yes vs No)  |   347.3375   34.44561    10.08   0.000     279.8253    414.8496
-------------+----------------------------------------------------------------
TE           |
     college |
(Yes vs No)  |    514.645   28.65043    17.96   0.000     458.4912    570.7988
------------------------------------------------------------------------------
Note: Outcome equation does not include treatment–mediator interaction.

如前所述，我们对中介变量使用的模型类型不影响估计处理效应的解释。效应是结果变量尺度上的期望差异。TE表明，如果所有女性都有大学学位，新生儿的平均出生体重将比没有女性拥有大学学位时的平均出生体重高近515克。在这一体重增加中，约167克归因于教育程度较高的女性吸烟更少，而347克归因于其他机制。

示例8：带有指数均值结果的因果中介模型

这里我们对结果变量bweight使用指数均值模型。

mediate (bweight sespar c.age##c.age, expmean) (ncigs sespar c.age##c.age, poisson)(college), nointeract

Iteration 0:  EE criterion =  3.250e-13  
Iteration 1:  EE criterion =  9.851e-18  

Causal mediation analysis                                Number of obs = 2,000

Outcome model:     Exponential mean
Mediator model:    Poisson
Mediator variable: ncigs
Treatment type:    Binary
------------------------------------------------------------------------------
             |               Robust
     bweight | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
NIE          |
     college |
(Yes vs No)  |    198.978   23.53279     8.46   0.000     152.8546    245.1014
-------------+----------------------------------------------------------------
NDE          |
     college |
(Yes vs No)  |   320.3318   34.47792     9.29   0.000     252.7563    387.9072
-------------+----------------------------------------------------------------
TE           |
     college |
(Yes vs No)  |   519.3098   28.70435    18.09   0.000     463.0503    575.5693
------------------------------------------------------------------------------
Note: Outcome equation does not include treatment–mediator interaction.

由于我们仍在对连续结果建模，解释不变。TE约为519克，其中199克归因于拥有大学学位的女性吸烟更少。

示例9：带有多值处理的因果中介模型

到目前为止，我们只处理了二分类处理变量。然而，实验通常有超过两个处理组，或者观察性处理可能由多个类别组成，此时我们称处理为多值处理。

为了演示，我们回到幸福感数据，使用处理变量mexercise，该变量捕获三个处理组：对照组、个体锻炼45分钟的组和个体锻炼90分钟的组。这种设计允许研究者了解锻炼时长是否以及如何影响血清素水平，进而影响幸福感。这里我们对结果和中介变量都使用线性模型，并将多值处理变量mexercise作为我们的处理变量：

use https://www.stata-press.com/data/r19/wellbeing, clear 
mediate (wellbeing age gender i.hstatus basewell) (bonotonin basebono)  (mexercise)

Iteration 0:  EE criterion =  1.138e-26  
Iteration 1:  EE criterion =  8.739e-27  

Causal mediation analysis                                Number of obs = 2,000

Outcome model:     Linear
Mediator model:    Linear
Mediator variable: bonotonin
Treatment type:    Multivalued
------------------------------------------------------------------------------------------
                         |               Robust
               wellbeing | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------------------+----------------------------------------------------------------
NIE                      |
               mexercise |
(45 minutes vs Control)  |   5.128899   .3505171    14.63   0.000     4.441898    5.815899
(90 minutes vs Control)  |   9.780537   .2880877    33.95   0.000     9.215895    10.34518
-------------------------+----------------------------------------------------------------
NDE                      |
               mexercise |
(45 minutes vs Control)  |   1.197498   .1750038     6.84   0.000     .8544965    1.540499
(90 minutes vs Control)  |   3.051084   .2071236    14.73   0.000     2.645129    3.457039
-------------------------+----------------------------------------------------------------
TE                       |
               mexercise |
(45 minutes vs Control)  |   6.326396   .3894269    16.25   0.000     5.563134    7.089659
(90 minutes vs Control)  |   12.83162   .2967962    43.23   0.000     12.24991    13.41333
------------------------------------------------------------------------------------------
Note: Outcome equation includes treatment–mediator interaction.

由于我们将两个处理组与对照组进行比较，每个估计量现在有两个效应。从TE开始，如果人群中每个人都锻炼90分钟，幸福感预计会增加近13分。在这13分中，约10分归因于血清素水平的增加，3分归因于其他机制。45分钟处理组的结果虽然幅度预计较小，但解释方式类似。

示例10：带有连续处理的因果中介模型

除了二分类或多值处理，我们还可能有连续处理变量。对于连续处理，我们必须指定至少两个值，一个作为处理值，另一个作为对照值。我们回到出生体重数据，使用社会经济地位（ses）作为连续处理变量。以下是ses的一些描述性统计：

use https://www.stata-press.com/data/r19/birthweight 
summarize ses

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         ses |      2,000    7.804412    2.287496   1.304026   16.27844

可以看到，ses的范围从约1到16，均值约为8。然而，这些值本身说明不了太多，因为该变量是在任意尺度上测量的。因此，我们对其进行标准化，使生成的变量均值为0，标准差为1：

gen std_ses =(ses-r(mean))/r(sd)

我们将使用新变量std_ses作为处理变量。在定义处理的第三组括号中，我们使用continuous()选项。该选项告诉mediate将变量视为连续变量，并使用该选项中指定的值作为对照和处理点。第一个值是对照，剩余值是与对照比较的处理值。这里我们将均值下1个标准差指定为对照值，均值上1个标准差指定为处理值：

mediate (bweight sespar c.age##c.age, expmean)  (ncigs sespar c.age##c.age, poisson) (std_ses, continuous(-1 1)), nointeract

Iteration 0:  EE criterion =  1.470e-12  
Iteration 1:  EE criterion =  1.764e-17  

Causal mediation analysis                                Number of obs = 2,000

Outcome model:     Exponential mean
Mediator model:    Poisson
Mediator variable: ncigs
Treatment type:    Continuous

Continuous treatment levels:
  0: std_ses = -1 (control)
  1: std_ses =  1

------------------------------------------------------------------------------
             |               Robust
     bweight | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
NIE          |
     std_ses |
   (1 vs 0)  |   171.3015   14.68778    11.66   0.000      142.514     200.089
-------------+----------------------------------------------------------------
NDE          |
     std_ses |
   (1 vs 0)  |   170.0598   32.14841     5.29   0.000       107.05    233.0695
-------------+----------------------------------------------------------------
TE           |
     std_ses |
   (1 vs 0)  |   341.3613   31.73741    10.76   0.000     279.1571    403.5655
------------------------------------------------------------------------------
Note: Outcome equation does not include treatment–mediator interaction.

即使我们使用了连续处理变量，结果的解释与之前相同：如果人群中每个人的社会经济地位都比均值高1个标准差，新生儿的出生体重将比每个人的地位值比均值低1个标准差时高约341克。在这341克中，约一半归因于地位较高的女性吸烟更少，另一半归因于其他机制。

我们还可以在不止两个值上评估处理效应。这里我们使用标准化变量的均值（0）作为基准，并在-2、-1、1和2处评估处理效应：

mediate (bweight sespar c.age##c.age, expmean)(ncigs sespar c.age##c.age, poisson)  (std_ses, continuous(0 -2 -1 1 2)), nointeract

Iteration 0:  EE criterion =  1.470e-12  
Iteration 1:  EE criterion =  2.507e-17  

Causal mediation analysis                                Number of obs = 2,000

Outcome model:     Exponential mean
Mediator model:    Poisson
Mediator variable: ncigs
Treatment type:    Continuous

Continuous treatment levels:
  0: std_ses =  0 (control)
  1: std_ses = -2
  2: std_ses = -1
  3: std_ses =  1
  4: std_ses =  2

------------------------------------------------------------------------------
             |               Robust
     bweight | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
NIE          |
     std_ses |
   (1 vs 0)  |  -276.2757   27.69004    -9.98   0.000    -330.5471   -222.0042
   (2 vs 0)  |  -100.1155   9.170566   -10.92   0.000    -118.0894   -82.14148
   (3 vs 0)  |   65.84585   5.423096    12.14   0.000     55.21678    76.47493
   (4 vs 0)  |   110.1346   8.724232    12.62   0.000     93.03538    127.2337
-------------+----------------------------------------------------------------
NDE          |
     std_ses |
   (1 vs 0)  |  -170.9012   31.33649    -5.45   0.000    -232.3196   -109.4828
   (2 vs 0)  |  -86.56069   16.08129    -5.38   0.000    -118.0794   -55.04193
   (3 vs 0)  |   88.83929   16.94031     5.24   0.000      55.6369    122.0417
   (4 vs 0)  |   180.0172   34.77372     5.18   0.000     111.8619    248.1724
-------------+----------------------------------------------------------------
TE           |
     std_ses |
   (1 vs 0)  |  -447.1769   35.41401   -12.63   0.000    -516.5871   -377.7667
   (2 vs 0)  |  -186.6761   15.73291   -11.87   0.000    -217.5121   -155.8402
   (3 vs 0)  |   154.6851   16.31969     9.48   0.000     122.6991    186.6712
   (4 vs 0)  |   290.1517   33.85571     8.57   0.000     223.7958    356.5077
------------------------------------------------------------------------------
Note: Outcome equation does not include treatment–mediator interaction.

现在我们为每个处理效应获得四个效应估计值，这些估计值捕获了相对于对照点的结果期望差异。有了多个效应估计值，绘制结果会很方便。我们使用估计后命令estat effectsplot来实现：

示例11：估计控制直接效应

控制直接效应（CDE）与我们迄今为止处理的其他估计量不同。在这里，潜在结果的形式不是Y_i(t, M_i(t’)) ，而是Y_i(t\vert M_i = m)。也就是说，我们对每个处理水平都有在中介变量设定值下评估的潜在结果。因此，CDE仅使用结果方程的结果。假设处理是二分类的，中介变量值为m时的CDE为：
$\mathrm{CDE}(m) = Y_i(1 \vert M_i = m) - Y_i(0 \vert M_i = m)$

CDE可以使用估计后命令estat cde进行估计。

为了演示，我们首先使用幸福感数据拟合中介模型：

use https://www.stata-press.com/data/r19/wellbeing, clear 
mediate (bwellbeing age gender i.hstatus, probit) (bbonotonin, probit)  (exercise)

Iteration 0:  EE criterion =  4.326e-19  
Iteration 1:  EE criterion =  2.032e-32  

Causal mediation analysis                                Number of obs = 2,000

Outcome model:     Probit
Mediator model:    Probit
Mediator variable: bbonotonin
Treatment type:    Binary
----------------------------------------------------------------------------------------
                       |               Robust
            bwellbeing | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-----------------------+----------------------------------------------------------------
NIE                    |
              exercise |
(Exercise vs Control)  |   .0962832   .0288905     3.33   0.001     .0396588    .1529076
-----------------------+----------------------------------------------------------------
NDE                    |
              exercise |
(Exercise vs Control)  |   .1672304   .0358936     4.66   0.000     .0968803    .2375805
-----------------------+----------------------------------------------------------------
TE                     |
              exercise |
(Exercise vs Control)  |   .2635137   .0212346    12.41   0.000     .2218946    .3051327
----------------------------------------------------------------------------------------
Note: Outcome equation includes treatment–mediator interaction.

我们为结果和中介变量拟合了probit模型，但这里模型类型并不重要；我们可以在使用mediate拟合任何模型后使用estat cde。

如果人群中每个人的中介变量值为0（血清素水平没有改善），TE是多少？为了找出答案，我们通过在estat cde中指定mvalue(0)选项来估计CDE：

estat cde, mvalue(0)

Controlled direct effect
Number of obs = 2,000
Mediator variable: bbonotonin
Mediator value = 0

            CDE      Delta-method std.err.    Z    P>|z|     [95%conf．interval]
exercise (Exercise vs Control)    0.1605355    0.039731    4.04   0.000     0.0826641    0.2384068

该CDE约为0.16。假设人群中没有人的血清素水平至少增加10%，则每个人都锻炼时幸福感增加的概率比没有人锻炼时高0.16。

我们可以指定中介变量的多个值来执行相同的分析。这里我们希望同时估计CDE(0)和CDE(1)：

estat cde, mvalue(0 1)

Controlled direct effect                                 Number of obs = 2,000

Mediator variable: bbonotonin
Mediator values:
  1._at: bbonotonin = 0
  2._at: bbonotonin = 1

------------------------------------------------------------------------------------------
                         |            Delta-method
                         |        CDE   std. err.      z    P>|z|     [95% conf. interval]
-------------------------+----------------------------------------------------------------
            exercise@_at |
(Exercise vs Control) 1  |   .1605355    .039731     4.04   0.000     .0826641    .2384068
(Exercise vs Control) 2  |   .2224479   .0493025     4.51   0.000     .1258166    .3190791
------------------------------------------------------------------------------------------

如果我们“开启”中介变量，CDE会高出约0.06分。我们也可以使用contrast选项直接估计这一差异：

estat cde, mvalue(0 1) contrast


Controlled direct effect                                 Number of obs = 2,000

Mediator variable: bbonotonin
Mediator values:
  1._at: bbonotonin = 0
  2._at: bbonotonin = 1

-------------------------------------------------------------------------------------------------
                                |            Delta-method
                                |        CDE   std. err.      z    P>|z|     [95% conf. interval]
--------------------------------+----------------------------------------------------------------
                   _at#exercise |
(2 vs 1) (Exercise vs Control)  |   .0619124   .0630241     0.98   0.326    -.0616126    .1854373
-------------------------------------------------------------------------------------------------

示例12：在不同尺度上估计处理效应

mediate命令在结果变量的自然尺度上估计处理效应。然而，如果结果变量是二分类的，一些研究者可能希望在优势比（odds-ratio）或风险比（risk-ratio）尺度上呈现估计效应；如果结果变量是计数变量，则可能希望在发生率比（incidence-rate–ratio）尺度上呈现。估计后命令estat rr、estat or和estat irr将估计的处理效应转换到这些不同尺度。

为了了解其工作原理，我们首先使用二分类结果变量拟合以下模型。我们可以对结果变量使用probit或logit模型；这里我们使用probit：

mediate (bwellbeing age gender i.hstatus, probit) (bbonotonin, probit)  (exercise),all

Iteration 0:  EE criterion =  4.326e-19  
Iteration 1:  EE criterion =  2.262e-32  

Causal mediation analysis                                Number of obs = 2,000

Outcome model:     Probit
Mediator model:    Probit
Mediator variable: bbonotonin
Treatment type:    Binary
----------------------------------------------------------------------------------------
                       |               Robust
            bwellbeing | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-----------------------+----------------------------------------------------------------
POmeans                |
                  Y0M0 |   .3014153   .0146737    20.54   0.000     .2726554    .3301752
                  Y1M0 |   .4686457   .0328309    14.27   0.000     .4042984    .5329931
                  Y0M1 |   .3536121   .0381972     9.26   0.000     .2787469    .4284773
                  Y1M1 |   .5649289   .0154435    36.58   0.000     .5346603    .5951976
-----------------------+----------------------------------------------------------------
NIE                    |
              exercise |
(Exercise vs Control)  |   .0962832   .0288905     3.33   0.001     .0396588    .1529076
-----------------------+----------------------------------------------------------------
NDE                    |
              exercise |
(Exercise vs Control)  |   .1672304   .0358936     4.66   0.000     .0968803    .2375805
-----------------------+----------------------------------------------------------------
PNIE                   |
              exercise |
(Exercise vs Control)  |   .0521968   .0348642     1.50   0.134    -.0161357    .1205293
-----------------------+----------------------------------------------------------------
TNDE                   |
              exercise |
(Exercise vs Control)  |   .2113169    .041136     5.14   0.000     .1306917     .291942
-----------------------+----------------------------------------------------------------
TE                     |
              exercise |
(Exercise vs Control)  |   .2635137   .0212346    12.41   0.000     .2218946    .3051327
----------------------------------------------------------------------------------------
Note: Outcome equation includes treatment–mediator interaction.

由于我们的结果变量是二分类的，且结果模型是probit，潜在结果均值是平均概率，并且由于处理效应是潜在结果均值之间的差异，估计的效应可以解释为风险差异。例如，自然间接效应是潜在结果均值Y1M1和Y1M0之间的差异：

display _b[POmeans:Y1M1]-_b[POmeans:Y1M0]
0.09628322

如果我们不想在风险差异尺度上解释效应，而是想在风险比尺度上解释，我们可以简单地计算潜在结果均值的比率：

display _b[POmeans:Y1M1]/_b[POmeans:Y1M0]
1.2054499

这就是estat rr所做的：

estat rr

Transformed treatment effects                            Number of obs = 2,000
----------------------------------------------------------------------------------------
                       |               Robust
            bwellbeing | Risk ratio   std. err.      z    P>|z|     [95% conf. interval]
-----------------------+----------------------------------------------------------------
NIE                    |
              exercise |
(Exercise vs Control)  |    1.20545   .0746555     3.02   0.003      1.06766    1.361023
-----------------------+----------------------------------------------------------------
NDE                    |
              exercise |
(Exercise vs Control)  |   1.554817    .132328     5.19   0.000     1.315937    1.837062
-----------------------+----------------------------------------------------------------
PNIE                   |
              exercise |
(Exercise vs Control)  |   1.173172   .1157431     1.62   0.105      .966905    1.423442
-----------------------+----------------------------------------------------------------
TNDE                   |
              exercise |
(Exercise vs Control)  |   1.597595   .1778207     4.21   0.000     1.284469    1.987055
-----------------------+----------------------------------------------------------------
TE                     |
              exercise |
(Exercise vs Control)  |   1.874254   .1043578    11.28   0.000     1.680482     2.09037
----------------------------------------------------------------------------------------

总处理效应现在被分解为乘法分量NIE和NDE以及PNIE和TNDE。即，取它们的乘积而不是和，将得到总效应。

类似地，我们可以使用estat or在优势比尺度上表达所有效应：

estat or

Transformed treatment effects                            Number of obs = 2,000
----------------------------------------------------------------------------------------
                       |               Robust
            bwellbeing | Odds ratio   std. err.      z    P>|z|     [95% conf. interval]
-----------------------+----------------------------------------------------------------
NIE                    |
              exercise |
(Exercise vs Control)  |   1.472222   .1708025     3.33   0.001     1.172788    1.848106
-----------------------+----------------------------------------------------------------
NDE                    |
              exercise |
(Exercise vs Control)  |   2.044157   .3042049     4.80   0.000     1.527008    2.736449
-----------------------+----------------------------------------------------------------
PNIE                   |
              exercise |
(Exercise vs Control)  |   1.267908   .1933291     1.56   0.120     .9403671    1.709534
-----------------------+----------------------------------------------------------------
TNDE                   |
              exercise |
(Exercise vs Control)  |   2.373558   .4231302     4.85   0.000     1.673622    3.366217
-----------------------+----------------------------------------------------------------
TE                     |
              exercise |
(Exercise vs Control)  |   3.009452    .281479    11.78   0.000     2.505377    3.614945
----------------------------------------------------------------------------------------

在这里，优势比尺度上的总平均处理效应为3，分别由NIE和NDE的优势比1.47和2.04，以及PNIE和TNDE的优势比1.27和2.37组成。通常，在风险差异尺度上解释处理效应更直观，但在某些应用中，将其转换为风险比或优势比尺度可能是可取的。

请注意，estat rr、estat or和estat irr需要使用mediate估计潜在结果均值。如果拟合的模型不包含潜在结果均值估计，这些estat命令将重新拟合模型。重新估计不会影响结果，但计算时间会更长。

存储结果

mediate在e()中存储以下内容：

参考文献

Baron, R. M., and D. A. Kenny. 1986. The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology 51: 1173–1182.
Holland, P. W. 1986. Statistics and causal inference. Journal of the American Statistical Association 81: 945–960.
Imai, K., L. Keele, and D. Tingley. 2010. A general approach to causal mediation analysis. Psychological Methods 15: 309–334.
Imbens, G. W. 2004. Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and Statistics 86: 4–29.
Nguyen, T. Q., I. Schmid, E. L. Ogburn, and E. A. Stuart. 2022. Clarifying causal mediation analysis: Effect identification via three assumptions and five potential outcomes. Journal of Causal Inference 10: 246–279.
Nguyen, T. Q., I. Schmid, and E. A. Stuart. 2021. Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn. Psychological Methods 26: 255–271.
Pearl, J., and D. MacKenzie. 2018. The Book of Why: The New Science of Cause and Effect. New York: Basic Books.
Robins, J. M., and S. Greenland. 1992. Identifiability and exchangeability for direct and indirect effects. Epidemiology 3: 143–155.
VanderWeele, T. J. 2015. Explanation in Causal Inference: Methods for Mediation and Interaction. New York: Oxford University Press.

在这里插入图片描述