# Mediation analysis

- My research question is about mediation. Which analysis should I use?
- What is PROCESS and how do I get started?
- How do I perform a simple mediation analysis via PROCESS?
- Which options should I choose in Process?
- How do I interpret output of a simple mediation analysis with PROCESS?
- How do I report the results from my mediation analysis?
- I want to do more than simple mediation. How do I do that?
- Process doesn’t work! What did I do wrong?
- Where can I find more information on PROCESS?
- How do I do mediation analysis via MEMORE (within-subject design)?
- What are the assumptions for the mediation analyses done in PROCESS?
- How do I perform a power analysis for a mediation analysis?

## 1. My research question is about mediation. Which analysis should I use?

If your research question is about mediation, you want to investigate whether the effect of an independent variable X on a dependent variable Y runs via a mediator M.

Most likely, your research question is about X, M and Y variables that are all measured *once*. In this case, you can do mediation analysis using PROCESS.

Do you have a within-subject design in which M and Y are measured multiple times for different levels of X? Then you can do mediation analysis using MEMORE.

Please note! In the past, the so-called Baron and Kenny approach and the Sobel test were often used to investigate research questions about mediation. These techniques are outdated and do not answer questions about mediation in a correct way. Therefore, you should not use these approaches. Instead, study mediation questions using PROCESS (or in some cases MEMORE).

## 2. What is PROCESS and how do I get started?

PROCESS is a macro developed by Andrew Hayes and can be used in SPSS. You can download it via this website. By installing a custom dialog file, you add the option PROCESS in the pull-down menu under Analyze à Regression (when you download the PROCESS material, you will find a document that describes how to install custom dialog files).

Before you use PROCESS, you should read the chapter on Mediation in Hayes, A.F. *Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach*. The Quilford Press.

The third edition of the book is available as **e-book** through RUQuest (you might have to login first with your RU-account to get access):

The first edition is also available as print version in the University library.

Another option: An older paper by Preacher & Hayes (2004) provides a short overview of mediation and gives an explanation about why regular regression analyses alone are not sufficient to analyze mediation and why you should use the bootstrap method of mediation analysis (which is what PROCESS does). Subsequently, a precursor of PROCESS is introduced in the paper.

Preacher, K.J. & Hayes, A.F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. *Behavior Research Methods, Instruments, & Computers, 36*(4), 717 - 731.

Please note! If you are going to use PROCESS, you need to read the relevant chapter(s) in the book referred to above and refer to the book in your research proposal /paper.

## 3. How do I perform a simple mediation analysis via PROCESS?

PROCESS uses a regression-based approach in addition to bootstrapping.

If the dependent variable Y is continuous, PROCESS uses regression analyses to estimate the different paths in the model. If the dependent variable Y is dichotomous, PROCESS uses logistic regression for all paths involving Y and regression analyses for the other paths.

Your mediator variable should be a continuous variable. Your independent variable X can either be a continuous variable or a qualitative one (if you have more than 2 levels of X, use the *Multicategorical *button in PROCESS).

In the most simple model of mediation, there is one independent variable X, one mediator M and one dependent variable Y. In PROCESS you have to fill in X, M and Y in the correct boxes and choose *Model number 4*. Under options, tick the box next to *Show total effect model*.

The total model involves the relationship between the independent variable X and the dependent variable Y *without *a mediator M in the model. The effect of X on Y in this model is called the total effect and is indicated as the *c* path*.*

The mediation model involves the relationship between the independent variable X and the dependent variable Y with mediator M in the model. As you can see in the picture below, this model contains 3 different paths.

The direct effect (** c’ path**) is the

**direct effect**of independent variable X on dependent variable Y, controlled for the mediator M (so both X and M are predictors in the model). The

**is the effect of independent variable X on mediator M. The**

*a*path**is the effect of mediator M on dependent variable Y (with both M and X included as predictors in the model).**

*b*pathThe combination of path *a* and path *b* is called the **indirect effect**: the effect of independent variable X on dependent variable Y via mediator M. The indirect effect (** ab**) is the product of the

*a*path times the

*b*path.

For a research question (or hypothesis) about mediation, the indirect effect is the most relevant outcome of the analysis to report.

## 4. Which options should I choose in Process?

There is a button called ‘Options’ which shows a couple of options that might be handy. Which ones are recommended depends on the type of analysis (i.e. mediation models or moderation models).

For mediation analysis:

- ‘Show total effect model’: Check this box in order to get the result for the total effect in the output (c path,
*click here for explanation*). - ‘Standardized effects’: If you do check it, you get the standardized effects below the unstandardized effects (otherwise, all output is unstandardized). Standardized effects can be useful to get an idea about the relative contribution of each variable. (However, be careful about the meaning of such comparisons between standardized coefficients. There are pros and cons of reporting standardized versus unstandardized effects. See for example Hayes’ paragraph about that on pp. 541-542 and the literature he refers to in his book).

## 5. How do I interpret output of a simple mediation analysis with PROCESS?

The output of PROCESS starts with information about the version of PROCESS you are using, followed by a summary of the model you are testing including sample size data.

Next, PROCESS gives output of several (logistic) regression analyses, which are discussed below in the order in which they are presented in the output.

In the first analysis, the effect of the independent variable X on the mediator M is tested (the heading *OUTCOME VARIABLE* contains the name of your mediator). Under *coeff,* you find a regression weight for your independent variable X. This is the value of coefficient a in your mediation model. You can use the p-value to determine whether the ** a path** is significantly different from 0.

In the second analysis (the one where *OUTCOME VARIABLE* contains the name of your dependent variable), two effects are given.

The direct effect of independent variable X on dependent variable Y is given by the regression weight of your independent variable (under *coeff*). This is the *c’* coefficient and you can use the p-value to determine whether the ** c’ path** is significantly different from 0.

In the row with the mediator (under *coeff*), you can find the regression weight of the effect of mediator M on dependent variable Y. This is the *b* coefficient and you can use the p-value to determine whether the ** b path** is significantly different from 0.

Next, below *** TOTAL EFFECT MODEL **,* you can find the analysis for the effect of independent variable X on dependent variable Y *without* the mediator in the model (notice that OUTCOME VARIABLE contains the name of your dependent variable again).

Under *coeff*, you can find the regression weight for the effect of independent variable X on dependent variable Y without the mediator in the model. This is the value of coefficient *c* in your total effect model. You can use the p-value to determine whether the ** c path** is significantly different from 0.

At the end of your output you can find *** TOTAL, DIRECT AND INDIRECT EFFECTS OF X ON Y***.

You first get a repetition of the total effect of X on Y and the direct effect of X on Y (note that these values are the same as their respective paths in the output above).

Under *Indirect effect(s) of X on Y*, you can find the **indirect effect**. In the row with the name of the mediator, the value under *effect* gives you the indirect effect (so ** ab**). Whether this indirect is different from 0 is not tested with regression analysis (and therefore you do not get a p-value). Instead, bootstrapping is used to test whether the indirect effect can be considered different from 0. This bootstrapping procedure gives you a BootSE (standard error) and a lower and an upper bound of the bootstrap confidence interval (which confidence intervals PROCESS uses is noted at the bottom of the output; the default is a 95% confidence interval).

- If the confidence interval of the indirect effect includes 0, then you cannot conclude that the indirect effect is different from 0.
- If the confidence interval of the indirect effect does not contain 0 (so is entirely above 0 or entirely below 0), the indirect effect is most likely different from 0.

Fun fact! If ordinary regression analyses are used (thus no logistic regression), the total effect *c *is the sum of the indirect effect and the direct effect (thus *c =* *ab* + *c’* and *ab = c − c′*) and the indirect effect (*ab*) is the product of the *a* coefficient and the *b* coefficient (*ab = a*b*). Just calculate it by yourself.

## 6. How do I report the results from my mediation analysis?

Although there are multiple ways to do this, we believe it is always a good idea to put a figure of your mediation model in your results section. This figure can be similar to the second figure in question 3; instead of X, Y and Mediator, you could put the names of your variables. In the figure, you can add the values of the different paths (a, b, c, c’) with their significance at the corresponding arrows. In addition, you could visualize the value for the indirect effect within the figure including the bootstrap confidence interval. Sometimes it is also useful to include the total effect model in your figure.

Next, in your text, refer to the figure and discuss the total effect, the direct effect and the indirect effect in more detail (don’t forget to also indicate in normal people's language what you have found, just b-values and p-values are not enough).

For more elaborate guidance and tips for reporting the analysis, see chapter (14.2) on this topic in the book by Hayes.

One really important remark:

In the conceptual model of mediation, arrows are drawn for the different relationships indicating causal relationships. That is, with a mediation hypothesis, you expect that X has an influence on M, and M has an influence on Y (so indirectly X has an influence on Y via M).

However, the analyses done with PROCESS (and for that matter any common statistical analysis) *only tell you something about associations* (relations) between variables. They do *not* tell you anything about the causality of those associations.

So, an indirect effect is really just an indirect effect and you should report it like that in your results section. Only in your discussion, you might want to make claims about influence or causality and mediation (in practice this distinction is often not made so clearly, but it is a really important one). As you probably remember from previous courses, you can only be sure about causal relations if you have conducted a true experiment (but please note, even if you have manipulated X, you did not manipulate M, so things are complicated). For more details on this, please read the elaborate paragraphs on this topic (1.4, 2.3) in the book by Hayes.

## 7. I want to do more than simple mediation. How do I do that?

PROCESS offers a lot of extensions beyond simple mediation.

For example, you can also choose models with which you can study multiple mediations, parallel or serial, within the same model (for example model 6). In addition, there are models which include mediation and moderation at the same time (see the appendix of Hayes’ book). Please find an overview of the model templates in the book.

MEMORE also offers options to study moderation in the context of within-subject designs and options to study serial mediation.

## 8. Process doesn’t work! What did I do wrong?

The most commonly made mistakes are these two:

- You selected the wrong model number within PROCESS, so there is a mismatch between the selected model and the selected variables.
**Model 4**is for simple*mediation*, and then PROCESS expects a variable in mediator(s) M.**Model 1**is for simple*moderation*(which is conceptually something completely different from mediation!), and PROCESS expects a variable in the field for moderator variable W (and nothing in the field for mediator(s) M). - Your variable names are too long. PROCESS can only handle variables that are 8 characters or shorter in length. Shortening your variable names beforehand (but keeping them meaningful) will solve this. In some versions of PROCESS, it is also possible in the PROCESS dialogue window to click the button ‘Long variable names’ and check the box. PROCESS then shortens the variable names in the analysis to the first 8 characters. But only do this if you are 100% certain that there are no other variables in your data set that have the same first 8 characters, or else PROCESS may accidentally select one of these variables for your analysis instead.

## 9. Where can I find more information on PROCESS?

The best way to start is to read the relevant chapters of Hayes’ book on PROCESS:

Hayes, A.F. *Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach*. The Quilford Press. RUQuest-link to e-book (third version)

In addition, there is a FAQ about PROCESS by Professor Andrew Hayes. Here you can find explanations for questions about topics like

- Why the total effect does not need to be significant to study mediation.
- Why the terms partial and full mediation are outdated and should not be used.
- Why path a and path b do not need to be significant to study mediation.

## 10. How do I do mediation analysis via MEMORE (within-subject design)?

MEMORE is a macro developed by Amanda Montoya and Andrew Hayes and can be used in in SPSS. You can download it via https://www.akmontoya.com/spss-and-sas-macros. By installing a custom dialog file, you get the option MEMORE in the pull-down menu under Analyze à Regression. MEMORE is a variant on PROCESS for designs in which the continuous mediator M and the continuous dependent variable Y are measured multiple times for different conditions of X. Before you are going to use MEMORE, you might want to get a general idea of mediation for a simple design *without* repeated measures and you want to read the following paper by the developers of the MEMORE macro:

Montoya, A. K., & Hayes, A. F. (2017). Two condition within-participant statistical mediation analysis: A path-analytic framework. *Psychological Methods, 22*(1), 6 - 27. http://dx.doi.org/10.1037/met0000086

This paper is included in the folder with MEMORE material that you can download. For the use of MEMORE within SPSS, you can have a look at the pdf file called *MEMORE SPSS documentation V2.1.pdf *(also in the folder with MEMORE material).

For more information about MEMORE, please see this site

Do you use MEMORE for your report or thesis? Do not forget to refer to the paper by Montoya and Hayes.

## 11. What are the assumptions for the mediation analyses done in PROCESS?

With respect to the regression analyses done by PROCESS, the assumptions of regression hold. Unfortunately, unlike regression via the *Regression* menu in SPSS, PROCESS does not include options to make histograms of the residuals or scatterplots of the residuals and predicted values.

A mediation model consists of a combination of several regression models (connected to the different paths of the mediation model). To test the assumptions, you can run separate regression analyses for the different paths via the Regression menu in SPSS and use the options to check the residuals. In order to test all paths, you need to run the following models in Regression and test the assumptions for each:

- X as independent variable, M as dependent variable. (analysis for path
*a*) - X and M as independent variables, Y as dependent variable. (analysis for paths
*b*and*c’*, direct effect) - X as independent variable, Y as dependent variable. (analysis for path
*c*, total effect) - If you have covariates, you enter these as independent variables as well in these analyses.

The only effect that remains is the indirect effect*.* This is a combination of path *a* and *b*, which are found in two analyses (#1 and #2 above). Its assumptions cannot be tested directly. However, since bootstrapping is used in the PROCESS mediation analysis to determine whether the indirect effect is different from 0, there are no further assumptions that need to be tested (see the book by Hayes for more details about this).

## 12. How do I perform a power analysis for a mediation analysis?

Power analysis for the whole mediation effect (the indirect effect that is) is a bit difficult because the indirect effect is a combination of two regression weights.

A good power analysis for a mediation analysis using PROCESS can therefore not be done via G*Power. Instead, a tool was made in R using Monte Carlo simulation.

You can find the tool here

In the article by Schoemann, you can find more information about this tool.

In the tool, you have to indicate the expected effect sizes of the relations between your X, M and Y variables. The default input method is correlations, but you can also choose standardized coefficients. Just as with any power analysis, you have to estimate the size of the effects (preferably based on literature and previous research).

If the above is not suitable for you (for example because this topic is not discussed during your study programme or because your model is not in the tool), you might want to opt for the less than ideal, but doable option for you.

In this case, you would choose the most complex regression analysis that PROCESS uses (so X, M, Y and potential control variables that you included into your model). And then you use G*power to do a power analysis for that (logistic) regression analysis. *And then please realize that you did not do a power analysis for your whole mediation analysis, so you just approximated the power analysis you actually wanted to do*.

For further reading on power analysis for mediation, check the paragraph (14.3) on this topic in the book by Hayes.

## Still can't figure it out?

Then you can contact the Statistical Methodological Advice Center (SMAP).