When size matters: advantages of weighted effect coding in observational studies

If your regression model contains a categorical predictor, then the significance of the categories is commonly tested against a preselected reference category. If all categories have (roughly) the same number of observations, you can also test all categories against the grand mean using effect (ANOVA) codingweight. In observational studies, however, the number of observations typically varies per category. In IJPH's 'Hints & Kinks' section we show how in such cases all categories may be tested against the sample mean [link to open access paper: http://rdcu.be/l6c0].

In the follow-up Hint & Kinks [link to open access paper: http://rdcu.be/l6fo] we expanded the procedure to regression models testing interactions. The weighted effect coded interactions display the extra effect on top of the main effect that is found in a model without interaction effect. This offers a new promising route to estimate interaction effects in observational data, where different category sizes often prevail.

Below you will find downloads for R, STATA (both user-friendly macro's) and SPSS (less user-friendly, but best suited for illustrative purposes).  We also included a coding scheme for weighted effect interactions in Excel.

Correspondence: Manfred te Grotenhuis

UPDATE 20-10.17: The WEC package described in the R-journal

UPDATE 18.12.16: Two short videos about the two IJPH papers

UPDATE 16.12.16: The WEC package has been updated and includes the interaction between weighted effect coded variable and scale variable with sum of squares as weights.

UPDATE 10.11.16: Watch a video of the presentation of Manfred te Grotenhuis on weighted effect coding, held at the Swedish Institute for Social Research(SOFI), Stockholm University, November 4, 2016.


r icon

The R package for weighted effect coding is called ‘wec’ and available on CRAN (link: https://CRAN.R-project.org/package=wec). It can be installed from within R by typing: install.packages("wec") Please make sure you run the latest R version (3.3.2 of higher).
To reproduce the examples from the articles, follow the examples in the package documentation, or use the following scripts and data:
- Script with all functions

- Script for dummy, effect, and weighted effect coded variables

- Script for dummy, effect, and weighted effect coded INTERACTION variables

- BMI data for R


stata icon

The Stata package for weighted effect coded interactions (and some other coding schemes) is called ‘igenerate’ and available from the ssc archive. It can be installed from within stata by typing: ssc install igenerate.

The package can also be downloaded here

A detailed documentation of the Stata package, including all examples used in the article can be downloaded here

The data used in the article can be found here (Stata data set, version 9 and higher)


spss icon

In all syntax we assume an existing folder on your c- drive called 'temp'.

- Syntax for dummy, effect, and weighted effect coded variables

- Syntax for dummy, effect, and weighted effect coded INTERACTION variables

-  More Weighted effect coded dummy variables in regression models.

- Data


Explaining the weighted coded interactions in Excel