When size matters: advantages of weighted effect coding in observational studies
If your regression model contains a categorical predictor, then the significance of the categories is commonly tested against a preselected reference category. If all categories have (roughly) the same number of observations, you can also test all categories against the grand mean using effect (ANOVA) coding
. In observational studies, however, the number of observations typically varies per category. In IJPH's 'Hints & Kinks' section we show how in such cases all categories may be tested against the sample mean [link to open access paper: http://rdcu.be/l6c0].
In the follow-up Hint & Kinks [link to open access paper: http://rdcu.be/l6fo] we expanded the procedure to regression models testing interactions. The weighted effect coded interactions display the extra effect on top of the main effect that is found in a model without interaction effect. This offers a new promising route to estimate interaction effects in observational data, where different category sizes often prevail.
Below you will find downloads for R, STATA (both user-friendly macro's) and SPSS (less user-friendly, but best suited for illustrative purposes). We also included a coding scheme for weighted effect interactions in Excel.
Correspondence: Manfred te Grotenhuis
UPDATE 20-10.17: The WEC package described in the R-journal
UPDATE 18.12.16: Two short videos about the two IJPH papers
UPDATE 16.12.16: The WEC package has been updated and includes the interaction between weighted effect coded variable and scale variable with sum of squares as weights.
UPDATE 10.11.16: Watch a video of the presentation of Manfred te Grotenhuis on weighted effect coding, held at the Swedish Institute for Social Research(SOFI), Stockholm University, November 4, 2016.
________________________________________
- Script for dummy, effect, and weighted effect coded variables
- Script for dummy, effect, and weighted effect coded INTERACTION variables
- BMI data for R
________________________________________
The Stata package for weighted effect coded interactions (and some other coding schemes) is called ‘igenerate’ and available from the ssc archive. It can be installed from within stata by typing: ssc install igenerate.
The package can also be downloaded here
A detailed documentation of the Stata package, including all examples used in the article can be downloaded here
The data used in the article can be found here (Stata data set, version 9 and higher)
________________________________________
In all syntax we assume an existing folder on your c- drive called 'temp'.
- Syntax for dummy, effect, and weighted effect coded variables
- Syntax for dummy, effect, and weighted effect coded INTERACTION variables
- dummy effect coded interactions (sps, 1 kB)
- effect coded interactions (sps, 2,4 kB)
- weighted_effect_coded_interactions (sps, 6,7 kB)
- More Weighted effect coded dummy variables in regression models.
- WEC in other regression models: log_transformed_logistic_weighted_effect_coding corrected (sps, 4,8 kB)
-
NEW: WEC variables interacting with continuous variables (also added to the R package 'wec'):
weighted_effect_coded_interactions_times_interval (sps, 6,2 kB)
- Data
________________________________________
Explaining the weighted coded interactions in Excel
- Explanation of the design matrix for weighted effect coded interactions: Excel file (xlsx, 13 kB)
- Coding scheme for weighted effect coding in interaction models: computational rules, : a 3 by 3 example in Excel (xlsx, 15 kB)