Thanks to a reader who pointed out an error in Table 4 (now corrected) and suggested a minor change (done) in language in the summary. Great to get such feedback.
I am not a statistician but really enjoy trying to learn more about the quantitative methods I use. One frustration is that there is a disconnect between the technical literature and applied practice. This means “mistakes” (that I have made) in applied work still occur despite them being highlighted for a long time in the more technical literature. One issue is that bridging work between the two is quite rare. In a couple of blogs I try to highlight a couple of issues that could occur even in a simple analysis.
Let’s start with the data
Here’s some data, it originates from here. We have a binary outcome, a binary exposure, and a binary baseline co-variate. Y, X, and Z respectively.
So what’s the effect of X on Y?
With the exposure (X=1) the probability of Y is 0.58 and without the exposure it is 0.38. So that’s a difference of 0.58-0.38, a relative risk of 0.58 / 0.38 and an odds ratio of (0.58 / (1-0.58) ) / (0.38 / (1-0.38)). That is a difference of 0.2, a relative risk of 1.53, and an odds ratio of 2.25.
Let’s adjust for Z.
For Z = 0, with the exposure (X=1) the probability of Y is 0.4 and without the exposure it is 0.2. So that’s a difference of 0.4-0.2, a relative risk of 0.4 / 0.2 and an odds ratio of (0.4 / (1-0.4) / ) / (0.2 / (1-0.2)).That is 0.2, 2, and 2.66.
For Z = 1, with the exposure (X=0) the probability of Y is 0.8 and with the exposure it is 0.6. So that’s a difference of 0.8-0.6, a relative risk of 0.8 / 0.6 and an odds ratio of (0.8 / (1-0.8) / ) / (0.6 / (1-0.6)). That is 0.2, 1.33, and 2.66.
Let’s get an adjusted average effect using regression models (linear, Poisson , and logistic ) with Y as outcome, and X, and Z on the right hand side. We get average adjusted effects, difference = 0.2, relative risk = 1.53, and odds ratio = 2.66.
So is Z a confounder?
Well because we are taught to use a logistic regression for a binary outcome then we might answer yes, as the odds ratio for X changes from 2.25 to 2.66 when we adjust for Z.
But Z isn’t a confounder, it is not associated with X (i.e. probability of Z is 0.45 in both X=0 and X=1). When averaging two odds ratios (sort of what the adjusted regression is doing), their average will not usually be the unadjusted odds ratio even in the absence of confounding.
This is known as “non-collapsibility” in epidemiology. It is well documented in epidemiology and the social sciences. But it is common to see papers comparing odds ratios before and after confounder adjustment as a method of judging the extent of confounding. I’ve done this. However, even when there is confounding a change in the odds ratio will often be part confounding and part non-collapsibility. The difference and relative risk are collapsible measures. See this and this for fuller technical discussions .
So don’t use a logistic regression?
No, using a logistic regression for a binary outcome is generally a good idea. There is nothing wrong here, it is just that the marginal and conditional (on Z) odds ratios will often differ even in the absence of confounding. As I did above to get the difference and relative risk you can fit other models to binary data but these “solutions” are based on the practice of reading effects directly from your model results. I can derive the the marginal odds ratio, the risk difference and the relative risk from the adjusted logistic regression even though the model results are conditional odds.
Conditional to marginal
The table below displays the adjusted logistic regression results. Note for clarity results are rounded to two decimal places.
|X (Odds ratio)||2.67|
|Z (Odds ratio)||6|
We now use these results to derive the odds and probability of Y for each of the four combinations of X and Z. To convert from odds to probability use Odds / (1+Odds)
|X||Z||“Formula for odds”||Odds||Probability|
|1||0||Constant * X||0.67||0.4|
|0||1||Constant * Z||1.5||0.6|
|1||1||Constant * X *Z||4||0.8|
To get an average effect for X, we can standardise to the probability of Z in the whole sample as below
The sum for X=1 = 0.22+0.36 and X=0 = 0.11+0.27. That is 0.58 and 0.38. We have the same result as the unadjusted as there is no confounding. We can work out the marginal difference, relative risk and odds ratio as before.
- Be clear on the effect measures you want to estimate.
- Logistic regression is a good choice for binary outcomes. However, a change in an odds ratio between models is not always due to confounding alone.
- Stata and R have the margins command that can calculate marginal effects from a logistic regression.
- Read this great free book.
Next blog, I am going to look at effect modification (aka interaction).