The odd odds ratio

That’s odd

I am not a statistician but really enjoy trying to learn more about the quantitative methods I use. One frustration is that there is a disconnect between the technical literature and applied practice. This means “mistakes” (that I have made) in applied work still occur despite them being highlighted for a long time in the more technical literature. One issue is that bridging work between the two is quite rare. In a couple of blogs I try to highlight a couple of issues that could occur even in a simple analysis.

Let’s start with the data

Here’s some data, it originates from here. We have a binary outcome, a binary exposure, and a binary baseline co-variate. Y, X, and Z respectively.

So what’s the effect of X on Y?

With the exposure (X=1) the probability of Y is 0.58 and without the exposure it is 0.38. So that’s a difference of 0.58-0.38, a relative risk of 0.58 / 0.38 and an odds ratio of (0.58 / (1-0.58) ) / (0.38 / (1-0.38)). That is a difference of 0.2, a relative risk of 1.53, and an odds ratio of 2.25.

Let’s adjust for Z.

For Z = 0, with the exposure (X=1) the probability of Y is 0.4 and without the exposure it is 0.2. So that’s a difference of 0.4-0.2, a relative risk of 0.4 / 0.2 and an odds ratio of (0.4 / (1-0.4) / ) / (0.2 / (1-0.2)).That is 0.2, 2, and 2.66.

For Z = 1, with the exposure (X=0) the probability of Y is 0.8 and with the exposure it is 0.6. So that’s a difference of 0.8-0.6, a relative risk of 0.8 / 0.6 and an odds ratio of (0.8 / (1-0.8) / ) / (0.6 / (1-0.6)). That is 0.2, 1.33, and 2.66.

Let’s get an adjusted average effect using regression models (linear, Poisson , and logistic ) with Y as outcome, and X, and Z on the right hand side. We get average adjusted effects, difference = 0.2, relative risk = 1.53, and odds ratio = 2.66.

So is Z a confounder?

Well because we are taught to use a logistic regression for a binary outcome then we might answer yes,  as the odds ratio for X changes from 2.25 to 2.66 when we adjust for Z.

But Z isn’t a confounder, it is not associated with X (i.e. probability of Z is 0.45 in both X=0 and X=1). When averaging two odds ratios (sort of what the adjusted regression is doing), their average will not usually be the unadjusted odds ratio even in the absence of confounding.

This is known as “non-collapsibility” in epidemiology. It is well documented in epidemiology and the social sciences. But it is common to see papers comparing odds ratios before and after confounder adjustment as a method of judging the extent of confounding. I’ve done this. However, even when there is confounding a change in the odds ratio will often be part confounding and part non-collapsibility. The difference and relative risk are collapsible measures. See this and this for fuller technical discussions .

So don’t use a logistic regression?

No, using a logistic regression for a binary outcome is generally a good idea. There is nothing wrong here, it is just that the marginal and conditional (on Z) odds ratios will often differ even in the absence of confounding. As I did above to get the difference and relative risk you can fit other models to binary data but these “solutions” are based on the practice of reading effects directly from your model results. I can derive the  the marginal odds ratio, the risk difference and the relative risk from the adjusted logistic regression even though the model results are  conditional odds.

Conditional to marginal

The table below displays the adjusted conditional logistic regression results. Note for clarity results are rounded to two decimal places.

X (Odds ratio)2.67
Z (Odds ratio)6
Constant (Odds)0.25

We now use these results to derive the odds and probability of Y for each of the four combinations of X and Z. To convert from odds to probability use Odds / (1+Odds)

XZ“Formula for odds”OddsProbability
10Constant * X0.670.4
01Constant * Z1.50.6
11Constant * X *Z40.8

To get an average effect for X, we can standardise to the probability of Z in the whole sample as below


The sum for X=1 = 0.22+0.36 and X=0 = 0.11+0.27. That is 0.58 and 0.38. We have the same result as the unadjusted as there is no confounding. We can work out the marginal difference, relative risk and odds ratio as before.

In summary
  • Be clear on the effect measures you want to estimate.
  • Logistic regression is a good choice for binary outcomes. However, a change in an odds ratio between models is not always due to confounding alone.
  • Stata and R have the margins command that can calculate marginal effects from a conditional logistic regression.
  • Read this great free book.

Next blog, I am going to look at effect modification (aka interaction).






Same model, different R squared.

Important risk factor?

Quite often I see papers that report how much of the variance in an outcome has been explained by the risk factor(s) of interest. The higher percentage explained (higher R squared) the better seems to be the thought. The authors think that important variables have been identified.

Perhaps not?

But consider this famous example.  Everyone in a rich country smokes 20 cigarettes a day. You study the reasons for lung cancer in this population. Smoking wouldn’t explain any of the variance in lung cancer, it wouldn’t be identified as a cause of lung cancer.  But it is the cause of why this country has a much higher rate of lung cancer than a rich country where nobody smokes. This is summarised as the causes of cases not necessarily being the same as causes of incidence (the rate). In population health we mostly want to change the causes of incidence.  Of course even if you’re dealing in prediction rather than cause it is still the case that predictors of cases are not necessarily the predictors of incidence.

Something magic?

So while smoking in a particular cohort of individuals might explain only 10% of the variation in lung cancer, smoking explains (around 90%) differences in rates between areas. Something I have seen mentioned less often is that the same analysis on the same data  can give a different value of the same R squared.

Sounds like magic! Hey presto, this Stata code illustrates in detail. Using data on mortality and smoking and age, and a Poisson model of individual data (with dead or not as the outcome), I get an R squared of 9%.  But I can rearrange the data, run the same model, get the same results (effect size and Cis) but get a completely different R squared (93%). The difference is I changed the number of observations from 181,467 individuals to 10 groups. In the latter I controlled for the size of groups using an offset. So at the group level the explained variance is pretty high. Given they are essentially the same analysis then actually their predictive ability is the same.  Of course R squared in Poisson and logistic models is a pseudo R squared calculated differently to the vanilla R squared. So don’t take this as a technically accurate description but I think the spirit of what I say is right.

*note the dataset is actually in person-years but I have pretended it is persons followed up for a year just to save the complication of writing about person-years.

Complexity and cause



Given the difficulty of solving intricate problems such as the obesity epidemic, researchers are turning to the concept of complexity. This sees problems as a system with multiple causal paths and feedback loops rather than a more simple matter of a cause and an effect.

This concept hit the medical big time recently when a viewpoint espousing a complexity approach for public health was published in the Lancet. The Lancet is widely read and has an impact factor of a zillion.  It raises important questions about an emphasis on Randomised Control Trials (RCTs). They have good internal validity (i.e. they are ace for establishing causality). But they may lack external validity (i.e. the intervention and any effect may not generalise for various reasons). This emphasis on RCTs may also lead to an emphasis on individual level interventions. It may also ignore multiple other pathways in the system and their interaction.

Where I differ from the authors is that I think that non-complexity causal methods recognise and can often address these issues. A good overview is given in this paper . It asks important questions of those advocating complexity while also recognising its potential importance.


Let me pick up a point, the critique of RCTs. The authors advocate for natural experiments. That’s great, as they are great for studying cause and effect. Why are they great? Well it is because they come from the same family of methods as the RCT, counterfactual methods. They all share the same statistical justification. That is that they solve (with assumptions) the problem that we can’t rerun time to have the same populations exposed to different interventions. Another important commonality is that there is an intervention.

So the problem, it seems to me, isn’t the RCT being “linear” (otherwise you would have the same issue with natural experiments). It is the fact that they are perceived difficult to do for upstream policy interventions. Moreover, those working in the counterfactual framework consider issues that concern complexity fans. These include, mechanisms (i.e. mediation – which is complex to do even in RCTs) on causal pathways, interaction (the joint effect of different interventions), moderation , time-varying interventions, time-varying confounding and time-varying outcomes (feedback loops), transportability (generalisation, contextualisation), looking at numerous outcomes, spillover effects, and the drawing of graphs to unpack these issues. 

Complexity and cause

I have a lot to learn about complexity (and causal methods) as my ill-informed descriptions above testify to. Yet I do think it would be productive for those advocating complexity to engage with causal methods (and vice versa) as they are concerned with addressing similar issues.


Where is my outcome regression balancing confounders?

Update 09/11/2017

Someone mentioned the paper  was a bit equation heavy to read. So I’ll give a simple example. You don’t have to download the data and code to get the idea but you can if you want to. The data comes from this paper. Stata code is here.  There are three binary variables: the outcome, the exposure and the confounder.  The mean of the confounder is 38.5% in the whole population, but it differs across the two levels of the exposure (58.3% for exposure = 0 and 32.5% for exposure = 1) so we need to balance the confounder when looking at the relationship between the exposure and the outcome. One method would be to use inverse probability weighting. This balances the confounder at the population mean in both levels of the exposure i.e. 38.5%. It is easy and standard to compare balance before and after adjustment when using inverse probability weighting, however you usually don’t see this in papers using an outcome regression. An outcome regression is just your standard regression model where the regression includes the outcome as the “dependent” variable and the exposure and confounder as “independent” variables. We illustrate a method to check where the outcome regression is balancing the confounder over the two levels of the exposure. It is at 51.9%. This is not the population mean of the confounder.  The “effect” of the exposure on the outcome is slightly different in the outcome regression compared to the inverse probability weighting as they are different populations (one with a mean of the confounder balanced at 38.5% and the other at 51.9%). In the code I show that it you run the outcome regression with an interaction between the exposure and the confounder (so it is saturated) and then calculate, using standardisation, an average effect of the exposure, if we standardise to a population where the confounder is balanced at 51.9% we get the effect of the exposure on the outcome we obtained from the outcome regression without the interaction. Put another way, the outcome regression isn’t balancing the interaction.

(Yes I am aware that in the code I use a linear regression for a binary outcome, but it doesn’t matter here, it was just a handy dataset!)

Original post

We’ve a new working paper (pre-print) and I would welcome comments either here on the blog or on the OSF site where the pre-print is hosted. There is also R and Stata code.  We’ve not shown anything new statistically but think we have a method of checking confounder balance in  outcome regressions.

The abstract is below.

“An outcome regression controlling for observed confounders remains a popular way to assess the causal effect of an exposure in epidemiology, despite more modern causal techniques for adjusting for observed confounders, such as inverse probability weighting. A feature of inverse probability weighting is that checking balance of confounders in the control and exposure groups after confounder adjustment is simple. However, researchers using outcome regressions commonly do not check confounder balance after controlling for confounders. Although outcome regressions will balance any confounder specified in the model, the confounder value the model balances at is not transparent. We show that a matrix representation of an outcome regression reveals that an outcome regression includes a weight similar to an inverse probability weight. We also show that outcome regressions may not be balancing at the sample mean of the confounders particularly if interactions are not included with the exposure, which is typically the case in outcome regressions. Finally, we show that the coefficient of the exposure in an outcome regression is simply the difference between two weighted counterfactuals. Thus, there is an important connection between traditional outcome regression and modern causal techniques.”


Measures of variance of age of death aren’t age discriminatory.

Age discrimination by a measure of variance?

A recent commentary in AJPH argued that measures like Years of Life Lost (YLL) are age discriminatory. First, because such measures give more weight to deaths at younger ages. Second, this weighting is justified because younger lives have more value to society. I agree with the second criticism, we should value all human life. I strongly disagree with the first.

Why do I disagree?

YLL is similar to measures of variance in demography, which I have highlighted as measures of inequality. Such measures summarise the distribution of deaths by age. One key finding is that there is a close relationship between the mean age of death and the variance. So if you object to measures of variance that give more weight to young deaths  then you object to measures of the mean, like life expectancy, that also reflect years of life lost.

Moreover, even more simple measures like the death rate when used to make comparisons between groups could be regarded as age discriminatory. This is because often the difference is not only due to  higher rates but also due to a different distribution of death. So, essentially the logic of the argument of the commentary leads to all measures of inequality being age discriminatory.

These measures highlight inequality.

Measures of variance in the age of death have highlighted countries like the USA that do poorly in both life expectancy and inequality. This is because of high levels of working age deaths, especially amongst men. This is probably related to high levels of unemployment and deprivation. In fact the USA is an outlier in that it manages to have a slightly higher life expectancy than its level of inequality would suggest. It performs well for older ages but extremely poorly for its young. The argument that measures that highlight early life mortality are somehow age discriminatory has been made elsewhere as well. However, I think the logic is flawed.

Funding and disclaimer

The MRC/CSO Social and Public Health Sciences Unit is funded by the Medical Research Council and the Scottish Government Chief Scientist Office. The views expressed are not those of the Medical Research Council or the Scottish Government.

Relative deprivation: a key theory for health inequality research?

The Black report and relative deprivation

The Black report is the famous 1980 UK report on health inequalities. Its favoured theory for why health inequalities persisted and widened post WWII was structural theory (aka materialist). In some ways this is just Peter Townsend’s relative deprivation theory applied to health inequalities.  Townsend was one of the report’s authors and a leading thinker on why deprivation and poverty persisted in rich countries.  You can read his classic 1979  book “Poverty in the UK” for free.  While the empirical work is dated, the theoretical is still of contemporary importance. I would recommend chapters 27, 1 and 2 for understanding relative deprivation theory. Of course Townsend wrote lots more and this collection is well worth a read. Townsend’s theory has been particularly influential in measuring poverty and measuring area deprivation. Yet it also covers the whole social gradient, individual and household deprivation.

Defining relative deprivation

Townsend defined relative deprivation as “… the absence or inadequacy of those diets, amenities, standards, services and activities which are common or customary in society. People are deprived of the conditions of life which ordinarily define membership of society. If they lack or are denied resources to obtain access to these conditions of life and so fulfil membership of society, they are in poverty.”

Continue reading “Relative deprivation: a key theory for health inequality research?”

Death expectancy? For studying health inequality?

What’s the opposite of life expectancy? Well I think it might be death expectancy, the area above, rather than below, the survival curve. Say we have a lifetable where the last age is 110+. There are essentially 111 years of life in the lifetable. So death expectancy at birth is 111 minus life expectancy. I am sure you could derive this directly from lifetable elements and demographers have this sorted already. Let me know. Also note I took inspiration from this paper that looked at 100 minus life expectancy.

Continue reading “Death expectancy? For studying health inequality?”

The link between population health and health inequality

Is life expectancy increasing? Is health inequality decreasing? These are fundamental questions for population health.  It is thus natural to ask if life expectancy and inequality are related. The figure below shows they are. It compares life expectancy – average age at death – to the distribution of the age at death around this average (inequality). If everyone died at the same age there would be no inequality. So average years of life lost per death measures inequality. Higher life expectancy is associated with lower inequality.  Each dot is calculated from 5 years worth of data for a country, with 41 countries observed since 1950 (i.e. one dot is for Scotland in 1950-54).

Continue reading “The link between population health and health inequality”