--- title: "Is regression discontinuity an instrument method?" author: "Frank Popham" date: "11/12/2019" output: md_document: variant: markdown_phpextra+backtick_code_blocks --- ```{r library, echo=FALSE, warning=FALSE, message=FALSE} library(ggdag) library(tidyverse) ``` [This recent paper](https://www.sciencedirect.com/science/article/pii/S2352827319301545) suggested that regression discontinuity is an instrument based method for causal inference. I am less sure. The paper contains an example of the form, we want to know the effect of university attendance on health? Of course university attendance is not randomized, so an observational study has to deal with confounding. If university entry is based on an exam result cut-point then we can instrument on this, the paper suggests. The [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph) below illustrates ```{r, echo=FALSE} coords <- list(x = c(C = 3, Health = 5, Uni = 3, Exam = 1), y = c(C = 5, Health = 3, Uni = 3, Exam= 3)) dag <- dagify(Health ~ C + Uni, Uni~ Exam + C, exposure="Uni", outcome="Health", coords=coords) ggdag(dag) + theme_dag() #ggdag_adjustment_set(dag) #ggdag_instrumental(dag) ``` But is exam an instrument? Well people from advantaged backgrounds do better in exams so there should be an arrow from confounding (C) to exam. Also exam results may effect health other than through university attendance so we need an arrow from exam to health. Usually the presence of either of these arrows negates the instrument. Their addition in the DAG below implies we need to adjust for confounding and exam to get the causal effect of university on health. ```{r, echo=FALSE} coords2 <- list(x = c(C = 3, Health = 5, Uni = 3, Exam = 1), y = c(C = 5, Health = 2, Uni = 3, Exam= 2)) dag2 <- dagify(Health ~ C + Uni + Exam, Uni~ Exam + C, Exam ~ C, exposure="Uni", outcome="Health", coords=coords2) ggdag(dag2) + theme_dag() #ggdag_adjustment_set(dag2) ``` But this isn't quite right. Say exam result is the only way you can get into university then, as in the DAG below, confounding to university attendance runs through exam. Block exam and you can estimate the effect of university on health. [As I have argued before](https://www.frankpopham.co.uk/2019/04/18/knock-knock-whos-there-its-regression-discontinuity-but-which-door/), this design is more akin to Pearl's front door criterion than an instrument. This is the way many regression discontinuity work. They control for the underlying relationship between the forcing variable (exam here) and the health outcome, with the cut point of exam score for university providing the discontinuity from which the causal effect of university is estimated. ```{r, echo=FALSE} coords2 <- list(x = c(C = 3, Health = 5, Uni = 3, Exam = 1), y = c(C = 5, Health = 2, Uni = 3, Exam= 2)) dag3 <- dagify(Health ~ C + Uni + Exam, Uni~ Exam, Exam ~ C, exposure="Uni", outcome="Health", coords=coords2) ggdag(dag3) + theme_dag() #ggdag_adjustment_set(dag3) ``` If there are a large number of observations just either side of the cut-point, then the DAG below could be assumed, where the arrow from exam to university is removed. The assumption being that random factors (performance on the day of the exam for example) at the border of the cut-point, essentially mean that university is randomized. However, usually we don't have this number of observations so a wider range of observations is needed and the previous DAG applies. ```{r, echo=FALSE} dag4 <- dagify(Health ~ C + Uni + Exam, Exam ~ C, exposure="Uni", outcome="Health", coords=coords2) ggdag(dag4) + theme_dag() #ggdag_adjustment_set(dag4) ``` So, I suggest that regression discontinuity isn't an instrument method and is more front door based. Of course some people may not go to university even if they achieve the entry score. In this situation the intervention is not fully determined by the forcing variable. Here regression discontinuity is an instrumental variable method as in the DAG below but not in the sense as outlined in the paper. [A highly cited review in economics](https://www.princeton.edu/~davidlee/wp/RDDEconomics.pdf) also moves away from the instrument representation of regression discontinuity. ```{r, echo=FALSE} coords3 <- list(x = c(C = 3, Health = 5, Uni = 3.5, Exam = 1, Instr= 2.5), y = c(C = 5, Health = 2, Uni = 3, Exam= 2, Instr = 3)) dag5 <- dagify(Health ~ C + Uni + Exam, Uni~ Instr, Exam ~ C, Instr ~ Exam, exposure="Uni", outcome="Health", coords=coords3) ggdag(dag5) + theme_dag() ##ggdag_adjustment_set(dag5) ``` ##### Thanks to... I used [R](https://www.r-project.org/), and [R Studio](https://rstudio.com/), and [the tidyverse](https://www.tidyverse.org/), [the daggity](https://cran.r-project.org/web/packages/dagitty/index.html), [ggdag](https://cran.r-project.org/web/packages/ggdag/vignettes/intro-to-ggdag.html) functions to write this post. The replication R markdown file is here