## Abstract

Observational studies, including the case-control design frequently used in epidemiology, are subject to a number of biases and possible confounding factors. Failure to adjust with them may lead to an erroneous conclusion about the existence of a causal relationship between exposure and disease. The Cochran–Mantel–Haenszel (CMH) test is widely used to measure the strength of the association between an exposure and disease or response, after stratifying on the observed covariates. Thus, observed confounders are accounted for in the analysis. In practice, there may be causal variables that are unknown or difficult to obtain. Hence, they are not incorporated into the analysis. Sensitivity analysis enables investigators to assess the robustness of the findings. A method for assessing the sensitivity of the CMH test to an omitted confounder is presented here. The technique is illustrated by re-examining two datasets: one concerns the effect of maternal hypertension as a risk factor for low birth weight infants and the other focuses on the risk of allopurinol on having a rash. The computer code performing the sensitivity analysis is provided in appendix A.

## 1. Introduction

Observational studies are commonly used to evaluate the effect of treatments or chemical exposure, e.g. the association between risk factors and a subsequent event. Unlike randomized experiments, researchers cannot control the assignment of treatments to subjects to ensure that subjects receiving different treatments or exposure are similar with respect to major covariates. The Cochran–Mantel–Haenszel (CMH; Cochran 1954; Mantel & Haenszel 1959) test is widely used to test the strength of the association after controlling for the observed confounders. The data are stratified into multiple 2×2 tables by known confounders, where in each stratum the distributions of the confounders in both groups are similar.

Since observational studies remain subject to unobserved confounding, one should explore how sensitive the results are to a possible omitted variable. Extensive work has been done on how to perform sensitivity analysis for observational data by Cornfield *et al*. (1959), Rosenbaum & Rubin (1983, 1984), Rosenbaum & Krieger (1990), Gastwirth (1992), Rosenbaum (2002) and others. Most of the literature models the association between an unobserved confounder, denoted by *U*, and treatment assignment, and then recomputes the test statistics and *p* values for a range of plausible differences in the prevalences of the confounder in the two groups. If a moderate degree of confounding could change the inference about the treatment effect on the outcome, then one questions the soundness of the conclusions. Gastwirth *et al*. (1998) proposed a dual and simultaneous sensitivity analysis for matched pair data, which extended a strengthened form of the original Cornfield inequality (Gastwirth 1988, p. 296). Both the relationship between *U* and treatment assignment and that between *U* and response are jointly modelled. This simultaneous form allows one to incorporate subject matter knowledge about both relationships into the sensitivity analysis.

This paper extends the ‘simultaneous’ approach to examine the sensitivity of the CMH test to an unobserved confounder. The rest of this paper is organized as follows. Section 2 reviews the CMH test and describes the stratified model for the simultaneous relationships. Then we apply the sensitivity analysis to a dataset examining maternal hypertension as a risk factor for low birth weight (LBW) infants and another dataset concerned with a side effect (rash) of allopurinol in §3. We discuss the potential use and future development of sensitivity analysis in observational studies in §4.

## 2. Sensitivity analysis for the CMH test

Let *X* be the exposure, e.g. environmental hazard, genetic factor, radiation exposure; and *Y* be the outcome, e.g. disease, mortality, or in an employment discrimination case, being promoted or passing a pre-employment exam. We are interested in testing the causal relationship between *X* and *Y* using the CMH test to control for other known factors related to the response. Usually the data are organized into *J* 2×2 tables, where in each stratum the covariates have similar values. The data for the *j*th stratum (*j*=1, …, *J*) are shown below.

Let *π*_{1j} (*π*_{0j}), *j*=1, …, *J*, be the probability that subjects who are exposed (not exposed) in stratum *j* has an event. In terms of these parameters, the odds ratio between exposure and outcome in stratum *j* isFurthermore, assume that the odds ratio has a common value *θ*. Let be the observed data for the *J* 2×2 tables. When the responses for each group in all strata follow independent binomial distributions, the full data have the product binomial likelihood function(2.1)From (2.1), we have , , and .

If all the confounders are observed and controlled by stratification, then the null hypothesis of no-exposure effect is *H*_{0}: *π*_{0j}=*π*_{1j}, *j*=1, …, *J* or *H*_{0}: *θ*=1. Cochran's (1954) statistic for testing *H*_{0}: *θ*=1 versus *H*_{1}: *θ*>1 is(2.2)where *w*_{j}=*m*_{1j}*m*_{0j}/*N*_{j}, and , are the estimates of *π*_{1j} and *π*_{0j}, respectively. Using the conditional central hypergeometric likelihood rather than the product binomial (2.1), Mantel & Haenszel (1959) proposed a similar test. The two tests are asymptotically equivalent and are often referred to as the CMH test. The CMH test is known to be the optimal method of combining multiple 2×2 tables with common odds ratio (Woolson *et al*. 1986).

The large sample distribution of (2.2) converges to a standard normal distribution under the null hypothesis that *θ*=1 (Cochran 1954). More generally, the statistic(2.3)converges to a standard normal distribution as *N*_{j}→∞ (Woolson *et al*. 1986).

Sometimes, information on known risk factors is not available or other risk factors related to the response have not been discovered. Here we assume that there is an unobserved factor *U* and that the relationship between binary outcome *Y* and exposure *X* and confounder *U* is modelled by(2.4)where *γ* is the effect of *U*, which is called the strength parameter (Yu & Gastwirth 2005), and *β* is the true effect of exposure *X*. The null hypothesis of no-exposure effect is *H*_{0}: *β*=0 and the alternative hypothesis is *H*_{1}: *β*≠0.

The unobserved variable *U* may also be associated with the exposure. Assuming that *U* is binary, the prevalence of *U* by exposure level in each stratum is modelled by(2.5)Here, the *δ*_{j}, indicating the association between *X* and *U* in each stratum, are called the imbalance parameters (Yu & Gastwirth 2005). To simplify the calculation, we assume that the imbalance parameters *δ*_{j} are equal to *δ*. Then, is the prevalence of *U* in the non-exposed group in stratum *j*. When the confounder *U* is omitted in the analysis, the marginal probability of response at each exposure level *x* in stratum *j* becomesThe true null hypothesis of no-exposure effect is *H*_{0}: *β*=0, which is really the hypothesis of interest. Note that *H*_{0}: *π*_{0j}=*π*_{1j}, *j*=1, …, *J*, and *H*_{0}: *β*=0 are not equivalent when there is an omitted confounder. If *U* is not a confounder, i.e. *δ*=0 or *γ*=0, then the equalities *π*_{1j}=*π*_{0j}, *j*=1, …, *J*, still hold under the null hypothesis *H*_{0}: *β*=0, so that the CMH test *C*_{0} is indeed testing the effect of exposure. When *U* is indeed a confounder, it has been shown that, if both *γ*>0 and *δ*>0, then *π*_{1j}=*π*_{0j}, *j*=1, …, *J*, even when the true null hypothesis holds (Yu & Gastwirth 2005) and the CMH test would be biased. This implies that confounder *U* could induce spurious positive association between exposure and outcome. However, if *γ*>0 and *δ*<0 then the true association between *X* and *Y* will be underestimated if the confounder is omitted.

If the strength parameter *γ*, the imbalance parameter *δ* and the prevalences *q*_{1}, …, *q*_{J} of the confounder are known or can be reliably estimated from other sources, the estimates of *π*_{1j} and *π*_{0j} can be calculated from the likelihood (2.1) under the null hypothesis *H*_{0}: *β*=0 (see appendix A). Then the correct CMH statistic is obtained by substituting the estimates for in (2.3) as the effect of *U* is now incorporated into the response probabilities *π*_{xj}. In practice these quantities are unknown, so one assumes a possible range for (*q*_{1}, …, *q*_{J}, *γ*, *δ*); the test statistic and the *p* value are calculated using (2.3). A table showing the *p* values with respect to different parameters for confounding can be used to assess the plausibility that the current inference could be altered due to the effect of an unobserved factor.

## 3. Application

The proposed sensitivity analysis will be applied to data from two observational studies. In these applications, *C*=exp (*γ*) measures the effect of *U* on binary response *Y* and *D*=exp (*δ*) measures the increasing odds of *U* being positive when *X* increases by one unit.

### (a) Sensitivity analysis of maternal hypertension as a risk factor for LBW infants

Data on 500 singleton births in a London Hospital (Hills & De Stavola 2002) were used to examine the sensitivity of the association of maternal hypertension as a risk factor for LBW infants. The dataset provides the birth information of 500 singletons with the following eight variables: identity number for mother and baby, birth weight of baby (indicator for birth weight less than 2500 g), gestation period (indicator for gestation period less than 37 weeks), maternal age, indicator for maternal hypertension and sex of baby (1, male; and 2, female). Ten mothers with missing gestation period information were dropped from the analysis. The exposure of interest (*X*) is maternal hypertension and the event of interest is LBW infant (*Y*).

‘Advanced’ maternal age is defined as any expectant mother who will have reached her 35th birthday by delivery date. The mothers are divided into ‘young’ and ‘advanced’ age group according to this criterion. The number of infants with normal birth weight (NBW) and LBW by sex, gestation status, maternal age and maternal hypertension are shown in table 1. The CMH test is significant with *p*=0.015 and the odds ratio is 2.74 with 95% CI (1.23, 6.14). The Breslow–Day test for homogeneity shows that the odds ratio estimates are homogeneous across different strata (*p*=0.785). Is it safe to draw the conclusion that maternal hypertension is causally related to LBW?

Despite substantial reductions in US infant mortality during the past several decades, black–white disparities in infant mortality rates persist. Important determinants of racial/ethnic differences in infant mortality are LBW, defined as less than 2500 g (US Department of Health and Human Services 2000). The number of LBW infants among 1000 live births for the whites are 5.7–6.5 and for the blacks are 12.7–13.0 from 1980 to 2000 (Centers for Disease Control and Prevention 2002; http://www.cdc.gov/mmwr/preview/mmwrhtml/mm5127a1.htm). Hence, race is an important risk factor with an apparent odds ratio of approximately 2.0–2.4 for LBW (David & Collins 1997). A recent study (Geronimus *et al*. 2007) also showed that being black increases the odds ratio of hypertension by 2.11–4.04 for women between 15 and 65 years of age. Also, LBW is related to many other risk factors, e.g. the use of assisted reproductive technology, use of alcohol or drugs. Here, we focus on the race as the unobserved confounder (*U*) because it is a recognized risk factor. We assume that the mother's race is not related to the sex of their babies, but that the percentage of mothers who are black varies across the maternal age groups as well as gestation term. The possible scenarios for the percentage of black mothers by maternal age and gestation term are shown in table 2. In scenarios 1 and 2, we assume that mothers with preterm babies consist of a higher percentage of black women. In scenarios 3 and 4, we assume that there are more black mothers with normal maternal age. In scenario 5, the black mothers tend to be younger and have more preterm babies than whites.

Table 3 shows the two-sided *p* values of the adjusted CMH test with respect to different values of *C* and *D* for scenarios 1–5. Note that the strength parameter *C*=exp (*γ*) is the odds ratio of a black mother having a LBW baby compared with a white mother. The imbalance parameter *D*=exp (*γ*) is the increased odds ratio of a mother being black if she has hypertension. When *C*=1 or *D*=1, race is not a confounder and the CMH test is unchanged.

Another way to perform the sensitivity analysis is to obtain the threshold values of the sensitivity parameters *C* and *D*, which increase the *p* value of the CMH test to 0.05. Figure 1 shows the plots of sensitivity parameters *C* and *D*, which corresponds to *p*=0.05. From figure 1, we see that when *C*=1.5 and *D*=1.1, then the two-sided *p* value would reach the 0.05 level. Thus, both table 3 and figure 1 show that the *p* value of the CMH test is sensitive to moderate changes in *C* and *D*, indicating that the significant association between maternal hypertension and LBW might have arisen from a hidden bias due to confounding. Hence, other studies, including race and ethnicity information, are needed to reach the conclusion that maternal hypertension causes LBW. If the studies demonstrate that increased incidence of black mothers with hypertension was slight with odds ratio less than 1.5, then the significance of the London study would not be affected by the omission of race.

Although in Rubin's causal model, sex and race are not used as ‘treatment’ variables in causal inference because it is difficult to ascribe causality to variables that cannot be altered (Holland 1986), here race is used as a proxy for other factors reflecting socio-economic disparities. More recently, race and sex have been used in the propensity score method to form ‘similar strata’ (Zanutto *et al*. 2005) but should not be the main potential outcome of a study as they cannot be altered.

### (b) Sensitivity analysis of allopurinol as a cause of rash

Rosenbaum (2002) examined the sensitivity of a study indicating that allopurinol can cause rash (Boston Collaborative Drug Project 1972). Allopurinol is commonly used for the treatment and prevention of attacks of gout and certain types of kidney stones. It is also used to treat elevated uric acid levels in the blood and urine, which can occur in patients receiving chemotherapy for the treatment of leukaemia, lymphoma and other types of cancer. Allopurinol is usually well tolerated by most patients; however, some may experience possible side effects of skin rash, hives and itching.

Table 4 shows the joint frequencies of drug use and getting a rash case for males and females, respectively. The CMH test is highly significant with *p*<0.0001. In a randomized trial or in the absence of unobserved confounders, this indicates strong evidence that allopurinol causes rash (Rosenbaum 2002, p. 131). Because the patients could also be taking amoxicillin and ampicillin, which are known to cause rash, is it possible that the rash was caused by an unobserved factor, e.g. use of amoxicillin or other drugs, instead of allopurinol?

Although information about the other drug use and users' allergic history was not collected, one can assess the sensitivity of the CMH test with respect to an unobserved confounder. Let *U* be the binary indicator for amoxicillin use, e.g. use amoxicillin (1) and no amoxicillin (0). Again, the prevalences of using amoxicillin among the non-allopurinol users are denoted by *q*_{j}, *j*=1, …, *J*. The parameter *δ* indicates the imbalance of *U* between allopurinol users and non-users. The parameter *γ* indicates the increasing probability of having rash among users of amoxicillin. We assume two possible scenarios for the prevalences of amoxicillin use among the non-allopurinol users, i.e. *q*_{j}=10% or 20%, where *j*=1, 2.

The increasing odds ratio of having rash for amoxicillin users compared with non-users is not known, but it is likely to be below 4. Based on table 5, we can see that the CMH test of allopurinol causing rash is not sensitive to the omitted variable (use of amoxicillin). Even when the strength parameter *C*=4 and the imbalance parameter *D*=4, the CMH test remains highly significant with *p*=0.002. This example also illustrates the additional information provided by the simultaneous sensitivity analysis compared with the earlier approaches by Cornfield *et al*. (1959) and Rosenbaum (2002), which focus primarily on the imbalance parameter and assume a confounder with infinitely large strength. Rosenbaum (2002, p. 131) used the parameter *Γ*, which is similar to *D* in our analysis, to measure the imbalance of treatment assignment with respect to the unobserved factor. The upper bounds on the *p* values based on his analysis are 0.036 and 0.30, for *Γ*=2 and 3, respectively, which indicates that the causal relationship between allopurinol use and rash could have arisen from an unobserved confounder that triples the risk of rash. While such a confounding is unlikely to exist, the conclusion based on the simultaneous analysis is stronger. Even when *C*=∞, the test barely reached non-significance for *D*=3 and *q*_{j}=10%. This sensitivity analysis is more stringent, indicating the conclusion that allopurinol causes rash is very robust to an unobserved confounder. This is consistent with the current medical practice that lists rash as the one common side effect of allopurinol use (http://www.nlm.nih.gov/medlineplus/druginfo/medmaster/a682673.html).

## 4. Discussion

Because the CMH test is widely used to analyse the data obtained in epidemiological studies, legal cases and social sciences, it is important to conduct a sensitivity analysis to answer whether that inference could *plausibly* be explained by an unobserved confounder before drawing a firm conclusion.

The proposed simultaneous sensitivity analysis approach is similar to the primal sensitivity analysis method by Rosenbaum (2002). The major difference is the introduction of the strength parameter. The primal sensitivity analyses closely parallel the theory of randomized experiments and have been developed for many statistical tests and estimators, e.g. the McNemar and Wilcoxon signed-rank tests and the Hodges–Lehmann estimator (Rosenbaum 2002). The simultaneous approach allows one to use more subject matter knowledge. However, it also requires more assumptions about both relationships. The simultaneous approach is especially useful when an observational study is challenged owing to failure to control for a particular unobserved variable, when there are plausible ranges for the values of *γ* and *δ* (Gastwirth *et al*. 1998).

At the design stage, it is also useful to examine the size and power of the corrected CMH test in the presence of omitted confounders in a range of realistic values of the strength and imbalance parameters. The calculation of size and power for the Cochran–Armitage trend test has been derived by Yu & Gastwirth (2005). A similar technique can be used for the CMH test with omitted variable.

In future studies, we plan to extend this methodology of sensitivity analysis to data where the unobserved confounders are continuous and ordinal. In real-world observational studies, there may be more than one omitted confounder; background information on the joint distribution of the multiple confounders can then be used to replace the univariate distribution (2.5) in a sensitivity analysis.

## Acknowledgments

B.Y. was supported in part by the Intramural Research Programme of the National Institute on Aging and J.L.G. was supported in part by grant SES-0317956 from the National Science Foundation.

## Footnotes

One contribution of 13 to a Theme Issue ‘Mathematical and statistical methods for diagnoses and therapies’.

- © 2008 The Royal Society