Effects of Categorizing Continuous Variables in Decision-Analytic Models
Published in: Medical Decision Making, v. 29, no. 5, Sep./Oct. 2009, p. 549-556
Posted on RAND.org on December 31, 2008
PURPOSE. When using continuous predictor variables in discrete-state Markov modeling, it is necessary to create categories of risk and assume homogeneous disease risk within categories, which may bias model outcomes. This analysis assessed the tradeoffs between model bias and complexity and/or data limitations when categorizing continuous risk factors in Markov models. METHODS. The authors developed a generic Markov cohort model of disease, defining bias as the percentage change in life expectancy gain from a hypothetical intervention when using 2 to 15 risk factor categories as compared with modeling the risk factor as a continuous variable. They evaluated the magnitude and sign of bias as a function of disease incidence, disease-specific mortality, and relative difference in risk among categories. RESULTS. Bias was positive in the base case, indicating that categorization overestimated life expectancy gains. The bias approached zero as the number of risk factor categories increased and did not exceed 4% for any parameter combinations or numbers of categories considered. For any given disease-specific mortality and disease incidence, bias increased with relative risk of disease. For any given relative risk, the relationship between bias and parameters such as disease-specific mortality or disease incidence was not always monotonic. CONCLUSIONS. Under the assumption of a normally distributed risk factor and reasonable assumption regarding disease risk and moderate values for the relative risk of disease given risk factor category, categorizing continuously valued risk factors in Markov models is associated with less than 4% absolute bias when at least 2 categories are used.