Performance of Comorbidity, Risk Adjustment, and Functional Status Measures in Expenditure Prediction for Patients With Diabetes

  1. Matthew L. Maciejewski, PHD12,
  2. Chuan-Fen Liu, PHD34 and
  3. Stephan D. Fihn, MD345
  1. 1Health Services Research and Development, Durham VA Medical Center, Department of Veterans Affairs, Durham, North Carolina
  2. 2Division of Pharmaceutical Outcomes and Policy, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
  3. 3Health Services Research and Development, VA Puget Sound Health Care System, Department of Veterans Affairs, Puget Sound, Washington
  4. 4Department of Health Services, University of Washington, Seattle, Washington
  5. 5Department of Medicine, University of Washington, Seattle, Washington
  1. Corresponding author: Matthew L. Maciejewski, matthew.maciejewski{at}


OBJECTIVE—To compare the ability of generic comorbidity and risk adjustment measures, a diabetes-specific measure, and a self-reported functional status measure to explain variation in health care expenditures for individuals with diabetes.

RESEARCH DESIGN AND METHODS—This study included a retrospective cohort of 3,092 diabetic veterans participating in a multisite trial. Two comorbidity measures, four risk adjusters, a functional status measure, a diabetes complication count, and baseline expenditures were constructed from administrative and survey data. Outpatient, inpatient, and total expenditure models were estimated using ordinary least squares regression. Adjusted R2 statistics and predictive ratios were compared across measures to assess overall explanatory power and explanatory power of low- and high-cost subgroups.

RESULTS—Administrative data–based risk adjusters performed better than the comorbidity, functional status, and diabetes-specific measures in all expenditure models. The diagnostic cost groups (DCGs) measure had the greatest predictive power overall and for the low- and high-cost subgroups, while the diabetes-specific measure had the lowest predictive power. A model with DCGs and the diabetes-specific measure modestly improved predictive power.

CONCLUSIONS—Existing generic measures can be useful for diabetes-specific research and policy applications, but more predictive diabetes-specific measures are needed.

Comorbidity and risk adjustment measures are routinely used for outcomes assessment (1,2), outlier identification (3,4), performance evaluation and profiling (58), program evaluation, and payment setting (9). Comorbidity adjustment gained prominence when the diagnosis-based Charlson Comorbidity Index (CCI) was developed in 1987 to predict inpatient mortality (10). Subsequent comorbidity and self-reported functional status measures have improved upon the CCI for prediction of clinical and economic outcomes in general samples taken from larger patient populations that are not disease specific and from disease-specific samples (1,1113).

Risk adjustment measures incorporating demographic, comorbidity, disease severity, or functional status have been used to improve expenditure prediction for payment setting and to reduce health plan and provider incentives to enroll the healthiest patients (14). It is not straightforward to choose among several existing, validated generic comorbidity and risk adjustment measures in expenditure analyses. When choosing a measure, researchers must consider the purpose of the measure in the planned analysis, the outcome and population on which the measure's original validation was conducted, the measure's predictive power, and availability of data to construct the measure.

If the expenditure analysis is conducted on a general sample, then prior comparisons of generic measures can inform measure choice (13,15). If the expenditure analysis is conducted on a disease-specific sample, existing disease-specific measures must also be considered. Disease-specific measures may capture variation in disease severity more effectively, include data sources not captured in comorbidity or risk adjustment measures (e.g., lab data), and avoid ceiling or floor effects of generic measures (16).

The purpose of this study is to compare the predictive power of seven generic measures, one diabetes-specific measure, and baseline health care expenditures among diabetic individuals (15). The seven generic measures include two comorbidity measures (the CCI and the Seattle Index of Comorbidity [SIC]), four risk adjusters (adjusted clinical groups [ACGs], diagnostic cost groups [DCGs], the Chronic Illness and Disability Payment System [CDPS], and RxRisk), and a self-reported functional status measure (combined physical component summary [PCS] and mental component summary [MCS] scores from the short-form [SF]-36). We chose a diabetic cohort because diabetes is a highly prevalent, costly chronic condition and there are several diabetes-specific measures (1719). For this analysis, we chose a diabetes complications count because other measures require data not captured in this study.

We also examined the accuracy of the expenditure predictions for low- and high-cost subgroups to illustrate potential tradeoffs between overall and subgroup predictive power. This is the first report to examine the potential tradeoffs in generic and disease-specific measures for risk adjustment; the criteria for evaluating measures and the relative ranking of the measures’ predictive power can inform expenditure analyses in other disease-specific samples.


The sample was drawn from patients in primary care clinics from seven Veterans Health Administration medical centers that participated in the Ambulatory Care Quality Improvement Project (ACQUIP) trial between 1997 and 2000 (19). Veterans were eligible to participate in the trial if they had to visit their assigned primary care provider in one of seven participating sites in the year before the beginning of the study. A total of 21,260 (62%) of 34,103 eligible veterans from all seven participating sites responded to a health-screening questionnaire and were sent a baseline SF-36 (20) (Fig. 1). Of 21,260 veterans, 4,790 had been told by a physician that they had diabetes (of either type) and provided complete SF-36 data. Diabetes-specific surveys were available from 2,287 of 4,790 veterans. We imputed SF-36 scores on another 805 veterans with diabetes who had incomplete or missing baseline SF-36 data, based on the average values from two SF-36 surveys completed by the patient before the baseline survey or a carry-forward of the SF-36 values from the closest survey before baseline. Imputed values did not significantly alter the mean PCS and MCS scores from the SF-36 in our sample. The final sample included 3,092 veterans with complete data.

Construction of health care expenditures

The dependent variables were inpatient, outpatient, and total expenditures constructed for the year following the baseline year to enable prospective risk adjustment. We calculated an index date for each respondent indicating the end of the baseline year and the beginning of the follow-up year. Patient-level expenditure data were calculated from Veterans Affairs (VA) administrative data in the outpatient care files and inpatient treatment files.

The method for deriving expenditures is described elsewhere (15). Briefly, outpatient expenditures were the product of visits and unit costs for outpatient clinics, including outpatient lab tests and procedures. Inpatient expenditures for nonmedical and nonsurgical hospitalizations (e.g., psychiatry) were the product of length of stay and a per diem rate. Inpatient expenditures for medical or surgical hospitalizations were calculated using an expenditure function based on age, sex, discharge disposition, bedsection(s), length of stay, and medicare DRG weights. Outpatient VA pharmacy and non-VA expenditures were excluded. The dependent variables were annualized by mortality weights for those who died in the prediction year, based on the fraction of prediction year in which the individual was alive.

Comorbidity, risk adjustment, functional status, and diabetes complication measures

Seven generic measures, one diabetes-specific measure, and baseline expenditures were used to predict inpatient, outpatient, and total expenditures. The two generic comorbidity measures were the CCI and the SIC. The CCI was originally developed as a means of classifying the number and seriousness of comorbid conditions to predict 1-year mortality based on diagnoses from medical charts (10). We used the Deyo-modified Charlson using inpatient diagnoses from VA administrative data (21). The SIC is based on self-reported chronic condition indicators, age, and smoking status from the ACQUIP initial screening questionnaire. The SIC was developed to predict 2-year mortality and hospital admission (22) and predicted expenditures poorly in the overall ACQUIP sample (15).

The four risk adjustment measures were DCGs, ACGs, the CDPS, and RxRisk scores constructed from VA administrative data. The DCGs, ACGs, and CDPS are based on both inpatient and outpatient diagnoses, while the RxRisk is based on medication refill data. DCGs were originally developed to predict Medicare expenditures, are the most widely implemented risk adjuster in payment settings (9), and have predicted total VA expenditures better than other risk adjusters (15). ACGs (14) are another widely used risk adjustment measure for expenditure analyses and have been applied in VA expenditures analyses (6,15). The CDPS measure was developed to predict Medicaid expenditures and was included because the source code is publicly available and VA and Medicaid patients have similar chronicity (23). The RxRisk was developed to predict primary care visits, hospitalizations, and expenditures in VA and managed-care populations (3,6,13,15); this study used the RxRisk tailored to the VA population (RxRisk-V) (24). Standardized risk scores for ACG, DCG, the CDPS, and the RxRisk risk measures were calculated by dividing each patient's predicted score by the average total expenditure for a reference population (veterans in the ACQUIP study).

The functional status measures (PCS and MCS scores [20]) were included in this analysis because of their use in previous risk adjustment studies. The PCS and MCS are standardized to a general population with a mean of 50 ± 10; higher scores indicate better health. Baseline total expenditures were also modeled because they are highly correlated with expenditures in the next year and were used to control for unobserved health status differences in expenditure analyses until risk adjustment measures became widely available. The diabetes-specific measure was a count of diabetes complications, a diagnosis-based measure of retinopathy, nephropathy, neuropathy, cerebrovascular, cardiovascular, peripheral vascular disease, and metabolic complications originally developed to explain mortality and the number of hospitalizations in a 4-year period (18).


We regressed expenditures in prediction year t against each risk adjuster and age-sex categories at year t-1 using weighted ordinary least squares. The 14 age-sex categories constructed and used in each model included women aged 18–44, 45–64, and >65 years in order to ensure sufficient cell sizes and men aged 18–44, 45–49, 50–54, 55–59, 60–64, 65–69, 70–74, 75–79, 80–84, and >85 years.

Adjusted R2 values were used to compare the predictive accuracy in the cohort. Predictive accuracy for patients in the lowest and highest expenditure quintiles was assessed by predictive ratios, calculated by dividing the sum of within-quintile predicted expenditures by the sum of within-quintile actual expenditures. Risk adjusters with less overprediction in the highest quintiles and less underprediction in the lowest quintiles (indicated by predictive ratios closer to 1.0) are preferred.


Nearly all veterans with diabetes (98%) were male; the average age was 64 years (Table 1). Baseline total expenditures averaged $5,737, including outpatient expenditures of $2,917 and inpatient expenditures of $2,820. Mean total expenditures in the prediction year were considerably higher ($7,410), due to a 50% increase in inpatient expenditures ($4,205). The mean PCS and MCS values were 31.2 and 44.8, respectively, which are below population norms of 50. The SIC and CCI measures had mean values of 8.0 (range 2–20) and 0.5 (0–14), respectively. The ACG, DCG, the CDPS and the RxRisk standardized scores were all >1.0, indicating that these veterans with diabetes were higher risk than the reference population (ACQUIP participants). The mean number of diabetes complications was 0.7 (0–5).

All measures, except age/sex alone, had the greatest predictive power for outpatient expenditures and the least power for inpatient expenditures (Table 2). In the total expenditure model, the diabetes complications count had the lowest adjusted R2 (0.10%), followed by age/sex (0.14%), PCS, and MCS from the SF-36 (0.64%) and the SIC (0.65%). Prior-year expenditures and the CDPS were slightly better (2.8 and 3.02%), followed by the RxRisk-V (3.23%), the CCI (3.26%), and the ACG (3.56%). The DCG measure explained the most variation in total expenditures (5.6%), and a model that included DCG and diabetes complication measures explained modestly more expenditure variation (adjusted R2 = 5.8%).

Similar to the results of the total expenditure analysis, the diabetes complication (0.10%), age/sex (0.18%), and PCS-MCS (0.23%) models explained the least variation in inpatient expenditures. The SIC (0.5%), prior-year expenditures (0.6%), the CDPS (0.9%), and the RxRisk-V (1.3%) models performed slightly better. Unlike the total expenditure analyses, the ACG model performed less well than the CCI model (1.3 vs. 1.7%). The best-performing models were the DCG model (2.4%) and the combined DCG and diabetes complication count model (2.6%). None of these models performed well in absolute terms. The relative rankings of measures in outpatient expenditure prediction were generally similar to inpatient and total expenditures, with one exception. Prior-year expenditures explained a significantly higher percentage (35%) of outpatient expenditures than other models.

Predictive ratios were compared across measures to illustrate the degree of overprediction or underprediction by expenditure quintile (Table 3). All models significantly overpredicted the lowest quintile and significantly underpredicted the highest quintile. ACGs and DCGs had less overprediction in the lowest quintiles and less underprediction in the highest quintile. The model including DCG and the diabetes complications count further reduced overprediction in the lowest quintile and underprediction in the highest expenditure quintile, suggesting that this model performed best in the low-cost group, the high-cost group, and in the overall sample.


Comorbidity, risk adjustment, and functional status measures vary widely in their power to explain expenditure variation and to discriminate between diabetic patients. However, the amount of variation in total expenditures explained by any measure never exceeded 6% and for most was <4%. The DCG measure explained the most variation in total expenditures and performed modestly better when combined with the diabetes complication count, possibly because the DCG measure accounts for severity to a limited but greater degree than other risk adjustment and comorbidity measures.

The diabetes complications count did not explain significant variation in inpatient, outpatient, or total expenditures because the potential range in this measure was small, the observed range was limited due to a fairly homogeneous sample, and it did not reflect broader comorbidity differences that generic measures take into account. Overall predictive power and accurate prediction of low- and high-cost groups depend more on comprehensive adjustment of comorbidities than on disease severity, even in a disease-specific sample. The generic comorbidity and risk adjustment measures examined here were originally designed to reduce confounding by measuring comorbidity broadly in general populations and are broadly applicable. A diabetes-specific measure that captured diabetes severity more fully with laboratory results and medication data may perform better in expenditure prediction than the complications count, but we did not have the data necessary to construct such a measure.

This study has several limitations. First, veterans with diabetes were identified by self-report not by validated diagnoses- or medication-based algorithms, so it is possible that there are some false-positives in our study. This is likely to be minimal since self-report of diabetes is considered the gold standard with a κ statistic of 0.84 when compared with a clinical diagnosis of diabetes from chart review in the ACQIP trial. Second, the relative ranking of measures may be sensitive to regression method and specification in studies with small samples such as ours (see online appendix I for predictive ratios under a generalized linear model with log costs [available at]). Third, non-VA expenditures and VA outpatient pharmacy expenditures were not available and may have underestimated our power. These exclusions do not invalidate our findings because the goal of the report was to compare several measures on a metric that is consistent across measures. Fourth, the study results may not generalize to patients with type 1 diabetes because of the lack of these patients in our sample. Fifth, we were unable to cross-validate our results by creating testing and validation samples that would have improved the robustness and reliability of the results because our sample size was limited. Sixth, several of the comorbidity and risk adjustment measures examined here have been updated, so their predictive power may have improved and relative ranking might change. We used ACG version 4.2 software and DCG-HCC version 6.0 software in this analysis, but ACG version 8.1 software is now available and includes new DCG risk adjusters for the Centers for Medicare and Medicaid Services and other purposes. Finally, there may be variation in diagnosis and medication coding practices within and across medical centers, so diagnosis- and pharmacy-based measures may have measurement error.

To guide analysts interested in using existing, validated measures to adjust for confounding due to comorbidity and other differences, future research should compare well-established generic risk adjusters and diabetes-specific measures in mortality and other health outcomes. Such comparisons will provide a better understanding of the tradeoffs between more complete capture of comorbidities or more complete capture of disease severity because no existing generic measure captures breadth (via comorbidities) and depth (via severity) to the same degree.

Generic measures capture comorbidities imperfectly but reflect the severity of the primary condition and predict different outcomes with differing degrees of power (25). For example, the CCI was developed to predict mortality and does so better than most measures, but it is less predictive of expenditures than risk adjustment measures in this sample of veterans with diabetes and in a general veteran sample (15). The ranking of measures by predictive power in this sample of veterans with diabetes is almost identical to the ranking for a general veteran sample (15). ACGs and DCGs were developed to predict expenditures, and DCGs are generally the most predictive risk adjuster, but these measures do not predict mortality as well as comorbidity measures in a general veteran sample (13). On the other hand, disease-specific measures may capture disease severity well but may not capture comorbidities that are strongly correlated with economic outcomes. Expenditure prediction in disease-specific samples may be improved by using generic and disease-specific measures together in regression adjustment if it is practical and consistent with the measure's original objective.

Figure 1—

Analytic sample obtained from ACQUIP trial participants with diabetes. *Enough responses to allow scoring of PCS and MCS.

Table 1—

Descriptive statistics of a sample of veterans regularly using primary care

Table 2—

Alternative risk adjustment measures in prospective expenditure models

Table 3—

Assessment of under- and overprediction by total expenditure quintile


This material is based on work supported by the Department of Veterans Affairs, the Veterans Health Administration, the Office of Research and Development, the Health Services Research and Development Service (HSR&D), LIP 61-105, SDR 96-002, and IIR 99-376. M.L.M. is an investigator at the Durham VA HSR&D Center of Excellence. C.-F.L. is an investigator at the VA Puget Sound Health Care System's HSR&D Center of Excellence. S.D.F. is currently Acting Chief Quality and Performance Officer for the Veterans Health Administration.

No potential conflicts of interest relevant to this article were reported.

Comments from Sarah Krein, the manuscript editor, two anonymous reviewers, and outstanding research assistance from Mark Perkins are appreciated.


  • Published ahead of print at on 22 October 2008.

    The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs, the U.S. Government, the University of North Carolina at Chapel Hill, or the University of Washington.

    Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. See for details.

    The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

    • Accepted October 14, 2008.
    • Received June 18, 2008.


| Table of Contents

This Article

  1. Diabetes Care vol. 32 no. 1 75-80
  1. Online-Only Appendix
  2. All Versions of this Article:
    1. dc08-1099v1
    2. 32/1/75 most recent