## Development and validation

## Abstract

**OBJECTIVE**—To develop and validate an empirical equation to screen for diabetes.

**RESEARCH DESIGN AND METHODS**—A predictive equation was developed using multiple logistic regression analysis and data collected from 1,032 Egyptian subjects with no history of diabetes. The equation incorporated age, sex, BMI, postprandial time (self-reported number of hours since last food or drink other than water), and random capillary plasma glucose as independent covariates for prediction of undiagnosed diabetes. These covariates were based on a fasting plasma glucose level ≥126 mg/dl and/or a plasma glucose level 2 h after a 75-g oral glucose load ≥200 mg/dl. The equation was validated using data collected from an independent sample of 1,065 American subjects. Its performance was also compared with that of recommended and proposed static plasma glucose cut points for diabetes screening.

**RESULTS**—The predictive equation was calculated with the following logistic regression parameters: *P* = 1/(1 − *e*^{−x}), where *x* = −10.0382 + [0.0331 (age in years) + 0.0308 (random plasma glucose in mg/dl) + 0.2500 (postprandial time assessed as 0 to ≥8 h) + 0.5620 (if female) + 0.0346 (BMI)]. The cut point for the prediction of previously undiagnosed diabetes was defined as a probability value ≥0.20. The equation’s sensitivity was 65%, specificity 96%, and positive predictive value (PPV) 67%. When applied to a new sample, the equation’s sensitivity was 62%, specificity 96%, and PPV 63%.

**CONCLUSIONS**—This multivariate logistic equation improves on currently recommended methods of screening for undiagnosed diabetes and can be easily implemented in a inexpensive handheld programmable calculator to predict previously undiagnosed diabetes.

- 2-h PG, plasma glucose 2 h after a 75-g oral glucose load
- ADA, American Diabetes Association
- EPV, events per variable
- FPG, fasting plasma glucose
- OAPR, odds of being affected given a positive result
- PPV, positive predictive value
- ROC, receiver-operating characteristic

Screening for undiagnosed diabetes is controversial. In 1978, the American Diabetes Association (ADA), the Centers for Disease Control and Prevention, and the National Institutes of Health recommended against screening for diabetes in nonpregnant adults (1). In 1989 and again in 1996, the U.S. Preventive Services Task Force recommended against screening for diabetes in nonpregnant adults (1,2), and in 2001, the ADA recommended against community screening for diabetes (3). Several recent studies have shown that age, sex, BMI, and current metabolic status affect blood glucose levels and have raised concerns about the performance of diabetes screening tests (4–8).

The performance of all screening tests is dependent on the threshold or cut point used to define a positive test. In diabetes screening, choosing a higher glucose cut point reduces sensitivity (probability of a positive screening test given disease) but improves specificity (probability of a negative screening test given absence of disease). Choosing a lower glucose cut point improves sensitivity but reduces specificity. Because the optimal cut point for a positive test may depend on age, sex, BMI, and the time since last food or drink, we propose an alternative approach to interpreting capillary glucose screening tests by developing a multivariate equation using the best combination of readily available data to predict previously undiagnosed diabetes.

## RESEARCH DESIGN AND METHODS

To assess the likelihood of previously undiagnosed diabetes, a predictive equation was developed using data from 1,032 Egyptian subjects without a history of diabetes who participated in the Diabetes in Egypt Project between July 1992 and October 1993 (9). In a household examination, all subjects were assessed for age, sex, height, weight, postprandial time (self-reported number of hours since last food or drink other than water), and random capillary whole blood glucose. On a separate day, fasting plasma glucose (FPG) and plasma glucose 2 h after a 75-g oral glucose load (2-h PG) were measured. Multiple logistic regression analysis was used to develop an equation for prediction of undiagnosed diabetes based on FPG ≥126 mg/dl and/or 2-h PG ≥200 mg/dl. Diabetes risk factors included in the equation were age (years), sex (female), BMI (calculated as weight in kilograms divided by height in meters squared [kg/m^{2}]), postprandial time (0 to ≥8 h), and random capillary plasma glucose (mg/dl). Age, BMI, and capillary plasma glucose were modeled as continuous variables, postprandial time was modeled as a continuous variable between 0 and 8 h (after which random capillary glucose did not vary as a function of postprandial time), and sex was modeled as a categorical variable (0 = male and 1 = female). The final mathematical equation provides an estimate of a subject’s likelihood of previously undiagnosed diabetes expressed as a probability between 0.0 and 1.0.

The linearity assumption for logistic regression was assessed by categorizing each continuous variable into multiple dichotomous variables of equal units and plotting each variable’s coefficient against the midpoint of the variable. We also performed the Mantel-Haenszel χ^{2} test for trend. Multicollinearity was assessed using the Pearson correlation coefficient statistic. Accuracy, reliability, and precision of regression coefficients were assessed by calculating the number of events per variable (EPV)—the ratio of the number of outcome events to the number of predictor variables. An EPV number of at least 10 indicates that the estimates of regression coefficients and their CIs are reliable (10,11). The possible interactions among variables were assessed using the Breslow and Day χ^{2} test (12).

The −2 log-likelihood ratio test was used to test the overall significance of the predictive equation. The significance of the variables in the model was assessed by the Wald χ^{2} test and CIs. The fit of the model was assessed by the Hosmer-Lemeshow goodness of fit χ^{2} test (13,14). To assess outliers and detect extreme points in the design space, logistic regression diagnostics were performed by plotting the diagnostic statistic against the observation number using hat matrix diagonal and Pearson and Deviance residuals analyses (13,14).

To select the optimal cut point to define a positive test, a receiver-operating characteristic (ROC) curve was constructed by plotting sensitivity against the false-positive rate (1 − specificity) over a range of cut-point values. Generally, the best cut point is at or near the shoulder of the ROC curve, where substantial gains can be made in sensitivity with only modest reductions in specificity. Sensitivity was defined as the proportion of subjects predicted to have the outcome who really have it (true-positive test) and calculated as [true positives/(true positives + false negatives)] × 100. Specificity was defined as the proportion of subjects predicted not to have the outcome who do not have it (true-negative test) and calculated as [true negatives/(true negatives + false positives)] × 100. Positive predictive value (PPV) was defined as the percentage of individuals with a positive test result who actually have the disease and was calculated as [true positives/(true positives + false positives)] × 100. The odds of being affected given a positive result (OAPR) was defined as the ratio of the number of affected to unaffected individuals among those with positive results and was calculated as true positives/false positives.

Concordance and discordance values, derived from the logistic regression analysis, were used to measure the association of predicted probabilities and to check the ability of the model to predict outcome. The higher the value of the concordance and the lower the value of discordance, the greater the ability of the model to predict outcome. To evaluate the overall performance of the equation, we considered several measures of predictive performance, including discrimination and calibration (15–20). Discrimination was defined as the ability of the equation to distinguish high-risk subjects from low-risk subjects and is quantified by the area under the ROC curve (15,19,20). Calibration was defined as whether the predicted probabilities agree with the observed probabilities and is quantified by the calibration slope calculated as [model χ^{2} − (df − 1)[/model χ^{2} (16,20,21). Well-calibrated models have a slope of ∼1, whereas models providing too extreme of predictions have a slope of <1 (17,20).

To validate the equation, we applied it to data that had not been used to generate the equation. Thus, we applied the equation to data collected from 1,065 subjects with no history of diabetes who were studied between September 1995 and July 1998 by health care systems serving communities in Springfield, MA; Robeson County, NC; Providence, Pawtucket, RI; and Central Falls, RI (7). All subjects were assessed for age, sex, height, weight, postprandial time, random capillary plasma glucose, and, on a separate day, FPG and 2-h PG.

To compare the results obtained with the predictive equation and the results obtained with various recommended and proposed random capillary plasma glucose cut points, we applied the equation and those cut points to the combined Egyptian and American datasets. Capillary plasma glucose values were calculated by multiplying capillary whole blood glucose values by 1.14. All statistical analyses were performed using SAS software version 6.12 (SAS Institute, Cary, NC).

## RESULTS

Table 1 describes the demographic characteristics of the Egyptian and American subjects. The American participants included Hispanics (58%), non-Hispanic whites (19%), African-Americans (12%), Native Americans (4%), and others (7%). The diabetes predictive equation was calculated with the following logistic regression parameters: *P* = 1/(1 − *e*^{−x}), where *x* = −10.0382 + [0.0331 (age in years) + 0.0308 (random plasma glucose in mg/dl) + 0.2500 (postprandial time assessed as 0 to ≥8 h) + 0.5620 (if female) + 0.0346 (BMI)]. Table 2 shows the maximum likelihood estimates for the logistic regression function. The overall significance of the equation by the −2 log-likelihood test was 299.6 (*P* = 0.0001) with 5 df, with 89% concordant pairs and 11% discordant pairs. The Hosmer-Lemeshow goodness of fit test was 5.27 (*P* = 0.73) with 8 df. The EPV number was 134/5 = 26.8. Because no interactions, either alone or in combination, added significantly to the equation, we did not add any of these parameters. No potential outliers were detected, and the equation met the linearity assumption for logistic regression analysis.

The probability level that provided an optimal cut point was 0.20. Based on the classification table, derived from the logistic regression and ROC curve analysis, sensitivity was 65%, specificity 96%, and PPV 67% (Fig. 1). The area under the ROC curve was 0.88. The calibration slope was (299.6 − 4)/299.6 = 0.99. When applied to a new sample of 1,065 subjects, the equation’s sensitivity was 62%, specificity 96%, and PPV 63%. These represented relatively small decrements from the original equation.

The diabetes predictive equation performed better than the various proposed static random capillary plasma glucose cut points for a positive test when applied to the combined population with 10% prevalence of undiagnosed diabetes (the prevalence observed in the combined Egyptian and American data sets) (Table 3). In general, the equation yielded higher sensitivity, identified more new cases (true positives), and missed fewer new cases (false negatives) than the static capillary plasma glucose cut points ≥140, ≥150, ≥160, ≥170, and ≥180 mg/dl. The equation yielded higher specificity and identified fewer false-positive cases than the static capillary plasma glucose cut points ≥110, ≥120, ≥130, ≥140, and ≥150 mg/dl. The equation yielded higher PPV and OAPR than the static capillary plasma glucose cut points ≥110, ≥120, ≥130, ≥140, ≥150, ≥160, and ≥170 mg/dl.

## CONCLUSIONS

The performance of all screening tests depends on the cut points used to define a positive test. The choice of a higher cut point leaves more cases undetected, and the choice of a lower cut point classifies more healthy individuals as abnormal (5). Currently, there are no widely accepted or rigorously validated cut points to define positive screening tests for diabetes in nonpregnant adults (6). The ADA has recommended a random capillary whole blood glucose cut point of ≥140 mg/dl (capillary plasma glucose ≥160 mg/dl), and Rolka et al. (7) have recommended a random capillary plasma glucose cut point of ≥120 mg/dl.

Optimal cut points for random capillary glucose tests depend on age, sex, BMI, and postprandial time (6,7). Multivariate equations incorporate multiple pieces of diagnostic information and can provide a flexible alternative to static cut points for the definition of a positive test (21). We have developed a multivariate predictive equation based on age, sex, BMI, postprandial time, and capillary plasma glucose levels to assess the likelihood of previously undiagnosed diabetes. The equation was 65% sensitive and 96% specific. In validation testing, the equation was 62% sensitive and 96% specific. Predictive equations rarely perform as well with new data as with the data with which they were developed because during development, the equation maximizes the probability of predicting the values in the original dataset. When testing an equation, the important factor is the size of the decrement in performance. The relatively small decrement in sensitivity and unchanged specificity suggest that the equation has both external validity and generalizability (21).

A decision regarding acceptable levels of sensitivity and specificity involves weighting the consequences of leaving cases undetected (false negatives) and classifying healthy individuals as abnormal (false positives) (22,23). Like the ADA-recommended plasma glucose cut point of 160 mg/dl, the logistic equation provided high specificity (96%) (Table 3). Compared with the ADA-recommended cut point of 160 mg/dl, the logistic equation improved sensitivity (44 and 63%, respectively) (Table 3). Compared with the plasma glucose cut point of 120 mg/dl, the logistic equation improved specificity (77 and 96%, respectively) but was less sensitive (76 and 63%, respectively) (Table 3).

Highly specific screening tests minimize the number of false-positive results but increase the number of false-negative results. They are preferable if the failure to make an early diagnosis and initiate treatment does not have dire health consequences, if a disease is uncommon in the population, and if false-positive results can harm the subject physically, emotionally, or financially. Type 2 diabetes is often slowly progressive and is not associated with complications in the short term. Individuals with initial false-negative screening tests will be identified as abnormal on rescreening, particularly if they have progressive glucose intolerance. In addition, undiagnosed diabetes is uncommon: in a representative sample of the U.S. population 40–74 years of age, undiagnosed diabetes, defined by FPG ≥140 mg/dl or 2-h PG ≥200, was present in only 6.7% (24). False-positive screening tests require further diagnostic tests that are inconvenient, expensive, and time-consuming. For these reasons, we believe that the predictive equation, which is highly specific, is preferable to a static glucose cut point of 120 mg/dl, which is much less specific. We also believe that the predictive equation is preferable to a static glucose cut point of 160 mg/dl because, given comparable high specificity, it is much more sensitive.

PPV and OAPR are measures of the performance of a diagnostic test that depend on the prevalence of the disease in the screened population and on the sensitivity and specificity of the test (22,25,26). However, unlike sensitivity and specificity, they are not properties of the screening test itself, but of its application. The multivariate predictive equation provided a PPV of 64% and an OAPR of 1.75. These results were better than those obtained with all static plasma glucose cut points <180 mg/dl and indicate that among those with a positive test, 64% actually have diabetes (true positives), and the odds of having a true-positive test result are 1.75 times greater than the odds of having a false-positive result (Table 3). Tests with an OAPR <1 identify fewer true positives than false positives.

In summary, by incorporating relevant risk factor data, the predictive equation performs better in the general population than any single glucose cut point. The multivariate equation can be implemented with a number of inexpensive, programmable, handheld calculators. We programmed the formula and coefficients presented in research design and methods into a TI-83 graphic and scientific calculator (Texas Instruments, Dallas, TX). To obtain a probability value, the user enters the values for age (years), capillary plasma glucose (mg/dl), postprandial time (0 to ≥8 h), BMI (kg/m^{2}), and sex (0 = male and 1 = female). The calculator prompts the user by displaying the coefficient for the variable that should be entered next. The result displayed is the calculated probability that a subject has previously undiagnosed diabetes (a number between 0.0 and 1.0). The programming is available on request. Using this device and a glucose meter, a health care professional can perform a quick point-of-care assessment of the probability of undiagnosed diabetes in either a public health or clinical setting.

## Acknowledgments

This work was supported by the U.S. Agency for International Development and the Egyptian Ministry of Health under PASA (Participating Agency Service Agreement) 236-0102-P-HI-1013-00, the Michigan Diabetes Research and Training Center under grant DK-20572, and the Centers for Disease Control and Prevention.

## Footnotes

Address correspondence and reprint requests to William H. Herman, MD, MPH, University of Michigan Health System, Division of Endocrinology and Metabolism, 1500 E. Medical Center Dr., 3920 Taubman Center, Ann Arbor, MI 48109-0354. E-mail: wherman{at}umich.edu.

Received for publication 21 January 2001 and accepted in revised form 26 June 2002.

A table elsewhere in this issue shows conventional and Système International (SI) units and conversion factors for many substances.

- DIABETES CARE