Diabetes Care 31:1040-1045, 2008 DOI: 10.2337/dc07-1150 © 2008 by the American Diabetes Association
Diabetes Risk CalculatorA simple tool for detecting undiagnosed diabetes and pre-diabetes
1 Archimedes, San Francisco, California Corresponding author: David Eddy, MD, PhD, 201 Mission St., 29th Floor, San Francisco, CA 94105. E-mail: author{at}archimedesmodel.com
OBJECTIVE—The objective of this study was to develop a simple tool for the U.S. population to calculate the probability that an individual has either undiagnosed diabetes or pre-diabetes. RESEARCH DESIGN AND METHODS—We used data from the Third National Health and Nutrition Examination Survey (NHANES) and two methods (logistic regression and classification tree analysis) to build two models. We selected the classification tree model on the basis of its equivalent accuracy but greater ease of use. RESULTS—The resulting tool, called the Diabetes Risk Calculator, includes questions on age, waist circumference, gestational diabetes, height, race/ethnicity, hypertension, family history, and exercise. Each terminal node specifies an individual's probability of pre-diabetes or of undiagnosed diabetes. Terminal nodes can also be used categorically to designate an individual as having a high risk for 1) undiagnosed diabetes or pre-diabetes, 2) pre-diabetes, or 3) neither undiagnosed diabetes or pre-diabetes. With these classifications, the sensitivity, specificity, positive and negative predictive values, and receiver operating characteristic area for detecting undiagnosed diabetes are 88%, 75%, 14%, 99.3%, and 0.85, respectively. For pre-diabetes or undiagnosed diabetes, the results are 75%, 65%, 49%, 85%, and 0.75, respectively. We validated the tool using v-fold cross-validation and performed an independent validation against NHANES 1999–2004 data. CONCLUSIONS—The Diabetes Risk Calculator is the only currently available noninvasive screening tool designed and validated to detect both pre-diabetes and undiagnosed diabetes in the U.S. population.
Abbreviations: CART, classification and regression tree DRC, Diabetes Risk Calculator FPG, fasting plasma glucose IFG, impaired fasting glucose IGT, impaired glucose tolerance NHANES, National Health and Nutrition Examination Survey OGTT, oral glucose tolerance test ROC, receiver operating characteristic
The objective of this study was to develop a simple, self-administered, paper-based screening tool that could be used by the public to determine their risk of having pre-diabetes or undiagnosed diabetes and to help people decide whether they should see a physician for further evaluation. To maximize its accessibility and ease of use, the tool should use only information that is commonly known to an average individual and preferably should not require any calculations. The prevalence of diabetes is growing rapidly, with the total number of cases worldwide projected to increase from 171 million in 2000 to 366 million by 2030 (1). In the U.S. in 2002, the prevalence of diabetes was estimated to be 19.3 million, of which about 5.8 million cases were undiagnosed (2). An additional 41 million individuals are estimated to have pre-diabetes, defined as impaired fasting glucose (IFG) or impaired glucose tolerance (IGT). Pre-diabetes implies an increased risk of development of type 2 diabetes on the order of 30% over 4 years (3) and 70% over 30 years (4). Several studies have demonstrated that type 2 diabetes can be prevented or delayed with lifestyle modification or the use of pharmacotherapy in subjects with pre-diabetes (3,5). Studies have also indicated that preventing or delaying the onset of type 2 diabetes by lifestyle modification or the use of pharmacotherapy can be cost-effective (6) if costs of the interventions are controlled (4).
An important step in preventing or delaying type 2 diabetes and its complications is to identify people with pre-diabetes and undiagnosed diabetes so that they can be given appropriate care. The American Diabetes Association recommends screening for type 2 diabetes at 3-year intervals beginning at age 45, particularly in those with BMI One way to address this problem is to develop a simple, inexpensive tool that can identify people who are at high risk of having pre-diabetes or undiagnosed diabetes and motivate them to be screened. Several investigators have developed diabetes risk assessment tools. However, most of those tools apply to non-U.S. populations and none were designed to detect pre-diabetes and undiagnosed diabetes. The objective of this study was to develop a simple tool for use in the U.S. to identify people who have a high probability of having pre-diabetes or undiagnosed diabetes, using only information that is commonly known to an average individual and preferably not requiring any calculations.
Definitions The definitions of pre-diabetes and diabetes are based on fasting plasma glucose (FPG) and glucose tolerance, as measured by a 2-h plasma oral glucose tolerance test (OGTT). IFG is defined as FPG of 100–125 mg/dl. IGT is defined as 2-h OGTT result of 140–199 mg/dl. Diabetes is defined as FPG 126 mg/dl and/or 2-h OGTT result 200 mg/dl. Pre-diabetes is defined as IFG and/or IGT without diabetes. Undiagnosed diabetes is defined as the presence of actual diabetes based on FPG and/or a 2-h OGTT and the absence of an individual having been told that he or she has diabetes. We use the term "elevated plasma glucose" to define an individual who has either pre-diabetes or undiagnosed diabetes.
Data
Explanatory variables
Strategy
Analytical methods
Classification and regression tree. The process then repeats for each of the child nodes. Subsequent splits can involve another explanatory variable or a different value of a previously used variable. A node that is not further split is referred to as a terminal node. Each terminal node is assigned to a target class conditional on whether the prevalence of the target class exceeds a designated threshold. The same threshold is applied to all terminal nodes and determines overall sensitivity and specificity of the tree. The classification tree is grown to its maximum size and then pruned on the basis of a criterion that balances the number of terminal nodes (complexity) against the accuracy of the tree in classifying people, sometimes termed misclassification cost. To develop a single tree that could be used to detect either pre-diabetes or undiagnosed diabetes, we used an approach analogous to that used for the regression model. Specifically, we first developed a tree to predict undiagnosed diabetes and then applied a different threshold to predict pre-diabetes. Because one of the goals was to create a simple model, BMI and waist-to-hip ratio were dropped from the list of variables in favor of weight, height, and waist circumference, which required no calculation but still maintained the accuracy of the tool. We eliminated the cholesterol variables high cholesterol and taking cholesterol medication because of the large number of missing fields and low predictive value. We also eliminated history of diabetes in any blood relative in favor of the more specific diabetes history variables (history of diabetes in a parent or sibling, in a parent only, or in a sibling only). We used v-fold cross-validation to train and test the classification tree models, partitioning the data into equal-sized subsets (9). We then derived and tested the classification tree on all combinations of 9 of 10 training data and 1 of 10 test data.
Prevalence of pre-diabetes and diabetes The prevalences of undiagnosed diabetes and pre-diabetes in the NHANES III dataset were 4.16 and 26.14%, respectively.
Logistic regression model Validations were performed using split datasets, in which the model was "trained" on a randomly selected subset of the data and tested on the remaining data. Validation tests were repeated for different selections of training and test data. These tests all produced models that were very similar to the original and performed nearly as well on test data as on training data.
CART
With use of these classifications, the accuracy of the classification tree for undiagnosed diabetes is sensitivity 88%, specificity 75%, positive predictive value 14%, negative predictive value 99.3%, and area under the ROC curve 0.85. The accuracy for pre-diabetes or undiagnosed diabetes is sensitivity 75%, specificity 65%, positive predictive value 49%, negative predictive value 85%, and area under the ROC curve 0.75. On the basis of these results, a positive result on the DRC for undiagnosed diabetes increases the odds that an individual has undiagnosed diabetes by a factor of 3.5, whereas a negative result decreases the odds by a factor of 6, for an 18-fold difference in the odds depending on the results of the test. For increased plasma glucose, the difference in the odds of a positive versus a negative result is a factor of 6.
Validation of classification tree
Comparison of logistic regression model and classification tree The classification tree performed slightly better than the logistic regression model for undiagnosed diabetes in the range of greatest interest and was almost as accurate for detecting elevated plasma glucose. Because it is considerably simpler to apply, requiring no calculations at all, we selected it as the preferred tool for our objectives.
We have developed a simple tool that uses only questions known to an average individual and requires no calculations to help identify people who are at increased risk for pre-diabetes or undiagnosed diabetes. The DRC sorts people into 14 different categories and reports for each category the probability that an individual is at low risk or high risk for either undiagnosed diabetes or pre-diabetes. To develop the tool we applied two different methods: logistic regression and CART. The versions produced by the two methods had similar accuracies, predictive values, and areas under ROC curves. We selected the tool developed by the CART method because it could be translated into a simpler tool, and it provided information about the actual probabilities that an individual has pre-diabetes or undiagnosed diabetes. The tool developed by the CART method is in the form of a tree that can be easily navigated from the root to terminal nodes through a series of branches, where the path followed depends on the answers to simple yes or no questions that any individual would be able to answer. The final terminal node determines the individual's risk of undiagnosed diabetes and/or pre-diabetes. The sensitivity of the DRC was 88 and 75% and the specificity was 75 and 65% for individuals with undiagnosed diabetes and pre-diabetes or undiagnosed diabetes, respectively. To our knowledge, there are no other tools designed to find people likely to have pre-diabetes as defined by IFG or IGT. Other tools have been developed for detecting people with undiagnosed diabetes (10–17). The sensitivities for undiagnosed diabetes ranged from 72 to 86%, with the highest sensitivity observed in individuals who had one or more cardiovascular risk factors (18). The specificities for the same tools ranged from 41 to 77%. Other tools have been built to calculate the risk of future development of diabetes (10,18,19). One of the tools designed to predict future drug-treated diabetes (10) has been used in people with one or more risk factors for cardiovascular disease to try to identify those who have either undiagnosed diabetes or IGT (11). These tools and applications were all designed for different purposes than was the DRC. Because one of the objectives of a good screening tool is to minimize the need for unnecessary testing and therefore reduce the economic impact of testing, the predictive value is important to consider for performance of the tool. The positive predictive values of the DRC were 14 and 49% for diabetes and elevated plasma glucose, respectively, and the negative predictive values were 99.3 and 85%, respectively. The positive predictive value for the other screening tools for undiagnosed diabetes ranged from 8 to 13% for non–high-risk populations and was 23% for individuals who have one or more cardiovascular risk factors (11). Thus, in terms of overall performance, the DRC appears to compare favorably with other available tools for detecting people with undiagnosed diabetes, in addition to its ability to detect people at high risk for pre-diabetes. Another important distinction of the DRC is that it has been constructed for and tested in a U.S. population. The NHANES III dataset is a weighted survey and includes individuals from different ethnicities as represented in the U.S. population. An analysis showed that a risk score for undiagnosed diabetes developed originally in a strictly Caucasian population could not be applied reliably to other populations with diverse ethnic origins (12). To our knowledge, the only other tool for undiagnosed diabetes developed with NHANES data was based on an older version of the NHANES (NHANES II) (14). For convenience we will call this the "NHANES II model." When this model was applied to the NHANES II data on which it was developed, its reported sensitivity of 79% and specificity of 65% were lower than the sensitivity and specificity calculated for the DRC applied to NHANES III data on which that model was developed (88 and 75%, respectively). Furthermore, when the NHANES II model is applied to NHANES III data, its sensitivity drops to 71.7% and its specificity decreases to 54.1% (see online appendix). Models generally perform best on the data on which they were developed, and they perform better on training data than on test data. That the NHANES II model does not perform as well on NHANES III data as the DRC is not unexpected. However, the fact that the DRC performs better than the NHANES II model, when each is tested against the data used to develop it, indicates a significant improvement in predictability for the DRC. Finally, the DRC has been validated using two methods, 1) split sample cross-validation methods applied to NHANES III data and 2) application of the classification tree to the NHANES 1999–2004 dataset. As expected, the sensitivity, specificity, and predictive values were somewhat lower in the validations than for the training datasets. Nonetheless, the DRC still appears to compare favorably to other tools for detecting pre-diabetes and undiagnosed diabetes. Future research will include validating the tool using an independent dataset from a diabetes prevention clinical trial, as well as determining its applicability to populations outside the U.S. Finally, development of a patient-friendly, electronic version is underway for broader use in clinical practice. It is not possible to determine precisely the clinical value of any risk-calculating tool or any diagnostic test for that matter. Their purpose is to provide information that would tip the balance that an individual would choose to undergo more definitive screening with appropriate laboratory tests. For pre-diabetes, it is well documented that treatment can postpone and in some cases prevent the onset of diabetes. It is also well established that treatment of diabetes helps prevent complications. For these reasons, several organizations, such as the American Diabetes Association, recommend screening. Yet a high proportion of people do not receive the recommended screening tests. It is reasonable to assume that some people do not perceive their risk of pre-diabetes or diabetes to be sufficiently high to justify the inconvenience and cost. The DRC we describe in this article is intended to give them a simple method for determining whether they might have a higher risk than they perceive. For undiagnosed diabetes, a positive versus a negative result spreads the odds of having that condition by a factor of 18. For increased plasma glucose, the spread in odds of that condition is by a factor of 6. It seems reasonable to believe that for many people this information may aid in their decision to seek care. In summary, we have described a simple, validated, paper-based screening tool that can be used to calculate the probability that an individual has either undiagnosed diabetes or pre-diabetes using information known to an average individual, without requiring any calculations. The screening tool can be used by physicians to assess the risks of their patients or can be self-administered by individuals to assess their own risks. Use of this tool enables the identification of individuals who might benefit from confirmatory tests and treatment to delay or prevent the onset of type 2 diabetes and its complications.
Published ahead of print at http://care.diabetesjournals.org on 10 December 2007. DOI: 10.2337/dc07-1150. Additional information for this article can be found in an online appendix at http://dx.doi.org/10.2337/dc07-1150. K.E.H. has received consulting fees from GlaxoSmithKline, and D.M.E. and L.S. have received consulting fees from GlaxoSmithKline, Pfizer Inc., and Eli Lilly. See accompanying editorial, p. 1084. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C Section 1734 solely to indicate this fact. Received for publication June 18, 2007. Accepted for publication December 5, 2007.
Related Article:
This article has been cited by other articles:
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||