Novel Risk Factors and the Prediction of Type 2 Diabetes in the Atherosclerosis Risk in Communities (ARIC) Study
OBJECTIVE The objective of this study was to determine potential added value of novel risk factors in predicting the development of type 2 diabetes beyond that provided by standard clinical risk factors.
RESEARCH DESIGN AND METHODS The Atherosclerosis Risk in Communities (ARIC) Study is a population-based prospective cohort study in four U.S. communities. Novel risk factors were either measured in the full cohort or in a case-control sample nested within the cohort. We started with a basic prediction model, previously validated in ARIC, and evaluated 35 novel risk factors by adding them independently to the basic model. The area under the curve (AUC), net reclassification index (NRI), and integrated discrimination index (IDI) were calculated to determine if each of the novel risk factors improved risk prediction.
RESULTS There were 1,457 incident cases of diabetes with a mean of >7.6 years of follow-up among 12,277 participants at risk. None of the novel risk factors significantly improved the AUC. Forced expiratory volume in 1 s was the only novel risk factor that resulted in a significant NRI (0.54%; 95% CI: 0.33–0.86%). Adiponectin, leptin, γ-glutamyl transferase, ferritin, intercellular adhesion molecule 1, complement C3, white blood cell count, albumin, activated partial thromboplastin time, factor VIII, magnesium, hip circumference, heart rate, and a genetic risk score each significantly improved the IDI, but net changes were small.
CONCLUSIONS Evaluation of a large panel of novel risk factors for type 2 diabetes indicated only small improvements in risk prediction, which are unlikely to meaningfully alter clinical risk reclassification or discrimination strategies.
A number of risk prediction tools for type 2 diabetes have been developed that could be used for opportunistic screening in clinical practice; however, at this time, there is no widely accepted risk prediction score that has been developed and validated in routine clinical practice (1,2). Developing a tool that successfully identifies those at high risk of type 2 diabetes is important because the disease is largely preventable through lifestyle and/or pharmacologic interventions (3). Therefore, the successful identification of at-risk individuals, via risk prediction models, would create greater opportunities for clinicians to intervene to prevent or delay the onset of type 2 diabetes and the complications associated with this disease (4).
Within the last decade, a number of potential new risk factors for type 2 diabetes have been identified that are related to chronic inflammation, metabolic abnormalities, endothelial dysfunction, oxidative stress, and a prothrombotic state. Many of these factors have been found to be independently associated with type 2 diabetes in prospective cohort studies, including the Atherosclerosis Risk in Communities (ARIC) Study (5–21). Likewise, a number of common gene variants have been identified that are associated with type 2 diabetes in both candidate gene and genome-wide association studies. Because there is a possibility that one or more of these novel risk factors could serve in a tool for predicting type 2 diabetes, allowing clinicians to intervene and prevent the onset of disease, it is important to identify those risk factors that may refine and improve tools for risk prediction. Therefore, the purpose of this analysis is to identify novel risk factors that could improve type 2 diabetes risk prediction.
RESEARCH DESIGN AND METHODS
The ARIC Study began in 1987–1989 and recruited a population-based cohort from four U.S. communities including: Forsyth County, NC; Jackson, MS; the northwest suburbs of Minneapolis, MN; and Washington County, MD (22). Participants received an extensive examination, including medical, social, and demographic data. The baseline examinations (visit 1) were conducted between 1987 and 1989; visit 2 was held between 1990 and 1992; visit 3 between 1993 and 1995; and visit 4 was conducted between 1996 and 1998. Of participants still alive at the time of follow-up visits, response rates for visits 2, 3, and 4 were 93, 86, and 81%, respectively.
For some analyses, we used data from a case-cohort study design previously used to examine the role of inflammation in the development of type 2 diabetes in ARIC (8). Prior to sampling, the following individuals were excluded: 2,018 with prevalent diabetes, 95 members of minority ethnic groups with small numbers, 853 individuals who did not return to any follow-up visit, 26 with no valid diabetes determination at follow-up, 7 with restrictions on stored plasma use, 12 with missing baseline anthropometric measurements, and 2,506 in previous ARIC case-control and case-cohort studies involving cardiovascular disease for whom stored plasma was either previously exhausted or held in reserve. After exclusions, the sampling frame consisted of 10,275 individuals.
Case subjects were defined as participants who met any of the following criteria for type 2 diabetes at one or more follow-up visits: 1) self-reported use of hypoglycemic medications; 2) casual serum glucose of ≥200 mg/dL; 3) fasting (>8 h) serum glucose of ≥126 mg/dL; or 4) self-reported physician diagnosis of type 2 diabetes. There were 1,155 incident cases identified among the participants in the sampling frame. Due to budget constraints, all eligible type 2 diabetes cases were not selected for the case-cohort design. Instead, a stratified random sample of cases was selected with oversampling of African Americans. A subcohort was selected from all eligible cohort members to serve as the comparison group. Because risk-prediction software could not readily accommodate sample weights necessary for case-cohort analysis, we excluded incident case subjects who were independently selected for the cohort random sample but not the case sample (N = 23), resulting in a final sample size of 529 case and 543 control subjects.
Prevalent diabetes at baseline was defined as a nonfasting glucose ≥200 mg/dL, a fasting glucose ≥126 mg/dL, self-reported diagnosis of diabetes by a physician, or the current use of medications. Parental history of diabetes was defined as a report of diabetes in either parent. Subjects were asked to fast for 12 h prior to the clinical examination. Anthropometric measurements were taken with participants dressed in scrub suits without shoes. Technicians measured waist girth at the umbilical level. Blood pressure was measured three times with the subject in the sitting position after 5 min of rest using a random-zero sphygmomanometer, and the last two measurements were averaged.
After informed consent, blood was drawn from the antecubital vein of seated participants. Serum glucose was measured using a hexokinase/glucose-6-phosphate dehydrogenase method. Triglycerides were measured using an enzymatic method and HDL cholesterol (HDL-C) was measured enzymatically after dextran sulfate-Mg2+ precipitation of other lipoproteins.
Details regarding the measurement of novel risk factors are found in Supplementary Table 1.
Genotyping, quality control, and imputation procedures for the ARIC genome-wide association study have previously been described (23). The genetic risk score was created by adding together the number of genotyped or imputed risk alleles of 30 genes or regions, thus assuming an additive model of inheritance. The selection of genetic variants was based on a recent large-scale association analysis of European Americans that combined genome-wide association data from multiple studies to identify genetic variants associated with type 2 diabetes (24). The risk alleles modeled were those used in Voight et al. (24), which indexed alleles to the forward strand of National Center for Biotechnology Information Build 36. Because most of the variants were discovered and validated in Caucasian populations, the genetic risk score was created only for Caucasian study participants.
Baseline characteristics of the study population were examined by incident type 2 diabetes status and shown as means ± SD or N (%) and compared by t or χ2 tests. For prediction analyses, we started with a simple or basic prediction model, previously validated in ARIC (25), that includes age, parental history of diabetes, race/ethnicity, fasting glucose, fasting triglycerides, systolic blood pressure, HDL-C, height, and waist circumference, all measured at visit 1. The expanded model for the full cohort considered the following measures obtained at visit 1 and reported to be associated with incident type 2 diabetes in previous ARIC publications:
White blood cell (WBC) count
von Willebrand factor (vWF) antigen
Activated partial thromboplastin time (aPTT)
Factor VIII coagulant activity
Forced vital capacity (FVC)
Forced expiratory volume in 1 s (FEV1)
Total blood viscosity
Low frequency power heart rate variability
Genetic risk score that includes variants from the following 30 genes or regions: NOTCH2 (rs10923931), THADA (rs7578597), BCL11A (rs243021), PPARG (rs1801282), ADAMTS9 (rs6795735), IGF2BP2 (rs1470579), WFS1 (rs10010131), ZBED3 (rs4457053), CDKAL1 (rs7754840), JAZF1 (rs849134), KLF14 (rs972283), TP53INP1 (rs896854), SLC30A8 (rs13266634), CHCHD9(rs13292136), CDKN2A/B (rs10811661), CDC123/CAMKID (rs12779790), HHEX/IDE (rs1111875), TCF7L2 (rs7903146), KCNQ1 (rs231362), KCNJ11 (rs5215), CENTD2 (rs1552224), HMGA2 (rs1531343), TSPAN8/LGR5 (rs7961581), HNF1A (rs7957197), ZFAND6 (rs11634397), PRC1 (rs8042680), FTO (rs9939609), HNF1B (rs75210), MTNRB1 (rs1387153), and IRS1 (rs7578326)
We used Cox proportional hazards regression models, with incident type 2 diabetes as the outcome, to calculate the C statistic for each individual risk factor. We defined incident type 2 diabetes as described above. The date of type 2 diabetes incidence was estimated by linear interpolation using glucose values at the ascertaining visit and the previous one (8). We constructed models by adding each novel risk factor one at a time to the basic risk prediction model. C statistics were compared between the baseline model and the model with the novel risk factor, and if the variable produced an incremental change of at least 0.005, it was included in the final model (25).
We used the macro derived by Chambless et al. (26) to calculate the area under the curve (AUC), net reclassification index (NRI), and integrated discrimination index (IDI) for our risk-prediction models. The AUC is calculated via a nonparametric method, which produces the AUC at time t in a setting of risk prediction from survival analysis and takes censoring into consideration (26). All analyses were conducted using SAS version 9.2 (SAS Institute, Cary, NC).
We excluded 2,018 individuals who had prevalent type 2 diabetes at baseline, 95 individuals from underrepresented minority groups, 314 individuals with missing information on the risk factors included in the basic risk model, 267 individuals who did not fast for at least 8 h, and 821 individuals who had no follow-up time data to ascertain type 2 diabetes, thus leaving 12,277 individuals for the analysis. Because the genetic risk score was created only for Caucasian study participants, there were 8,067 individuals available for the analysis of the addition of a genetic risk score to the basic model.
We started with the same aforementioned basic risk-prediction model (25). The expanded model for the case-control subsample considered the following novel measures obtained from visit 1 blood samples:
γ-Glutamyl transferase (GGT)
Intercellular adhesion molecule 1 (ICAM-1)
Asymmetric dimethylarginine (ADMA)
Retinol binding protein 4 (RBP-4)
Free fatty acids
For the case-control analysis, we used logistic regression models, with incident type 2 diabetes as the outcome, to calculate the C statistics for each individual risk factor. We constructed models by adding each novel risk factor independently to the basic risk-prediction model, as we did in the total cohort, and looked for an incremental change of at least 0.005. We used the nriidi macro created by Sundstrom et al. (27) to calculate the NRI, IDI, and associated P values.
There were 1,457 (11.9%) incident cases of type 2 diabetes with a mean of >7.6 years of follow-up. Unadjusted baseline characteristics of the type 2 diabetes and non–type 2 diabetes groups for the total cohort are summarized in Table 1. Individuals with incident type 2 diabetes were more likely to have a parental history of diabetes, be African American versus white, have a higher systolic blood pressure, higher mean waist circumference, greater mean height, higher levels of fasting triglycerides and glucose, and lower levels of HDL-C. In terms of novel risk factors, individuals with incident type 2 diabetes had statistically significantly different levels of all novel risk factors except albumin and low-frequency-power heart rate variability when compared with those without incident type 2 diabetes.
Multivariate-adjusted hazard rate ratios for the variables included in the basic risk model are shown in Supplementary Table 2. Supplementary Table 3 shows the correlations between novel and basic risk factors. All of the novel risk factors except for the genetic risk score were significantly correlated with at least three of the six basic risk factors, and the majority of novel risk factors are significantly correlated with five or more basic risk factors. However, it is important to note that not all of these correlations were strong. By contrast, the genetic risk score was only correlated with baseline glucose levels.
The basic model had an AUC of 0.8411 (95% CI: 0.8316–0.8457) (Table 2). There were no novel risk factors that improved the AUC of the basic model by an increment of at least 0.005. All of the novel risk factors, with the exception of FEV1, did not have statistically significant NRIs. The NRI when adding FEV1 to the basic model was 0.54% (95% CI: 0.33–0.86%). Finally, the addition of WBC count, albumin, aPTT, factor VIII, magnesium, heart rate, hip circumference, or the genetic risk score statistically significantly increased the IDI.
Nested case-control sample
Baseline characteristics of incident type 2 diabetes case and control subjects in the nested case-control subsample are summarized in Table 3. All novel risk factors were statistically significantly different between case and control subjects. Supplementary Table 4 shows that all of the novel risk factors were significantly correlated with at least two of the basic risk factors, and the majority of novel risk factors were significantly correlated with three or more basic risk factors. As with the total cohort, not all of these risk factors were strongly correlated.
The basic model, which included the aforementioned risk factors from the Schmidt et al. (25) analysis, had an AUC of 0.8607 (95% CI: 0.8386–0.8828) (Table 4). None of the novel risk factors improved the C statistic by at least 0.005. In terms of model fit, the addition of ICAM-1 had the greatest improvement as it came closest to achieving the 0.005 increment of change in the AUC. None of the novel risk factors exhibited a statistically significant NRI. ADMA, interleukin-18, GGT, and lactate exhibited no movement between risk categories, and therefore, an NRI calculation was not made. Adiponectin, leptin, GGT, ferritin, ICAM-1, and complement C3 had statistically significantly improved IDIs.
None of the novel risk factors significantly improved the AUC in the total cohort or nested case-control sample. However, FEV1 did significantly, albeit modestly, improve the NRI in the total cohort. None of the risk factors statistically significantly improved the NRI in the case-control study sample; however, the addition of adiponectin, leptin, GGT, ferritin, ICAM-1, and complement C3 did statistically significantly but moderately improve the IDI. Likewise, in the total cohort, the novel risk factors WBC count, albumin, aPTT, factor VIII, magnesium, heart rate, hip circumference, and the genetic risk score exhibited significant but modest improvements in IDI. These results suggest that of these novel risk factors, only FEV1 may be helpful for type 2 diabetes risk stratification in the ARIC cohort study. Several novel risk factors did modestly improve the IDI, which indicates that the difference in average predicted probabilities between individuals with and without type 2 diabetes significantly increased when these risk factors were added to the basic model; however, critics argue that it is unclear whether a significant IDI indicates that the novel risk factor in the model is clinically useful (28,29).
Despite the fact that many of the novel risk factors are independent risk factors for type 2 diabetes in the total cohort, none of these risk factors appeared to provide additional value to type 2 diabetes risk prediction. Previous studies that have incorporated one or more novel risk factors into a risk prediction model have been limited, and although these analyses may have found increased AUCs with the inclusion of novel risk factors, they are also often single studies in very specific populations (4,30,31). Our own study failed to replicate the contributions of WBC count, heart rate, or alanine aminotransferase to the improvement in the AUC, as found in the aforementioned studies (4,30,31). It is important to note that although novel risk factors may be associated with type 2 diabetes, it does not mean they will contribute to risk prediction, as these are separate issues of etiology and prediction (32). All of the novel risk factors modeled in the total cohort and case-control analyses were significantly associated with type 2 diabetes in ARIC; however, none of them significantly contributed to improved risk prediction when C statistics were calculated with and without the novel risk factors.
It is difficult to improve upon existing risk factors for type 2 diabetes. Specifically, when a single measurement of obesity or glycemia is included in a risk model, the AUCs already range from 0.66–0.77. When obesity and glycemia measures are combined with readily available clinical variables, such as those included in the basic model, the AUC increases greatly, making it difficult to improve the risk prediction (32). Furthermore, the correlation between novel risk factors and traditional risk factors must also be considered, as correlated risk factors provide less independent information about type 2 diabetes risk. We found this to be true in our own analysis, as many statistically significant correlations existed between traditional risk factors and novel risk factors in both the total cohort and the case-control analysis.
Recent advances in the identification of a number of genetic variants associated with type 2 diabetes have generated interest in the clinical utility of combining the loci associated with type 2 diabetes into a genetic risk score, which could be used for risk prediction. Thus far, the use of genetic risk scores in type 2 diabetes risk prediction models prior to this analysis has been limited, often involved a smaller number of genetic variants, and yielded varied results (33).
Our own analysis did not find a statistically significant contribution to the AUC or NRI with the addition of a genetic risk score; however, it did moderately improve the IDI. The incorporation of a genetic risk score into future type 2 diabetes risk prediction models could be more useful, once an ideal set of variants is identified, as genes are not prone to the biological variability or measurement error that often accompanies other risk factors. Further, the genotype does not change over one’s lifetime, and this offers opportunities for earlier screening and identification of individuals at risk (34). In fact, de Miguel-Yanes et al. (35) found that the incorporation of a genetic score into a risk model was actually more beneficial in younger subjects. Identifying individuals at risk earlier in the disease process will allow for interventions that can either reverse the course of the disease or control its accompanying risk factors such as dyslipidemia and hypertension.
Limitations to this study include the absence of an oral glucose tolerance test or hemoglobin A1c test results to classify type 2 diabetes and the use of a single baseline value for the novel risk factors, which does not capture the variation in levels over time for risk factors. Further, not all novel risk factors are included in this analysis. We chose to only include biomarkers that had not previously been included in risk prediction analyses in ARIC and biomarkers that were measured and not self-reported.
Another limitation is the inclusion of only 30 SNPs in the genetic risk score, which account for only a small fraction of the heritability of type 2 diabetes (36). Finally, there were 35 novel risk factors evaluated, resulting in multiple testing that may yield false positives. A strength of this analysis was the availability of a large, population-based cohort of white and African American men and women with follow-up data. Further, there were standardized data collection methods for both predictors and type 2 diabetes outcomes.
In conclusion, our modeling indicates that no novel risk factor contributed significantly to risk prediction, as measured by the AUC. There was a modest improvement in risk classification with the addition of FEV1 and a small improvement in the IDI with the addition of WBC count, aPTT, albumin, factor VIII, magnesium, heart rate, hip circumference, and the genetic risk score in the total cohort and adiponectin, leptin, GGT, ferritin, ICAM-1, and complement C3 in the case-control sample. However, these improvements are small and unlikely to motivate refinement of clinical risk reclassification or discrimination strategies. Further study by prospective, population-based cohort studies is needed to confirm the generalizability of these findings.
The Atherosclerosis Risk in Communities Study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute contracts (HHSN-268201100005C, HHSN-268201100006C, HHSN-268201100007C, HHSN-268201100008C, HHSN-268201100009C, HHSN-268201100010C, HHSN-268201100011C, and HHSN-268201100012C), grants R01-HL-087641, R01-HL-59367, and R01-HL-086694; National Human Genome Research Institute contract U01-HG-004402; and National Institutes of Health (NIH) contract HHSN-268200625226C, with the ARIC carotid MRI examination funded by U01-HL-075572-01. Infrastructure was partly supported by grant UL1-RR-025005, a component of the NIH and NIH Roadmap for Medical Research.
No potential conflicts of interest relevant to this article were reported.
L.A.R. analyzed data and wrote the manuscript. J.S.P. contributed to data analysis and was the primary editor. B.B.D., M.I.S., R.C.H., M.A.P., J.H.Y., and C.M.B. contributed to data collection and reviewed and edited the manuscript. L.A.R. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Parts of this work were presented in poster form at the 45th Annual Society for Epidemiologic Research Meeting, Minneapolis, Minnesota, 27–30 June 2012.
The authors thank the staff and participants of the ARIC Study for important contributions.
This article contains Supplementary Data online at http://care.diabetesjournals.org/lookup/suppl/doi:10.2337/dc12-0609/-/DC1.
- Received March 29, 2012.
- Accepted July 1, 2012.
- © 2013 by the American Diabetes Association.
Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. See http://creativecommons.org/licenses/by-nc-nd/3.0/ for details.