Derivation and Validation of a New Cardiovascular Risk Score for People With Type 2 Diabetes

The New Zealand Diabetes Cohort Study

  1. Paul L. Drury, MA, MB, BCHIR4
  1. 1Department of General Practice and Primary Health Care, School of Population Health, University of Auckland, Auckland, New Zealand;
  2. 2Section of Epidemiology and Biostatistics, School of Population Health, University of Auckland, Auckland, New Zealand;
  3. 3Waitemata District Health Board, Auckland, New Zealand;
  4. 4Auckland Diabetes Centre, Auckland District Health Board, Auckland, New Zealand.
  1. Corresponding author: C. Raina Elley, c.elley{at}


OBJECTIVE To derive a 5-year cardiovascular disease (CVD) risk equation from usual-care data that is appropriate for people with type 2 diabetes from a wide range of ethnic groups, variable glycemic control, and high rates of albuminuria in New Zealand.

RESEARCH DESIGN AND METHODS This prospective open-cohort study used primary-care data from 36,127 people with type 2 diabetes without previous CVD to derive a CVD equation using Cox proportional hazards regression models. Data from 12,626 people from a geographically different area were used for validation. Outcome measure was time to first fatal or nonfatal cardiovascular event, derived from national hospitalization and mortality records. Risk factors were age at diagnosis, diabetes duration, sex, systolic blood pressure, smoking status, total cholesterol–to–HDL ratio, ethnicity, glycated hemoglobin (A1C), and urine albumin-to-creatinine ratio.

RESULTS Baseline median age was 59 years, 51% were women, 55% were of non-European ethnicity, and 33% had micro- or macroalbuminuria. Median follow-up was 3.9 years (141,169 person-years), including 10,030 individuals followed for at least 5 years. At total of 6,479 first cardiovascular events occurred during follow-up. The 5-year observed risk was 20.8% (95% CI 20.3–21.3). Risk increased with each 1% A1C (adjusted hazard ratio 1.06 [95% CI 1.05–1.08]), when macroalbuminuria was present (2.04 [1.89–2.21]), and in Indo-Asians (1.29 [1.14–1.46]) and Maori (1.23 [1.14–1.32]) compared with Europeans. The derived risk equations performed well on the validation cohort compared with other risk equations.

CONCLUSIONS Renal function, ethnicity, and glycemic control contribute significantly to cardiovascular risk prediction. Population-appropriate risk equations can be derived from routinely collected data.

Ethnic and socioeconomic disparities in cardiovascular disease (CVD) outcomes exist around the world. Locally derived or ethnic-specific CVD risk equations to guide management may be appropriate to help redress these disparities. Including glycemic control, albuminuria, current management, and socioeconomic status in risk equations may also improve prediction and outcomes, particularly for people with type 2 diabetes, a group at high risk of CVD (1).

The Framingham equation has been extremely useful for assessing CVD risk for the past 40 years worldwide (2). However, it does not include renal function, albuminuria, or ethnicity, which are often potent predictors of CVD (36). Although it includes diabetes as a dichotomous variable, risk increases continuously with increasing glycemia (7,8). The UK Prospective Diabetes Study (UKPDS) risk equations, also widely used, include glycemia and diabetes duration but not measures of renal function or treatment and only two ethnic categories (9). Several other CVD equations exist, many derived regionally, but few have included measures of glycemia, renal function, and ethnicity together to improve risk prediction (6,1013). The Strong Heart Study equation includes albuminuria but includes diabetes only as a dichotomous variable and is specific to a single ethnicity (14). The DECODE equation did not include renal function or ethnicity, although it provided “multiplying factors” based on nationality (15). The Swedish National Diabetes Register was used to produce a prediction equation for 5-year CVD risk but without renal function or ethnicity (16). Other variations include the Systematic Coronary Risk Evaluation (SCORE) equation, which did not include diabetes, ethnicity, or renal function (17); a stroke prediction equation for Hong Kong Chinese with type 2 diabetes (18); and a “clinical grouping” approach from Norway in which people were placed into broad groups by a count of basic risk factors (one of which was self-reported diabetes) (19).

This study demonstrates how routinely collected data can be used to derive an appropriate risk equation to use when making treatment decisions within a specific population, which may lead to more equitable outcomes. This study aimed to derive a 5-year CVD risk equation for people with type 2 diabetes that included these important prognostic risk factors such as glycemia, albuminuria, and ethnic groups relevant to New Zealand.


The Diabetes Cohort Study (DCS) is a prospective open cohort using routinely collected data from a national primary-care annual review program called “Get Checked,” which commenced in New Zealand in 2000. Details of the program and data collection are published elsewhere (7). Each individual has an encrypted unique identifier that allowed linkage to national hospitalization and mortality databases to identify cardiovascular events between 1988 and 2008. We split the dataset into two cohorts: a derivation cohort from the north of New Zealand and a validation cohort from the south of New Zealand.

Participants were included if they had type 2 diabetes as determined by their primary-care physician, had commenced the “Get Checked” program between 2000 and 2006, and had all risk variables recorded during the first assessment or within 2 years. Participants were excluded if they had been admitted to hospital for CVD prior to their first study assessment, as identified from national hospital admissions data since 1988.

Minimum required risk variables included age, sex, duration of diabetes, smoking status (current, previous, or never smoked), systolic and diastolic blood pressure, fasting total cholesterol and HDL, A1C, urine albumin-to-creatinine ratio, BMI, ethnicity, and social deprivation score (20). Information on blood pressure–lowering and lipid-lowering medications was also collected where available. Ethnicity is self-assigned according to national categories (21). Those used in this analysis were European, Indo-Asian (Indian), East Asian, Maori (the indigenous people of New Zealand), Pacific Islander, and “other,” including Middle Eastern, Latin American/Hispanic, African, and others.

The primary outcome measure for the CVD risk equation was time to first recorded fatal or nonfatal CVD event (ischemic heart disease, cerebrovascular accident/transient ischemic attack, or peripheral arterial disease). Events identified from national hospital and mortality database diagnoses were coded according to the ICD-9 and ICD-10 (online appendix Table 1, available at Participants were followed until first admission, death, or until the censor date of 20 December 2007, whichever came first. To allow comparison with the UKPDS coronary heart disease (CHD) equation, we also derived a CHD equation applying the same outcome definition as the UKPDS (7,22).

Model derivation

Cox proportional hazards regression models were used to estimate the coefficients and hazard ratios associated with the potential risk factors for first CVD event in the derivation dataset (23). The Efron approximation was used to handle ties. The inclusion of variables in the models was determined using both Akaike information criteria to compare the fit of models and significance of the variable when included in the model. For continuous variables, we investigated the nonlinearity of the association between the variable and the outcome using fractional polynomials. The need for transformations to reduce the influence of extreme values was also explored. For a small percentage of missing clinical data, we substituted data from a previous check within 2 years. Sensitivity analyses were carried out using only data without imputed values. Kaplan-Meier curves were used to compare survival functions for participants included in the analysis and those not included due to missing variables. The assumptions of proportional hazards were checked using log-log plots of survival for each category of the ordinal and nominal covariates and by plotting the scaled Schoenfield residuals against time then testing for a nonzero slope (23). Interactions between sex, ethnicity, and other risk variables were checked. We used coefficients from the Cox proportional hazards model as weights for the probability of CVD event in 5 years and the baseline survivor function to obtain the risk equations (11). Analyses were undertaken using STATA 10.0 and SAS 9.2.

The initial equation was derived from a model using clinical and demographic variables known to be predictive of CVD events. A second equation was derived including CVD medications to assess medication status effect on the ability of the equation to predict outcomes. For this, we included those individuals with blood pressure–lowering and lipid-lowering medications recorded. The interactions between the use of medication and blood pressure and lipid profiles, respectively, were tested for inclusion in the models. Models for the prediction of CHD were similarly developed both with and without CVD medications. Within the derivation cohort, we evaluated the fit of the new models using the measure of explained variation for censored survival data (R2) proposed by Royston, and concordance using Harrell's C.

Validation of new equations

We tested the performance of the equations on the southern New Zealand cohort by assessing the calibration and discrimination. Calibration was assessed by comparing the observed number of people with events within prespecified risk groupings with the number predicted by the models (24).

To assess discrimination, the ability of the equation to distinguish between individuals who do or do not have a subsequent CVD event, we calculated the area under the receiving operating curve (ROC) curve (C statistic) (12). We also compared the 5-year risk predictive ability of our equation with that of Framingham and our CHD risk equation with that of the UKPDS equation using ROCs, area under the graph, and calibration plots (24).


Data were collected from 71,570 people with type 2 diabetes between January 2000 and December 2006. Of these, 62,032 (86.7%) had the minimum dataset present from at least one assessment, of whom 48,211 (77.7%) had no previous CVD. An extra 524 people (1.1%) were included in the cohort after inserting previous clinical values for missing data (0.6% of variables). The equation derivation cohort included 36,127 participants from north New Zealand (Fig. 1).

Figure 1

Flow diagram of participants through the study for CVD equation derivation.

Baseline characteristics of participants are presented in Table 1. Data on medication were available for 29,573 (81.9%) subjects, of whom 16,941 (57%) were prescribed blood pressure–lowering medication and 12,233 (42%) were prescribed lipid-lowering medication. Those without medication status recorded did not differ substantially from those who did on any major clinical variable but were more likely to be of Pacific Island ethnicity (28 vs. 18%) and from the lowest socioeconomic quintile (47 vs. 39%).

Table 1

Characteristics at baseline of the derivation cohort

Median follow-up in the derivation cohort was 3.9 years (range 0–8), equivalent to a total of 141,169 person-years, and included 10,030 individuals (28%) who were followed for at least 5 years. There were 6,479 first CVD events during follow-up, with a 5-year observed risk of 20.8% (95% CI 20.3–21.3).

Model derivation

Final variables included in the models were age, sex, duration of known diabetes, systolic blood pressure, smoking status, total cholesterol–to–HDL ratio, ethnicity, A1C, and albumin-to-creatinine ratio. The inclusion of BMI, socioeconomic deprivation index, and diastolic blood pressure did not significantly improve the fit of the model as measured by the Akaike information criteria and were therefore not included in the final models. The assumptions of the proportional hazards were satisfied for the model. Hazard ratios for each contributing variable for the CVD model are presented in Table 2. Those for the other derived equations are available on request. A sensitivity analysis without imputed values did not change results. The CVD and CHD 5-year risk equations and measures of fit are presented in online appendix Table 2. Harrell's C varied from 0.67 to 0.71 and R2 ranged from 0.20 to 0.28. Hazard ratios were similar for men and women, and no significant interactions were found between sex and any risk factor, so separate equations for men and women were not derived. Although significant interaction was found between ethnicity and some risk variables, particularly age and duration of diabetes, the inclusion of ethnicity as a risk variable overcomes this problem to a certain degree, and for clinical reasons we have elected to use one equation. For the models that included medication status, use of blood pressure–lowering medication and its interaction with systolic blood pressure were included in the model. Inclusion of lipid-lowering medication use did not improve the fit of the model.

Table 2

Adjusted hazard ratios for first cardiovascular event

Calibration and discrimination of models using the validation (southern) cohort

The areas under the ROC curves (C statistics) were 0.68 for both equations for CVD risk and 0.69 for both CHD equations (online appendix Fig. 1). The calibration of the CVD risk equation is presented in online appendix Fig. 2. These graphs compare the mean predicted risk with the mean observed risk at 5 years for each decile of predicted risk in order. The differences between predicted and observed risk were small in all deciles of risk, with the predictive model consistently underestimating risk by 1–5%. The main differences between the northern and southern cohorts at baseline were in ethnic composition, socioeconomic deprivation, and age (online appendix Table 3). The CVD incidence rates were very similar: 46 per 1,000 person-years for northern and 43 for southern cohorts (incidence rate ratio 1.07 [95% CI 1.02–1.12]).

Comparison with the Framingham and UKPDS risk equations

The area under the ROC curve indicates that our CVD risk equation compares favorably with the Framingham equation (C statistic: 0.68 [95% CI 0.67–0.70] vs. 0.63 [0.62–0.65], P < 0.0001) for this diabetic population and our CHD risk equation performs better than the UKPDS CHD equation (C statistic: 0.69 [0.67–0.71] vs. 0.63 [0.61–0.65], P < 0.0001) (online appendix Fig. 1). There was little difference in performance between our equations with and without medications included. Compared with observed events, the DCS equation underestimated risks by 1.4–6.3%, while the Framingham equation underestimated risks by 5.5–17.3% in the southern validation cohort (online appendix Fig. 2). The ability of the new risk equation to discriminate CVD risk in this specific population is best illustrated using the multiple clinical scenarios, as compared with predictions using the Framingham and UKPDS equations (Table 3 and online appendix Table 4).

Table 3

DCS absolute 5-year CVD risk estimates of a 50-year-old man (nonsmoker, systolic blood pressure 140 mmHg, total cholesterol–to–HDL ratio 4.5, and diabetes duration 5 years) compared with estimates using the Framingham 5-year CVD risk equation


The DCS equations, which include measures of glycemia, renal function, and ethnicity, used routinely collected data from a diverse primary-care population to allow locally and ethnically relevant CVD risk prediction and management. After controlling for traditional risk factors, being Maori (who suffer the greatest health outcome inequalities in New Zealand) or Indo-Asian, or having micro- or macroalbuminuria (33% of this cohort), substantially increased the risk of a CVD event, and risk increased with rising A1C. Inclusion of lipid-lowering medication status was not significantly predictive of CVD risk, although blood pressure–lowering medication was predictive.

This equation was derived from a large free-living and multicultural population with type 2 diabetes, with large numbers of CVD events compared with previous and more recent studies (25) and validated on a separate cohort. The study is recent (2000–2008) and reflects the population and management changes that have occurred since previous equations were derived, mainly during the 1950s to 1990s.

CVD events and deaths were obtained from national hospital and mortality databases using a unique identifier for each patient with high rates of complete data and linkage. Events were not adjudicated by study investigators, but hospital coding of events has improved significantly over the last decade, and it could be argued that hospitalization codes are a more rigorous way to collect events than searching primary-care records, which may well underestimate the number of events and therefore risk (11,25). This potential undercounting of real events may explain why a recent study concluded that the Framingham and UKPDS equations tend to overestimate CVD risk in people with diabetes (25), whereas we found the opposite, especially in those with other risk factors such as poorly controlled glycemia or renal impairment.

This cohort represents people with type 2 diabetes without previous CVD who are seen routinely in New Zealand primary care. There were high rates of smoking, renal impairment, and poorly controlled glycemia, with many participants from ethnic groups with relatively high CVD rates. This may explain why the CVD rates were higher than those found in previous cohorts such as the Framingham or in selected trial populations such as in UKPDS. Misclassification of baseline CVD status is unlikely to bias our equation, as the same definition of CVD was used to exclude those with previous CVD and to identify new events. Treatment differences are unlikely to explain the high CVD event rates in this cohort, as CVD preventive treatment rates in this cohort were similar to those found internationally, and medications that may be linked to increased CVD, such as thiazolidinediones, are rarely used in New Zealand. The equation tended to slightly underestimate risk in the southern cohort. This may have been due to differences in variables not included in this model, as there are demographic differences between the two regions. The follow-up period was limited with a maximum follow-up of 8 years and median of 3.9 years. However, >10,000 participants were followed for at least 5 years, so the model is likely to be valid for 5-year CVD risk prediction.

The predictive ability of our equations (using C statistic, R2, and Harrell's C) was lower than that reported in initial validation of other equations (6,912). This may be because this was a very diverse population from real clinical practice, because people with type 2 diabetes are a high-risk population where prediction is difficult, or because of the quality of routinely collected data or hospital coding of events. The main advantage of the new equation is the improvement in discrimination and calibration for this specific population, particularly with respect to high-risk ethnic groups and those with renal impairment, when compared with using predictions from the Framingham and UKPDS equations.

Many countries recognize the need to reduce health inequalities as a national health policy priority. Ensuring that ethnic population groups have high-quality care is critical to reducing inequalities. Many countries will have similar at-risk minority groups, often with poor glycemic control and high rates of renal impairment who may have their CVD risk underestimated by current risk equations. It may be appropriate for each country to derive risk equations for their own populations to improve targeted risk management. On the basis of our data, especially Table 3, we do not think it useful to consider all those with diabetes to be at uniform high CVD risk. Within this population with type 2 diabetes, the 5-year risk ranged from less than 3% to well over 30%. Being more discriminating in our assessment and management may also be both more cost-effective and allow aggressively focusing on high-risk people. Furthermore, being able to show patients change in their absolute risk, using an accurate risk equation, provides us with a tool that may engage and motivate patients.

Renal function, glycemic control, and ethnicity are important risk factors and should be included in locally relevant CVD risk equations used to make treatment decisions in people with diabetes. More accurate risk prediction may improve the quality of care received, avoid delays in treatment for those in whom risk was previously underestimated, and may thus help address inequalities in health outcomes.


This study was funded by the New Zealand Health Research Council (04/164R). The funders had no role in the design, data collection, analysis, interpretation of findings, or writing up of results of this study.

No potential conflicts of interest relevant to this article were reported.

This study was approved by the New Zealand Multi-Regional Ethics Committee in 2004 (WGT/04/09/077).

All authors contributed to the design and writing of the manuscript. E.R. was the biostatistician who designed and conducted the statistical analyses of this study. C.R.E. was the principal investigator and oversaw the conduct of the study. T.K. oversaw much of the consultation and data collection from organizations throughout New Zealand and advised on primary care diabetes clinical matters. P.L.D. provided diabetes clinical and epidemiological advice. D.B. provided epidemiological advice and oversaw results and interpretations pertaining to Maori.

We thank the many patients, primary health care organizations, diabetes trusts, and general practices who contributed data for this study. We particularly acknowledge our research partners, Ngati Porou Hauora. We also acknowledge Ngaire Kerse and Bruce Arroll for their assistance and advice; the data management team, Janet Pearson, Simon Moyes, and Roy Lay-Yee; and collaborators in the PREDICT Team, particularly Rod Jackson, Sue Wells, and Jo Broad for methodological advice and contribution to CVD coding definitions. We are grateful to the New Zealand Health Information Service for provision of secondary care and mortality data and to Sandy Dawson from the New Zealand Ministry of Health for his support and encouragement.


  • The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

  • Received August 3, 2009.
  • Accepted March 10, 2010.

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. See for details.


| Table of Contents

This Article

  1. Diabetes Care vol. 33 no. 6 1347-1352
  1. Online Appendix
  2. All Versions of this Article:
    1. dc09-1444v1
    2. 33/6/1347 most recent