Development and Validation of Stroke Risk Equation for Hong Kong Chinese Patients With Type 2 Diabetes

The Hong Kong Diabetes Registry

  1. Xilin Yang, PHD1,
  2. Wing-Yee So, FRCP1,
  3. Alice P.S. Kong, FRCP12,
  4. Chung-Shun Ho, PHD3,
  5. Christopher W.K. Lam, PHD13,
  6. Richard J. Stevens, PHD45,
  7. Ramon R. Lyu, PHD6,
  8. Donald D. Yin, PHD6,
  9. Clive S. Cockram, MD1,
  10. Peter C.Y. Tong, PHD1,
  11. Vivian Wong, MD7 and
  12. Juliana C.N. Chan, MD1
  1. 1Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong, China
  2. 2Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China
  3. 3Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China
  4. 4Oxford Centre for Diabetes, Endocrinology, and Metabolism, University of Oxford, Oxford, U.K.
  5. 5Nuffield Department of Clinical Medicine, University of Oxford, Oxford, U.K.
  6. 6Worldwide Outcomes Research, Merck & Co., Inc., Whitehouse Station, New Jersey
  7. 7Hospital Authority Head Office, Hong Kong, China
  1. Address correspondence and reprint requests to Professor Juliana C.N. Chan, Department of Medicine and Therapeutics, The Chinese University of Hong Kong, The Prince of Wales Hospital, Shatin, NT, Hong Kong SAR, China. E-mail: jchan{at}


OBJECTIVE—We sought to develop stroke risk equations for Chinese patients with type 2 diabetes in Hong Kong.

RESEARCH DESIGN AND METHODS—A total of 7,209 Hong Kong Chinese type 2 diabetic patients without a history of stroke at baseline were analyzed. The data were randomly and evenly divided into the training subsample and the test subsample. In the training subsample, stepwise Cox models were used to develop the risk equation. Validation of the U.K. Prospective Diabetes Study (UKPDS) stroke risk engine and the current stroke equation was performed in the test dataset. The life-table method was used to check calibration, and the area under the receiver operating characteristic curve (aROC) was used to check discrimination.

RESULTS—A total of 372 patients developed incident stroke during a median of 5.37 years (interquartile range 2.88–7.78) of follow-up. Age, A1C, spot urine albumin-to-creatinine ratio (ACR), and history of coronary heart disease (CHD) were independent predictors. The performance of the UKPDS stroke engine was suboptimal in our cohort. The newly developed risk equation defined by these four predictors had adequate performance in the test subsample. The predicted stroke-free probability by the current equation was within the 95% CI of the observed probability. The aROC was 0.77 for predicting stroke within 5 years. The risk score was computed as follows: 0.0634 × age (years) + 0.0897 × A1C + 0.5314 × log10 (ACR) (mg/mmol) + 0.5636 × history of CHD (1 if yes). The 5-year stroke probability can be calculated by: 1 − 0.9707EXP (Risk Score − 4.5674).

CONCLUSIONS—Although the risk equation performed reasonably well in Chinese type 2 diabetic patients, external validation is required in other populations.

Stroke is among the most common causes of death worldwide (1). Chinese individuals have a higher incidence of stroke and related mortality than Caucasians, as shown in the World Health Organization MONICA project (2). Diabetic patients have a two- to fivefold increased risk of stroke, in part due to interactions between multiple risk factors (3). The Framingham Study (4) and U.K. Prospective Diabetes Study (UKPDS) (5) have developed risk equations based on data collected from the Caucasian community and diabetic patients. Although a stroke risk equation has been developed in a small cohort of Chinese men recruited from a workforce (6), there is currently no risk equation applicable to Chinese individuals with diabetes, despite this number being projected to 42.3 million by 2030 (7). In this study, we validate and develop stroke risk equations to predict first stroke in Chinese type 2 diabetic patients based on data from the Hong Kong Diabetes Registry.


Since 1995, all newly referred diabetic patients to the Prince of Wales Hospital in Hong Kong underwent comprehensive assessments of complications and risk factors based on the European DiabCare protocol (7a). Patients with hospital admissions within 6–8 weeks before assessment accounted for <10% of all referrals. None of the analyzed patients had a history of stroke. Patients with type 1 diabetes, defined as presentation with diabetic ketoacidosis, acute symptoms with heavy ketonuria (three or more), or continuous requirement of insulin within 1 year of diagnosis were excluded (8). The study was approved by the Chinese University of Hong Kong Clinical Research Ethics Committee, and written informed consent was obtained from all patients.

Apart from documentation of demographic data and clinical assessment of complications, fasting blood samples were taken for measurement of plasma glucose, A1C, lipid profile (total cholesterol, HDL cholesterol, and triglycerides and calculated LDL cholesterol), and renal and liver functions. A sterile, random, spot urine sample was used to measure the albumin-to-creatinine ratio (ACR). Details of assessment methods, laboratory assays, and definitions have been previously described (9). Renal function was assessed by serum creatinine and estimated glomerular filtration rate (eGFR; expressed in ml/min per 1.73 m2). The latter was calculated using the abbreviated Modification of Diet in Renal Disease study group formula: eGFR = 186 × (SCR × 0.011)−1.154 × (age)−0.203 × (0.742 if female), where SCR is serum creatinine expressed as μmol/l (original mg/dl converted to μmol/l). Peripheral vascular disease was defined by absence of foot pulses, confirmed by an ankle-to-brachial ratio <0.90 on Doppler ultrasound. Visual acuity and fundoscopy through dilated pupils were performed. Retinopathy was defined by typical changes due to diabetes, laser scars, or a history of vitrectomy. Sensory neuropathy was defined as two of three abnormal signs or symptoms: numbness in lower limbs and reduced sensation using monofilament or graduated tuning fork. CHD history was defined as having a history of angina with abnormal electrocardiogram or stress test, myocardial infarction, angina coronary artery bypass graft surgery, angioplasty, or heart failure.

Outcome measures

All clinical end points were censored on 30 July 2005. Details of hospital admissions were retrieved from the Hong Kong hospital authority central computer system, which included admissions to all public hospitals in Hong Kong and accounted for 95% of all hospital bed-days in Hong Kong. First incident stroke was defined by hospital discharge diagnoses by the ICD-9 codes including stroke (codes 430–434 and 436) or deaths from stroke (codes 430–434 and 436–438). Hemorrhagic stroke was defined as having fatal and nonfatal subarachnoid hemorrhage (code 430), intracerebral hemorrhage (code 431), or other/unspecified intracranial hemorrhage (code 432), while all others were classified as ischemic stroke. All diagnoses of stroke were confirmed by the attending physician on discharge based on clinical findings and ascertained by computed tomography of brain in accordance with the clinical guidelines of the hospital authority. Only patients confirmed to have stroke based on these guidelines were included in the analysis. Patients with transient cerebral ischemia (code 435) were not included.

Statistical analysis

Statistical analysis was performed using SAS (version 9.10; SAS Institute, Cary, NC). The data were randomly assigned to two subsamples of roughly equal sizes: the training data (n = 3,652) and test data (n = 3,559). Cox proportional hazard regression with the stepwise algorithm (P < 0.05 for entry and stay) was used to select predictors at baseline for incident stroke. Baseline variables considered for inclusion in the model were age, sex, BMI, waist circumference, current and ex-smoking status, history of coronary heart disease (CHD), sensory neuropathy, retinopathy, peripheral vascular disease, known duration of diabetes, LDL-to-HDL cholesterol ratio, non-HDL cholesterol, total-to-HDL cholesterol ratio, A1C, systolic blood pressure (SBP), ACR, eGFR, white blood cell count, and blood hemoglobin. In developing the current predicting model, only measurements collected at baseline were used.

Based on Cox proportional hazard model, the risk score obtained from the final Cox model to rank stroke risk was: risk score = X1 × β1 + X2 × β2, …, + Xp × βp. The probability of stroke over j years was: stroke probability = 1 − S(j)EXP (Risk Score − Mean of the Risk Score), where X1, X2, …, Xp are baseline predictors and β1, β2, …, βp are, respectively, the estimated coefficients of baseline predictors 1 to p, and S(j) is the survival function over j years when the risk score takes the value of its mean.

Clinical trials have demonstrated that hydroxymethylglutaryl-CoA reductase inhibitors (statins) (10) and blockers of the renin angiotensin aldosterone system (11) reduced the risk of stroke by >20%. In our additional analyses, we included the following treatment variables in the models: lipid-lowering drugs, ACE inhibitor/angiotensin II receptor blocker (ARB), other antihypertensive drugs, oral antidiabetic drugs, and insulin.

Validation of the UKPDS stroke risk engine and the new stroke risk equation was performed in the test subsample. Calibration was checked using the same life- table method as utilized by the original group during the development of the UKPDS stroke risk engine (5). The area under the receiver operating characteristic curve (aROC) was utilized to indicate the discriminative power of the equation (12). For follow-up study, aROC may be influenced by observation time and censoring. Thus, we used the method described by Chambless and Diao (13) to calculate the aROC that takes into account observation time and censoring.


Between 1995 and 2005, 7,920 consecutive patients underwent comprehensive assessment. Patients with type 1 diabetes (n = 332), uncertain type 1 diabetes status (n = 5), non-Chinese or unknown nationality (n = 49), and past history of stroke (n = 325) were excluded. A total of 7,209 Chinese type 2 diabetic patients were included in the final analysis.

Table 1 compares the clinical characteristics of patients with and without incident stroke. In this cohort, the median age (interquartile range) was 57 years (46–67) and median disease duration 5 years (1–11). After a median follow-up duration of 5.37 years (2.88–7.78), 5.16% of patients (n = 372) developed incident stroke. The incident rate of stroke was 9.66 (95% CI 8.69–10.64) per 1,000 person-years. The training subsample and the test subsample had 3,668 (stroke: 5.18% or 190) and 3,541 (stroke: 5.14% or 182) patients, respectively. During the follow-up period, 705 patients died, including 43 deaths due to fatal stroke (among the 372 stroke events).

The hazard ratios (and β coefficients) of the predictors used by the current stroke risk equation and the UKPDS stroke risk engine are listed in Table 2. Of the UKPDS stroke predictors, only age, A1C, and SBP remained significant when reestimated in our test sample, whereas sex, current smoking status, and total-to-HDL cholesterol ratio were not significant. In the stepwise algorithm, log10 ACR and history of CHD at baseline were selected as significant predictors of stroke, in addition to age and A1C, whereas SBP was not selected by the new model. In the additional analysis, drug-use variables were not selected by the stepwise algorithm.

The UKPDS stroke engine overestimated the stroke risk for Hong Kong Chinese patients with type 2 diabetes, i.e., the predicted curve was below the lower 95% CI of the observed curve (Fig. 1). The predicted stroke probabilities (or stroke-free probability) by the new stroke risk equation were within the 95% CI of the observed stroke risk over 8 years of follow-up (Fig. 1). The unadjusted aROC for application of the UKPDS risk engine in this cohort was 0.588 (95% CI 0.549–0.626). The unadjusted aROC for the new risk equation was 0.749 (0.716–0.782). Taking into consideration follow-up time and censoring, the adjusted aROC was 0.776 within 5 years of follow-up (the median follow-up time rounded to an integer).

Using the linear equation described in research design and methods, the risk equation for predicting the first stroke event can be constructed using the parameter estimates of models 1 and 2 listed in Table 2. The risk equation from model 1 is: risk score = 0.0634 × age (years) + 0.0897 × A1C + 0.5314 × log10 (ACR) (mg/mmol) + 0.5636 × history of CHD (1, if yes; 0, otherwise); the 5-year stroke probability = 1 − 0.9707EXP (Risk Score − 4.5674). At the cutoff point of ≥5.3099 for the risk score, corresponding to 0.0606 stroke probability over 5 years of follow-up, the sensitivity was 65.7% and specificity 74.9%. Sensitivities and specificities for other cutoff points are shown in Table 3.

The predictive ability of the current risk equation for hemorrhagic stroke and ischemic stroke was further estimated by using hemorrhagic and ischemic stroke end points in the test subsample. Using the risk equation, the adjusted aROC for hemorrhagic stroke (n = 32) and ischemic stroke (n = 150) were 0.770 and 0.785, respectively, for 5 years of follow-up.


In this prospective study, we have shown that the UKPDS risk engine did not perform well in Chinese type 2 diabetic patients due to different risk profiles. Thus, there is a need to develop a Chinese-relevant risk equation to predict incident stroke using clinical variables that are recommended for periodic assessments. These risk predictors include age, A1C, ACR, and history of CHD. The equation has adequate calibration and discrimination over an 5-year follow-up period in the test subsample with an aROC of 0.78. This is compared with 0.61–0.74 for other risk equations for stroke (6).

Two stroke equations derived from community cohorts achieved higher aROC in test samples (6,14). In the Atherosclerosis Risk in Communities Study, based on 14,685 middle-aged Americans (14), a basic model of stroke equation was derived with risk factors including current smoking, diabetes, SBP, antihypertensive therapy, prior CHD, and left ventricular hypertrophy. The inclusion of 22 additional nontraditional risk factors and markers of subclinical atherosclerotic diseases, such as BMI, waist-to-hip ratio, HDL cholesterol, serum albumin, and von Willebrand factor, modestly improved the aROC from 0.79 to 0.84 in women and from 0.76 to 0.80 in men. However, inclusion of too many variables that have only minor contributions to the total risk may increase the potential of overfitting the risk equation (15). It may also increase the probability of inaccuracy when applied to new datasets. In another study (6) consisting of a small cohort of Chinese male steel workers (n = 4,400), the aROC was 0.78 and 0.82 for ischemic and hemorrhage stroke, respectively. However, the 95% CIs were not reported. Besides, the event rate was relatively low in this community cohort with only 49 ischemic strokes and 33 hemorrhagic strokes in the training subsample and 21 ischemic strokes and 15 hemorrhagic strokes in the validation subsample (6). This is compared with 372 strokes (190 in the training subsample and 182 in the test subsample) in our cohort.

Our risk equation shares similar features with that developed from the UKPDS (5) and a Chinese cohort of steel workers (6), all of which used commonly documented clinical and biochemical parameters. The UKPDS equation consists of seven predictors (age, sex, smoking status, total-to-HDL cholesterol ratio, SBP, A1C, and atrial fibrillation), while the equation derived from the Chinese steel workers has four predictors (age, blood pressure, total cholesterol, and smoking status). Our equation consists of four predictors (age, A1C, ACR, and history of CHD). History of CHD and ACR were included in our equation but not in the above two equations. Of note, some predictors from the UKPDS stroke equation, including smoking status, sex, and total-to-HDL cholesterol ratio, were not significant in the analysis, whereas the effect of SBP was superseded by the inclusion of ACR into the model. The Asian type 2 diabetic population has a very high prevalence of diabetic nephropathy (16). Albuminuria has been demonstrated to be an important cardiovascular risk factor, probably indicating underlying endothelial dysfunction (17). Smoking, abnormal lipids, and hypertension are common risk factors for cardiovascular and renal diseases (18). Thus, the inclusion of albuminuria, which may reflect underlying kidney damage and past history of CHD, in our equation may indicate the collective effects of these risk factors. Although association between serum creatinine and cardiovascular events including stroke has been reported (19,20), renal function was not selected by any of these three equations. In our cohort, patients with stroke had lower eGFR than those without. Only ACR but not eGFR was selected in the stroke equation. This may be due to the overwhelming prognostic value of albuminuria as an expression of endothelial dysfunction and vascular damage.

The use of antihypertensive drugs has been included in some risk equations for general populations (4,14) but not in the stroke equation in diabetic populations (5). The use of antihypertensive drugs, insulin, and ACE inhibitor/ARB was associated with a higher risk of stroke in univariate Cox models in our patients but not selected in the stepwise algorithm. Given the nonrandomized nature of the study, the effects of drugs on stroke were likely to be confounded by other variables. In this respect, the efficacy of ARB on prevention of stroke has been confirmed in randomized clinical trials (16).

Our model has demonstrated the importance and clinical utility of comprehensive assessment as a tool for risk stratification and prediction of clinical events (21). This periodic assessment may serve as a measure of quality assurance, especially in clinical settings where close monitoring of risk factors and metabolic control may not be feasible. Many international and regional audits have confirmed poor adherence by primary care or specialist physicians in carrying out these recommended assessments. In a U.S. population-based survey, only 29% of 4,000 diabetic patients had A1C levels measured in the past year, and 15% of patients did not have lipid testing over the past 2 years (22).

This study has several limitations. First, use of aspirin might increase the risk of hemorrhagic stroke in the Chinese cohort (23). After excluding patients in whom information on aspirin use was not available (enrolled before 1 December 1996), use of aspirin was not included as a predictor for stroke (P = 0.6890). Second, although atrial fibrillation is a risk factor for stroke (24), it has been included in some (25) but not all stroke risk equations (6,14). We have not recorded this parameter in our present dataset. Moons et al. (25) also reported that inclusion of electrocardiographic characteristics including atrial fibrillation did not significantly improve the predicting accuracy of stroke risk equations. Besides, the prevalence of atrial fibrillation in the diabetic population has been reported to be 0.7% in diabetic men and 0.5% in diabetic women in one study (5). Third, in this study, only baseline measurements of risk factors were used to develop and validate the equations. Inclusion of prospective measurements of A1C and blood pressure may further refine the model.

In conclusion, we developed a risk equation to predict stroke in Chinese type 2 diabetic patients. It is noteworthy that risk scores derived from one population may not adequately predict the event risk in other populations that have different incidences of the event of interest. However, the ranking may still be appropriate (26). Thus, further validation is required before our risk equation can be widely used in clinical practice.

Figure 1—

The predicted stroke-free probabilities by the UKPDS stroke engine and the Hong Kong (HK) Chinese stroke risk score, as well as the 95% CIs of the observed stroke-free probability over 8 years of observation in the test dataset.

Table 1—

Baseline clinical and biochemical characteristics of 7,209 Chinese type 2 diabetic patients with no history of stroke divided according to the development of first stroke during a median follow-up of 5.37 years

Table 2—

Parameter estimates of the risk equation for Hong Kong Chinese type 2 diabetic patients and reestimated hazard ratios of the predictors used by the UKPDS stroke engine in the training subsample

Table 3—

Sensitivity, specificity, and positive predictive values in the test data at selected risk scores and their corresponding 5-year stroke probabilities


This study was partially supported by a Merck Sharp & Dohme (MSD) University Grant and the Hong Kong Foundation for Research and Development in Diabetes, established under the auspices of the Chinese University of Hong Kong. R.J.S. was funded by the Health Foundation.

We thank L.Y. Tse, Hong Kong Department of Health, for her assistance in data retrieval and critical comments.


  • A table elsewhere in this issue shows conventional and Système International (SI) units and conversion factors for many substances.

    The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

    • Accepted October 16, 2006.
    • Received June 19, 2006.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 7a.
  9. 8.
  10. 9.
  11. 10.
  12. 11.
  13. 12.
  14. 13.
  15. 14.
  16. 15.
  17. 16.
  18. 17.
  19. 18.
  20. 19.
  21. 20.
  22. 21.
  23. 22.
  24. 23.
  25. 24.
  26. 25.
  27. 26.
| Table of Contents