Prediction of Diabetic Nephropathy Using Urine Proteomic Profiling 10 Years Prior to Development of Nephropathy

  1. Hasan H. Otu, PHD12,
  2. Handan Can, PHD12,
  3. Dimitrios Spentzos, MD1,
  4. Robert G. Nelson, MD, PHD3,
  5. Robert L. Hanson, MD, MPH3,
  6. Helen C. Looker, MBBS3,
  7. William C. Knowler, MD, DRPH3,
  8. Manuel Monroy, MD4,
  9. Towia A. Libermann, PHD1,
  10. S. Ananth Karumanchi, MD5 and
  11. Ravi Thadhani, MD, MPH4
  1. 1Genomics Center and DF/HCC Cancer Proteomics Core, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts
  2. 2Department of Genetics and Bioengineering, Yeditepe University, Istanbul, Turkey
  3. 3Diabetes Epidemiology and Clinical Research Section, Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Diseases, Phoenix, Arizona
  4. 4Department of Medicine and Renal Unit, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
  5. 5Division of Nephrology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
  1. Address correspondence and reprint requests to Ravi Thadhani, MD, MPH, Bullfinch 127, 55 Fruit St., Massachusetts General Hospital, Boston, MA 02114. E-mail: thadhani.r{at}


OBJECTIVE—We examined whether proteomic technologies identify novel urine proteins associated with subsequent development of diabetic nephropathy in subjects with type 2 diabetes before evidence of microalbuminuria.

RESEACH DESIGN AND METHODS—In a nested case-control study of Pima Indians with type 2 diabetes, baseline (serum creatinine <1.2 mg/dl and urine albumin excretion <30 mg/g) and 10-year urine samples were examined. Case subjects (n = 31) developed diabetic nephropathy (urinary albumin–to–creatinine ratio >300 mg/g) over 10 years. Control subjects (n = 31) were matched to case subjects (1:1) according to diabetes duration, age, sex, and BMI but remained normoalbuminuric (albumin–to–creatinine ratio <30 mg/g) over the same 10 years. Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) was performed on baseline urine samples, and training (14 cases:14 controls) and validation (17:17) sets were tested.

RESULTS—At baseline, A1C levels differed between case and control subjects. SELDI-TOF MS detected 714 unique urine protein peaks. Of these, a 12-peak proteomic signature correctly predicted 89% of cases of diabetic nepropathy (93% sensitivity, 86% specificity) in the training set. Applying this same signature to the independent validation set yielded an accuracy rate of 74% (71% sensitivity, 76% specificity). In multivariate analyses, the 12-peak signature was independently associated with subsequent diabetic nephropathy when applied to the validation set (odds ratio [OR] 7.9 [95% CI 1.5–43.5], P = 0.017) and the entire dataset (14.5 [3.7–55.6], P = 0.001), and A1C levels were no longer significant.

CONCLUSIONS—Urine proteomic profiling identifies normoalbuminuric subjects with type 2 diabetes who subsequently develop diabetic nephropathy. Further studies are needed to characterize the specific proteins involved in this early prediction.

Diabetic nephropathy from type 2 diabetes is the most common cause of end-stage renal disease in the U.S. (1); however, less than half of all subjects with type 2 diabetes develop diabetic nephropathy. Traditionally, incipient nephropathy is defined by the appearance of microalbuminuria (urine albumin excretion 30–300 mg/24 h), which can progress to macroalbuminuria (>300 mg/24 h) and subsequently to kidney failure (2). The presence of microalbuminuria, however, does not correlate well with underlying glomerular damage, since diabetic subjects with microalbuminuria display tremendous heterogeneity when concomitant biopsies are examined (38). Furthermore, in type 2 diabetic subjects, the presence of microalbuminuria is often a better predictor of cardiovascular disease than of diabetic nephropathy (9).

Glomerular and tubular damage resulting from type 2 diabetes occurs over several years, and it is possible that the excretions of glomerular and tubular proteins antedate the development of macroalbuminuria and perhaps even the development of microalbuminuria. The advent of novel, highly sensitive technologies such as proteomic profiling may identify urinary proteins associated with development of diabetic nephropathy well before any clinically identifiable alteration in kidney function or urine albumin excretion occurs. To test this hypothesis, we compared urinary proteomic profiles among Pima Indians with type 2 diabetes and normal urinary albumin excretion, who were followed for 10 years, for the development of diabetic nephropathy.


Pima Indians and the closely related Tohono O’odham (Papago) Indians, who live in the Gila River Indian Community in central Arizona, participate in a comprehensive longitudinal diabetes study (10). Since 1965, each member of the population aged ≥5 years is invited to have a research examination approximately every 2 years. These examinations include measurements of venous plasma glucose, obtained 2 h after a 75-g oral glucose load, and an assessment of various complications of diabetes. Diabetes is as diagnosed by World Health Organization criteria (11), and the date of diagnosis is determined from these research examinations or from review of clinical records if diabetes is diagnosed between research examinations in the course of routine medical care. A urine specimen is collected at each examination and is assayed for albumin concentration with a nephelometric immunoassay using a monospecific antiserum to human albumin (12) and for creatinine concentration using a modification of the Jaffe method (13). Albumin excretion is expressed as the ratio of urinary albumin to urinary creatinine (in milligrams per gram) from a single untimed urine specimen.

Urine samples were collected at baseline and 10 years later in 31 case subjects and 31 contemporaneous control subjects matched for age (±5 years), sex, duration of diabetes (±5 years), and BMI (±5 kg/m2). The two populations were defined as follows.

Case subjects

Case subjects included type 2 diabetic Pima Indians who were normoalbuminuric (albumin-to-creatinine ratio <30 mg/g), had a normal serum creatinine concentration (≤1.2 mg/dl) at baseline, and progressed to diabetic nephropathy within 10 years.

Control subjects

Control subjects included type 2 diabetic Pima Indians who were also normoalbuminuric and had a normal serum creatinine concentration (≤1.2 mg/dl) at baseline but remained normoalbuminuric after 10 years.

Proteomic profiling

Proteomic profiling using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) (Ciphergen, Fremont, CA) was performed on baseline urine samples collected and stored at −80°C. Protease inhibitors were not added to urine samples at the time of collection. SELDI-TOF MS was carried out in duplicate on Ciphergen ProteinChips to capture subsets of proteins based on specific characteristics including affinity, charge, and hydrophobicity. The methods involve using an optimized, fully automated protocol on a liquid-handling robot (Biomek FX; Beckman Coulter, Chaska, MN) as previously described (14).

Cationic exchange chromatography chip.

Weak cationic exchange chromatography protein arrays (CM10 ProteinChip arrays; Ciphergen) were pretreated with 10 mmol/l HCl for 5 min and then rinsed with high-performance liquid chromatography–grade water. Subsequently, the arrays were loaded onto a 192-well bioprocessor and equilibrated with 20 mmol/l ammonium acetate/0.1% Triton X-100 (Sigma), pH 6.0. Ten microliters of cell lysate and 50 μl of 20 mmol/l ammonium acetate/0.1% Triton X-100 were dispensed onto each array spot and incubated for 1 h. The incubation comprised 60 cycles of pipetting the sample mixture up and down for 30 s. Array spots were washed three times for 5 min with 75 μl of 20 mmol/l ammonium acetate/0.1% Triton X-100 and once for 5 min with 75 μl water.

Immobilized metal affinity chromatography chip.

Immobilized metal affinity capture arrays (IMAC30 ProteinChip arrays; Ciphergen) were incubated with 100 mmol/l CuSO4 for 25 min and loaded onto a 192-well bioprocessor. Subsequently, the arrays were equilibrated with 50 mmol/l NaCl and 100 mmol/l NaH2PO4, pH 7.0. Ten microliters of cell lysate and 40 μl of 50 mmol/l NaCl and 100 mmol/l NaH2PO4, pH 7.0, were dispensed onto each array spot and incubated for 1 h. Array spots were washed three times for 5 min with 75 μl of 500 mmol/l NaCl and 100 mmol/l NaH2PO4, pH 7.0, to remove nonspecifically bound proteins and then washed for 5 min with 75 μl water.

Application of matrix molecule.

Sinapinic acid (Fluka, Buchs, Switzerland), the matrix molecule, was prepared as a saturated solution in 50% acetonitrile/0.5% trifluoroacetic acid and then diluted 1:1 in 50% acetonitrile/0.5% trifluoroacetic acid. After air-drying the arrays, twice with 1 μl and twice with 0.75 μl sinapinic acid, they were dispensed to each spot of the hydrophobic, cationic exchange, and IMAC (immobilized metal ion affinity chromatography) arrays, respectively, again using the Biomek FX equipped with a 96-channel 200-μl head. The arrays were air-dried again and immediately analyzed.

Detection of protein peaks.

Individual protein peaks, which represent polypeptides of the same or similar molecular weight, were detected using the Ciphergen Biomarker Wizard software. To identify distinct and significant peaks, a signal-to-noise ratio cutoff of 2 was required, which selects only peaks with signal levels significantly above the calculated background noise (14). Urine samples were interrogated for the full range of protein peaks with molecular mass between 2,000 and 40,000 Da. The urine protein peak data were normalized using the total ion current method as previously described (14). Following the manufacturer’s specifications, the normalization step was corrected for the baseline by excluding noise from the matrix molecule between 0 and 2,000 Da. The average intra-assay coefficient of variation was ∼20%, which is within the range for surface-enhanced laser desorption/ionization studies. All analyses were conducted with and without normalization for urine creatinine concentrations.

Dataset split.

A predictive peak signature was defined as a subset of measured peaks that could be used to predict whether a sample would develop diabetic nephropathy based on the sample’s baseline urine protein profile. To identify a predictive peak signature that could be tested on an independent set for its accuracy, subjects were randomly divided into a training set and a validation set. The training set was used to ascertain the predictive signature, which was then applied to the independent validation set that had not been used in the initial identification of the predictive signature. The training set consisted of 14 case samples and matched controls, and the validation set consisted of 17 case samples and matched controls. The mean ± SD duration from documentation of normoalbuminuria to evidence of overt nephropathy for the case samples in the training and validation sets were similar (119.64 ± 7.29 vs. 120.88 ± 5.82 months, P > 0.05).

Class prediction.

A set of descriptive peaks on the training set were identified using Student’s t tests and a threshold of P < 0.05. The descriptive peaks were refined using the accuracy of its subsets as predictor peaks on the training set. The best-performing (with the highest leave-one-out accuracy) subset of the descriptive peaks was chosen as the predictive profile and was subsequently applied to the independent validation set. Class prediction was examined using the weighted voting algorithm, a sample was left out, and a predictor set of peaks that distinguished the two groups was built and used to predict the class of the sample left out (15). The procedure was cycled through all of the samples individually. The accuracy of the predictor was calculated by counting the total number of correctly predicted samples left out. The P value for the predictor accuracy was calculated using Fisher’s exact test, and multivariate analysis to control for confounding was carried out using binary logistic regression with categorical or continuous covariates, as appropriate.


A hierarchical clustering technique was used to construct an unweighted pair group method with arithmetic-mean tree using Pearson’s correlation as the metric of similarity (16). This tree represents the similarity between samples based on the proteome profile observed on the chips for the predictive peak set.


Demographic characteristics at baseline and follow-up

Baseline and follow-up characteristics of the 31 case and 31 control subjects are shown in Table 1. At baseline, the two groups were similar with respect to most characteristics (age, sex, blood pressure, serum creatinine, and urine albumin–to–creatinine ratios) except for A1C levels, which tended to be higher in case subjects. Furthermore, at baseline, five control and four case subjects were taking some form of antihypertensive medication, and eight control and four case subjects had documented evidence of nephropathy. At follow-up, case subjects had significantly higher blood pressures, A1C levels, and, as expected, urine albumin-to-creatinine measures.

Protein profiling results

Mass spectrometric analysis of all 62 samples by SELDI-TOF MS detected 714 unique protein peaks (337 on the CM10 chip and 377 on the IMAC30 chip) in urine samples. The intensity for each of the 714 peaks on all 62 samples was analyzed based on the area under the spectra of the interrogated peak.

Training set

Using the prespecified threshold of signal-to-noise ratio of 2 and P < 0.05, 28 unique peaks differentiated 14 case samples from their respective 14 matched controls in the training set. These peaks were further refined using hierarchical clustering into a 12-peak predictive signature based on their prediction accuracy in the training set. This 12-peak signature displayed 93% sensitivity, 86% specificity, and 89% leave-one-out cross-validation accuracy (25 out of 28 predicted accurately, P < 0.001) for the development of diabetic nephropathy. Normalizing the protein signature results for urine creatinine concentrations slightly improved the accuracy from 89 to 93%. Peak intensity values along with sample parameters and prediction results can be found in online supplementary data at

Hierarchical clustering of the samples in the training set using the 12-peak signature is shown in Fig. 1. Case and control samples are represented in columns, and each row in the colorgram represents a peak in the protein signature. Peak intensity values mapped to the [−2,2] interval are color coded, with red and green indicating an increase and decrease in peptide’s abundance, respectively. Peaks are labeled to denote the chip surface, and the molecular weight corresponding to the size of each peptide is identified. For example, peak label CM3807_04 denotes a peptide detected on the CM10 array chip at a molecular weight of 3,807.04 Da. In Fig. 2, tracings from the SELDI-TOF spectra for one representative peak (CM3807_04) in 10 samples (5 control and 5 case) from the training group are highlighted. As suggested by the colorgram in Fig. 1, this peptide is elevated in case but not in control samples.

Validation set

The 12-peak signature was tested against samples in the validation set, consisting of 17 case samples and their respective matched controls. The overall accuracy was 74% (25 of 34 correctly predicted, P < 0.01), with a sensitivity of 71% and a specificity of 76%.

Multivariate analysis

The distribution of factors known to be associated with diabetic nephropathy between case and control subjects was examined in detail. Most characteristics, including blood pressure and blood pressure medication use, did not differ at baseline between the two groups. Alternatively, case subjects demonstrated higher A1C levels compared with control subjects (Table 1). In a multivariate binary logistic regression model adjusting for baseline A1C, the surface-enhanced laser desorption/ionization 12-peak signature was independently predictive of diabetic nephropathy in the validation set (odds ratio [OR] 7.9 [95% CI 1.5–43.5], P = 0.017), as well as when all subjects were combined (14.5 [3.7–55.6], P = 0.001), and in both analyses A1C was no longer significantly associated with subsequent diabetic nephropathy.


In this well-characterized, nested case-control study of Pima Indians with type 2 diabetes, a 12-peak urine protein signature distinguished patients who went on to develop diabetic nephropathy from those who did not within a 10-year period. Importantly, the urine protemic signature we identified was obtained from urine specimens collected when subjects were normoalbuminuric and had normal renal function and antedated the development of diabetic nephropathy by ∼10 years. Furthermore, this protein signature appeared to be independently associated with the development of diabetic nephropathy even after accounting for potential confounders, the most evident being A1C level. This 12-peak protein signature, although internally valid for this unique subset of Pima Indians, needs to be further validated in a larger group of patients with type 2 diabetes to assess its generalizability.

As a result of studies suggesting that early intervention delays the progression of diabetic nephropathy (17), practice guidelines from the American Diabetes Association (9) currently recommend routine screening for albuminuria in all newly diagnosed patients with type 2 diabetes. Nevertheless, there is an unmet need for new biomarkers due to the limitations of albuminuria to predict the development of incipient nephropathy. Proteomic approaches have the potential to identify novel low–molecular weight biomarkers in an unbiased way, since the entire proteome can be evaluated. Indeed, the peptides or proteins we identified may represent novel proteins in the prediction of diabetic nephropathy or, alternatively, may represent fragments of known proteins including albumin (18,19). Regardless of their origin, however, their presence appeared to predict subsequent development of diabetic nephropathy before the threshold of microalbuminuria was reached. Finally, given the complex biology of diabetic nephropathy, the combination of several markers in a proteomic signature is likely to be more predictive than individual biomarkers (20,21).

The primary limitation of proteomics approaches, in general, and SELDI-TOF MS, in particular, is the dynamic range of proteins within the proteome that can be interrogated. By using whole urine or serum, low–molecular weight proteins (e.g., <20–30 kDa) present in significant abundance in complex fluids can be detected, whereas other potentially interesting biomarkers may be beyond the limit of detection (20,22). Despite these limitations, SELDI-TOF MS has been successfully used to generate disease-specific protein profiles in large-scale studies. With such profiles, six independent research groups were able to distinguish ovarian, prostate, breast, and hepatocellular cancer patients from healthy individuals with high sensitivity and specificity (2327). The use of proteomic technology in kidney transplantation revealed urinary biomarker profiles whose detection distinguished kidney transplant patients with no rejection from those with acute rejection (27,28). These results suggest that protein profiles may become a new, powerful high-throughput tool for the early detection of disease states.

This study has potential limitations that should be acknowledged. Characteristics linked with diabetic nephropathy among Pima Indians with type 2 diabetes include duration of diabetes, BMI, blood pressure, and glycemia control (29,30). Although we matched most of these baseline characteristics, we could not match for A1C levels and retain a sufficient sample size. Thus, we adjusted for A1C in the multivariate analyses and found that the 12-peak signature remained independently associated with diabetic nephropathy, while baseline A1C lost its significance. While this may be due to reduced power, it also raises the possibility that the proteomic signature was linked to both glucose control and risk of diabetic nephropathy. Although we uniformly performed proteomic analyses on baseline samples, we could not test follow-up samples because community members were not required to have extra samples collected at each 2-year interval after their baseline visit. Nevertheless, each had follow-up urine specimens examined for albumin and creatinine to verify nephropathy status. Since we did not test multiple samples, the degree of stability of proteomic profiles during the storage period is unclear. However, given the similarity of collection and storage for all samples, marked instability or urine protein variations would have led to random misclassification and reduced our ability to identify important predictive protein signatures. In light of our positive findings, we believe protein instability or urine protein variations were not significant limitations. Finally, survival bias may have affected these results because all subjects were required to have baseline and follow-up samples in order to be included in this study.

In conclusion, we applied a high-throughput proteomic approach to the evaluation of urine samples from type 2 diabetic patients and identified a protein profile that accurately predicted nephropathy in advance of an increase in albuminuria. These results warrant further studies to determine the applicability of this approach within other populations, and, more specifically, within other larger cohorts of diabetic patients with prospectively collected samples. Finally, further characterization and evaluation of the proteins in the biomarker profile are needed to demonstrate whether they are biologically important in the development of nephropathy.

Figure 1—

Hierarchical clustering of the case (N) and control (C) samples in the training set using the 12-peak signature. Rows represent individual peaks in the intensity values, which are normalized to [−2,2] as shown in the scale at the bottom. Peak labels represent the chip on which the peak was detected (IM for IMAC30, CM for CM10) followed by the molecular weight for the detected peak. Red denotes an elevation while green denotes a decrease in expression.

Figure 2—

Trace view for one representative peak CM3807_04 from the 12-peak signature. The detection level for the peak in five case (N) and five control (C) subjects in the training set is shown. Case subjects demonstrate higher peaks than control subjects in accordance with the heat map shown in Fig. 1.

Table 1—

Clinical characteristics of the subjects at baseline and follow-up


R.T. is supported, in part, by grant DK 068465.

The authors thank Meghan Wells for technical support in the SELDI-TOF MS phase of the study.


  • A table elsewhere in this issue shows conventional and Système International (SI) units and conversion factors for many substances.

    The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C Section 1734 solely to indicate this fact.

    • Accepted December 1, 2006.
    • Received August 4, 2006.


| Table of Contents