Comparison of Accuracy Measures of Two Screening Tests for Gestational Diabetes Mellitus

  1. Marsha van Leeuwen, MD1,
  2. Egbert J.K. Zweers, MD23,
  3. Brent C. Opmeer, PHD4,
  4. Evert van Ballegooie, MD, PHD56,
  5. Henk G. ter Brugge, MD7,
  6. Harold W. de Valk, MD, PHD2,
  7. Ben W.J. Mol, MD, PHD89 and
  8. Gerard H.A. Visser, MD, PHD8
  1. 1Department of Obstetrics and Gynaecology, Academic Medical Centre, Amsterdam, the Netherlands
  2. 2Department of Internal Medicine, University Medical Centre, Utrecht, the Netherlands
  3. 3Department of Internal Medicine, Bronovo Hospital, the Hague, the Netherlands
  4. 4Department of Clinical Epidemiology and Biostatistics, Academic Medical Centre, Amsterdam, the Netherlands
  5. 5Department of Internal Medicine, Isala Clinics, Zwolle, the Netherlands
  6. 6Department of Internal Medicine, Bethesda Hospital, Hoogeveen, the Netherlands
  7. 7Department of Obstetrics and Gynaecology, Isala Clinics, Zwolle, the Netherlands
  8. 8Department of Perinatology and Gynaecology, University Medical Center, Utrecht, the Netherlands
  9. 9Department of Obstetrics and Gynaecology, Máxima Medical Centre, Veldhoven, the Netherlands
  1. Address correspondence and reprint requests to Marsha van Leeuwen, MD, Academic Medical Centre, Department of Obstetrics and Gynaecology, Rm. H4-255, Meibergdreef 9, 1105 AZ Amsterdam, Netherlands. E-mail: marsha.vanleeuwen{at}amc.uva.nl

Abstract

OBJECTIVE— To compare the accuracy measures of the random glucose test and the 50-g glucose challenge test as screening tests for gestational diabetes mellitus (GDM).

RESEARCH DESIGN AND METHODS— In this prospective cohort study, pregnant women without preexisting diabetes in two perinatal centers in the Netherlands underwent a random glucose test and a 50-g glucose challenge test between 24 and 28 weeks of gestation. If one of the screening tests exceeded predefined threshold values, the 75-g oral glucose tolerance test (OGTT) was performed within 1 week. Furthermore, the OGTT was performed in a random sample of women in whom both screening tests were normal. GDM was considered present when the OGTT (reference test) exceeded predefined threshold values. Receiver operating characteristic (ROC) analysis was used to evaluate the performance of the two screening tests. The results were corrected for verification bias.

RESULTS— We included 1,301 women. The OGTT was performed in 322 women. After correction for verification bias, the random glucose test showed an area under the ROC curve of 0.69 (95% CI 0.61–0.78), whereas the glucose challenge test had an area under the curve of 0.88 (0.83–0.93). There was a significant difference in area under the curve of the two tests of 0.19 (0.11–0.27) in favor of the 50-g glucose challenge test.

CONCLUSIONS— In screening for GDM, the 50-g glucose challenge test is more useful than the random glucose test.

Gestational diabetes mellitus (GDM) is estimated to occur in 2–9% of all pregnancies (15). It is defined as carbohydrate intolerance with onset or first recognition during pregnancy and is associated with increased rates of adverse pregnancy outcomes, such as macrosomia; shoulder dystocia; birth-related trauma, such as fractures and nerve palsies; neonatal hypoglycemia; and jaundice. In addition, women with GDM are at substantially higher risk to develop diabetes in later life (1,68). Results from a randomized controlled trial show that treatment of GDM by means of dietary advice, blood glucose monitoring, and insulin therapy, if required, reduces the rate of serious perinatal complications without increasing the rate of caesarean delivery (1). Based on these results, identification through screening and subsequent treatment of women with GDM appears beneficial. However, consensus on the optimal policy for screening is lacking. The American Diabetes Association recommends screening based on risk factors for GDM (age >25 years, obese, close relative with diabetes, history of GDM or a previous macrosomic infant, or specific ethnicity) followed by the 50-g 1-h oral glucose challenge test as a screening test (911). Other methods of screening that are regularly used are (repeated) random glucose testing and fasting glucose measurement. It is indefinite which test is the most accurate in testing women for GDM.

The diversity in screening methods may result in unidentified cases of GDM and preventable neonatal and maternal morbidity. Establishment of an optimal, evidence-based screening policy to detect and treat GDM in a timely fashion could contribute to a reduction of perinatal complications. Two regularly used screening tests in the Netherlands are the random glucose test and the 50-g glucose challenge test. The objective of the present study was to compare these two tests as screening tests for GDM as a first step in determining optimal screening policy in GDM.

RESEARCH DESIGN AND METHODS—

In a prospective cohort study, all pregnant women attending the outpatient obstetric departments at the University Medical Centre, Utrecht, and the Isala Clinics, Zwolle, in the Netherlands during a 2-year study period were invited to participate. Women known to have preexisting diabetes were excluded from the study, as well as those who had not reported for prenatal care in one of two participating hospitals before 24 weeks of gestation. Only women who delivered after 28 weeks of gestation were included in the analysis.

Data

At intake, the following information was obtained: obstetric history, family history of diabetes, ethnicity (categorized as Caucasian or non-Caucasian), height, self-reported weight (before pregnancy), age, and smoking habits (categorized as smoking or nonsmoking). BMI was calculated as weight in kilograms divided by the square of height in meters. The following data regarding pregnancy and outcome were recorded after delivery: weight gain during pregnancy, treatment with diet or insulin, duration of pregnancy in days, birth weight of the neonate in grams, Apgar score after 1 and 5 min, and arterial and venous pH from the umbilical cord.

In all women, the random glucose test was performed at intake (±12 weeks) and between the 24th and 28th week of gestation. If the random plasma glucose measured between 24 and 28 weeks of gestation was ≥6.8 mmol/l, the random glucose test was considered abnormal. If random plasma glucose measurement was not performed between the 24th and 28th week, a random plasma glucose at intake ≥6.8 mmol/l was considered indicative for GDM.

A 50-g oral glucose challenge test was performed between the 24th and 28th week of gestation. The test was performed irrespective of time of the day and of the last meal. Plasma glucose was measured 1 h after administration of a solution containing 50 g of glucose. The predefined cutoff value for an abnormal test result was a 1-h plasma glucose value of 7.8 mmol/l.

If either the random glucose test or the 50-g oral glucose challenge test exceeded the predefined threshold value, a 2-h 75-g oral glucose tolerance test (OGTT) was performed within 1 week to confirm or rule out the presence of GDM (reference test). The OGTT was performed in the morning after a 12-h overnight fast and 3 days of minimal 150- to 200-g carbohydrate diet. Plasma glucose was determined before and 2 h after administration of a 75-g glucose-containing solution. GDM was considered present if venous plasma glucose equaled or exceeded the threshold values according to World Health Organization criteria (<7.8 mmol/l after 12-h overnight fast and ≥7.8 mmol/l at 2 h after administration of a 75-g glucose-containing solution). These criteria were also applied in the randomized controlled trial in which treatment of GDM was considered beneficial (1). Venous plasma glucose concentration in all tests was evaluated via glucose oxidase method (Vitros; Ortho-Clinical Diagnostics, Amersham, U.K.) in the two perinatal centers.

Verification bias

When a screening test is evaluated against a reference test, ideally all participating patients should undergo both the screening and the reference test. However, in practice, the reference test is seldom performed in all patients, as this test is often more invasive or expensive. If only patients with verified screening test results are used to assess the performance of the screening test, calculated accuracy measures become biased because patients with verified disease status are often only patients with an abnormal screening test result, and, therefore, they do not represent a random sample of the population in which the screening test is used. The bias that occurs is called (partial) verification bias (12).

As in the present study, the reference test was, according to the predefined protocol, not performed in all patients. We used the following procedure to correct for verification bias. We performed the OGTT (reference test) in an arbitrary subset of consecutive patients with two negative screening test results to determine the extent to which cases of GDM were missed by the screening tests. Subsequently, we estimated OGTT measurements in women who were not subjected to an OGTT based on results of the random test and the 50-g glucose screening test as well as on patient characteristics using multiple logistic regression analysis. In other words, if the result of the OGTT was missing, OGTT values were estimated with multiple regression analysis, using the results of the two screening tests and available patient characteristics. This procedure to handle missing data is called imputation and is a commonly used adequate technique to correct for verification bias (13,14). By using multiple imputation instead of single imputation (i.e., performing the imputation procedure multiple times instead of just once), uncertainty in the imputed values is reflected by the variation in imputed values across multiple imputed datasets and, thus, by appropriately larger SEs (15). The multiple imputation procedure was also used to impute incidental missing data on patient characteristics.

Statistical analysis

The distribution of continuous variables is reported as means ± SD. We constructed two-by-two tables for abnormal and normal test results on the random glucose test and the 50-g glucose screening test against the OGTT. These tables reflect true-positive, false-positive, true-negative, or false-negative test results for both the random glucose test and the 50-g glucose challenge test. Diagnostic accuracy (sensitivity, specificity, predictive values, and likelihood ratios) and 95% CIs were calculated. Receiver operating characteristic (ROC) analysis was used to evaluate the discriminatory power of the two screening tests. Data were analyzed using SPSS 12.0.1 (SPSS, Chicago, IL) and SAS 9.1.3.

RESULTS—

We included 1,305 women. Four women were excluded from analysis because they delivered before 28 weeks of gestation. Data from 1,301 women were used for further analysis. Patient characteristics are presented in Table 1. Thereby, the distribution of patient characteristics within the classification groups of the reference test (OGTT) can be compared.

Figure 1 displays the flow of patients in our study based on the results of the subsequent diagnostic tests. Of all 1,301 women, at least one test result of the random glucose test was obtained. The random glucose test was performed at intake and between the 24th and the 28th week of gestation in 1,169 (89.9%) and 1,295 (99.5%) of the 1,301 women, respectively. We used the results of the random glucose test obtained at intake for the six women (0.5%) in whom the random glucose measurement was not performed between the 24th and the 28th week of gestation. None of these six women had a random glucose test result >6.8 mmol/l. The 50-g oral glucose challenge test was performed in 1,281 women (98.5%).

There were 37 of 1,301 women (2.8%) who had an abnormal random glucose test, whereas 167 of 1,281 women (13.0%) had an abnormal 50-g glucose challenge test. There were 184 women (14.1%) with at least one abnormal test result (random glucose test or 50-g glucose challenge test or both). In 20 women (1.5%), both tests results were suspect for GDM.

The OGTT was performed in 322 women (24.8%). This included 146 of 184 women (79.3%) with an abnormal screening test result and a subgroup of 176 women with two negative screening tests (Fig. 1). Initially, GDM was diagnosed in 46 women. After correction for verification bias, 48 women were diagnosed with GDM (3.7%).

We used multiple imputation of the OGTT value for every patient in whom the OGTT was not performed. This would have been an adequate procedure if the chance of verification of a screening test result depended solely on the result of the screening test. However, we calculated that the chance of verification was not completely independent of factors other than the results of the screening tests. In general, women with a history of GDM or perinatal death, increased BMI, and women from the hospital in Zwolle were more likely to be verified, independent of the results of their screening tests. Due to this nonrandom verification, there was a high prevalence of GDM in women with two negative screening tests who underwent an OGTT. As a result, the prevalence of GDM in the imputed dataset became unrealistically high (up to 15%). To obtain imputed data that are in line with the incidence of GDM in the Netherlands (estimated to be ∼2–4%), we adjusted the imputation procedure by applying the following additional criterion to limit the number of cases classified as having GDM. Based on the same covariates (screening tests and patient characteristics), multiple imputation was repeated 100 times and unverified women were only classified as having GDM if they had consistently imputed OGTT values that were indicative for GDM (>75%). After this adjusted multiple imputation procedure, the prevalence of GDM in our sample was 3.7%. Only two unverified women were classified as having GDM, whereas in all other women that were unverified no GDM was assumed.

Table 2 displays results of the comparison of the two screening tests in terms of accuracy measures calculated after correction for verification bias. Comparison of accuracy measures after correction for verification bias resulted in an almost five-times-higher sensitivity in favor of the 50-g glucose challenge test compared with the random glucose test (70.2% [95% CI 57.1–83.3] vs. 14.6% [4.6–24.6]). The random glucose test had less false-positive test results and was therefore more specific (97.6% [96.6–98.5] vs. 89.1% [87.4–90.9]). Positive predictive values for both tests were comparable, as were the negative predictive values. The likelihood ratio of an abnormal test result was larger for the 50-g glucose challenge test than for the random glucose test. The likelihood ratio of a normal test was smaller for the 50-g glucose challenge test. The area under the ROC curve was larger for the 50-g glucose challenge test (0.88 [0.83–0.93]) than for the random glucose test (0.69 [0.61–0.78]). There was a significant difference in the areas under the curve of the two tests of 0.19 (0.11–0.27).

CONCLUSIONS—

Evidence for screening for GDM is often inconsistent and difficult to interpret due to various screening methods and thresholds applied internationally. An evidence-based policy could increase the number of identified women with GDM and therefore reduce the number of neonatal and maternal complications by providing adequate monitoring and treatment for these women. For this purpose, the present study compared the random glucose test and the 50-g glucose challenge test as screening tests for GDM. The area under the curve was larger for the 50-g glucose challenge test, indicating that the 50-g glucose challenge test was a better predictor for GDM than the random glucose test.

A potential weakness in the present study is the number of missing reference tests, due to which verification bias occurred. Because verification was apparently not performed at random, characteristics other than the screening test results influenced the chance of verification. An intuitive and straightforward procedure to correct for verification bias would be to calculate the ratio of diseased to nondiseased from the results of the verified patients stratified by screening test results and to extrapolate this ratio to the unverified patients (12,16). However, this mathematical correction can only be applied if verification of patients is performed completely at random or, in other words, if the chance of verification is truly independent of other factors such as, for example, patient characteristics. In addition, this results in an adjustment at the sample level. As for individual unverified patients, the disease status according to the reference test remains unknown. To correct for verification at the individual level, accounting for factors that influence the chance of verification, imputation techniques can be used to estimate disease status accounting for these factors (17).

There are several strategies to deal with incomplete data, also within the context of partial verification (17). As in our study, various imputation strategies consistently lead to a considerable higher number of cases, consequently implying unrealistically high prevalence rates. We therefore had to apply an additional criterion to limit the number of cases classified as having GDM by means of repeating the multiple imputation procedure for the OGTT 100 times and only classifying women as having GDM if they had consistently imputed values for the OGTT that were indicative for GDM (>75 of 100 times). Further research is required to evaluate which approach is preferred, thereby also accounting for the epidemiological context of the study.

The overall prevalence of GDM in the literature varies from 2 to 9% (1). In Western countries such as the Netherlands, the incidence is more often toward 2% than 9%. Hypothetically, the incidence of GDM could be systematically underestimated in the literature (if these estimates have been based solely on selectively verified patients). In that case, we also underestimated the incidence of GDM and consequently our approach would have been suboptimal. However, it is not very plausible that for years the incidence of GDM has been underestimated, so application of the described method should have corrected properly for this verification bias (18,19).

Results from the present study show that the 50-g glucose challenge test has an almost fivefold higher sensitivity compared with random glucose testing. To our knowledge, these two screening tests have only been equated in the same sample two times before. McElduff et al. (20) found their results in favor of the 50-g challenge test, whereas Mathai et al. (21) found similar sensitivity for both tests and a higher specificity for the random test if both tests were performed in the 26th to 30th week of gestation. A number of studies compared the 50-g glucose challenge test with measurement of fasting glucose. Perucchini et al. (22) found the results in favor of the fasting glucose measurement, whereas Rey et al. (23) showed the 50-g glucose challenge test to be superior. Other studies investigating the test characteristics of the glucose challenge test reported sensitivities ranging from 58 to 80% (24,25) for a specificity of ∼65% (25). In these studies, thresholds for an abnormal result of the challenge test ranged from 7.2 to 7.8 mmol/l. In the present study, a predefined cutoff value for an abnormal test result was set at 7.8 mmol/l. If thresholds were set <7.8 mmol/l, sensitivity of the 50-g glucose challenge test would increase at the expense of a decreased specificity.

The random glucose test is a fast, simple, and relatively inexpensive test. Accuracy of random glucose measurement is less frequently studied than that of the glucose challenge test. Nasrat et al. (26) evaluated random glucose measurement, which revealed a sensitivity of 16% and a specificity of 96% using a threshold value of 7.0 mmol/l or 6.4 mmol/l if evaluated ≥2 h postprandial. Jowett et al. (27) also concluded that random glucose measurement is not sufficiently sensitive for screening on GDM. Results from the present study are in accordance with results from those two groups, using a threshold value for an abnormal test result of 6.8 mmol/l. As high sensitivity is key to any screening test, random glucose testing is not an accurate method to screen women for GDM because five of six women with GDM would still be missed.

In conclusion, we recommend that despite easy implementation, low costs, and relative high specificity, random glucose measurement should not be used as a screening test for GDM. Until superior screening alternatives become available, the 50-g glucose challenge test should be the preferred screening test for GDM.

Figure 1—

Screening and diagnostic test results before and after correction for verification bias. The figures in the diagram represent the number of women with the specific combination of test results before (after) correction of verification bias. Figures between parentheses represent the number of women after correction for verification bias.

Table 1—

Demographics before correction for verification bias

Table 2—

Results of the 2 × 2 table and accuracy measures calculated after correction for verification bias

Acknowledgments

This research was supported by a grant from Novo Nordisk, Alphen aan den Rijn, the Netherlands, and by grant 917.46.346 in the VIDI Program of ZonMW, the Hague, the Netherlands. The funding sources had no involvement in the design, analysis, or reporting of this study.

Parts of this article were presented in abstract form at the 4th International Symposium of Diabetes and Pregnancy, which was held in Istanbul, Turkey, 29–31 March 2007.

Footnotes

  • Published ahead of print at http://care.diabetesjournals.org on 13 August 2007. DOI: 10.2337/dc07-0571.

    A table elsewhere in this issue shows conventional and Système International (SI) units and conversion factors for many substances.

    The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C Section 1734 solely to indicate this fact.

    • Accepted July 31, 2007.
    • Received March 22, 2007.

References

| Table of Contents

This Article

  1. Diabetes Care vol. 30 no. 11 2779-2784
  1. All Versions of this Article:
    1. dc07-0571v1
    2. dc07-0571v2
    3. 30/11/2779 most recent