A Novel Use of Structural Equation Models to Examine Factors Associated With Prediabetes Among Adults Aged 50 Years and Older

National Health and Nutrition Examination Survey 2001–2006

  1. Linda S. Geiss, MA
  1. Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, Georgia
  1. Corresponding author: Barbara H. Bardenheier, bfb7{at}cdc.gov.


OBJECTIVE To use structural modeling to test a hypothesized model of causal pathways related with prediabetes among older adults in the U.S.

RESEARCH DESIGN AND METHODS Cross-sectional study of 2,230 older adults (≥50 years) without diabetes included in the morning fasting sample of the 2001–2006 National Health and Nutrition Examination Surveys. Demographic data included age, income, marital status, race/ethnicity, and education. Behavioral data included physical activity (metabolic equivalent hours per week for vigorous or moderate muscle strengthening, walking/biking, and house/yard work), and poor diet (refined grains, red meat, added sugars, solid fats, and high-fat dairy). Structural-equation modeling was performed to examine the interrelationships among these variables with family history of diabetes, high blood pressure, BMI, large waist (waist circumference: women, ≥35 inches; men, ≥40 inches), triglycerides ≥200 mg/dL, and total and HDL (≥60 mg/dL) cholesterol.

RESULTS After dropping BMI and total cholesterol, our best-fit model included three single factors: socioeconomic position (SEP), physical activity, and poor diet. Large waist had the strongest direct effect on prediabetes (0.279), followed by male sex (0.270), SEP (−0.157), high blood pressure (0.122), family history of diabetes (0.070), and age (0.033). Physical activity had direct effects on HDL (0.137), triglycerides (−0.136), high blood pressure (−0.132), and large waist (−0.067); poor diet had direct effects on large waist (0.146) and triglycerides (0.148).

CONCLUSIONS Our results confirmed that, while including factors known to be associated with high risk of developing prediabetes, large waist circumference had the strongest direct effect. The direct effect of SEP on prediabetes suggests mediation by some unmeasured factor(s).

The growing prevalence of diabetes, combined with clear evidence that lifestyle change can reduce diabetes risk in high-risk individuals, has led the American Diabetes Association to recommend diabetes screening in clinical settings for adults aged ≥45 years who have no risk factors other than age (1). Persons with blood glucose levels that are higher than those considered normal but not high enough to be classified as diabetes are at increased risk for type 2 diabetes (2,3). This state is termed “prediabetes,” and its prevalence in 2005–2008, based on fasting glucose or HbA1c levels, reached 50% for U.S. adults aged ≥65 years. For this age group, diagnosed diabetes is projected to reach 26.7 million by 2050, or 55% of all diabetes cases (4). In 2007, spending on diabetes care for adults aged ≥65 years accounted for $64.8 billion (56%) of direct diabetes medical costs—$41.1 billion for institutional care alone. Identifying older adults with prediabetes may help delay or prevent type 2 diabetes, thereby reducing morbidity and healthcare costs.

After adjusting for race/ethnicity, sex, and age, prediabetes is associated with obesity (5), high blood pressure (6), lipid abnormalities (7), family history of diabetes (8), and specific physical activity and dietary patterns (9). To our knowledge, no studies have examined all of these factors simultaneously as a system of multiple pathways leading to prediabetes. Available research allows us to hypothesize a causal model that depicts the relationships of factors related to the development of prediabetes in terms of direct effects and indirect (i.e., mediator) effects. Unlike traditional regression models that treat each covariate in the model as an independent direct effect on prediabetes, we can assess all relevant pathways of reported factors as independent and/or dependent (i.e., mediator) factors leading to prediabetes at once. Focusing our analysis on adults aged ≥50 years provides information that could stimulate thinking concerning ways to reduce diabetes risk for the Medicare population and those who will soon be in the Medicare population. As such, we used structural-equation modeling to test our hypothesized model, using data from a nationally representative survey of the U.S. population.


Survey design and population

As part of the 2001–2006 National Health and Nutrition Examination Surveys (NHANES), the National Center for Health Statistics, Centers for Disease Control and Prevention, collected data representative of the U.S. civilian noninstitutionalized population (10). Survey participants were interviewed at home and invited to a mobile examination center (MEC) to undergo various examinations and laboratory measurements. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel. Those participants selected to undergo tests that require fasting between 8 and 24 h have appointments in the morning group. The response rates for those who participated in the examinations during 2001–2006 ranged from 76–80% (10). Among the 7,287 participants aged ≥50 years, 6,668 completed the household interview and an examination at the MEC; of those, 3224 were in the morning fasting group, with 2,925 fasting from 8 to <24 h prior to the examination.

Glycemic status definitions and exclusions

During the home interview, participants were asked if they had ever been told by a doctor or other health professional that they had diabetes (other than during pregnancy). Based on their answer, 458 participants aged ≥50 years were classified as having diagnosed diabetes and were excluded from analyses. We excluded 193 participants who reported they did not have diabetes but had fasting glucose ≥126 mg/dL or HbA1c ≥6.5% (48 mmol/mol) and 44 participants who had inconclusive results for fasting glucose or HbA1c or both. The final sample size was 2,230 participants. The 1,221 persons who had a fasting glucose of 100–125 mg/dL or HbA1c of 5.7–6.4% (39–47 mmol/mol) were classified as having prediabetes.

Definitions of effects assessed

For our sample of older adults, variables assessed included age, sex, race/ethnicity (non-Hispanic [NH] white, NH black, Mexican-American, and other; due to the small sample size, other Hispanics are included in the “other” category) (10); household income (eight categories starting from <$20,000, then in $5,000 increments to ≥$75,000); educational attainment (less than high school, high school or GED, and more than high school); marital status (married or living with partner, divorced/widowed/separated, and never married); number of household members (actual number 1–6 and ≥7); self-reported family history of diabetes (i.e., including living and deceased, were any biological blood relatives including grandparents, parents, brothers, sisters, ever told by a health professional that they had diabetes?); measured BMI; large waist (measured circumference of ≥35 inches for women and ≥40 inches for men); high blood pressure (average of up to three readings ≥135/80 mmHg as recommended by the U.S. Preventive Service Task Force for screening of type 2 diabetes in adults or reported to be on antihypertensive medication) (11); HDL; triglycerides; total cholesterol; and a physical activity categorical variable based on metabolic equivalent hours per week in the past 30 days (0 for none, 1 for greater than 0 up to and including the median, 2 for greater than the median) for vigorous-intensity activity, moderate-intensity activity, muscle-strengthening activities, house or yard work, walking, or biking. We defined usual intake of selected dietary components (saturated fat, solid fat, high-fat dairy, meat, and refined grains) using the 24-h recall dietary assessment, averaging with a second day of recall data when available for the 2003 to 2004 and 2005 to 2006 cycles (12). All dietary data were adjusted using the MyPyramid equivalents to obtain the Healthy Eating Index (HEI) scores for each dietary component of interest (13). Although red meat has been considered to be episodic dietary intake, a recent study examining NHANES 24-h recall found it to be usual intake (14). The proportion of missing data among respondents was 7.3% for household income and <3.5% for all other variables in our analyses.

Statistical analysis

We used structural equation modeling with factor analysis, which groups intercorrelated variables into a single factor or latent construct, and path analysis, which includes the direct and indirect effects of factors previously reported associated with prediabetes as hypothesized (Fig. 1). Direct effects are depicted as an arrow emanating from an independent variable (exposure) leading and pointing to a dependent variable (outcome). For example, see the arrow between the latent constructs of socioeconomic position (SEP) and physical activity. An indirect effect is depicted as a mediating variable having an arrow pointing to it from an independent variable but also pointing to yet another dependent variable. In addition, physical activity as a mediating variable links SEP to the high blood pressure variable. A confounder, according to the use of these directed acyclic graphs, is depicted as a variable with direct effects on both the exposure and the dependent variable (15). Correlations between the measurement errors of two variables are represented by two-headed curving arrows, in which case only the measurement error terms are correlated.

Figure 1

Hypothesized factors in the pathway to prediabetes (fasting blood glucose [FBG]/HbA1c) among older adults aged ≥50 years, NHANES 2001–2006. Ellipse indicates latent, unobservable constructs (to be identified using factor analysis); box indicates observed variable; straight line with one arrowhead denotes direct effect.

In general, latent variable models reduce measurement error by having multiple indicators per latent construct, the ability to test models with multiple dependent variables, and the benefit of testing multiple integrated models simultaneously rather than factors individually. In addition, structural-equation modeling examines the direct and indirect effects of mediators on dependent variables while allowing the examination of complex associations among multiple mediators (16). Conversely, in a traditional regression model, mediators would not be included because they would block the pathway between the independent variable of interest and the dependent variable. Thus, in the structural-equation model, the independent factors and combined mediated relationships can be examined simultaneously, determining the impact of each of the dependent variables in the appropriate order. Thus, the SEM includes mediating effects without sacrificing indirect effects of interest. For each relationship in the SEM model, only data missing for either the independent or dependent variable would be missing from that equation.

Analyses proceeded in two stages. First, congruent with our hypotheses, we created and confirmed the a priori factor structure. We confirmed these latent constructs for SEP, physical activity, and dietary patterns. An assumption of this analysis is that an underlying unmeasured variable is identified by the shared variance of the observed variables. The constellations of factors that comprise poor diet and physical activity patterns may best be modeled as measures including those factors in terms of their shared variance rather than to account individually for the underlying immeasurable source. For the physical-activity construct, the observed decrease in physical activity participation among older adults may be partially due to ill-defined measurement (17). We therefore used a breadth of physical-activity domains and modeled them as a latent construct because the shared variance represents the entire pattern of physical activity that may influence the other variables and prediabetes. For the SEP construct, we used household income, level of education, marital status, and number of individuals in the household. Second, after confirmatory factor analysis, we added the other observed factors reported to be associated with prediabetes. The theoretical structural model tested is displayed in Fig. 1.

Our focus was to examine the effects of modifiable factors such as physical activity, diet, lipids, obesity, high blood pressure, and, to a lesser degree, SEP (which can only be modified with great difficulty) on prediabetes. Because age, sex, and race/ethnicity are strong, nonmodifiable confounders related to most of the other factors in the model, their direct effects, while included, are not shown in the graphic of the final model. Although family history of diabetes is nonmodifiable, it is specific to diabetes risk and therefore is examined as factor of interest.

We used SAS v9.2 (SAS Institute Inc.) for data management and descriptive statistics, along with Mplus v6.12 software for confirmatory factor analysis and testing the structural model, while accounting for the complex survey design of NHANES. P values <0.05 were considered statistically significant.

The model

The conceptual model that specifies the relations (numbered) among concepts operationalized in this study appears in Fig. 1. A priori, we predicted 27 paths to directly and/or indirectly affect prediabetes, emanating from the 10 variables labeled. We relied on the data’s inherent temporality. For example, race, sex, and family history of diabetes are determined at birth, and age is a function of when one is born. Physical activity was reported during the past 30 days of MEC examination, and dietary intake was reported for the 24-h period prior to or up to 10 days after the MEC examination (18). We assumed these activities were “usual” patterns. Evidence from previous studies was also used to assess the determinants of the causal pathway (e.g.,family history of diabetes may lead to dyslipidemia and diabetes) (7) because prospective measures were not available in NHANES cross-sectional data.

As indices of the models’ statistical fit to the data, we used standard criteria, including comparative fit index (CFI) >0.90, root mean square error of approximation (RMSEA) < 0.08, and the standardized root mean square residual <0.06. Although not widely used, we also report the weighted root mean square residual (WRMR), which is a weighted average of the residuals—a value of <1 is recommended (19). Modification indices were used to assess specific paths for the best-fitting model, using a χ2 value indicating the probability was <0.05 significance. Because the model had discrete dependent variables, the best method of estimation for the model is a robust weighted least squares, also known as the weighted least squares with mean and variance adjustment (20). With this type of model and estimation method, SEs for the standardized path coefficients are not computed. Therefore, we report the standardized estimates and the fit statistics of the models only.


Descriptive statistics

Among those ≥50 years without diabetes, 51.7% had prediabetes. Compared with those with normal glucose, at the P < 0.01 level, those with prediabetes were more likely to be male (51.5 vs. 36.7%), overweight (39.4 vs. 34.6%), or obese (39.1 vs. 21.5%); to have high triglycerides >200 mg/dL (22.0 vs. 15.2%); and to have HDL <40 mg/dL (14.0 vs. 9.9%) (Table 1). Those with prediabetes had significantly higher average intakes of solid fats (5.92 vs. 4.78; P < 0.01, where higher score indicates greater percentage of total energy intake) and red meat (3.50 vs. 3.11; P = 0.03, where higher score indicates greater number of equivalents [e.g., ounces or cups] per 1,000 kcal) than those with normal glucose (Table 2).

Table 1

Characteristics of 2,230 older adults according to prediabetes and normal glucose status

Table 2

Dietary variables for older adults according to prediabetes and normal glucose status

Structural-equation models

Factor analysis confirmed the measurement portion of the model (CFI, 0.99; RMSEA, 0.02; standardized root mean square residual, 0.02) for the three latent constructs: 1) SEP (except that the number of family members did not load or contribute to the pattern of SEP identified in the factor analysis); 2) physical activity; and 3) poor diet (except that that saturated fats and processed meats did not load or contribute to the pattern of poor diet identified in the factor analysis). In the factor analysis assessment of the model, the latent constructs were not correlated with one another; however, in the structural-equation model, the latent constructs were correlated.

The best-fit structural-equation model (CFI, 0.89; RMSEA, 0.02; WRMR, 1.19) was somewhat different from our adjusted hypothesized model (CFI, 0.77; RMSEA, 0.02; WRMR, 1.56) (Fig. 2). The following changes were made for our hypothesized model to converge. First, we dropped total cholesterol and BMI from the final model based on the fit of parameters and the modification indices. Second, we dropped the direct effect of dietary patterns on HDL based on the fit of the model. Almost all factors had direct effects from age, sex, and race/ethnicity. For simplicity, these paths are not presented in Fig. 2. All paths shown are statistically significant at the 0.05 level (standardized path coefficients are given in parentheses), except for the one path of physical activity regressed on large waist circumference (P = 0.108). Higher SEP had a positive relationship with physical activity (0.317) and inverse relationships with poor diet (−0.457) and prediabetes (−0.157). The effect of SEP was greatest on poor dietary intake, indicated by the highest absolute value of a path coefficient. Family history of diabetes had a slightly greater effect on poor diet (0.111) compared with its relationships to HDL (−0.063), large waist circumference (0.066), and prediabetes (0.070). Poor diet had similar effects on triglycerides (0.148) and large waist circumference (0.146). The effect of physical activity on large waist circumference (−0.067) was not as strong as its effects on HDL levels (0.137), triglycerides (−0.136), and high blood pressure (−0.132). HDL had strong inverse correlations with triglycerides (−0.510) and large waist circumference (−0.291), being more highly associated in the former instance. Large waist circumference had a strong relationship with prediabetes (0.279) and also with high blood pressure (0.253); the relationship of high blood pressure on prediabetes was positive, but not as strong (0.122).

Figure 2

Final model of factors in the pathway to prediabetes (fasting blood glucose [FBG]/HbA1c) among older adults aged ≥50 years, NHANES 2001–2006. Ellipse indicates latent, unobservable constructs; box indicates observed variable; straight line with one arrowhead denotes direct effect; curved line denotes correlation. Adjusted for sex, race/ethnicity, and age. DM, diabetes mellitus.


To our knowledge, this study presents the first examination of direct and indirect effects of modifiable risk factors on prediabetes using structural-equation modeling. Appropriate fit required our original hypothesized model to undergo a few changes. Owing to the presence of other modeled variables, BMI and total cholesterol did not contribute to the model. Large waist circumference played an important role, with a relatively strong direct effect on prediabetes. Studies have shown that not all excess body weight carries equal risk. In fact, abdominal obesity, more so than generalized obesity (e.g., BMI), and adipose tissue inflammation appear to be factor in the development of insulin resistance and subsequent type 2 diabetes (2123).

A prospective study found that high triglyceride levels and low HDL levels were both directly associated with increased risk of diabetes, whereas total cholesterol was not statistically significantly associated (7). We also found the best-fitting model did not include total cholesterol. Instead, triglycerides overrode the role of total cholesterol via significant correlated measurement errors with HDL levels, large waist circumference, and prediabetes; triglycerides neither mediated nor had direct effects on prediabetes. Our finding of family history of diabetes directly related to HDL cholesterol (7) but not with triglycerides is supported by the literature for type 2 diabetes risk (24). Family history of diabetes could be genetic and/or environmental defined by behavioral risk factors. A review of genomic studies reiterated that significant gene–diet and gene–environment interactions result in altered lipid metabolism, inflammation, and other metabolic imbalances that lead to cardiovascular disease and obesity (25). Our study supported a gene–diet relationship but not a gene–environment interaction with either SEP or physical activity since diabetes family history was not directly related to either factor in our model.

The direct effect of high blood pressure on prediabetes in our model was confounded by large waist circumference. This finding is supported by a previous study that found an increase in resistance to insulin-mediated glucose disposal in subjects with hypertension, regardless of obesity status, when compared with their weight-matched controls with normal blood pressure (26). However, the measure of obesity in that study was weight, whereas, in our study, large waist circumference was a measure of central adiposity that displaced the measure of general obesity (e.g., BMI).

Physical activity plays an important role in improving risk factors that are associated with diabetes (27). We found that higher levels of physical activity were associated with lower prevalence of high blood pressure, large waist, and dyslipidemia being consistent with studies of specific diabetes risk factors (2831). Also, usual intake of poor diet had a stronger effect on large waist than did physical activity, with the latter effect being due to the older age group in our population. In general, the amount of physical activity suspected to be associated with decreased abdominal obesity is 13–26 MET-hours per week (32). Our sample had a median MET-hours per week of only 6.0–9.0 for all physical activities except house/yard work, for which the median was 31.5 MET-hours per week. While the latter might have been expected to produce a strong association with decreased waist, only 16.9% reported any house/yard work. Moreover, among older adults, such activity, while prevalent, is also known to have low reliability of self-report (33); data quality could not be accounted for in our modeled latent construct for total physical activity.

It has long been established that the most vulnerable groups that are disproportionately burdened by diabetes include the aged, minority race/ethnicity groups, and those with lower SEP. In our analysis of those aged ≥50 years, we found that although SEP was mediated by physical activity and dietary patterns, it still had an additional direct effect on prediabetes in the presence of the other factors. This suggests that some other factor(s) related to SEP may also mediate the relationship with prediabetes.

A limitation of our analysis is that the data are cross-sectional, and therefore our hypothesized directional relationships of laboratory measures did not always hold, perhaps reflecting the proximal timing of data collection. Our dietary intake and physical activity data may also be affected by social desirability bias in that people are likely to overreport “healthy” behaviors. Because we used factor analysis to assess latent constructs, the results are only generalizable to the population of the U.S.; to compare across countries, it would be necessary to use the same method with comparable variables.

Another limitation is that we were unable to include a measure for stress hormones. Further, we used only one measure to identify prediabetes. Although it is not strictly recommended to detect prediabetes, because of the variability in test results, the lack of a confirmatory second measure may bias our estimates (34). However, we defined prediabetes using two different laboratory assays, fasting glucose and HbA1c, the latter representing a measure of glycemia averaged over 3 months. Using the 24-h recall to measure poor diet is subject to bias from considerable within-person variation; the food frequency questionnaire would have been preferable but was unavailable for all NHANES waves. However, using a latent construct to model the effects of poor diet lessens the potential bias in two ways. First, as a latent construct, the dietary factors (e.g., added sugars, refined grains, saturated fats, etc.) are defined by their shared variance and were modeled based on covariance, not any individual factor alone. Second, latent constructs are adjusted for normally distributed measurement error, and measurement error to some degree has created difficulties in estimating long-term intake with 24-h recall (35). However, use of a latent construct does not account for omission of episodic dietary intake, which is a limitation of 24-h diet recall.

To our knowledge, this is the first report to test the pathway of interrelated factors leading to prediabetes. Our study found that while including other factors known to be associated with high risk of developing prediabetes, large waist circumference had the strongest direct effect on this modeled outcome. The U.S. Diabetes Prevention Program clinical trial (36) demonstrated that, among those with prediabetes, structured lifestyle interventions including at least a 7% weight loss and at least 150 min of physical activity per week reduced 3-year incidence of diabetes by 58%, and ∼71% for those aged ≥60 years. Another intervention study found that healthy eating combined with physical activity or an exercise program among viscerally obese men improved blood pressure, decreased dyslipidemia, and reduced visceral fat (37). Our model confirms previously established associations of modifiable factors such as physical activity, poor diet, large waist circumference, and high blood pressure with prediabetes in a causal model, yet our model should be confirmed with a prospective study.


No potential conflicts of interest relevant to this article were reported.

B.H.B. conducted the data analysis, researched data, contributed to discussion, wrote the manuscript, and reviewed and edited the manuscript. K.M.B., C.J.C., and L.S.G. researched data, contributed to discussion, wrote the manuscript, and reviewed and edited the manuscript. Y.J.C. and E.W.G. researched data, contributed to discussion, and reviewed and edited the manuscript. B.H.B. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Data from this study were presented as a guided poster presentation at the 72nd Scientific Sessions of the American Diabetes Association, Philadelphia, Pennsylvania, 8–12 June 2012.

The authors thank Dr. Lawrence Barker of the Division of Diabetes Translation, U.S. Centers for Disease Control and Prevention, for valuable comments.


  • The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

  • Received December 14, 2012.
  • Accepted March 5, 2013.

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. See http://creativecommons.org/licenses/by-nc-nd/3.0/ for details.


| Table of Contents

This Article

  1. Diabetes Care vol. 36 no. 9 2655-2662
  1. All Versions of this Article:
    1. dc12-2608v1
    2. 36/9/2655 most recent