Race and Ethnicity

Vital constructs for diabetes research

  1. Andrew John Karter, PHD
  1. From the Division of Research, Northern California Region, Kaiser Permanente, Oakland, California
  1. Address correspondence and reprint requests to Andrew John Karter, PhD, 2000 Broadway, Research Division, Kaiser Permanente, Oakland, CA 94611. E-mail: andy.j.karter{at}

Medical researchers are now paying increasing attention to findings of racial or ethnic (“racial/ethnic” hereafter) differences in quality and access to care, health outcomes, risk factors, genetic markers, and therapeutic response. However, this attention has been met with growing controversy and debate. Society’s history of discrimination, racism and eugenics, and continued disparities in access and quality of care make this a particularly sensitive issue. In the past year, the New England Journal of Medicine (14) and the International Journal of Epidemiology (5,6) have published several commentaries and editorials, some criticizing and others arguing in favor of the use of race/ethnicity in medical research. The editorial board of the Archives of Pediatric and Adolescent Medicine recently instructed submitting authors not to detail race/ethnic variation in disease or risk factors unless there is proof of the biologic, scientific, or sociologic bases for these differences (7). While social epidemiologists have justifiably criticized some molecular scientists’ espousing “genetic-determinism,” many social epidemiologists have promoted the equally unsubstantiated perspective that dismisses the influence of genetics on racial disparities in disease (5). At the heart of this controversy is a dispute about whether studying the construct of race/ethnicity has any justification in medical research at all. Concerns of the sociologic risks associated with doing racial/ethnicity research [e.g., stigmatization and emphasis on differences rather than similarities and racial profiling in choice of therapy (2)] have also been raised. Epidemiological research on race/ethnicity, however, has a long history of apparent utility, facilitating the identification of subgroups with higher rates of disease (8) and differing levels of risk factors (9) and the detection of disparities in the quality of and access to care (10,11) and differing response to pharmacotherapy (12), and providing potentially important leads about etiology and the roles of genes and environment (13,14).

This debate has now reached the diabetes scientific community as well; this year’s American Diabetes Association Scientific Sessions (June 2003) dedicated a special session to debate the use and measurement of race/ethnicity in diabetes research. Given that there are few disease states that demonstrate such marked racial/ethnic variation as diabetes, this discussion has particular relevance for diabetes research. During the calendar year 2002, ∼6% of the articles published in Diabetes Care focused on race/ethnicity (i.e., included the words “race,” “racial,” “ethnic,” or “ethnicity” in the title or abstract). In this issue of Diabetes Care, de Rekeneire et al. (14a) report on variability in glycemic control by race/ethnicity in the Health, Aging, and Body Composition Study cohort. In the U.S., poorer glycemic control among African-American and Latino patients has been reported from several cross-sectional population-based samples (1517), the National Health and Nutrition Examination Survey, 1988–1994 (NHANES-3) (18), the Behavioral Risk Factor Surveillance Survey (BRFSS) (19), the Insulin Resistance and Atherosclerosis Study (IRAS) (20), and the Translating Research Into Action for Diabetes (TRIAD) study (Dr. Arleen Brown, submitted for publication). There are numerous other examples of race/ethnic disparities specific to diabetes, including the prevalence of diabetes (21), diabetes-related complications (8), risk factors (22), and quality of diabetes care (11). Given the growing controversy, researchers need to be aware of the particular issues and methodologies and to develop a critical eye when evaluating and planning studies of racial/ethnic differences in disease outcomes, risk factors, and health services. Included below is a commentary on the recent debate about the value of race/ethnicity in medical research and a brief review of select methodological issues. [For more comprehensive and excellent discussions regarding methodological issues, see articles by Lin and Kelsey (14) and Risch et al. (13).]

There is a growing recognition of the importance of race/ethnicity in our research activities, and National Institutes of Health now requires documentation of minority inclusion on all new grant submissions (23). However, some scientists suggest that there is insufficient evidence that race/ethnicity has biological or genetic significance (3,6,2427) and promote ignoring race/ethnicity in medical research altogether. Justification of this so-called “race-neutral approach” rests largely on two contentions. First, race/ethnicity is strictly a social construct and too crudely measured to have value in public health. Second, race/ethnicity is not a biological construct because there is more intra-individual genetic variation within a race than between races. Eric Lander popularized this concept with what is known as the “99.9% identical rule,” which states that “any two human beings on this Earth are 99.9% identical at the DNA level” (28). However proponents often fail to mention that although individuals are genotypically almost identical, the 10th of a percent of the genome’s 3 billion letters that are different translates into roughly 3 million sequence differences, with some conferring dramatically differing risk of disease (e.g., cystic fibrosis or sickle cell disease). Thus, it has been argued that failing to design studies to accommodate the contingencies for interactions between populations and genes, important population differences in genetic susceptibility, if they exist, would likely remain undetected. Given that genomic medicine is still in its infancy, incorporating analytic approaches that could refine the enormously complex task at hand could potentially benefit scientific progress.

Others have argued for race neutrality simply because race/ethnicity is too crude to be useful as a stratifying variable (6). Although our perceptions of race/ethnicity are somewhat fluid, complex, and often ambiguous (14,29), they have nonetheless proven to be powerful predictors. Epidemiologists have a long history of benefiting from designs that stratify samples on categories that are either surrogates or crudely defined constructs. As an example, diabetes researchers have linked socioeconomic status (SES) and dietary patterns with diabetes incidence. Despite the “fuzzy” imperfection of definitions of both SES and diet, this research has contributed greatly to our understanding of the determinants of diabetes.

The process of classification and how to operationalize racial/ethnic stratification is hotly debated. Many researchers (5,13,14) promote self-identified race/ethnicity as being the most valid measure for most types of epidemiological study. The National Institutes of Health now requires documentation of minority inclusion on all new grant submissions and considers self-reported race/ethnicity status to be the preferred method of categorization (23). There is increasing support, however, for the use of race/ethnic-specific genetic markers (e.g., “microsatellite markers”) to detect and statistically correct for confounding due to population stratification (30). It is argued that this more objective and quantitative approach would avoid the uncertainty associated with crude, self-identified classification schemes that force assignment of mixed populations or individuals into one or another group. However, using unique genetic markers rather than self-reported race/ethnicity is not yet practical for large epidemiological studies. Greatly inflated sample frames would be required to accommodate the identification of sufficient individuals in minority populations to power a study to evaluate interactions (see below), and the whole sampling frame would need to be genotyped before population stratification could take place. In addition to demanding expensive data collection and lab assays, this approach fails to capture the confluence of social, cultural, behavioral, and environmental variables that are associated with self-identified race/ethnicity, thereby introducing confounding between genetic and environmental risk (13). By ignoring self-identified race/ethnicity or even when stratifying on population-specific genetic markers, environmental culprits may be missed due to our inability to disentangle the residual effects of confounding (13). If disease variation was due to cultural practices, then self-identified race/ethnicity would be a better adjuster than genetic markers, given that cultural practices could not be maintained over time if members could not identify one another (31,32). For these reasons, the use of self-identified race/ethnicity provides the most practical and economical resolutions for handling study design problems.

Unique genetic markers, however, have become important in a special area of research (called “admixture studies”). The distribution of these markers (as a measure of degree of admixture) is correlated with the prevalence of a given disease or trait as a method to investigate the role genetics plays in that association between race/ethnicity and disease (33). For example, insulin resistance and acute insulin response were shown to vary as a function of genetic markers (acting as surrogates for proportion African admixture) (34). This approach has also been used to study type 2 diabetes in Pima Indians (35) and type 1 diabetes in African Americans (36). The intriguing findings reported from these studies sometimes suggest powerful genetic differences across races/ethnicities; but extreme caution is needed in interpretation. Analyses that pool a minimally admixed group, e.g., European Americans or Asian Americans, with a group with greater levels of admixture (e.g., African Americans or Latino Americans) may yield distorted conclusions because the large group of nonadmixed individuals exerts a powerful leveraging effect on the regression analysis. It is more appropriate to restrict admixture studies only to the admixed group (e.g., African Americans and Latino Americans) when associating levels of admixture and disease phenotype (13). Another shortcoming in admixture studies is that nongenetic factors (e.g., diet and SES) may co-vary with admixture as well, and without thorough adjustment for these environmental factors, residual confounding could bias estimates of the genetic effect (13).

Another point of contention surrounding race/ethnicity is the classification process and the scientific as well as political appropriateness of such naming, i.e., whether to “lump or split.” Race/ethnicity is not one thing to all people, and has been aptly described by Williams (37) as a “complex, multidimensional construct reflecting the confluence of biological factors and geographical origins, culture, economic, political and legal factors, as well as racism.” Naming decisions clearly need to be flexible, depending on the purpose. Particular confusion surrounds the separation of “ethnicity” (e.g., Hispanic vs. non-Hispanic) from “race,” with the former having more of a cultural and social connotation and the latter being more biological. From the perspective of the individual being surveyed, this distinction frequently seems artificial and rather than ask a separate question about ethnicity, some surveys now use it to subdivide the usual race response choices (e.g., non-Hispanic white vs. “non-Hispanic Black” vs. “Hispanic”). It is also argued that finer-grained (sub-) categorizations (e.g., Amish) would be more informative than groupings based on the usual continental ancestry (38). However, Risch et al. (13) and others (39) reported genetic evidence supporting categorization based on major categories of self-identified race and suggested that identifying genetic differences between these groups was “scientifically appropriate,” at least for genetic studies. However, for the study of behavioral or cultural exposures, this relatively coarse-grained classification runs the risk of missing important factors that may distinguish subgroups. Additionally, there has been an approximate fourfold increase in those reporting mixed ancestry to the U.S. Census from 1970 to 1990, and thus future data collection and analyses will need to accommodate admixed groups specifically, rather than arbitrarily subsuming mixed race individuals under one group using some predetermined hierarchical algorithm. Currently, respondents are sometimes requested to choose the single racial classification with which they most closely self-identify. This would probably be most appropriate for studies with a sociologic focus, but be less appropriate for genetic studies. New “bridging methods” are being investigated that use additional questions to facilitate choosing the single most appropriate classification. Regardless of the method, it is important for researchers to be explicit in describing how subjects with mixed ancestry are handled analytically. As no single classification scheme can accommodate all studies, categories should be collapsed in such a way as to minimize heterogeneity within groups, while balancing the practical aspects of data collection and study outcomes (14).

One of the most harmful aspects of the race-neutral approach is its failure to consider the possibility of racial/ethnic SES differences in the design stage, which in effect could preclude detection in the analytic stage when interactions exist (5). Conducting separate (stratified) analyses by race/ethnicity is indicated when the effect of the exposure differs in magnitude across subgroups; such patterns are called “interactions” (or sometimes “effect modification”). For example, it was recently reported that medication response to ACE inhibitors differed by race/ethnicity (diminished effect in African Americans) (12). Pooled estimates are no longer valid when such interactions are detected, whereas race/ethnic-specific estimates are valid. However, such analyses require sufficient numbers within each race/ethnicity, and given that population-based samples are typically dominated by European Americans, researchers often are unable to test interactions between race/ethnicity and exposures or to conduct stratum-specific analyses in minority groups if differences are detected. Failure to allow for the contingencies of racial/ethnic differences at the design stage may prevent researchers with an overall (pooled) finding that is invalid due to interactions from conducting subanalyses within minority groups because they include too few subjects and lack statistical power. A recent example comes from the latest AIDS vaccine trial, which showed no overall efficacy but, “subset analysis of their data showed statistically significant efficacy in blacks and another minority group, suggesting a genetic basis for the difference in response. Researchers and activists were dubious of the claims, noting the small sample size of the subgroups” (40). Use of stratified random sampling (equally sized random samples from each race/ethnic group) or over-sampling of minorities is sometimes included in a study design based on prior expectations of racial/ethnic differences. If no differences are detected across populations, then the data can easily be pooled to generate appropriate weighted estimates.

In some instances, particularly in health services research, completely unadjusted models linking race/ethnicity to health are most appropriate. An example is assessing racial/ethnic differences in receiving recommended processes of diabetes care (e.g., at least one annual cholesterol test). Because we are not trying to predict processes but rather assess disparities in care, statistical adjustment for SES, demographics, or health status in an analysis would be over-adjustment given that these processes of care are minimum standards that should be available to all patients with diabetes, regardless of race/ethnicity or any other individual-level characteristic.

In etiologic research, many researchers have attempted to understand racial/ethnic differences in health that may be due to genetic differences without actually using genetic markers. Such studies have used so-called “black box epidemiology” (41), in which, after adjusting for expected alternative explanatory variables, e.g., SES, residual effects are attributed to genetic causes, an approach of questionable methodology (38). There is wide agreement that, rather than implying them by default, directly measuring the factors thought to be responsible for health differences between race/ethnic groups, such as genetic factors, is preferred (42). One can safely assume that many factors are missing (not collected or not specified) from a statistical model that attempts to explain racial/ethnic differences, and thus it would be inappropriate to assign the residual race/ethnic effect to genetics alone. Furthermore, it is rare in statistical models for residual race/ethnicity effects to be insensitive to adjustment for alternative explanatory variables. More often, crude estimates of racial/ethnic differences are attenuated after adjustment for explanatory factors such as SES and other behavioral or clinical factors. In de Rekeneire et al.’s article (14a) in this issue, for example, we see the unadjusted estimates of black-white differences in glycemic control attenuated with increasing model adjustment. One is left wondering what additional explanatory variables might attenuate the race/ethnicity effect further and to what degree. Explanatory mechanisms may be quite subtle and difficult to measure in quantitative models. Examples include the impact of internalized racism on health (43,44), differential treatment by providers when patients’ race/ethnicity is discordant from their own (45), and financial and language barriers to access even among the fully insured (46). Also, race/ethnicity is a lifelong attribute, and past experiences [e.g., childhood SES (47)] may be predictive of current or future health. While data on the current environment can be easily collected, it is far more difficult to quantify past experiences and may be a source of considerable confounding. Clearly, if we could specify and adjust for all the explanatory variables, environmental and genetic, no differences should remain. The existence or importance of residual explanatory factors will always remain unknown, thus it may be more relevant to focus on the relative importance of specified explanatory variables rather than the magnitude of the race/ethnicity effect. Knowledge of the dominant explanatory factors (explaining racial/ethnic differences in outcomes) would have public health value because it defines appropriate targets for interventions aimed at reducing racial/ethnic disparities.

Rather than examining only models that measure the effect of race/ethnicity on outcomes (e.g., relative risk of diabetes for one race/ethnic group relative to the reference group), methodologists recommend also specifying separate (stratified) models for each racial/ethnic group. There are two main reasons: 1) the relative importance of risk factors are better examined separately by race/ethnicity because the set of risks and effect sizes may vary by group, and detecting such complex interactions is often intractable (37); and 2) causal models specifying fixed attributes such as race/ethnicity are not “substitutable” and thus fail methodologically because they do not yield the answers that we actually seek (e.g., what would the health of individuals of race X be given they experienced life as race Y?) [see counterfactual model of causality (48)].

Ironically, proponents and opponents of the “race-neutral approach” are likely striving for the same overarching goal, to eliminate racial/ethnic disparities in health. Public health and medical research, however, has much to lose if the race-neutral approach is adopted. Science usually moves forward not in great leaps, but in small steps, and frequently the mechanisms underlying new discoveries are not understood until later. Proponents of applying the “race-neutral approach” would evidently ignore observations of racial/ethnic differences in outcomes (such as glycemic control differences reported in this issue) and the public health implications simply because the scientists do not yet grasp the mechanisms. Rather than facilitate our progress toward understanding, such a position would impede it. We must continue to monitor our progress toward eliminating racial/ethnic disparities in health as proposed by the Healthy People 2010 Initiative (49), even if we do not yet fully understand the causes. Observations of racial/ethnic differences in outcomes, exposures, or processes of care must not be ignored, regardless of whether due to social differences in access or quality of care, health behaviors, or genetic susceptibility.


I thank Neil Risch, Esteban G. Burchard, Laurel Habel, Howard H. Moffet, Sarah Rowell, Ameena T. Ahmed, Andy L. Avins, Cathy Schaefer, Joseph V. Selby, and David G. Marrero for comments on early drafts of this editorial.


    • Accepted May 6, 2003.
    • Received May 2, 2003.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 14a.
  16. 15.
  17. 16.
  18. 17.
  19. 18.
  20. 19.
  21. 20.
  22. 21.
  23. 22.
  24. 23.
  25. 24.
  26. 25.
  27. 26.
  28. 27.
  29. 28.
  30. 29.
  31. 30.
  32. 31.
  33. 32.
  34. 33.
  35. 34.
  36. 35.
  37. 36.
  38. 37.
  39. 38.
  40. 39.
  41. 40.
  42. 41.
  43. 42.
  44. 43.
  45. 44.
  46. 45.
  47. 46.
  48. 47.
  49. 48.
  50. 49.
| Table of Contents

Navigate This Article