Evaluating Structured Care for Diabetes: Can Calibration on Margins Help to Avoid Overestimation of the Benefits? An Illustration From French Diabetes Provider Networks Using Data From the ENTRED Survey

Chevreul, Karine; Brunn, Matthias; Cadier, Benjamin; Nolte, Ellen; Durand-Zaleski, Isabelle

doi:10.2337/dc13-2141

OBJECTIVE

While there is growing evidence on the effectiveness of structured care for diabetic patients in trial settings, standard population level evaluations may misestimate intervention benefits due to patient selection. In order to account for potential biases in measuring intervention benefits, we tested the impact of calibration on margins as a novel adjustment method in an evaluation context compared with simple poststratification.

RESEARCH DESIGN AND METHODS

We compared the results of a before–after evaluation on HbA_1c levels after 1 year of enrollment in a French diabetes provider network (DPN) using an unadjusted sample and samples adjusted by simple poststratification to results obtained after adjustment via calibration on margins to the general diabetic population’s characteristics using a national cross-sectional sample of diabetic patients.

RESULTS

Both with and without adjustment, patients in the DPN had significantly lower HbA_1c levels after 1 year of enrollment. However, the reductions in HbA_1c levels among the adjusted samples were 22–183% lower than those measured in the unadjusted sample, regardless of the poststratification method and characteristics used. Compared with simple poststratification, estimations using calibration on margins exhibited higher performance.

CONCLUSIONS

Evaluations of diabetes management interventions based on uncontrolled before–after experiments may overestimate the actual benefit for patients. This can be corrected by using poststratification approaches when data on the ultimate target population for the intervention are available. In order to more accurately estimate the effect an intervention would have if extended to the target population, calibration on margins seems to be preferable over simple poststratification in terms of performance and usability.

Introduction

Health systems increasingly rely on structured care interventions to better meet the needs of growing populations with diabetes and other chronic conditions (1,2). Such interventions, frequently also referred to as disease management care, typically include components such as enhanced coordination between providers, the systematic use of clinical guidelines, and patient education (3,4). In practice, the nature and settings in which structured care is delivered vary, ranging from discrete programs offered to a selected group of patients to multicomponent, population-based strategies offered as part of usual care (5). An example of structured diabetes care is the concept of the diabetes provider network (DPN) in France, an association of care providers with central coordination to ensure a predefined patient trajectory (6).

Structured care approaches are expected to improve the quality of care for persons with chronic conditions, enhance health outcomes, and, ultimately, reduce costs. However, although intuitively appealing, the evidence of whether these aims are achieved in practice remains uncertain. This is in part because good quality evidence using randomized controlled designs are typically limited to small populations or conducted in research settings and are therefore difficult to generalize (7,8). Conversely, the diffusion of structured care approaches into routine settings, for example, through rollout or implementation at population level, is rarely accompanied by rigorous evaluation (9,10). Instead, where evaluation is conducted, it often must rely on uncontrolled designs such as observational before–after studies, because randomized designs are not feasible for various reasons (10,11). Such evaluations are difficult to interpret, however, and are prone to misestimating intervention effect, because they do not account for potential biases resulting, for example, from selective enrollment of patients likely to benefit from the intervention and from self-selection of health care professionals (12,13). This leads to differences in the characteristics of the overall target population and the intervention group that is a sample of the latter (called intervention sample throughout this article).

In order to improve the available evidence on the effectiveness and impact of population-wide disease management programs, it is necessary to account for these potential biases and use methods that are both scientifically robust and feasible for evaluation in daily practice.

To this end, it is appropriate to use poststratification methods commonly used in survey analysis that consist of a posteriori adjustments using auxiliary information at the target population level. These methods include simple poststratification, poststratification by regression, and, more recently, poststratification by calibration on margins introduced by Deville and Särndal (14–16). Poststratification aims to rebalance differences in the characteristics of the intervention sample and the overall target population by using several mathematical methods that are more or less complex. Simple poststratification methods are the most commonly used and rely upon the simplest mathematical method that assigns weights to strata of individuals using a ratio. Calibration on margins using more complex functions based on an iterative process with distance functions can also be applied (15). In addition to being relatively simple to use, the literature indicates that this method usually provides better performance than simple poststratification, as demonstrated by lower variance of the estimator of calibration, because it is less biased when used with small sample sizes (17,18). Moreover, calibration on margins can be conducted on a higher number of characteristics than simple poststratification, particularly when no individual but only aggregated data are available (because no poststrata can be reconstructed).

In this study, we explored the use and usefulness of calibration on margins over no adjustment or commonly used simple poststratification for the evaluation of DPNs in France. We compared the estimated effect size of the intervention by using a before–after evaluation of structured care on an unadjusted sample, on samples adjusted by simple poststratification, and on samples adjusted by calibration on margins. In addition, we sought to compare results obtained from the different poststratification methods using several different calibration functions in order to understand their relative merits and performance levels, thereby informing their potential use in future evaluations.

Research Design and Methods

Data Sources

The intervention group was comprised of patients enrolled in a DPN for type 1 and 2 diabetes in the Paris region of France. Enrollment was usually recommended by the patient’s general practitioner. Services provided by the DPN included patient education and workshops; systematic patient assessment and follow-up by general practitioners (annual checkups at a minimum), dietitians, nurses, and podiatrists; interdisciplinary meetings and training for the professionals involved; and the use of clinical guidelines. At patient enrollment, an initial patient assessment was performed and documented in the network database. Permission was obtained from the National Data Safety Authority to store and use patient data for evaluation purposes.

Because data on the characteristics of the French diabetic population are not available at the national level, data for the reference population were drawn from the Echantillon National Témoin Représentatif de la Population Diabétique (ENTRED) study, a cross-sectional representative national survey of people treated for diabetes in France based on a random sample of adults who had claimed at least three reimbursements for oral hypoglycemic agents or insulin from the largest French health insurance fund over a 1-year period (19). Anonymized patient-level data from the ENTRED study were obtained from the French Institute for Public Health Surveillance.

Study Population and Measures

For data availability reasons, we included patients in the DPN who were enrolled in 2007 or 2008 (n = 549) and had completed the initial and the 1-year follow-up assessments with a full data set, resulting in an intervention group sample size of 232 patients.

ENTRED data included claims data, questionnaires, and clinical data for 2007 and 2008, forming a reference group of 2,485 patients, representing the diabetic population in France.

As HbA_1c level is the most widely used diabetes outcome measure (20,21), we used HbA_1c levels after 1 year of enrollment in the DPN as the primary outcome.

Measures for adjustment included basic demographic characteristics (age and sex) and clinical indicators (diabetes duration; treatment modality; and intermediate outcomes, including hypertension, blood lipids, smoking status, BMI, and glomerular filtration rate). We also included the statutory health insurance chronic disease coverage scheme status (affections de longue durée) as a proxy reflecting a patient’s higher level of comorbidity. We were unable to consider socioeconomic status, as relevant data were not collected by the DPN.

Poststratification Methods

Simple Poststratification

Poststratification in its classic form classifies the sample by poststratum (group of individuals) on a given characteristic and weights individuals in each group up to the population total of that group. Specifically, weights are computed based on a ratio approach by dividing the proportion of individuals in a given stratum in the intervention sample by the same proportion in the overall target population (22).

Characteristics used for poststratification are commonly age and sex and those that are associated with the outcome and thus impact the results when they differ between the intervention sample and the reference population (23). We therefore explored four sets of one, two, or three characteristics using the standard age and sex as well as HbA_1c because it is highly correlated with variation in outcome (correlation coefficient = 0.602; P < 0.001).

Calibration on Margins

Calibration on margins generates an adjusted sample by assigning a calibration sample weight (coefficient) to each individual based on an iterative process starting with initial weights d_k (which are usually the “sampling weights,” equal to the inverse probabilities of an individual to be included in the intervention). At each iteration, new “calibration weights” w_k that are as close as possible to the initial weights (as determined by a given distance function) are computed. Several mathematical functions are available, and we explored four of them: linear, raking ratio, logit, and truncated linear (see Supplementary Data for equations) (15). The two linear functions are based on quadratic functions, with the particularity that the truncated linear function always yields positive weights. The raking ratio and logit functions are logarithmic functions. A fifth function, hyperbolic sinus, was not available for SAS and R use and thus was not used in our analysis.

To our knowledge, no literature provides specific guidance on the types of characteristics that should be used for adjusting a sample applying this method in the context of a health intervention evaluation of diabetes care. We therefore used all characteristics that are considered to be associated with differences in outcomes in diabetic patients (23) and that may explain differences in before–after assessment of the DPN population: demographic characteristics, diabetes information, and other clinical characteristics (Table 1), regardless of whether they significantly differed between the DPN and ENTRED at baseline to ensure that they would not significantly differ between the two populations after adjustment.

Table 1

Comparison of the characteristics of the DPN population with the ENTRED population at baseline

Characteristics	Reference population, ENTRED (n = 2,485)	Intervention group, DPN (n = 232)
Demographics
Age, years	64.59 ± 12.47	60.32 ± 12.51***
≤55	23%	32%***
56–70	40%	45%***
>70	37%	22%***
Sex, male	54%	55%
Diabetes information
Duration of diabetes, years
<4	23%	41%***
4–9	25%	26%***
9–16	24%	14%***
>16	28%	19%***
Treatment modus
One oral HA	42%	38%
Two or more oral HA	35%	38%
Insulin and no, one, or more oral HA	23%	24%
Other clinical characteristics
Hypertension grade§
0	63%	63%
1, 2, or 3	37%	37%
LDL, g/L
≤1.3	78%	67%***
>1.3	22%	33%***
HDL, g/L
>0.4 (female) or 0.35 (male)	87%	88%
≤0.4 (female) or 0.35 (male)	13%	12%
Triglycerides, g/L
≤1.5	63%	66%
>1.5	37%	34%
ALD, % yes	87%	84%
Smoking status, % smoker	14%	15%
BMI, kg/m²
<25	22%	19%
25–30	38%	42%
>30	40%	39%
GFR, mL/min
Level 1: >90	42%	45%
Level 2: 60–89	36%	38%
Level 3, 4, or 5: 0–59	22%	17%
DPN assessed outcome
HbA_1c, %	7.14 ± 1.19	7.75 ± 1.78***
HbA_1c, mmol/mol	54.5 ± 13	61.2 ± 19.5***
<7% (<53 mmol/mol)	50%	34%***
7–8% (53–64 mmol/mol)	30%	28%***
>8% (>64 mmol/mol)	20%	38%***

Characteristics	Reference population, ENTRED (n = 2,485)	Intervention group, DPN (n = 232)
Demographics
Age, years	64.59 ± 12.47	60.32 ± 12.51***
≤55	23%	32%***
56–70	40%	45%***
>70	37%	22%***
Sex, male	54%	55%
Diabetes information
Duration of diabetes, years
<4	23%	41%***
4–9	25%	26%***
9–16	24%	14%***
>16	28%	19%***
Treatment modus
One oral HA	42%	38%
Two or more oral HA	35%	38%
Insulin and no, one, or more oral HA	23%	24%
Other clinical characteristics
Hypertension grade§
0	63%	63%
1, 2, or 3	37%	37%
LDL, g/L
≤1.3	78%	67%***
>1.3	22%	33%***
HDL, g/L
>0.4 (female) or 0.35 (male)	87%	88%
≤0.4 (female) or 0.35 (male)	13%	12%
Triglycerides, g/L
≤1.5	63%	66%
>1.5	37%	34%
ALD, % yes	87%	84%
Smoking status, % smoker	14%	15%
BMI, kg/m²
<25	22%	19%
25–30	38%	42%
>30	40%	39%
GFR, mL/min
Level 1: >90	42%	45%
Level 2: 60–89	36%	38%
Level 3, 4, or 5: 0–59	22%	17%
DPN assessed outcome
HbA_1c, %	7.14 ± 1.19	7.75 ± 1.78***
HbA_1c, mmol/mol	54.5 ± 13	61.2 ± 19.5***
<7% (<53 mmol/mol)	50%	34%***
7–8% (53–64 mmol/mol)	30%	28%***
>8% (>64 mmol/mol)	20%	38%***

Data are mean ± SD unless otherwise noted. ALD, affections de longue durée, statutory health insurance chronic disease coverage scheme; GFR, glomerular filtration rate; HA, hypoglycemic agent.

***P ≤ 0.001.

§According to the World Health Organization (33).

We explored three sets of characteristics. One with all characteristics excluding HbA_1c at baseline; one with all characteristics including HbA_1c at baseline; and one with age, sex, and HbA_1c at baseline only. This last set was used to compare calibration and simple poststratification performance on the same set of characteristics used for adjustment.

We therefore constructed 12 adjusted samples by calibration using the four functions on three sets of characteristics.

Analysis

We first used descriptive statistics to check for differences in patient characteristics between the intervention group and the reference population.

Second, we compared results of a before–after analysis with HbA_1c level after 1 year of enrollment in the DPN as the primary outcome on the initial sample, on the calibration-adjusted samples, and on the simple poststratification-adjusted samples. Changes in HbA_1c were assessed on the mean level based on the following categories: number of patients whose HbA_1c levels changed from >7 to ≤7% (24,25) and number of patients whose HbA_1c levels fell by ≥0.5% (26,27).

Finally, in order to compare the performance of simple poststratifications and the different calibration functions used in this evaluation context, we compared them in terms of SE and weight dispersion measured by the design effect, with a higher design effect expressing lower dispersion, which is considered preferable (28).

All analyses were performed using SAS version 9.2 (SAS Institute, Cary, NC) and the “Calmar” macro developed by the French National Institute for Statistics and Economic Studies (29). (Note that other software for calibration includes the R package sampling, g-Calib for SPSS, and Bascula for Blaise.) A P value of 5% was considered significant. All P values are two-sided. For calibration, continuous variables were transformed into categorical variables.

Results

Baseline Characteristics

Patient characteristics at baseline are presented in Table 1.

Compared with the ENTRED population, patients enrolled in the DPN were younger, had a more recent diabetes diagnosis, and had higher LDL cholesterol. Glycemic control as measured by HbA_1c was worse in the DPN patients.

Regardless of the sample used, results of the before–after analysis revealed that DPN patients had significant reductions in HbA_1c levels after 1 year of enrollment (Table 2 and Fig. 1). However, when the initial sample was adjusted, the reductions in all outcome measures (mean HbA_1c, the percentages of patients whose HbA_1c status changed from >7 to ≤7% or fell by ≥0.5) was smaller regardless of the poststratification method and the set of characteristics used (Table 2).

Table 2

Comparison of DPN effects after 1 year, by before–after evaluation approach, on the initial sample and samples adjusted by different poststratification methods on patient characteristics (n = 232)

Type of adjustment	Characteristics used for adjustment	Function	Mean HbA_1c change after 1 year, %	95% CI	SD	SE	Relative change from no adjustment, %	Patients with HbA_1c change from >7 to ≤7%, %	Patients with HbA_1c decrease ≥0.5%, %	Design effect ×10³
1. No adjustment			0.497***	0.308–0.685	1.456	0.096	0	25.86	41.38	0.00
2. Simple poststratification
	Age/sex	NA	0.406***	0.231–0.581	1.354	0.089	−22	24.77	37.47	4.31
	HbA_1c	NA	0.250**	0.089–0.412	1.248	0.082	−99	21.53	32.87	4.31
	Age/HbA_1c	NA	0.208**	0.054–0.361	1.190	0.078	−139	21.77	31.17	4.31
	Sex/HbA_1c	NA	0.250**	0.09–0.411	1.242	0.082	−99	21.50	32.78	4.31
	Age/sex/HbA_1c	NA	0.223**	0.067–0.379	1.205	0.079	−123	21.55	30.35	4.31
3. Advanced poststratification: calibration on margins
	All patient characteristics at baseline§ without HbA_1c	Linear	0.367***	0.208–0.526	1.232	0.081	−35	24.56	37.25	6.08
		Raking	0.369***	0.211–0.528	1.226	0.081	−35	24.73	37.00	6.27
		Logit	0.371***	0.213–0.53	1.226	0.081	−34	24.82	37.13	6.25
		Truncated linear	0.366***	0.209–0.523	1.213	0.080	−36	24.43	37.38	6.09
	Age/sex/HbA_1c	Linear	0.208**	0.055–0.361	1.200	0.078	−138	21.54	31.12	5.37
		Raking	0.235**	0.081–0.388	1.207	0.078	−112	21.68	31.41	5.41
		Logit	0.235**	0.081–0.389	1.208	0.078	−111	21.68	31.41	5.41
		Truncated linear	0.208**	0.055–0.361	1.200	0.078	−138	21.54	31.12	5.37
	All patient characteristics at baseline§ with HbA_1c	Linear	0.175*	0.036–0.314	1.084	0.071	−183	19.82	29.28	6.72
		Raking	0.200**	0.062–0.339	1.068	0.070	−148	20.15	29.65	7.29
		Logit	0.207**	0.068–0.345	1.069	0.070	−141	20.12	29.80	7.19
		Truncated linear	0.181**	0.049–0.313	1.018	0.067	−175	19.75	29.89	6.81

Type of adjustment	Characteristics used for adjustment	Function	Mean HbA_1c change after 1 year, %	95% CI	SD	SE	Relative change from no adjustment, %	Patients with HbA_1c change from >7 to ≤7%, %	Patients with HbA_1c decrease ≥0.5%, %	Design effect ×10³
1. No adjustment			0.497***	0.308–0.685	1.456	0.096	0	25.86	41.38	0.00
2. Simple poststratification
	Age/sex	NA	0.406***	0.231–0.581	1.354	0.089	−22	24.77	37.47	4.31
	HbA_1c	NA	0.250**	0.089–0.412	1.248	0.082	−99	21.53	32.87	4.31
	Age/HbA_1c	NA	0.208**	0.054–0.361	1.190	0.078	−139	21.77	31.17	4.31
	Sex/HbA_1c	NA	0.250**	0.09–0.411	1.242	0.082	−99	21.50	32.78	4.31
	Age/sex/HbA_1c	NA	0.223**	0.067–0.379	1.205	0.079	−123	21.55	30.35	4.31
3. Advanced poststratification: calibration on margins
	All patient characteristics at baseline§ without HbA_1c	Linear	0.367***	0.208–0.526	1.232	0.081	−35	24.56	37.25	6.08
		Raking	0.369***	0.211–0.528	1.226	0.081	−35	24.73	37.00	6.27
		Logit	0.371***	0.213–0.53	1.226	0.081	−34	24.82	37.13	6.25
		Truncated linear	0.366***	0.209–0.523	1.213	0.080	−36	24.43	37.38	6.09
	Age/sex/HbA_1c	Linear	0.208**	0.055–0.361	1.200	0.078	−138	21.54	31.12	5.37
		Raking	0.235**	0.081–0.388	1.207	0.078	−112	21.68	31.41	5.41
		Logit	0.235**	0.081–0.389	1.208	0.078	−111	21.68	31.41	5.41
		Truncated linear	0.208**	0.055–0.361	1.200	0.078	−138	21.54	31.12	5.37
	All patient characteristics at baseline§ with HbA_1c	Linear	0.175*	0.036–0.314	1.084	0.071	−183	19.82	29.28	6.72
		Raking	0.200**	0.062–0.339	1.068	0.070	−148	20.15	29.65	7.29
		Logit	0.207**	0.068–0.345	1.069	0.070	−141	20.12	29.80	7.19
		Truncated linear	0.181**	0.049–0.313	1.018	0.067	−175	19.75	29.89	6.81

A high design effect is considered favorable. NA, not applicable.

***P ≤ 0.001,

**P ≤ 0.01,

*P ≤ 0.05.

§Demographics, diabetes information, and other clinical characteristics.

Figure 1

View large Download slide

Trends in mean HbA_1c level at baseline and after 1 year, by sample used, defined by the set of patients characteristics and the poststratification method.

When HbA_1c level at baseline was included as an adjustment variable in either simple poststratification or calibration on margins, the measured change in all outcome measures was markedly lower than in the samples not adjusted on HbA_1c. For example, when the sample was adjusted using the linear function for calibration on margins, reductions in mean HbA_1c levels compared with the unadjusted sample were 35% lower when HbA_1c level at baseline was not used for calibration; however, when HbA_1c at baseline was included, reductions in HbA_1c levels were 183% lower.

When poststratification was not performed on the initial HbA_1c level, adjustment by calibration on margins with all remaining characteristics measured lower achievement in all outcomes than simple poststratification on age and sex. Similarly, when poststratification was also performed on the initial HbA_1c level, poststratification via calibration on margins with all patient characteristics measured lower changes in all outcomes than simple poststratification. This represents relative decreases from no adjustment of 141–183% for calibration on margins compared with relative decreases of 99–123% in simple poststratification samples (Table 2). However, calibration on margins showed similar results as simple poststratification when only age, sex, and HbA_1c level at baseline were used for adjustment in both methods.

The results of the four calibration functions fell within a narrow range. For example, when all characteristics were used for calibration, the absolute HbA_1c level change ranged from 0.18 to 0.21 (Table 2).

Performance of Adjustment Techniques

Based on the number of characteristics and the sample sizes used in the intervention group and reference population, all tested poststratification techniques were technically feasible and yielded robust results.

The design effect of poststratification via calibration on margins was persistently higher than in simple poststratification, indicating more favorable weight dispersion, including when similar characteristics (age, sex, and HbA_1c at baseline) were used in both methods. Moreover, in calibration on margins, the higher the number of characteristics used for adjustment, the lower the observed SE and design effect.

Finally, across the four calibration functions, the design effect was of a similar range, with the raking ratio function exhibiting the highest design effect compared with the linear, logit, and truncated linear functions for all sets of characteristics (Table 2).

Conclusions

To our knowledge, this is the first study to compare a range of adjustment methods, including calibration on margins, to evaluate a structured diabetes care intervention using a national cross-sectional sample of diabetic patients as the reference population. While the positive impact of the DPN remained significant, we found that before–after analysis without poststratification may have overestimated the effect of the DPN by 22–183% in terms of observed improvements in HbA_1c levels. Furthermore, adjustment on HbA_1c levels at baseline appears to be important for not overestimating the intervention effect. When compared with simple poststratification, change in the observed improvement is usually lower using calibration on margins, mostly because it allows adjustment on a greater number of characteristics. Moreover, estimations using calibration on margins exhibited higher performance with lower SEs and higher design effects, strongly suggesting that calibration on margins is the preferable adjustment method in this context. Finally, the four calibration functions that we tested all showed comparable performance.

As the analytical approach explored here has thus far not been documented in the peer-reviewed literature in the context of evaluation, it is difficult to compare our findings with work undertaken elsewhere. While it is well known that simple before–after evaluation can lead to substantial overestimation of the intervention effect in structured care (and, in rare cases, to underestimation), the size of this misestimation is not well understood (10). However, a recent evaluation of a diabetes disease management program in Austria using a cluster-randomized controlled trial found that HbA_1c levels in the intervention group had decreased by 0.13% after 1 year. Conversely, using the same data and applying a simple before–after comparison, the effect size was measured as a decrease of 0.41% (13). Thus the before–after design had overestimated the “true” intervention effect established in the randomized controlled trial by ∼68%. While direct comparison of findings is impossible given the different methods used, the relative overestimation of the intervention effect as identified using poststratification methods appears to be within the same range. This suggests that poststratification and preferably calibration on margins may provide a useful evaluation approach to produce valid findings where more scientifically robust designs such as randomization are not possible.

We acknowledge some limitations regarding data quality, availability, and follow-up.

We included DPN patients in the analysis only if complete follow-up data and characteristics for calibration were available, which increased the likelihood of selection effects due to missing data within the DPN. Further, our analysis did not include characteristics on socioeconomic status, comorbidities, and diabetes type.

Despite using a proxy for comorbidities and the fact that the type of diabetes is in part reflected by age, disease duration, and treatment mode, we were likely unable to account for the full extent of patient selection. Moreover, there were missing data in the ENTRED reference population, which may render it a slightly biased representation of the overall target population. Despite these limitations, the data used reflects data available in the context of provider network evaluation, and the methods tested aim to support future pragmatic evaluations of similar interventions.

In addition, the data sources in our study may have been suboptimal in terms of their ability to illustrate the advantages of calibration on margins. In fact, our intervention and reference populations differed only moderately at baseline in some of the characteristics that may be associated with the outcome (e.g., BMI and renal function). Yet greater differences in patient characteristics between the two populations would probably lead to more marked differences in effect size and test performance between simple and advanced poststratification, thereby allowing us to illustrate the higher potential of calibration on margins.

In terms of analytical methods, our study did not test the hyperbolic sinus function as a fifth mathematical basis for calibration on margins. Because of the very similar results obtained by the four present functions in terms of measured outcomes and design effect, we assume that results would not have greatly differed for the hyperbolic sinus function.

Our results have implications for research and decision making. We conceived the evaluation approach tested here as a tool in situations in which gold-standard designs, such as randomized controlled trials or quasi-experimental designs (30), are not feasible for logistic or resource reasons. Calibration appears to be applicable to real-life evaluation and adapted to assessments of the effectiveness of a given intervention, particularly in the case of inclusion of more characteristics than would be feasible with simple poststratification. Indeed, if the number of characteristics used in the latter is higher than three, the number of strata increases almost exponentially, leading to a high risk of empty strata and, consequently, biased estimates. Moreover, calibration on margins can be applied when only aggregated data are available for the overall target population, while simple poststratification requires patient-level data. Given these possibilities, calibration on margins can increase the external validity of a given evaluation, as opposed to the high internal validity of evaluations using randomized controlled designs (31). While calibration on margins is not comparable to controlled evaluation designs and cannot measure phenomena such as the “placebo effect,” it may provide a useful evaluation design where randomization is not possible and program planners or funders are interested in estimating the effect of an existing intervention if rolled out to a wider population based on routinely collected data. In such a case, calibration on margins could provide clinicians, program managers and policy makers with relevant information regarding whether and how much they should invest in such a wider strategy.

It is important to note the methodological restrictions in using this method. Calibration on margins is technically feasible when the size of the intervention population is at least ∼1/10 the size of the reference population used for adjustment (32). Thus the applicability of calibration is likely limited to settings in which the intervention group is sufficiently large compared with the reference population. Moreover, researchers using the linear function in calibration should be aware that it can yield negative weights, and thus only statistical tests for quantitative variables may be used.

Overall, this study underscores the utility of poststratification methods in a context in which structured approaches to care for chronic diseases such as diabetes are increasingly being implemented but hard to evaluate in a rigorous manner for financial or logistic reasons. Calibration on margins appears to be the preferable poststratification method mostly because it allows for adjustment on a greater number of characteristics than simple poststratification. It appears to provide an effective means for accounting for selection bias, thereby mitigating the possibility of overestimation of the intervention effect when simple before–after evaluations are undertaken in real-world settings.

Article Information

Acknowledgments. The authors thank the DPN Paris Diabète, in particular Pierre-Yves Traynard and Pierre-Albert Charbit, and the ENTRED partners for generously providing the data. The authors are very grateful to the patients for their participation in these initiatives. The authors further thank Karen Berg Brigham (URC-Eco) for her very helpful review of the manuscript.

Funding. The 2007 ENTRED study was funded by the Institute for Public Health Surveillance, the National Institute for Prevention and Health Education, the General Scheme of Health Insurance, the Independent Scheme for Employees, and the French National Authority for Health. This study was conducted with support from the DISMEVAL Consortium, funded under the European Commission’s Seventh Framework Programme (grant 223277). See www.dismeval.eu for additional information.

Duality of Interest. No potential conflicts of interest relevant to this article were reported.

Author Contributions. K.C. designed the study, analyzed data, and wrote the manuscript. M.B. analyzed data and wrote the manuscript. B.C. designed the study, performed the statistical analysis, and reviewed the manuscript. E.N. contributed to the discussion and reviewed and edited the manuscript. I.D.-Z. obtained data and reviewed and edited the manuscript. K.C. is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

References

1.

Mattke

S

,

Seid

M

,

Ma

S

.

Evidence for the effect of disease management: is $1 billion a year a good investment?

Am J Manag Care

2007

;

13

:

670

–

676

[PubMed]

Google Scholar

2.

Gress

S

,

Baan

CA

,

Calnan

M

, et al.

Co-ordination and management of chronic conditions in Europe: the role of primary care—position paper of the European Forum for Primary Care

.

Qual Prim Care

2009

;

17

:

75

–

86

[PubMed]

Google Scholar

3.

Epstein

RS

,

Sherwood

LM

.

From outcomes research to disease management: a guide for the perplexed

.

Ann Intern Med

1996

;

124

:

832

–

837

[PubMed]

Google Scholar

Crossref

4.

Ham

C

,

Curry

N

.

Integrated Care. What Is It? Does It Work? What Does It Mean for the NHS?

London

,

The King’s Fund

,

2011

Google Scholar

5.

Nolte

E

,

McKee

M

.

Caring for People with Chronic Conditions: A Health System Perspective

.

Maidenhead

,

Open University Press

,

2008

Google Scholar

6.

Durand-Zaleski

I

,

Obrecht

O

.

France

. In

Managing Chronic Conditions: Experiences in Eight Countries

.

Nolte

E

,

Knai

C

,

McKee

M

, Eds.

Copenhagen

,

World Health Organization, on behalf of the European Observatory on Health Systems and Policies

,

2008

, p.

55

–

73

Google Scholar

7.

Pimouguet

C

,

Le Goff

M

,

Thiébaut

R

,

Dartigues

JF

,

Helmer

C

.

Effectiveness of disease-management programs for improving diabetes care: a meta-analysis

.

CMAJ

2011

;

183

:

E115

–

E127

[PubMed]

Google Scholar

Crossref

8.

Hopkins

D

,

Lawrence

I

,

Mansell

P

, et al.

Improved biomedical and psychological outcomes 1 year after structured education in flexible insulin therapy for people with type 1 diabetes: the U.K. DAFNE experience

.

Diabetes Care

2012

;

35

:

1638

–

1642

[PubMed]

Google Scholar

Crossref

9.

Mattke

S

,

Bergamo

G

,

Balakrishnan

A

,

Martino

S

,

Vakkur

NV

.

Measuring and Reporting the Performance of Disease Management Programs

.

Santa Monica

,

RAND Corporation

,

2006

Google Scholar

10.

Nolte

E

,

Conklin

A

,

Adams

J

, et al.

Evaluating Chronic Disease Management - Recommendations for Funders and Users

.

Cambridge

,

RAND Corporation and DISMEVAL Consortium

,

2012

Google Scholar

11.

Knai

C

,

Nolte

E

,

Brunn

M

, et al.

Reported barriers to evaluation in chronic care: experiences in six European countries

.

Health Policy

2013

;

110

:

220

–

228

[PubMed]

Google Scholar

Crossref

12.

Buntin

MB

,

Jain

AK

,

Mattke

S

,

Lurie

N

.

Who gets disease management?

J Gen Intern Med

2009

;

24

:

649

–

655

[PubMed]

Google Scholar

Crossref

13.

Flamm

M

,

Panisch

S

,

Winkler

H

,

Sönnichsen

AC

.

Impact of a randomized control group on perceived effectiveness of a Disease Management Programme for diabetes type 2

.

Eur J Public Health

2012

;

22

:

625

–

629

[PubMed]

Google Scholar

Crossref

14.

Särndal

C

.

The calibration approach in survey theory and practice

.

Surv Methodol

2007

;

33

:

99

–

119

Google Scholar

15.

Deville

J

,

Särndal

C

.

Calibration estimators in survey sampling

.

JASA

1992

;

87

:

376

–

382

Google Scholar

Crossref

16.

Lu

H

,

Gelman

A

.

A method for estimating design-based sampling variances for surveys with weighting, poststratification, and raking

.

J Off Stat

2003

;

19

:

133

–

151

Google Scholar

17.

McNamee R. Regression modelling and other methods to control confounding. Occup Environ Med. 2005;62:500–506, 472

18.

Kim

JK

,

Park

M

.

Calibration estimation in survey sampling

.

Int Stat Rev

2010

;

78

:

21

–

39

Google Scholar

Crossref

19.

Tiv

M

,

Viel

J-F

,

Mauny

F

, et al.

Medication adherence in type 2 diabetes: the ENTRED study 2007, a French Population-Based Study

.

PLoS ONE

2012

;

7

:

e32412

[PubMed]

Google Scholar

Crossref

20.

Egginton

JS

,

Ridgeway

JL

,

Shah

ND

, et al.

Care management for Type 2 diabetes in the United States: a systematic review and meta-analysis

.

BMC Health Serv Res

2012

;

12

:

72

[PubMed]

Google Scholar

Crossref

21.

Steinsbekk

A

,

Rygg

LØ

,

Lisulo

M

,

Rise

MB

,

Fretheim

A

.

Group based diabetes self-management education compared to routine treatment for people with type 2 diabetes mellitus. A systematic review with meta-analysis

.

BMC Health Serv Res

2012

;

12

:

213

[PubMed]

Google Scholar

Crossref

22.

Little

RJA

.

Post-stratification: a modeler’s perspective

.

J Am Stat Assoc

1993

;

88

:

1001

–

1012

Google Scholar

Crossref

23.

Armoogum

J

,

Madre

JL

.

Weighting or imputations? The example of nonresponses for daily trips in the French NPTS

.

J Transp Stat

1998

;

1

:

53

–

63

Google Scholar

24.

Mauras

N

,

Beck

R

,

Xing

D

, et al

Diabetes Research in Children Network (DirecNet) Study Group

.

A randomized clinical trial to assess the efficacy and safety of real-time continuous glucose monitoring in the management of type 1 diabetes in young children aged 4 to <10 years

.

Diabetes Care

2012

;

35

:

204

–

210

[PubMed]

Google Scholar

Crossref

25.

Owens

LA

,

Avalos

G

,

Kirwan

B

,

Carmody

L

,

Dunne

F

.

ATLANTIC DIP: closing the loop: a change in clinical practice can improve outcomes for women with pregestational diabetes

.

Diabetes Care

2012

;

35

:

1669

–

1671

[PubMed]

Google Scholar

Crossref

26.

DePue

JD

,

Dunsiger

S

,

Seiden

AD

, et al.

Nurse-community health worker team improves diabetes care in American Samoa: results of a randomized controlled trial

.

Diabetes Care

2013

;

36

:

1947

–

1953

[PubMed]

Google Scholar

Crossref

27.

Sönnichsen

AC

,

Rinnerberger

A

,

Url

MG

, et al.

Effectiveness of the Austrian disease-management-programme for type 2 diabetes: study protocol of a cluster-randomized controlled trial

.

Trials

2008

;

9

:

38

[PubMed]

Google Scholar

Crossref

28.

Kish

L

.

Survey Sampling

.

New York

,

John Wiley & Sons

,

1965

Google Scholar

29.

Sautory

O

.

La macro CALMAR - redressement d’un échantillon par calage sur les marges

.

Paris

,

INSEE

,

1993

Google Scholar

30.

Duru

OK

,

Mangione

CM

,

Chan

C

, et al.

Evaluation of the diabetes health plan to improve diabetes care and prevention

.

Prev Chronic Dis

2013

;

10

:

E16

[PubMed]

Google Scholar

Crossref

31.

English

M

,

Schellenberg

J

,

Todd

J

.

Assessing health system interventions: key points when considering the value of randomization

.

Bull World Health Organ

2011

;

89

:

907

–

912

[PubMed]

Google Scholar

Crossref

32.

Vivot M. Calage sur les marges aléatoires - une aventure hasardeuse? Presented at the Colloque francophone sur les sondages, 2005, Laval, Quebec, Canada

33.

Whitworth

JA

World Health Organization, International Society of Hypertension Writing Group

.

2003 World Health Organization (WHO)/International Society of Hypertension (ISH) statement on management of hypertension

.

J Hypertens

2003

;

21

:

1983

–

1992

[PubMed]

Google Scholar

2014

Readers may use this article as long as the work is properly cited, the use is educational and not for profit, and the work is not altered. See http://creativecommons.org/licenses/by-nc-nd/3.0/ for details.

Evaluating Structured Care for Diabetes: Can Calibration on Margins Help to Avoid Overestimation of the Benefits? An Illustration From French Diabetes Provider Networks Using Data From the ENTRED Survey

Introduction

Research Design and Methods

Data Sources

Study Population and Measures

Poststratification Methods

Simple Poststratification

Calibration on Margins

Analysis

Results

Baseline Characteristics

Performance of Adjustment Techniques

Conclusions

Article Information

References

Supplementary data

Email alerts

Journals

Books

Clinical Compendia

News

Other

About

Resources

Evaluating Structured Care for Diabetes: Can Calibration on Margins Help to Avoid Overestimation of the Benefits? An Illustration From French Diabetes Provider Networks Using Data From the ENTRED Survey

Introduction

Research Design and Methods

Data Sources

Study Population and Measures

Poststratification Methods

Simple Poststratification

Calibration on Margins

Analysis

Results

Baseline Characteristics

Performance of Adjustment Techniques

Conclusions

Article Information

References

Supplementary data

Email alerts

Journals

Books

Clinical Compendia

News

Other

About

Resources

This Feature Is Available To Subscribers Only