PHBE :: Personal Health Budgets Evaluation

Q&A: Study Design

The evaluation was undertaken of an entirely new policy, one that we piloted in a number of PCTs.
The study was primarily designed to assess the effectiveness and costs of personal health budgets (PHBs).

The original intention was for the study to be an individual level randomised control trial (RCT). However, for a number of practical reasons, in some sites this design was not possible and a non-random control design was used. In other sites (around 25% of the whole sample), an RCT was possible, and implemented. To control for potential selection bias, a difference-in-difference (D-in-D) method was adopted. This is a well-established and robust method designed to address this problem. The method involved collecting baseline data for all participants before any use of a personal health budget (PHB). It then focused on the change in experience of people either receiving the normal support or a PHB. This change was compared between the PHB and control groups in the study. By comparing changes in costs and benefits from their baseline position, this method removed any differences in costs and benefits between the groups at baseline. We measured the impact of PHBs in terms of how the experience of the PHB group deviated during the period of receipt of the PHB from the experiences of the control group.

We also further controlled for baseline characteristics such as age, condition, sex, dependency levels etc., in case these factors implied that people in the two arms of the study were somehow on a different trajectory of experience.

It is also important to note that, whilst RCTs are considered the gold standard for tackling selection bias in many circumstances, they do have their limitations when used for ‘social experiments’^1. In particular, it is impossible to implement a ‘blind’ (and in particular, a double-blind) RCT in the case of PHBs. They are a system-level intervention designed as much to change the system - i.e. the care professionals, funding, commissioning etc. - as the role of the patient. In this case, an RCT may not be the best model anyway because the same care professions/system could be interacting with both the person selected randomly for a PHB and those who continued with the usual care.

A ‘cluster’ randomised design was a possible solution, whereby interacting groups of patients and care professionals were the units of analysis to be randomised. However, we considered that there was an insufficient number of clusters in the study to use this randomisation approach.

Another salient point is that the study needed to evaluate the resource implications (i.e. costs) of using PHBs. We can only judge the overall merit of the intervention with information on the opportunity costs as well as the benefits. Nonetheless, costing a systemic intervention was challenging and therefore the sensitivity of our assumptions was thoroughly investigated.

As regards statistical uncertainty, as well as conventional statistics tests, the evaluation used bootstrap approaches which involved repeated re-sampling in the data. This alternative approach helped to ensure that statistical significance was not driven by any assumptions inherent in statistical testing.

Full details of the methods were provided and published to allow open scrutiny. Conclusions in the evaluation were appropriately conditioned and caveated, allowing readers to form their own judgement.

The evaluation (and the method used) was subject to the usual rigorous, anonymous peer review and satisfied the reviewers. The study was then published in the Journal of health services research and policy, where it was again subject to peer-review in the usual way.

As noted in the report, the analysis was also performed for just the randomised sites. The net monetary benefit of PHBs using the ASCOT indicator at the £30,000 level was greater (more positive) in the randomised sub-sample than the whole sample. The result was significant at the 10% level, despite the small sample sizes. Moreover, in the main sensitivity analysis comparison (using alternative imputation methods) this result was significant at the 5% level. We would caution about the small sample size in this case (c. 500 people in total in both groups) and would not wish to make any conclusive statement, but this result does not suggest that selection bias was driving the overall result.

¹ See, for example, Cameron, A.C. and Trivdi, P.K. (2005) Microeconometrics: methods and applications, Cambridge University Press, New York (especially pages 52-57). Or Imbens, G. W. and Wooldridge, J. M. (2009) "Recent developments in the econometrics of program evaluation", Journal of Economic Literature, 47(1): 5-86.

Useful resources