This is one of the fundamental questions in health policy in recent years. Quality can be measured in two parts. First, does quality improved for targeted patients? Second, does quality improve for non-targeted patients.
The first question is relatively clear. One would think that quality would improve for the patients targeted by a pay-for-performance (P4P) program but this may not always be the case. For instance, the financial incentives to improve quality may not be sufficient. Or the provider who is incentivized by the P4P program may not have a significant enough impact on outcomes (e.g., patient non-adherence, social determinants of health, or the quality of other health care providers may also impact the outcome of interest).
The second question is more complex. If P4P programs try to incentivize quality for targeted patients, quality could benefit or harm non-targeted patients. If investments in quality improvement initiatives benefit targeted and non-targeted patients, P4P programs could benefit non-targeted patients as well. However, if physicians and other health care providers divert efforts from non-targeted to targeted patients, quality of care for non-targeted patients could fall under P4P. Seminal work from Holmstrom and Milgrom (1991), showed that substitution of effort away from non-incentivized tasks is well-documented not just in healthcare but in other industries as well.
A new paper by Britteon et al. (2025) describes the Advancing Quality (AQ) initiative implemented in the United Kingdom:
Advancing Quality (AQ) scheme was a pay-for-performance scheme introduced in the North-West region of England in October 2008…all 24 eligible hospital trusts in the region voluntarily participated in the scheme. The scheme was an English adaptation of the US Premier Hospital Quality Incentive Demonstration (HQID) scheme (Jha et al. 2012). It initially rewarded hospitals based on the quality of care provided to patients that were admitted for one of three emergency conditions (acute myocardial infarction, heart failure, pneumonia) or patients that underwent one of three elective procedures (coronary artery bypass graft, hip replacement, and knee replacement)…
During this time, hospitals scoring in the highest or second highest quartile received a 4% or 2% bonus payment on top of the amount reimbursed to the hospital for the associated activity under the national activity-based financing tariff
So, paralleling our two questions above, Britteon and co-authors examine whether quality improves (i) for these 6 conditions and (ii) for other patients outside these 6 conditions. The authors use data from the Hospital Episode Statistics (HES) linked with mortality data from the UK Office of National Statistics (ONS). The key outcome of interest was 30-day mortality rates. A difference-in-difference approach was used to evaluate the impact of the AQ program where hospitals in the North-West of England (where the AQ program was introduced) was compared to hospitals in the rest of the country. Using this approach, the authors found that:
…regional payment changes were associated with an increase in mortality of 0.321 percentage points (S.E. 0.114) for non-targeted emergency patients who were treated by physicians with no exposure to the incentives, compared to control regions. In contrast, the mortality rate for non-targeted patients reduced by 0.008 percentage points (S.E. 0.002) for every additional targeted patient treated per quarter by their physician. These findings were consistent across a range of sensitivity analyses. The findings suggest that providers diverted resources away from non-targeted patients but that patients benefitted from physicians learning from the incentives.
…the results suggest that the AQ scheme had a small adverse spillover effect on the mortality rate for non-targeted patients treated in an incentivized hospital. The magnitude of this estimated spillover effect offset previously observed improvements in the mortality rate for patients with a targeted condition…but was statistically insignificant at the 95% confidence level.
You can read the whole paper here.