Opiniones | Opinions | Editoriales | Editorials

Evaluating the Impact of Value-Based Purchasing: A Guide for Purchaser

Picture of System Administrator
Evaluating the Impact of Value-Based Purchasing: A Guide for Purchaser
by System Administrator - Tuesday, 22 July 2014, 5:40 PM
Colaboradores y Partners


Evaluating the Impact of Value-Based Purchasing: A Guide for Purchaser


A recent report by the Institute of Medicine (Crossing the Quality Chasm: A New Health System for the 21st Century) identified a "chasm" between the quality of care we have and the quality of care we should have. Employers could be a powerful force for closing this gap, since they pay for much of the health care in the United States. Past research has shown that employers and employer coalitions have at least some of the tools they need to serve as a force for quality, and a growing number of pioneers indeed are developing and implementing strategies they hope will improve the quality and value of the health care they purchase. But to date we have very little evidence on the impact of such efforts:

  • What strategies are effective?
  • Under what circumstances?
  • In what markets?

This lack of evaluation presents problems at both the program and policy levels. At the program level, it means that employers and coalitions find it hard to determine whether their own initiatives, much less the initiatives they might choose to emulate, are in fact meeting their intended goals. Given rising health care costs, strategies that cannot prove their impact are likely to be abandoned. At the policy level, the lack of evaluation leaves unanswered the question of whether one can rely on the current market mechanism for improving quality.

The health services research community can play an important role in helping to evaluate past value-based purchasing efforts. A recent AHRQ Program Announcement, "Impact of Payment and Organization on Cost, Quality and Equity," seeks to encourage such research. In the meantime, however, employers and coalitions have indicated that they need some tools to help them do "real time" tracking and assessment of the impact of their own efforts, so they can tell quickly which to keep and which to drop, and how to fine-tune their strategies. Evaluating the Impact of Value-Based Purchasing Initiatives: A Guide for Purchasers is intended to meet this immediate purchaser need. We hope employers and coalition leaders will try it out, tell us how well it is working and how it could be improved, and use it to improve and enhance their value-based purchasing efforts.

—Irene Fraser, Ph.D., Director
Center for Organization and Delivery Studies
Agency for Healthcare Research and Quality


One of the unique aspects of health care in the United States is the manner in which it is financed. Unlike many other countries, the United States does not have a publicly administered universal insurance program. Instead, health care is financed through a multitude of public and private insurance programs administered by purchasers. In the United States, the primary group purchasers include Medicare and Medicaid for public health care programs, and employers and purchasing cooperatives for private health care programs. Purchasers are defined as follows:

"Purchasers" are public and private sector entities that subsidize, arrange and contract for—and in many cases bear the risk for—the cost of health care services received by a group of beneficiaries.

In the past decade, many public and private health care purchasers have become more active in the purchasing process. Rather than simply writing checks to health insurers or health care providers, they are attempting to measure, monitor, and improve the quality they are receiving for the health care dollar spent. Although there are numerous ways purchasers have approached these tasks, they have been collectively termed value-based purchasing (VBP).

The need and opportunity for VBP has probably never been greater than it is right now. After several years of moderate growth in health care costs, recent estimates indicate that costs grew by nearly 7 percent from 1999 to 2000, outpacing the growth of the economy for the first time in almost a decade (Levit, Smith et al., 2002). Reflecting that growth, private employers estimate that their health care costs in 2002 will be nearly 14 percent more than they were in 2001 (Ceniceros, 2001). These increases are the highest in more than a decade and come at a time of economic uncertainty, prompting many employers to question the value of these expenditures.

At the same time, a growing body of evidence suggests that a significant percentage of the money that employers and other purchasers are spending on health care pays for poor quality in the form of overuse, misuse (e.g., medical errors), and waste (Midwest Business Group on Health et al., 2002Kohn et al., 2000). For example, according to a study by the Midwest Business Group on Health, the Juran Institute, and The Severyn Group, a conservative estimate of the direct cost of poor quality care for employers would be $1,350 per employee per year, while the indirect cost of poor quality care, including lost time and productivity, is at least $340 per employee per year (Midwest Business Group on Health et al., 2002).

Despite the level of activity across the country, the impact of VBP activities on health care quality and costs has not been well established. The evaluation of these activities is a critical step for purchasers eager to identify and adopt beneficial tactics and avoid those found to be ineffective.


Employers Are the Nation's Biggest Purchasers
About 153 million American workers and their dependents receive health insurance as an employment benefit (Gabel et al., 2001). Although some analysts regard the connection between employment and health insurance as an historical accident or question the wisdom of tying health insurance to employment (Battistella and Burchfield, 2000), the fact remains that, collectively, public and private employers are the largest purchasers of health insurance (and consequently health care) in the United States. On average, employers spend $4,164 per employee on health benefits (Battistella and Burchfield, 2000).


Purpose of the Guide

This guide has two primary objectives:

  • To encourage purchasers, and especially employers, to conduct formal evaluations of their VBP activities.
  • To facilitate that effort by presenting an evaluation process that purchasers can adopt and adapt to their projects.

For many value-based purchasers, one barrier to conducting an evaluation has been the lack of a resource that combines the formality of scientific research principles with real world examples and illustrations. This guide is intended to fill that gap with a tool that is accessible and useful to those without research experience but still informative for researchers.

Another important barrier to both the implementation of value-based purchasing principles and the evaluation of value-based purchasing activities is the fact that most purchasers are too busy with their own initiatives to focus on sharing any knowledge they have gained with other purchasers. To the extent that this guide helps to establish a common framework for evaluating value-based purchasing initiatives, it may serve to encourage the documentation and sharing of knowledge so that in the future, purchasers can learn more from others instead of having to "reinvent the wheel."

Finally, this guide is intended as a resource to be used and shared by decisionmakers as well as those who work on their behalf, whether contractually or otherwise. Specifically, it has been designed so that a senior-level health benefits manager with little research experience can learn about the issues and select a specific analytical strategy without knowing exactly how that strategy might be executed in practice. The manager can then consult with internal or external analysts, who will also find this guide useful for understanding the important analytic, data collection, and measurement details involved in executing the chosen research design. In this way, the guide can help to fill a void that often prevents formal evaluation of value-based purchasing activities from occurring, namely the disconnect between those responsible for making timely business decisions and those experienced in conducting research.

Organization of the Guide

In the first part of the guide, readers will find the following:

The second part walks through the five major steps involved in evaluating VBP activities.

This step includes a detailed review of several different research methods that purchasers may want to consider.

By following these steps, purchasers will be able to develop much-needed evidence about the effectiveness of various VBP initiatives.

Citations for publications that provide further detail on the topics addressed in this guide as well as specific examples of purchasers engaged in VBP activities may be found in the bibliography.


I. The Basics of Value-Based Purchasing

This section reviews the purpose of value-based purchasing, common goals of VBP initiatives, the kinds of activities that purchasers pursue, and the reasons for a greater emphasis on evaluation.

What Is Value-Based Purchasing?

While there are different ways to define value-based purchasing, at its broadest the term basically refers to any purchasing practices aimed at improving the value of health care services, where value is a function of both quality and cost. It can be helpful to think about value as the result of quality divided by cost:

Value = Quality ÷ Cost

This equation shows that value increases as quality increases, holding expenditure constant.

For the purpose of this guide, value-based purchasing emphasizes activities that aim to improve the quality of care that patients and other consumers of health care services receive. It does not emphasize the various strategies that purchasers use to reduce their costs, even if they are holding quality constant. While many purchasers adopt VBP strategies in an effort to lower their expenses in the long term, it is important to recognize that, although improvements in quality can and often do reduce costs, they may also increase costs or be cost-neutral.

Why Be a Value-Based Purchaser?

As the definition suggests, the main reason to practice value-based purchasing is to get more for your money. On a practical level, this means that your VBP activities should be geared towards influencing one or more final or intermediate outcomes.

One Definition of Value-based Purchasing
"The concept of value-based health care purchasing is that buyers should hold providers of health care accountable for both cost and quality of care. Value-based purchasing brings together information on the quality of health care, including patient outcomes and health status, with data on the dollar outlays going towards health. It focuses on managing the use of the health care system to reduce inappropriate care and to identify and reward the best-performing providers. This strategy can be contrasted with more limited efforts to negotiate price discounts, which reduce costs but do little to ensure that quality of care is improved."
Meyer, Rybowski, and Eichler, 1997, p.1
Reason 1: To Improve Final Outcomes

For the purposes of this guide, final outcomes are the results that purchasers ultimately care about: health status, satisfaction with health plans and care delivery, costs, and, for purchasers that are employers, the ability to compete in the market for labor. This section reviews the specific goals associated with these outcomes.

Improved Health Status. Since the primary reason to pursue a VBP strategy is to improve quality, one of the most important outcomes involves changes in the health status of individuals and, in some cases, communities. Ideally, purchasers gauge these changes by evaluating both clinical measures of health status as well as measures that reflect the patient's perspective. The challenge is to be realistic about what impact VBP activities can have on health status and to identify other factors that may influence your results.

Greater Satisfaction With Health Plans and Care Delivery. Most purchasers regard increased satisfaction with health plans and health care as an important final outcome, if for no other reason than to remain competitive in the market for labor. While measures of enrollee or patient satisfaction are not necessarily a reliable reflection of clinical quality, they are easier to assess and understand and are often reported as a proxy for quality.

Lower Costs. For most purchasers, a primary goal of VBP activities is to reduce expenditures associated with health care. Typically, purchasers focus on their own costs, whether measured by premiums or by payments to providers. Some look at costs more broadly by including the initiative's financial impact on the company (e.g., savings achieved by increased productivity or reduced absenteeism). It is also possible to consider savings to patients and their families; for example, an initiative to improve care for asthma may reduce the need for frequent emergency room visits, which impose a measurable cost on families in terms of time and money. Finally, since many VBP activities have an impact beyond an employer and its employees, an evaluation could also attempt to assess changes and shifts in costs at the community level.

Greater Competitiveness in the Labor Market. The many health care purchasers that are employers offer health care benefits as part of the total wage and compensation package for employees. Employer-sponsored health care coverage is the norm in this country for those of working age (18-65) and their dependents and is seen by many employers as a necessity for attracting qualified labor. To the extent that they change the nature of the health benefits package, VBP activities can be related to labor force outcomes such as employee turnover, wages, and the ability to hire new employees (Dowd and Finch, 2001). For example, an employer that significantly reduces its subsidization of employment-based health insurance (and requires more cost-sharing of employees) might experience increased turnover as employees search for jobs with more generous health benefits. While most VBP activities will not have an impact on labor market outcomes, any activities that add, reduce, or alter benefits for existing and new employees might have implications for the employer's competitiveness in the labor market.

Reason 2: To Improve Intermediate Outcomes

Unfortunately, making a direct connection between VBP activities and final outcomes is not always easy. For example, differences in the health status of patients and populations are often difficult to measure; even when they can be detected, there are many other factors aside from VBP activities that could have influenced the change. As a result, you often need to infer the effects of VBP activities from intermediate outcome measures.

Generally speaking, intermediate outcomes are measurable results that have been shown to influence final outcomes. In one study, for example, performance on intermediate outcomes measures predicted the impact of selective referrals to hospitals on mortality rates, an important measure of health status (Dudley et al., 2000). However, these kinds of measures do not determine final outcomes (i.e., an improvement in intermediate outcomes will not guarantee better final outcomes.)

This section discusses some important intermediate outcomes: the selection of high-quality plans and providers by consumers, the utilization of health care services, the prevalence of healthy behaviors, and medical errors.

More Consumers Choose High-Quality Plans and Providers. A common objective of VBP activities is to encourage beneficiaries, patients, or enrollees to select high-quality health plans and providers (e.g., hospitals, medical groups, nursing homes). The theory is that if individuals can identify and choose high-quality providers and plans, they are more likely to experience improvements in their health status.

More Appropriate Utilization of Health Care Services. Many VBP activities focus on reducing inappropriate utilization (e.g., unnecessary Caesarean sections, antibiotic prescriptions for viral infections) and/or increasing appropriate utilization (e.g., compliance with recommended immunizations and preventive care screenings such as mammograms). There is ample evidence linking the appropriate utilization of services to better health status as well as lower costs, but the determination of appropriateness is not always straightforward.

More Evidence of Healthy Behaviors. Many purchasers, especially employers, change their benefits package or collaborate with health plans and providers in an attempt to influence the health-related behaviors of enrollees and patients, such as smoking, alcohol use, and exercise. A change in the prevalence of these behaviors is a relevant intermediate outcome because of its direct relationship with costs and health status. For example, smoking, excessive alcohol consumption, and other inadvisable or risky behaviors are related inversely to health status and positively to health care expenditures. Similarly, exercise is related positively to health status and inversely to health care expenditures. Thus, to the extent that these programs are successful, they would likely be correlated with better health status and lower expenditures.

Fewer Medical Errors. Some prominent purchasers are beginning to respond to recent concerns about patient safety by developing VBP programs aimed at minimizing medical errors (Leapfrog Group, 2001). Medical errors include errors of omission (e.g., a failure to diagnose a health care problem requiring treatment) as well as errors of commission (e.g., a surgery performed on the wrong knee, an overdose of an appropriate medication). While this is a new focus for value-based purchasing, initiatives that lead to a reduction in health care mistakes are expected to improve quality and lower overall costs.

What Do Value-Based Purchasers Do?

Because quality is a broad concept with many dimensions, value-based purchasing encompasses a wide range of initiatives designed to achieve a variety of short-term and long-term objectives. Table 1 lists a number of purchasers engaged in VBP activities; the studies cited in this table provide useful information about the specific VBP activities in which these purchasers are engaged as well as, in some cases, insights into the impact of those efforts. This literature suggests that VBP activities generally focus on three groups:

      • Those who are eligible for or receive health care (e.g., employees, patients).
      • Those who provide health care (e.g., health plans, physicians, hospitals).
      • The third parties who pay for health care (e.g., insurance companies).
Table 1. Examples in the Literature of Purchasers Engaged in Value-Based Purchasing
Purchaser(s)Authors of studies
(see references for full citation)
New Jersey Medicaid Office of Managed Care Farley et al. 2002 (also see Medicaid programs in five States, below)
General Motors Corporation Scanlon et al. 2002; Meyer et al. 1999; Meyer et al. 1998; Meyer et al. 1997
Buyer's Health Care Action Group Schultz et al. 2001; Feldman et al. 2000; Robinow 1997
Medicaid programs in five States (Arizona, Kansas, Michigan, New Jersey, and West Virginia) Fossett et al. 2000
Pacific Business Group on Health Castles et al. 1999; Schauffler et al. 1999; Meyer et al. 1997; Rodriguez and Schauffler 1996
Members of the National Business Coalition on Health Fraser et al. 1999
California Public Employees Retirement System (CalPERS) Meyer et al. 1999; Meyer et al. 1998
The Alliance, Denver, Colorado Meyer et al. 1999; Meyer et al. 1998
Missouri Consolidated Health Care Plan Meyer et al. 1999; Meyer et al. 1998
Cleveland Health Quality Choice Meyer et al. 1999; Meyer et al. 1998
University of California Buchmueller and Feldstein 1997
Dallas-Fort Worth Business Group on Health Meyer et al. 1997
Chicago Business Group on Health Meyer et al. 1997
Gateway (St. Louis) Purchasing Association (now Gateway Purchasers for Health) Meyer et al. 1997
Digital Equipment Corporation Meyer et al. 1997
GTE Corporation Meyer et al. 1997
Community Health Purchasing Corporation (Iowa) Meyer et al. 1997
Pacific Bell Meyer et al. 1997
Purchasers in 15 U.S. communities Lipson and De Sa 1996


Despite the variety at the tactical level, there are essentially two paths or strategies that purchasers can follow to influence final outcomes. Strategy 1 focuses on influencing the decisions or behavior of individuals (i.e., employees, beneficiaries, or patients), while Strategy 2 aims to change the behavior or performance of health care entities, usually providers and/or plans. Many large purchasers pursue both strategies at the same time, but smaller ones often lack the resources to take on multiple activities.

Watch Out for Unintended Consequences
When you conduct an evaluation, it is critical to identify the intended consequences of a VBP activity so that you can measure and monitor the appropriate final and intermediate outcomes and determine whether a relationship exists between the VBP activity and these outcomes. However, VBP activities sometimes result in outcomes that were not intended or predicted. Since these unintended consequences could undermine your broader goals, it is important to seek out and assess them as part of your evaluation of the overall impact of a VBP program, even though the evaluation may not be designed for that specific purpose.
For example, imagine a VBP activity that rewarded providers that had low mortality rates for CABG (coronary artery bypass graft) surgery. Providers might achieve these goals by genuinely improving quality of care. However, such initiatives might also discourage providers from treating the sickest patients, an outcome of value-based purchasing that is not desirable.


Evaluating the Impact of Value-Based Purchasing  

Figure 1. A Bird's-Eye View of Value-Based Purchasing Strategies

Strategy 1: Change the Behavior and Decisions of Individuals

The first strategy is to encourage people to make choices that will lead to higher quality care and better health. While the primary goal of this approach is to affect the health care-related decisions of consumers, this market-oriented strategy has an implicit objective to change the behavior of health plans and providers, which would be expected to improve their performance in order to attract enrollees and patients.

At a tactical level, this would include activities such as consumer information campaigns (e.g., general education about health care quality, the distribution of specific data on the performance of providers or health plans), as well as the use of financial incentives or cost sharing to encourage the selection and use of providers and health plans that can document their ability to provide high-quality care. For example, based on evidence that mortality rates as well as other measures of quality are positively related to surgical volume, some employers use selective contracting or incentives to encourage employees to go to hospitals that perform a large number of surgical procedures.

To assess the impact of this kind of activity, you could look at how many employees are receiving surgery at high-volume facilities or enrolling in highly rated health plans. A more involved evaluation could track the impact of this strategy on quality and costs and identify any unintended consequences.

Initiatives designed to support this strategy, such as programs to educate and inform consumers about quality, are often the first choice of relatively smaller purchasers (e.g., medium-size employers) that lack the market clout needed to deal directly with providers and plans. Among larger employers, some adopt this "consumer-empowerment" strategy because they regard it as consistent with their human resources philosophy. Others prefer it because they want to maintain an arms-length relationship with the business of health care. However, it is important to recognize that, compared to Strategy 2, this strategy has a less direct impact on the delivery of care because it depends on the ability of consumers to drive changes in the health care market.

Examples of Ongoing VBP Initiatives To Inform Consumers
Reporting Health Plan Quality: The AboutHealthQuality Web Site
As part of its VBP activities, the New York Business Group on Health is a leader of the New York State Health Accountability Foundation, a public-private partnership co-founded by IPRO (an independent quality evaluation organization) and funded in part by New York State. The Foundation sponsors the AboutHealthQuality Web site, which offers consumers comparative information on the performance of health plans throughout the New York metropolitan area. For more information, access www.abouthealthquality.org.
Reporting Hospital Quality: The Hospital Profiling Project
A hospital project initiated several years ago by Ford Motor Company in Southeast Michigan has evolved into a multi-purchaser, five-city initiative to collect data and report on the quality of inpatient care. Participating employers distribute the results to employees and retirees through publications and Web sites. For more information, access www.hospitalprofiles.org.
Reporting Medical Group Quality: The Consumer Assessment Survey
For a number of years, the Pacific Business Group on Health (PBGH; a coalition of large West Coast purchasers) has measured and reported on the quality of care delivered by medical groups in California. Using the Physician Value Check Survey, and more recently the Consumer Assessment Survey, PBGH has produced annual public reports with measures of consumer satisfaction and the quality of preventive care at the level of group practices. For more information, access www.healthscope.org.
Reporting Health Care Quality: The TalkingQuality Web Site
In March 2002, AHRQ, the Centers for Medicare & Medicaid Services (CMS), and the U.S. Office of Personnel Management launched a new Web site, TalkingQuality.ahrq.gov, that provides easy-to-use information on health care quality. The site is designed for organizations and professionals who are experienced in producing quality reports for a variety of audiences, as well as those who are not. Detailed information is found within the site about the entire process of communicating information from the initial conceptualization of the idea through the project's implementation and, finally, the evaluation phase. For additional information, visit the site at https://talkingquality.ahrq.gov.


Strategy 2: Change the Performance of Health Care Organizations and Practitioners

The second strategy for improving intermediate and final outcomes is to effect changes in the performance of health care organizations and practitioners. In their study of the pioneers of VBP, Meyer et al. (1997) identified four types of organizations that VBP activities target for purposes of changing provider behavior or performance: health plans, health care systems, hospitals, and physician groups. Activities include:

    • Standardizing benefits across health plans in order to facilitate apples-to-apples comparisons of value.
    • Requiring that providers be accredited.
    • Encouraging plans or providers to adopt specific disease management programs intended to improve health outcomes.
    • Requiring that health plans report measures from the Healthplan Employer Data and Information Set (HEDIS®) and/or the Consumer Assessment of Health Plans (CAHPS®). (Select for information on HEDIS® and CAHPS®.)
    • Requiring that hospitals report mortality or complication rates.
    • Monitoring these reports to identify areas in need of improvement.
    • Incorporating quality standards into contracts with health plans or care systems. (contractually linked groups of primary care physicians, specialists, and hospitals)

Activities focused on third-party payers, including contracting with Preferred Provider Organizations (PPOs) to obtain discounted health care fees from physicians, hospitals, and other health care providers, are considered to be value-based purchasing only if quality is a key component in the contracting.

As with strategies directed at individuals, the measurement of the direct impact of these programs on final outcomes such as health status is often difficult. It is usually more feasible to link these types of VBP activities to intermediate outcome measures that have been shown to be associated with final outcomes, such as improvements in the percentage of diabetics screened for potential complications or the percentage of women screened for breast cancer.

Example: The Leapfrog Group's Quality Standards
With a membership of about 90 health care purchasers, the Leapfrog Group is a national organization committed to reducing medical errors and improving the value of health care. To improve patient safety, Leapfrog members are trying to change what providers do by insisting that hospitals implement the following practices:
  • Use of computerized physician order entry systems.
  • Staffing of intensive care units (ICUs) with physicians who are certified or eligible to be certified in critical care medicine.
  • Evidence-based hospital referrals for five high-risk surgeries and two high-risk neonatal conditions.
To evaluate their success, the members plan to look at intermediate outcome measures such as the change in the percentage of a health plan's network hospitals that have electronic order entry systems or proper ICU staffing (Leapfrog Group, 2001). A more complex evaluation would track impact of referral policy on health status and plan costs. (Leapfrog members are also pursuing the first strategy described above by initiating efforts to educate consumers about consumer safety.) For more information, visit http://www.leapfroggroup.orgA


A Goal Within Each Strategy: Reduce Imbalances in Information

While many VBP activities are intended to have a direct influence on the behaviors or decisions of consumer or health care organizations, other VBP activities take an indirect approach by trying to reduce or eliminate asymmetric information between health care providers and plans on the one hand, and purchasers and consumers on the other. In a perfectly informed market, organizations and individuals would be able to select plans and providers that correspond to their preferences for quality and costs. However, asymmetric information poses a barrier to improving the final outcomes that purchasers are concerned about. To the extent that VBP activities reduce information asymmetries, they should improve the functioning of market mechanisms and lead to intermediate and final outcomes that more closely satisfy organizational and consumer objectives.

For example, some large purchasers issue requests for proposals (RFPs) or requests for information (RFI) that require health insurance plans (or care systems) to bid on standardized benefit packages. While this tactic may not have a direct impact on the decisions of providers or consumers, it does provide the purchaser with comparative information that it can use to influence those decisions and, ultimately, improve value. For instance, if you could evaluate variations in premium quotations and quality provisions across competing plans offering standardized benefit packages, you would be in a better position to identify, contract with, and offer employees incentives to select the plans or care systems that offer the best price and quality.

The V-8 Initiative: A Concerted Effort To Reduce Asymmetries in Information
In 1997, a group of eight regional employer coalitions decided to develop a standardized RFI that would help employers and other purchasers identify and select high-quality health plans. This group, called the V-8, was recently joined by General Motors and Marriott International; other participants include accrediting bodies and government agencies, including the Centers for Medicare & Medicaid Services.
The standardized RFI, which has been in use for 3 years, makes it easier for purchasers to gather useful data on their health plans and make apples-to-apples comparisons. The V-8 participants also regard it as a tool for leveraging employer clout to enhance quality. In addition, the RFI benefits health plans and provider systems because it should eventually reduce the variety of information requests they receive from purchasers.


II. Why Evaluate Value-Based Purchasing Activities?

With health insurance premiums on the rise (Gabel et al., 2001) and cost pressures continuing to mount, both public and private purchasers are increasingly concerned about developing sound strategies, and chief financial officers and corporate executives are demanding evidence of their impact on both quality and costs. As a result, perhaps the most important reason to evaluate the impact of VBP activities is to produce objective and defensible estimates that will allow you to determine how best to use scarce organizational resources to maximize the value of health care. Armed with detailed analysis and sound information from evaluations, you will be able to decide whether to continue what you are doing, try other tactics, or pursue a completely different strategy.

The evaluation of VBP activities is also important to developing a body of evidence that all purchasers can draw on when choosing among purchasing strategies and specific activities. To the extent that purchasers are willing to use the methods described in this guide to evaluate VBP activities and to share the results of these evaluations with other purchasers, the entire purchasing community should benefit by knowing how effective different VBP activities have been in practice and the conditions under which those evaluations have been conducted.

A third reason to evaluate VBP activities is to demonstrate to health plans and providers whether the initiative is making a difference. Since many activities require that health care organizations contribute money, staff time, or other resources, purchasers often need to be able to justify that investment. Also, if you plan to incorporate VBP activities into your negotiations with plans or providers, evaluation results can help you explain your demands or defend any information you share about their cost or quality performance relative to others.

Of course, VBP activities vary widely among purchasers. And purchasers will vary in the degree of rigor and the amount of resources that they wish to apply to evaluation. Because of this substantial heterogeneity, this guide is not intended as a detailed manual but as a broad overview of the options for evaluation design and the many issues that value-based purchasers need to consider. Purchasers interested in pursuing any of these options may want to seek out more detailed sources as well as partners with experience in evaluating purchasing activities.

Assessing the Return on Investment
Some purchasers, especially private employers, have become increasingly interested in determining the return on their investment in value-based purchasing activities. With smaller budgets forcing them to choose among competing uses for their scarce resources, more and more employers will need a way to decide whether to initiate or continue supporting VBP efforts.
A calculation of return on investment (ROI) is possible. Conceptually, all final outcome variables could be expressed as dollar values, which would allow you to compute an ROI where the investment would be represented by the cost of the VBP activity and the return would be measured by the changes in health care and other costs plus the value of changes in health care quality and other outcomes of interest, such as productivity. This approach is analogous to a cost-benefit analysis in which all outcome variables are expressed in dollar terms.
However, to make this calculation, you have to be willing and able to associate a financial value with a change in quality measures, such as an improvement in enrollee satisfaction, better access to care, or a reduction in medical errors. You also have to decide how broadly or narrowly to measure the "return." For example, if your VBP activity succeeded in improving the performance of all local hospitals, do you look only at the impact on your employees (few of whom may have used the hospital in a given period of time) or do you consider the benefits to the community? Similarly, if your program targeted improvements in care for a chronic disease, your return may not be measurable in the form of lower premiums but in lower absenteeism and higher productivity.


Steps for Evaluating Value-Based Purchasing Activities

This section discusses what value-based purchasers need to do to evaluate the impact of their activities and provides a straightforward overview of the research designs that purchasers may want to consider and the factors that need to be weighed in their decisions.

A thorough and useful evaluation of VBP activities requires the following five steps:

    1. Define your value-based purchasing activities and their goals.
    2. Determine the necessity, appropriateness, and feasibility of an evaluation.
    3. Choose a research design to assess the impact of VBP activities.
    4. Implement the research.
      • Task 1: Identify appropriate measures.
      • Task 2: Collect the data.
      • Task 3: Analyze the data.

5. Summarize the results and interpret implications for purchasing activities.

Although this guide discusses each step sequentially, these steps should not be regarded as independent. Quite often, the decisions made in one step are influenced by choices in other steps. For example, although the steps imply that you would choose the evaluation design before selecting measures and collecting data, the choice of study design is often driven by what data are available to the purchaser. (Go to "How Do You Choose a Research Design?").

Applying the Principles of Program Evaluation
This guide applies the principles of program evaluation to the evaluation of VBP activities. The term "program evaluation" refers to the process of using research techniques to systematically investigate the impact or worth of programmatic activities, interventions, or policy initiatives. Program evaluation is a standard component of the curriculum in many policy and business schools in the United States.
While program evaluation is typically conducted in the context of public policy, the literature and methods are well established and the approaches are easily adapted to the needs of value-based purchasers. The steps and advice offered in this guide reflect the widely accepted principles of program evaluation.
If you would like more information, the list of references at the end of this guide suggests relevant textbooks and other useful references. However, keep in mind that none of these resources focuses exclusively on the VBP activities conducted by health care purchasers.


Step 1. Define Your Value-Based Purchasing Activities and Their Goals

The first step is to list your VBP activities and link them to the final and intermediate outcomes that each activity is meant to achieve. While it sounds simple, this crucial step is often complicated by two factors. First, not all VBP activities can be expected to have separately identifiable outcomes. And second, purchasers may have trouble deciding which of those outcomes really matter to them.

First Challenge: Sorting Out Related Activities

VBP activities are frequently composed of multiple elements that collectively are intended to have an effect on outcomes. For example, before contracting with health plans on behalf of employees, many employers issue detailed requests for proposals (RFPs) to obtain information and bids. These RFPs usually contain multiple provisions and requirements, which together may be intended to produce several outcomes, such as adequate access to health care providers and acceptable levels of quality at the minimum possible cost. However, it may be difficult to independently link each of the separate provisions in the RFP to their intended outcomes. For outcomes such as access, the VBP activity might be defined as the entire RFP rather than its separate provisions. For other outcomes, such as the quality of preventive care, the VBP activity might be defined as a specific provision, such as the requirement to report HEDIS® data.

The process of linking activities to intended outcomes will help you determine whether the elements of VBP activities should be combined or examined separately. Generally speaking, elements that collectively are designed to achieve common outcomes should be lumped together as a single VBP activity for purposes of evaluation, while elements that are expected to have individual effects on outcomes should be considered separate VBP activities.

You may want to begin by documenting all of the major and minor VBP activities in which the purchasing entity is engaged and their associated objectives. While you may already know which activity you want to evaluate, this task will enable you to identify other VBP practices that could affect your results. Typically, the leadership of an employer's benefits department or the procurement department for a government purchaser is the starting point for information on VBP efforts. However, coordination among the units responsible for health care purchasing within an organization is crucial: Many large employers engage in regional purchasing and contracting; and many government purchasing programs, such as State Medicaid programs, involve multiple agencies such as the health department, insurance department, and department of social services.

Recognizing that each purchaser may define VBP activities differently, 

Table 2 provides a list of some of the most common VBP activities purchasers are currently engaged in and examples of the intermediate and final outcomes that these activities could influence. To ensure that the evaluation process is feasible and that the process produces useful information, be sure that the outcomes you identify for each activity are measurable. For more detail on this issue, refer to Step 4, which specifically addresses the challenges associated with measuring intermediate and final outcomes for the evaluation of VBP activities.

Table 2. Examples of Value-Based Purchasing Activities and Outcomes
Examples of VBP activitiesExamples of relevant outcomes
Potential intermediate outcomesPotential final outcomes
Requiring health plans to contract with hospitals that perform high volumes of coronary artery bypass graft surgeries Fewer CABG surgeries performed at low-volume hospitals

More positive ratings from CABG patients regarding their experiences with care

Fewer complications following bypass surgery

Lower costs for CABG patients

Improved health status for CABG patients

Providing employees with comparative health plan report cards based on HEDIS® and CAHPS® data

Increased awareness of variations in quality

Increased enrollment in highly rated health plans

Increased satisfaction with health plan choices
Refusing to contract with health maintenance organizations that are not accredited by the National Committee for Quality Assurance

Increased number of local HMOs applying for and achieving NCQA accreditation

Higher utilization of preventive care and screening services

Improved health status of HMO enrollees
Requiring hospitals to use computerized order entry systems as suggested by the Institute for Safe Medication Practices

Decrease in serious prescribing errors

Decrease in adverse drug interactions

Fewer complications due to medication errors
Requiring hospitals to staff intensive care units with physicians certified in critical care medicine Increase in the number of hospitals that have certified critical care physicians staffing their ICUs

Lower rate of complication for patients receiving care in the ICU

Lower mortality rate for patients receiving care in the ICU

Lower costs due to ICU care

Requiring health plans to develop comprehensive diabetes disease management programs that comply with guidelines established by the American Diabetes Association Improved scores on the comprehensive diabetes care measures that are contained in the HEDIS® data set

Improved health status for diabetics

Improved satisfaction with care for diabetics

Lower costs due to care for diabetes

Second Challenge: Deciding What Matters

Your definition of relevant outcomes for VBP activities will also depend on what matters the most to the purchasing organization. One issue is your time horizon; if you have a long-term perspective for your VBP initiatives, you may be able to focus on objectives that would not be feasible or observable in the short term. Another question is whether you want to adopt a narrow perspective that considers only outcomes that directly affect you as a purchaser, or a broad perspective that also includes indirect outcomes (i.e., the activity's impact on the larger community). If you choose a narrow scope, you may consider certain outcomes to be irrelevant. For example, if you wanted to know the relationship between a VBP activity and the employer's costs, changes in employees' out-of-pocket costs would not be a pertinent outcome. Similarly, a definition of relevant health outcomes from the employer perspective might emphasize lost productivity due to poor health as opposed to a more general measure of employee health status. Under a business definition, employers would only value health outcomes to the extent that poor outcomes hurt the employer in the labor market or through lost productivity.

Expert Advice: Define Outcomes Broadly
Experts in cost-effectiveness evaluations (for example, Gold et al., 1996) recommend conducting these assessments from the societal perspective, which entails the broadest inclusion criteria for measuring costs and outcomes. Even if you prefer a relatively narrow scope, you may want to define outcomes broadly if only because one of the primary objectives of providing health benefits is to attract and retain workers. From that point of view, effects related to employee co-payments or health outcomes such as mortality and morbidity have value beyond their direct effects on firm productivity.


Step 2. Determine the Necessity, Appropriateness, and Feasibility of an Evaluation

The purpose of an evaluation is to provide information that purchasers can use to design and fine-tune their purchasing strategies. For instance, purchasers might use the information gained from evaluations of VBP activities to improve their position in negotiations with contractors and vendors, to account for the level of organizational resources allocated to VBP activities, and to determine whether to expand current levels of activity.

For that reason, the decision to conduct an evaluation should be driven by the likelihood that the findings and lessons learned from the evaluation can and will significantly inform future decisions. Thus, the second step in the evaluation process involves an internal assessment of the value of formally evaluating the various VBP activities identified in the first step. To estimate the "value," you would want to consider both the likely benefit of the information expected from the evaluation as well as the costs of conducting the evaluation. You can then proceed with the evaluation process for those activities that provide sufficient utility given the costs. Although this process appears rather formal, usually purchasers can quickly narrow the list of all VBP activities to a subset of VBP activities for which a formal evaluation would be appropriate.

In addition to this exercise, there are a number of issues that purchasers should try to resolve before going forward with an evaluation. This section discusses some questions that can help you decide whether an evaluation would be both feasible and useful.

How Well Was the VBP Activity Implemented?

Despite the best of plans, VBP activities do not necessarily happen the way you envision them. It is important to assess whether your VBP activity was actually implemented as planned because it influences whether and how the evaluation should be conducted and how the results should be interpreted. For example, suppose a purchaser developed and issued a health plan report card for employees in an effort to steer enrollment to better performing plans; but because of budgetary concerns and production delays, only a handful of employees received or had access to the report card during the open enrollment period.

In this case, the purchaser must first assess the VBP activity; a presumption that the activity was implemented appropriately could lead the purchaser to conclude that report cards could not be effective, which may not be true. The lack of observed effectiveness could be due to the shortcomings of the implementation of the VBP activity rather than the inefficacy of the activity itself. The purchaser can then consider whether to postpone the evaluation or modify it in order to work with what did happen. In the illustration presented here, the purchaser might choose to pursue a more limited evaluation focused on those employees who did see the report card.

How Strong a Relationship Do You Want To See?

Before embarking on an evaluation, you will need to decide what kind of relationship between VBP activities and outcomes you want to see. Depending on the activity and how you expect to use the findings, it may be sufficient to simply establish a correlation between a VBP activity and an outcome, without really knowing how strong that correlation is or why it exists. In other cases, you might require evidence of a causal relationship. Generally speaking, greater rigor from a research perspective requires more resources (possibly including outside consultants) and more time. If neither of those is available, a definitive study may not be an option.

Is It Too Soon To See an Effect?

The research designs discussed in this guide typically assume that the effects of the VBP program are realized immediately after the program is initiated. But it can take years for an effect to take place. In addition, if the VBP activity continues over a period of years, the effects may be cumulative. As a result, negative findings may simply reflect an evaluation that occurred too early.

To incorporate lags into the research design, you will need multiple years of data as well as a hypothesis regarding the appropriate lag, although the lag time can sometimes be determined statistically. For researchers, the primary concern when investigating lagged effects is that the longer the lag between the VBP intervention and the hypothesized effects, the greater the chance that a confounding factor or event is responsible for the finding. Moreover, many VBP programs evolve over time. It can be difficult for an evaluation to determine whether an effect detected in year 2 is a lagged effect of the year 1 intervention or a contemporaneous effect of the year 2 program.

A related problem is that effects may wane over time. In some cases, VBP activities have a larger effect initially because the participants start out enthusiastic and the new activity has garnered substantial attention. Over time, individuals and organizations may lapse into less attentive and active pursuit of the program's goals.

Step 3. Choose a Research Design To Assess the Impact of VBP Activities

A research design is a detailed plan for the systematic investigation of a phenomenon. In much evaluation research, the primary purpose is to investigate the impact of some intervention, program, service, or set of activities on one or more dependent or outcome variables that can be observed. A number of different research designs lend themselves to the task of assessing whether a VBP activity had some short-term impact and/or achieved a longer-term outcome of interest. Each represents a somewhat different way of gauging the degree to which the intervention led to a positive or negative change in the variables of interest.

Broadly speaking, research designs can be categorized into two groups: those that use qualitative methods and those that use quantitative methods. These approaches are different, but they can complement each other and are often used in combination. The exact distinction between the two approaches is less important than understanding that both have their own strengths and weaknesses and that each is appropriate under certain conditions. This section of the guide describes common qualitative and quantitative research designs and methods that are useful for evaluating VBP activities and offers some guidance for choosing among them.

Sources of Information on Research Designs
For a more formal discussion of research designs, please refer to the following resources and other readings listed in the bibliography:
  • Babbie E. The Practice of Social Research, 8th ed. Belmont, CA: Wadsworth Publishing Company; 1998.
  • Bailey DM. Research for the Health Professional: A Practical Guide, 2nd ed. Philadelphia, PA: F.A. Davis Company; 1997.
  • Campbell DT, Stanley JC. Experimental and Quasi-Experimental Designs for Research. Dallas, TX: Houghton Mifflin Company; 1963.
  • Fink A. Evaluation Fundamentals: Guiding Health Programs, Research and Policy. Newbury Park, CA: Sage Publications; 1993.
  • Milstein RL, Wetterhall SF et al., Framework for Program Evaluation. Morbidity and Mortality Weekly Report 1999, 48 (No. RR-11).
  • Patton MQ. Utilization-Focused Evaluation, 3rd ed. Thousand Oaks, CA: Sage Publications; 1997.
  • Shortell S, Richardson WC. Health Program Evaluation. Saint Louis, MO: Mosby; 1978.
  • Yeaton W, Camberg L. Program Evaluation for Managers. Boston, MA: Management Decision and Research Center. Health Services Research and Development Services, Office of Research and Development, Department of Veterans Affairs; 1997.


Qualitative Research Designs

Qualitative research methods play a valuable role in evaluations by shedding light on uncertain situations, and revealing and clarifying important relationships that quantitative methods often miss or ignore (Sofaer, 1999). For instance, through qualitative research, evaluators can learn whether employees truly understand the material presented in report cards, whether that material is relevant to their information needs, and why health plans may not be using the information in the ways that purchasers expected. These methods can also support the development of testable hypotheses that evaluators can then explore further by collecting and analyzing quantitative data. Finally, qualitative research can help to explain findings from quantitative studies. For example, data from a quantitative analysis might show no improvements in quality a year after an intervention, but a qualitative analysis conducted at the same time or soon afterwards might reveal changes in attitudes, behaviors, or processes that are likely to lead to measurable improvements.

While there are a variety of qualitative methods, three research designs are especially relevant for evaluating VBP activities:

      • Case studies.
      • Focus groups.
      • Interviews.

This section offers a brief overview of each of these three approaches. For additional information, please consult the citations provided in the bibliography.

Case Studies. Case studies involve one or more short but intensive exposures to one or more settings (such as a city) or groups of organizations linked by some common activity or experience. For example, evaluators could conduct site visits to all of the health plans involved in a VBP activity to learn how each organization is responding to the purchaser's initiative and how the activity affects the different departments within each organization (e.g., quality improvement managers, physicians, nurse managers).

Although a case study can focus on only one setting or entity, most studies identify and investigate a sample of cases that are believed to be particularly important on a relevant dimension (e.g., the health plans enroll at least 10 percent of the employer's covered lives) and appear to lend themselves to useful comparisons and insights (e.g., the plans vary in geography or in their care delivery models). The primary tools used to analyze cases include interviews with key informants, structured observations, and the collection and analysis of documents.

The choice of sample and the methods used to conduct case studies play an important role in determining the usefulness of this approach. For more detail on the available options, consult Babbie, 1998Ragin, 1999; and Sofaer, 1999.

Particularly in the early stages of a VBP activity, case studies can be useful for identifying challenges and assessing the likelihood of success. For example, as noted earlier, the Leapfrog Group is a coalition of purchasers that is trying to reduce medical errors, increase patient safety, and improve the quality of health care by, among other things, encouraging hospitals to use computerized prescription order entry (CPOE) systems. Since Leapfrog members recognize that the adoption of these systems takes time, one approach they are using to assess whether this VBP activity will be successful is to conduct site visits to learn about hospitals' implementation plans for CPOE.

Advantages of this approach. Case studies are useful for developing hypotheses about the relationship between VBP activities and intended outcomes. While they cannot establish causality, they can provide insights valuable for decisionmaking purposes. Through case studies, for example, purchasers could learn that a VBP activity focused on health plans is making little progress because it lacks an educational component that targets physicians. Depending on the design and objectives, case studies can also be conducted quickly and inexpensively to provide an initial status report on the effects of a purchaser's initiatives.

Drawbacks of this approach. Thorough case studies that involve multiple site visits and interviews can be time consuming and costly. Moreover, though all study designs are subject to researcher bias, it is harder to identify and control for such bias in case studies. Finally, because cases are often selected non-randomly, they typically do not represent a larger population. Consequently, the findings may not be generalizable to other cases outside the sample.

Example: Case Studies
Use of Performance Measures for Quality Improvement in Managed Care Organizations
Description of the Research Activity. Researchers conducted case studies to better understand how managed care plans use performance measures for quality improvement and to identify the strengths and weaknesses of standardized performance measures that are currently being used, such as HEDIS® measures and CAHPS® measures. The results are intended to be of interest to purchasers that value health plans that engage in quality improvement activities for the benefit of all plan members.
Evaluators. The evaluation was done by academic researchers from Pennsylvania State University, the RAND Corporation, and AHRQ.
Research Design. The evaluation involved case studies of a non-random sample of 24 managed care plans in four States: Pennsylvania, Maryland, Kansas, and Washington.
Methods. After developing and pilot testing a set of interview protocols tailored to each type of respondent (pilot tests were conducted with four plans in New Jersey), evaluators developed a single interview instrument that could be administered to all respondents and then used this instrument to conduct exploratory qualitative research. The questions covered a variety of topics related to organizational and operational characteristics that affect the clinical and service quality improvement activities of the health plan.
Two study authors conducted separate 1-hour tape-recorded telephone interviews with multiple respondents from each health plan. They interviewed 42 respondents for an overall response rate of 58.3 percent, with a mean of 1.8 respondents per plan. Respondents included chief executive officers, medical directors, and quality improvement directors. One interviewer drafted notes from the tape-recorded interviews and gave these notes to the other interviewer to review for accuracy. The interviewers then used the final version of the notes to create a detailed spreadsheet entry for each interview. The spreadsheet facilitated frequency counts and calculations for quantifiable data and aided in sorting and grouping interviews for qualitative analysis. To develop the reported findings, the authors of the study held several discussions to achieve a consensus.
Results. The evaluators found that all of the participating managed care organizations used performance measures for quality improvement, but the degree and sophistication of use varied. Many of the respondent plans used performance measures to target quality improvement initiatives, evaluate current performance, establish goals for quality improvement, identify the root cause of problems, and monitor performance. The results suggest that performance measurement is useful for improving quality in addition to informing external constituents.
However, additional research is needed to understand how to maximize the benefit of measurement, and to quantify the degree of variation in quality improvement activities and the organizational and operational characteristics associated with successful quality improvement programs.
Advantages and Disadvantages of the Evaluation Strategy. The primary advantage of the exploratory case study design was the ability to obtain in-depth information about quality improvement strategies and programs for a sample of managed care organizations. Since no existing database with such information exists, and since a significant amount of detail was required, individual phone interviews with multiple members of the same organization proved to be very valuable. The information obtained through the interviews and secondary data analysis led to the formulation of hypotheses that can be more formally tested.
The major disadvantage of this approach was the inability to generalize the results to a larger population of managed care organizations. The case study design could not adequately control for important organizational and market characteristics that might have a differential impact on organizations.
Source: Scanlon DP, Darby C, Rolph E et al., The Role of Performance Measures for Improving Quality in Managed Care Organizations. Health Services Research 2001;36(3):619-41. 


Focus Groups. Focus groups bring together individuals who meet a specific set of criteria (e.g., health plan enrollees who have diabetes, hourly employees enrolled in health maintenance organizations [HMOs]) to discuss a set of topics or questions as a group. This research design aims to shed light on an uncertain situation by eliciting feedback from one or more groups and gleaning insights from the various perspectives that participants share. It is a useful technique for getting general reactions to VBP activities and a sense of how well the programs meet the needs of their target. For example, to evaluate a VBP program to reduce health risks associated with diabetes, you could convene a focus group of patients to learn how much they know about the ongoing program and what they think of it. Similarly, you could convene a group of physicians to solicit their opinions and ideas about the program. Focus groups can also provide insights into findings from quantitative studies; for example, evaluators could use focus groups to help elaborate on and interpret survey findings.

An evaluation using this research design generally includes two or more focus group sessions. (One session is never sufficient because there is no way to know whether the results have been biased in some way—for example, if one person dominated the group or if the participants were somehow different than their counterparts in other markets.) During each of these sessions, an experienced moderator (sometimes called a facilitator) leads the group of approximately seven to ten people through a series of discussion topics. This person is responsible for encouraging everyone to share their own views, making respondents feel relaxed and comfortable with the process, drawing out any pertinent concerns or issues, and recording responses in a nonjudgmental way. To guide the conversation, the moderator uses an open-ended interview guide, or protocol, that is basically an outline of topics or questions that need to be covered.

Sessions typically take 1 to 2 hours. Participants are usually compensated for their time in some way, such as a small cash payment or gift certificate. After the session, the moderator analyzes the discussion and produces a report that captures any recurring themes, concerns, or feelings of the participants.

Advantages of This Approach. Focus groups are a relatively easy way to learn about a particular issue from key stakeholders. In particular, they can be very effective at creating a relaxed atmosphere that encourages people to share opinions, interact with others, and express views they might suppress in more formal one-on-one interviews. As a result, the method facilitates a sharing of information, ideas, opinions, and experiences that can result in unexpected insights.

Another benefit is that focus groups tend to be less expensive than the one-on-one interviews that would be required to get a comparable level of feedback. The turnaround time for results is also likely to be faster.

Drawbacks of This Approach. Because the sample size for a focus group is so small and the sample has not been randomized, you cannot use the findings to make inferences about the larger population. Consequently, while you may gain insights into various opinions, you cannot determine how widespread an opinion is or how deeply held it may be. Another issue is that focus groups can only capture the thoughts, feelings, and opinions of people who are able and willing to verbalize their views. As a result, this technique does not capture the perspectives of others whose contribution could be very valuable, such as people with speech or hearing problems, those who are very young or very old, those who are shy about speaking openly in public, or those who could not participate for other reasons (e.g., because they are ill, cannot afford to take the time off from work, or do not have child care).

Depending on how many sessions you conduct, focus groups can become expensive, especially if you use a professional firm to recruit or conduct the sessions. (Travel costs may also pose an obstacle if the potential participants are scattered around the country.) For busy stakeholders, scheduling can also be a significant barrier to implementing a focus group; one-on-one interviews may offer greater flexibility.

Example: Focus Groups
Consumers' Responses to Reports on Health Plan Performance
Purchasers. The California Public Employees' Retirement System (CalPERS), the Missouri Consolidated Health Care Plan, and General Motors.
Description of the Research Activity. The "Report on Report Cards" was a 2-year study that aimed at documenting and assessing two VBP programs of five prominent public and private purchasers: their reports for employees on the performance of contracted health plans and their use of financial incentives to promote quality. (In addition to the three purchasers listed above, the study evaluated VBP activities of The Alliance in Denver, Colorado, and the Cleveland Health Quality Choice program in Ohio.) As part of this study, researchers conducted a series of focus group sessions with employees to learn about their experiences with and attitudes towards health plan report cards. These sessions covered several topics, including the information sources that consumers used to choose a health plan, the factors they considered in their decision, whether and how they used the report cards, their reactions to the report cards, and their suggestions for improvement.
Evaluators. The study was conducted by the Washington, DC-based Economic and Social Research Institute (ESRI) and its contractors, with funding from the Robert Wood Johnson Foundation. An experienced moderator with Lake Snell Perry & Associates, also based in Washington, DC, produced the focus group findings.
Research Design. The evaluators conducted focus groups with employees receiving insurance coverage through three of the five purchasers studied: the California Public Employees' Retirement System, the Missouri Consolidated Health Care Plan, and General Motors.
Methods. An experienced moderator with a strong background in health care issues conducted eight focus groups between October 27 and November 10, 1998: three for CalPERS in Sacramento, Fresno, and San Diego; two for the public purchaser in Missouri in St. Louis and Jefferson City; and three for General Motors (two in Dayton, OH, and one in Detroit, MI). In preparation for these sessions, the moderator consulted with the ESRI researchers to establish the goals of the focus group design and to determine what issues to cover. The moderator then drafted a protocol that was reviewed and approved by ESRI. Employees were invited to participate; in only one site, CalPERS, were the participants randomly recruited by a professional recruiting facility. The other two purchasers identified the participants themselves.
Each session lasted 1 to 2 hours and included 8 to 10 participants; all sessions were recorded and transcribed. After each set of sessions, the moderator produced a report summarizing the findings and worked with ESRI to identify cross-cutting themes and recurrent suggestions.
Results. By introducing the consumers' perspective, the focus groups made a valuable contribution to this evaluation of VBP activities. In particular, the focus group findings offered insights into the barriers that report cards face in being embraced by each group of employees and what needs to be done to make the reports more widely accepted and used. While the feedback from each session reflected a unique set of experiences and perspectives, as well as the specific content and style of each purchaser's performance report, the analysis of the focus groups also uncovered a number of similarities across the sessions. These cross-cutting themes included skepticism about performance information, uncertainly about factoring quality information into decisions, appreciation for the convenience that the reports can offer, problems understanding the presentation of data, and concern about information overload.
Advantages and Disadvantages of the Evaluation Strategy. The focus groups were an important complement to this study's extensive interviews with purchasers, health plans, and consumer representatives (e.g., union leaders) at each site. In addition to offering a fresh perspective, the findings allowed the evaluators to see the ways in which employees' perceptions and reactions to the report cards were consistent with or different from the impressions of the stakeholders interviewed for the study.
The downside of the focus groups was the inability to generalize the findings to all employees, let alone all consumers. There was no way to know whether the participants were more or less familiar with or inclined towards the report cards than their colleagues would be. A second problem was that, because of the costs associated with the focus group sessions, the study was not able to include comparable research in all five sites.
Source: Meyer JA, Wicks EK, Rybowski LS et al., Report on Report Cards: Initiatives of Health Coalitions and State Government Employers to Report on Health Plan Performance and Use Financial Incentives. Vol. II. Washington, DC: Economic and Social Research Institute; 1999. (See also Meyer et al., 1998.)


Interviews. Interviews are a means of collecting various kinds of information, including facts, impressions, opinions, and concerns. Through interviews, evaluators can gain insights into the intent of a VBP activity, learn how the activity is really being implemented, probe the perceptions and responses of key stakeholders such as health plans or providers, and identify issues and barriers not apparent on the surface. Whether interviews are conducted on their own or as part of a case study, their findings can play a critical role in identifying questions and hypotheses worthy of further research and appraising the potential value of a quantitative research design.

Depending on the goals and design of the evaluation, individual stakeholders may take part in one or more interviews over a period of time. These interviews are typically conducted by one or two people, either in person or over the telephone. When two researchers are present, one usually takes responsibility for asking the questions while the other takes notes. Some researchers prefer to rely on recordings and/or transcriptions. Prior to the interview, the researchers develop a protocol, or interview guide, consisting of open-ended questions that are designed to keep the conversation focused on the topic at hand. Depending on the purpose of the interview and the nature of the respondents, the interviewers may follow the protocol very closely or just use it as a prompt when needed.

Another option: cognitive interviews. A cognitive interview is a variation on a standard, one-on-one interview that aims to elicit opinions and other information. What makes a cognitive interview different is that it is designed to find out whether and how the respondent understands and thinks about a given set of materials, tasks, or activities. Cognitive interviewers may observe how a respondent navigates through a set of materials, or may use techniques such as "think-aloud" exercises, in which the respondent openly expresses thoughts and questions while reviewing the materials. Traditionally, these techniques have been used by survey developers to assess whether respondents will understand survey questions in the ways in which they are intended. More recently, evaluators have been using cognitive interviews to learn how people perceive, interpret, and use information on health care quality (Select for example.)

Advantages of This Approach. As an evaluation method, interviews offer the benefit of flexibility; they can be formal or informal, detailed or superficial, long or short. They also offer a rich source of individual data that can be mined for useful insights, common themes, and potential trends that may invite further investigation. Because of their intimate, one-on-one structure, interviews can also elicit more honest feedback and assessments than may be available through a focus group, where participants may feel reluctant to disagree with others or offer their own opinions and ideas.

Drawbacks of This Approach. Depending on the number of interviews, the difficulty of recruiting respondents, and the design of the study (e.g., if the interviews need to be done in person), interviews can be both time-consuming and costly. If the sample of respondents is not representative, the evaluators are also limited in their ability to generalize their findings. For example, if the researchers are able to interview all but one of the medical directors of the health plans involved in a regional VBP activity, they can be fairly confident that their findings reflect the "population" of health plans. But if the interviewers also spoke to a handful of the doctors practicing in the area, they would not be able to draw any broad conclusions about the "population" of physician practices.


Example: Cognitive Interviews
Medicare Beneficiaries' Use of Comparative Information When Choosing Health Plans
Purchaser. Centers for Medicare & Medicaid Services (CMS), formerly the Health Care Financing Administration (HCFA).
Description of the Research Activity. The Medicare program provides beneficiaries with information about health plan costs, benefits, and performance (expressed in HEDIS® and CAHPS® measures) via the Medicare Web site and a toll-free telephone number that beneficiaries can call to request the information. Researchers attempted to understand how Medicare beneficiaries used the comparative information when evaluating their health plan choices and what they thought of the information (e.g., the amount of information, how easy it was to understand, its usefulness, the presentation, whether beneficiaries' trust it, what they like, and what improvements they would like to see). The results offer insights into how Medicare beneficiaries incorporate various pieces of information into their decisionmaking process and suggest ways to improve the existing information.
Evaluators. The evaluation was conducted by academic researchers from Research Triangle Institute and Pennsylvania State University.
Research Design. The evaluators conducted cognitive interviews with 25 Medicare beneficiaries from three counties in Pennsylvania.
Methods. The researchers first assembled booklets that used the same format and presented the same information as the Medicare Compare booklets available through the 1-800-Medicare number. They also developed and pilot-tested an interview protocol. Using the booklets and protocol, the researchers then conducted exploratory qualitative research using a convenience sample of Medicare beneficiaries.
The study participants were presented with a booklet that included descriptive information about the options available through Medicare, as well as comparative cost, benefits, and quality information from the Medicare Compare database for plans available in their county. The interviewer asked the participants to imagine that they were choosing a health plan for themselves using the information provided, and to "think-aloud" while comparing the plans and making their ultimate choice. The interviewer observed, used scripted probes when applicable, took handwritten notes, and audiotaped the interviews. The interviews lasted 1.5 - 2 hours each.
To identify themes and make the data systematically comparable across interviews, the evaluators subjected the qualitative data collected during the interviews to content analysis. They used the transcripts to create a spreadsheet that included data from all 25 interviews, then used the spreadsheet to group similar/dissimilar responses and obtain frequency counts across interviews.
Results. All subjects read through the information booklets sequentially. Most compared the plans on the specific costs and benefits that mattered to them personally, but used the performance measures to confirm or supplement their choices. Participants spent the most time on the costs and benefits section and the least on the specific quality ratings, but rated quality just as important as costs for picking a particular plan. Overall, the majority felt confident in their ability to pick the best plan. Most were generally satisfied with the amount of information provided, said it was useful for comparing plans, and trusted it. The subjects generally liked the information, but many felt the costs and benefits could be presented better and would like comparisons of Medi-Gap plans to be added. Finally, respondents had some common areas of confusion and misunderstandings regarding the information presented to them.
Advantages and Disadvantages of the Evaluation Strategy. The primary advantage of the study design is the ability to obtain in-depth information about how beneficiaries incorporate different types of information into their decision-making process, how they navigate the materials, how they understand and interpret what they see, and what they think of how the data was presented. This feedback also suggested ways to improve existing information to make it more useful to beneficiaries. Two disadvantages of the design are that the results are not generalizable to the entire Medicare population and the study participants were not making binding health plan choices.
Source: Uhrig JD. Beneficiaries' Use of Quality Reports for Choosing Medicare Health Plans. [Ph.D. Dissertation]. Pennsylvania State University; 2001.


Quantitative Research Designs

The purpose of quantitative approaches is to establish numerical evidence regarding correlation or causality between VBP activities and their intended outcomes. While there are a number of ways in which this goal may be accomplished, quantitative research designs basically vary in two ways:

    1. Timing of observations.
    2. Use of a comparison group.

One key difference is in the timing of observations relative to the intervention (i.e., the implementation of a VBP activity) and to other observations. For example, some quantitative designs call for the collection of baseline data before the intervention as well as data after the intervention. Also, some designs collect data once, while others specify that data be collected in multiple periods, both before and after the intervention.

The designs also differ in their use of a comparison (or control) group, which is a group that is similar to the intervention group but is not affected by the VBP activity. From a research perspective, a related issue is whether the individuals or organizations affected by an intervention were randomly selected; however, randomization is often not feasible for VBP activities.

These two variables determine whether the design allows for comparisons, which enable the evaluator to isolate or disentangle the impact of an intervention from other events or phenomena that could affect the outcomes of a VBP activity. VBP activities take place in a complicated environment with ongoing changes in purchaser and provider behavior, most of which have nothing to do with any VBP activities. Having a comparison point before the intervention or a comparison group adds validity or strength to a research design's ability to capture the causal effects of VBP activities.

This section describes the five quantitative research designs that are likely to be the most helpful for evaluating VBP activities:

Cross-Sectional Design With No Comparison Group. The cross-sectional design without a comparison group basically involves measuring variables of interest in the intervention group (or the population affected by VBP activities) one or more times after the VBP intervention occurs. For example, to measure the impact of a disease management program, evaluators could collect and analyze detailed measures of health status or health care utilization. There is no pre-test or comparison point from before the intervention, nor is there any comparison group not receiving the intervention. Thus, the outcomes must be interpreted relative to an internally defined set of standards or external benchmarks. (However, if external benchmarks are used, you may interpret them as a comparison group, depending on how they were derived.)

Figure 2. Cross-Sectional Design With No Comparison Group

To analyze data collected for this design, evaluators typically use simple descriptive statistics and statistical tests of the difference in means and frequencies. If there are multiple observation points after the intervention, researchers can also use statistical tests to compare the indicators or measures with each other over time. Finally, multivariate statistical techniques allow researchers to see if the outcomes of interest vary across subgroups (e.g., by the gender or age of a patient, or by clinic site within a large health care system).

Advantages of This Approach. Since the evaluators are gathering only post-intervention observations for the intervention group, the data requirements are lower than those required for other design approaches. In addition, this design lends itself to an in-depth analysis of the intervention group and may provide important and useful insights regarding the functioning of the program. The basic cross-sectional design is especially useful in early stages of implementation, as it can provide valuable information regarding how a VBP program was implemented in practice, whether the activity appears to be correlated with the intended outcomes, and how the program might be improved before a more rigorous evaluation is undertaken.

Drawbacks of This Approach. Because there is no pre-test or comparison group, this design cannot be used to make statements about the impact of VBP activities relative to what was occurring before they were implemented. Also, since this design cannot disentangle the effects of the VBP activity from any of the other many forces that might influence the outcome variables of interest (such as a time trend that would have occurred anyway), you cannot use this approach to establish causal relationships.

Example: Cross-Sectional Design With No Comparison Group
An Evaluation of a Defined-Contribution Model
Purchaser. The Buyers Health Care Action Group (BHCAG), a health benefit purchasing alliance in the Minneapolis-St. Paul area.
Description of the Research Activity. BHCAG is a coalition of employers that created an innovative direct-contracting model designed to provide local service delivery organizations (known as care systems) with the incentive to compete on the basis of premium cost and quality. The purchasing model incorporates risk-adjusted payments, standardized benefits, and the dissemination of a report card containing satisfaction and quality information. Based on competitive bids, the care systems are placed into one of three cost tiers, with higher premiums required for care systems in the middle- and high-cost tiers. One design feature of this model is the use of "level dollar" (also known as fixed or defined) premium contributions by employers. This policy exposes employees to the marginal difference in premiums. In theory, when employees pay more of the marginal cost of insurance, their choices should be more efficient; the expectation is that individuals who do not value insurance at its full marginal cost will choose cheaper alternatives. The purpose of the evaluation was to determine whether a defined contribution model increases employees' sensitivity to premiums.
Evaluators. Academic researchers from Cornell University and the University of Minnesota evaluated the model by determining whether employees responded to premium and quality differences across the care systems.
Research Design. The evaluators conducted a cross-sectional survey of employees enrolled in BHCAG's program.
Methods. To collect information on enrollment, premiums, and provider group characteristics, as well as demographic data and measures of socioeconomic status and health status for employees, the evaluators fielded a post-intervention telephone survey and reviewed administrative files. They then constructed regression models to predict the probability that single (nonmarried, no dependents) employees enrolled in one of the care systems as a function of the out-of-pocket premium, characteristics of the care system, employee characteristics, and report card ratings. Conditional logistic regression methods incorporating characteristics of provider groups and employees were used to estimate care system choice models.
Results. The empirical results indicate that single employees are, on average, very responsive to premiums. The sensitivity to premiums is reduced for older employees and for those who value high-quality care. However, employees with more experience with the health care market are more price-responsive. Employees are also sensitive to differences in the quality of care systems, as presented in the report card, and to differences in convenience measures, such as the distance to clinics. Based on these findings, it appears that a defined-contribution policy may make employees more cost-conscious in their health care decisions.
Advantages and Disadvantages of the Evaluation Strategy. This evaluation was able to control for factors such as age and gender that may have biased the effect of premiums. Nonetheless, the effects of other important factors such as the fact that primary care physicians can be affiliated with only one care system or variations in employers' fixed dollar contribution policies cannot be separately identified in the data. The lack of pre-intervention data and a comparison group also limit the inferences that can be made.
Source: Schultz J, Thiede Call K, Feldman R et al., Do Employees Use Report Cards to Assess Health Care Provider Systems? Health Services Research 2001;36(3):509-30.



Pretest/Posttest (or Before/After). This approach allows evaluators to compare individual or organizational data collected after an intervention to data collected prior to the intervention. The assumption is that the analytic units (i.e., individuals or organizations) under observation would look the same at both points in time in the absence of any intervention. In general, as long as the data were collected and aggregated in the same way at the two points in time, this design can be implemented.

Figure 3. Pretest/Posttest (or Before/After)

Evaluators typically use a multivariate analytical approach to determine whether any differences in the health care outcomes measured before and after the intervention are statistically significant. However, when data are not available at the individual level, it becomes more difficult to conduct multivariate statistical analysis because the number of observations is insufficient. For example, if you wanted to know whether and how a VBP activity affected the health plan premiums of the few plans you offer to employees, an analysis of premiums before and after the intervention would have limited statistical reliability because the data came from only those plans. In some cases, statistical tests can be conducted with aggregate data if appropriate denominators exist. For example, if you had data on the population at risk in the time periods before and after an intervention, you could test whether a VBP activity was associated with a statistically significant change in admissions per thousand.

Advantages of this approach. The strength of this approach is that researchers can use pre-intervention information to help interpret the information collected after the purchaser has implemented a VBP activity. As with the other quantitative designs, more than one observation can be made post-intervention. By collecting multiple observations after the intervention, researchers are better equipped to investigate such questions as whether intervention effects continue over time and whether or not there is a lag between when the intervention is implemented and when effects become observable.

Drawbacks of this approach. This design suffers from several weaknesses, as there are many "rival hypotheses" or alternative explanations for changes from the pre-test to the post-test outcome observations. First, the design does not control for time trends; so it is possible that any observed change would have taken place even in the absence of the intervention because of a trend already underway. For example, if HMO premiums were falling for reasons unrelated to the VBP activity, this design would tend to attribute those falling premiums to the VBP activity when in fact they would have occurred anyhow. Another weakness of this design is that it cannot tell whether another intervention or external event that was occurring simultaneously with the VBP intervention—rather than the VBP activity itself—could be responsible for the apparent change between the pre-test and post-test observations.

A third weakness of this design applies when the people or organizations involved in the evaluation know they are being observed or measured in some way. In some cases, the act of being observed or studied before the intervention leads to a change in the subsequent observation, even without any effect from the intervention. For example, simply completing a survey regarding attitudes and satisfaction before the intervention might lead to improved attitudes and satisfaction (i.e., the act of measurement becomes a sort of intervention itself). This effect (referred to as the Hawthorne effect or a testing effect) can lead to erroneous conclusions about the apparent impact of an intervention.

Caveat: Watching for the Hawthorne Effect
In quantitative as well as qualitative research, the Hawthorne effect refers to the possibility that the process of evaluation may affect the results (Babbie, 1998). In other words, if individuals and organizations know that they are being monitored, their performance may improve regardless of whether or not the program or intervention is actually effective. This effect tends to lead to estimates that are more favorable to the intervention than otherwise would be expected.
The Hawthorne effect should be anticipated in evaluations of VBP efforts, although it is not necessarily undesirable if the ultimate goal is improvement. For example, if health plans know that the purchaser is analyzing certain HEDIS® data to evaluate a VBP initiative, plans may be more prone to improve on these measures than if they did not know what measures the purchaser would be examining. When analyzing the data, evaluators may have to try to distinguish if plans were improving simply because they were being watched or examined or because of the direct effect of the VBP activities.


Example: Pre-Test/Post-Test
An Evaluation of a Type II Diabetes Disease Management Program in an HMO Setting
Health Care Provider. Geisinger Health System, a large mixed-model HMO in Pennsylvania.
Description of Research Activity. In 1996, a steering group of primary care physicians and endocrinologists, clinical nurse specialists, dieticians certified in diabetes, and HMO representatives initiated a diabetes disease management program aimed at better outcomes for patients with diabetes mellitus. This program consisted of several components, including self-management education, coverage for glucometers, extensive database linkage and management, and strong leadership commitment. As part of the program, the group formulated practice guidelines based on widely published professional literature and trained physicians in adopting and following the guidelines. Highly trained and specialized education nurses taught self-management to patients. Program participation was voluntary, but the HMO offered coverage for glucose meters and strips as an incentive for enrollees. The purpose of the evaluation was to assess the impact of the diabetes management program on relevant outcomes.
Evaluators. Initially, the HMO analyzed the data internally, but the process was not as formal as the plan leadership desired. Subsequently, the plan consulted with Pennsylvania State University researchers to carry out an outcomes evaluation of the program.
Research Design. Both the internal analysts and the consultants used a pre-test/post-test design to track changes in clinical, cost, and health status measures over time.
Data and Measures. The group collected an extensive set of clinical, administrative, and patient quality-of-life data at inception and at periodic followup points for enrolled patients and later linked the data to allow for statistical analysis. The internal analysis compared mean HbA1c levels and diabetes expenditures before and after the intervention. The steering committee received these reports on a monthly basis.
The more formal analysis examined diabetes outcome-related variables such as HbA1c levels and cardiovascular clinical variables such as HDL and LDL, self-reported health status (based on responses to the SF-36® survey), and degree of compliance with the guidelines as evaluated by the specialized nurse's review of the patients' medical charts.
Methods. For continuous variables, the evaluators used t-tests to detect differences in the means of the outcomes variables of interest (e.g., HbA1c levels) from pre-intervention to post-intervention. They used chi-square tests for categorical variables, such as the occurrence of episodes of hypoglycemia.
Results. The most salient finding so far is the significant improvement in the clinical indicators of the program (such as HbA1c, HDL, and LDL). This does not appear to translate into patient-perceived improvements in health status, physically or emotionally, although there have been some improvements on the mental health and vitality scores, as illustrated in the data below.
SF-36 DomainsTime 1Time 2
Physical Functioning 67.95 27.2 68.97 28.55
Role-Physical 59.71 41.49 60.81 42.62
Body Pain Index 65.27 26.78 63.78 26.44
General Health Perceptions 60.19 19.95 59.68 21.04
Vitality 53.34 20.24 56.11 19.971
Social Functioning 81.69 22.34 82.23 23.37
Role-Emotional 72.86 38.92 74.63 37.83
Mental Health Index 69.95 18.34 72.99 16.731
Advantages and Disadvantages of the Evaluation Strategy. A comparison of clinical indicators over time was a clear and easy way to assess the results of the intervention. However, because the project design lacked a comparison group, it is not certain that the results are attributable to the intervention (i.e., the disease management program). Although a comparison group would be desirable, ideally a randomly assigned one, this would require that some patients not be eligible for the diabetes management program; for a number of reasons, the health plan would not consider this option.
Source: Geisinger Health System.


Cross-Sectional Design With Comparison Group (or Static Group Comparison). This design is similar to the cross-sectional design discussed earlier in that observations are made only after the intervention has been implemented. However, in this variation, a comparison group is introduced. That is, evaluators identify and observe a comparison group that is similar to the group or population under study, but has not received the VBP intervention. The assumption is that what is observed for the comparison group is what would have been observed in the intervention group in the absence of the intervention. In this sense, the comparison group provides a measure of what was "expected" in the absence of an intervention, which can be compared with what was actually observed for the intervention group.


Figure 4. Cross-Sectional Design With Comparison Group (or Static Group Comparison)

To implement this design, researchers gather observations at the same point in time for the treatment and comparison group, using the same measurement approaches and variable definitions. They can make one observation after the intervention, or multiple observations over time. As with the pre-test/post-test design, evaluators would use multivariate analysis to test for statistically significant differences in outcomes between the intervention group and the comparison group. If data are not available at the individual level (e.g., data only exist at the hospital level), there may be another level of observation that will permit multivariate analysis. For example, in a situation where an intervention occurred at 100 hospitals and the comparison group was composed of 100 hospitals that did not receive the intervention, you could conduct the analysis using the hospital rather than the individual as the unit of observation as long as you can control for hospital characteristics and hospital-level measures of casemix.

When this is not possible, statistical testing may still be feasible if the evaluators know the appropriate sample so that they can construct standard errors around the estimated means. For example, suppose that the outcome of interest is mortality per 1,000 admissions and only aggregate data are available (e.g., mortality rates for all hospitals in the intervention and comparison groups). The evaluators can test whether mortality per 1,000 admissions in the intervention group differs from that in the comparison group (assuming a similar casemix) if they have access to data on the number of admissions or deaths for hospitals in the intervention and comparison groups.

In still other cases, the individual data may not be available and may not even be meaningful to consider; so statistical tests are not possible. For example, suppose that you want to know the impact of a VBP activity on premiums; the intervention group consists of one employer offering one plan and the comparison group consists of another employer offering another plan. You can observe if the price of the plan offered to the VBP employer is lower than the price in the comparison group, but you cannot statistically test this proposition. A statistical analysis can only be done using the plan as the unit of observation if there are a sufficient number of plans (and perhaps employers) in the treatment and comparison group so that the analysis can use aggregated data at the level of the plans, or even the employers.


Selecting a Comparison Group
Evaluators can use any of several methods to select a comparison, or control, group. In a case control approach, the comparison group is chosen to match the intervention group on specific characteristics thought to be important. Another approach is to pick a population that is thought to be similar to the intervention group and for which data are available for comparison purposes. A third approach is to use national or regionally available statistics as standards for comparison (e.g., NCQA's Quality Compass database of HEDIS® measures).
Sometimes groups within a population are randomly assigned either to receive the intervention or to be in a comparison group. Although random assignment to groups provides the highest level of control and strength regarding the ability to establish causal relationships, it is difficult to use this approach with VBP activities since the randomization would likely have to be at the level of a clinic, hospital, or subset of employees. For both business and political reasons, it is rarely feasible to treat these organizations or individuals differently.


Advantages of This Approach. In contrast to designs that do not use a comparison group, this design allows the evaluator to draw stronger inferences regarding the impact of the VBP activity. In addition, since this design does not involve observations made before the intervention is implemented, only post-intervention data are required. As a result, this design can be used in cases where a purchaser did not consider conducting an evaluation until after the intervention had already been implemented.

Drawbacks of this approach. he weaknesses in this design relate to the extent to which the comparison group can be assumed to be just like the intervention group except for exposure to the intervention. Any observed difference between the treatment and comparison group could represent an intervention effect; yet observed differences are also cause for concern as they might be indicating group rather than intervention effects. In some cases, these concerns may be addressed statistically. For example, if the intervention group and comparison group each consists of many hospitals, the analysis might control for hospital characteristics (although this requires a sufficient number of hospitals).

Casemix adjustment is a prime example of the kind of statistical controls that may be necessary. Yet, even after casemix adjustment, there may be other differences between the two groups that are not observable to researchers. The literature on small area variations suggests that these differences could be very substantial (Wennberg and Gittelsohn, 1973;Feldstein, 1993). Thus, any differences in observed outcomes may in fact be due to selection bias differences (i.e., the comparison and intervention groups differ on variables not observed by the researcher) or differences in exposures other than the VBP activities.

Options Available for Casemix Adjustment
There are a variety of systems available today to meet the diverse needs for casemix adjustment. Purchasers should be aware that the adjustments for costs may differ from the adjustments for quality outcomes. Also, different quality measures may require different types of casemix adjustment.
The most straightforward casemix adjustment is for age and gender. Other adjustments include comorbidity indices; several systems exist at the general population level, such as the "Johns Hopkins ACG (Adjusted Clinical Groups) Case-Mix System" (Johns Hopkins University, 2001). Other indices exist for casemix adjustments in specific clinical areas. This step can be performed at the analysis phase or when constructing the variables.


Another concern is spillover effects (or contamination of the comparison group). If the treatment and comparison groups are in close proximity, the VBP activity may affect both groups. For example, in a situation where salaried workers receive health plan performance information while hourly workers serve as the comparison group, the information might spread to the hourly workers, potentially muting the effects of disseminating the report card. Similarly, if an intervention involves changing provider behavior with respect to HMO patients, the providers may change how they handle all patients, including non-HMO patients. If these providers treat patients in both the intervention group and comparison group, an evaluation could underestimate the effects of the VBP activity.

A related concern arises when activities similar in spirit are occurring in the comparison group. If this is the case, this research design will capture the extent to which the effects of specific VBP activities differ from those of ongoing activities in the comparison group. But this is different than asking how the VBP activity altered outcomes relative to a scenario where no activities occurred.

Example: Cross-Sectional Design With Comparison Group
An Evaluation of the Impact of a CAHPS® Report Card on Medicaid Enrollment
Purchaser. New Jersey Medicaid Office of Managed Care.
Description of the Research Activity. In the summer and fall of 1997, the New Jersey Medicaid Office of Managed Care conducted its first CAHPS® survey of enrollees in its mandatory Medicaid managed care program. The office subsequently published a seven-page brochure, "Choosing an HMO," that compared available Medicaid HMOs based on the collected CAHPS® data. The brochure was included with the enrollment materials sent to half of the newly eligible Medicaid cases during a four-week period in the spring of 1998. All newly eligible cases received the standard enrollment materials, but randomization was used to determine which new cases received the CAHPS® material; the experimental group consisted of 2,649 cases and the comparison group consisted of 2,568 cases. An evaluation was done to determine the impact of the CAHPS® report on the enrollment decisions of newly eligible Medicaid cases.
Evaluators. The evaluation was conducted by academic researchers from the RAND Corporation and Pennsylvania State University.
Research Design. The researchers used a cross-sectional design with a random comparison group to determine whether availability of the CAHPS® report affected enrollees' decisions.
Data and Measures. To examine the impact of the CAHPS® report on plan enrollment and the utility of the CAHPS® report in plan selection, the researchers relied on data from two sources. Plan selection and demographic data for the comparison and experimental groups came from the New Jersey Medicaid Office. Survey data came from a post-enrollment survey of a random sample of newly eligible Medicaid cases in the comparison and experimental groups.
Methods. After the enrollment process was completed, the analysts drew a sample of 2,550 cases from the experimental and comparison groups and surveyed these cases about the CAHPS® report and their enrollment decisions. The followup survey was specifically designed to assess whether the experimental group received the CAHPS® report and incorporated it into the plan enrollment process. Because the experimental design randomly assigned new Medicaid cases to the experimental and comparison groups, t-tests could be used to test for statistically significant differences in the mean values of outcomes for the two groups. The analysts also employed logistic multivariate regression to examine the probability of having seen and used the CAHPS® reports, and to assess why one plan that scored relatively low on the CAHPS® report achieved a high level of enrollment.
Results. Only half of the newly eligible Medicaid cases reported having looked at the CAHPS® report. There were no statistical differences in the pattern of plan enrollment between those who received the CAHPS® report and those who did not. One plan, referred to as the dominant plan, had relatively low ratings on the CAHPS® report but still achieved significant enrollment; this suggests that something about this HMO that was not evident to the evaluators was appealing to Medicaid beneficiaries. When analysts examined enrollment patterns for a subset of the sample that did not choose the dominant HMO and reported looking at the CAHPS® report, they found that these individuals chose better plans on average than did a comparison group. The results suggest that for report cards to be effective at changing plan enrollment, considerable efforts are needed to make sure that consumers receive and read these reports.
Advantages and Disadvantages of the Evaluation Strategy. The primary advantage of this evaluation and research design was the ability to randomly distribute the CAHPS® report to a subset of new Medicaid cases. Randomization controlled for differences in important individual characteristics and allowed the researchers to focus on the effect of the report card.
A major disadvantage was that despite randomization, the study design could not guarantee that all members of the experimental group looked at or even received the CAHPS® report. Since only half of the experimental group admitted to examining the report, it would have been difficult for the evaluation to ascertain an effect in the intervention group. This prompted the analysts to focus the analysis on the non-random subset of the experimental group that admitted to examining the report. An additional limitation was the inability of the design to control for and explain the finding that one dominant plan received significant enrollment despite poor performance on the CAHPS® reports.
Source: Farley DO, Short PF, Elliot MN et al., Effects of CAHPS® Health Plan Performance Information on Plan Choices by New Jersey Medicaid Beneficiaries. Forthcoming in Health Services Research 2002, in press.


Nonequivalent Comparison Group. This approach combines the strengths of the pre-test/post-test design with that of the cross-sectional with comparison group design. In the nonequivalent comparison group design, analysts make both pre-intervention and post-intervention observations for the intervention group as well as for a comparison group that is not receiving the intervention. This design uses the comparison group to control for factors that threaten the validity of the pre-test/post-test design. Similarly, it uses differences between the comparison and intervention group prior to the intervention to control for unobserved factors that would have confounded the cross-sectional with comparison group design.


Figure 5. Nonequivalent Comparison Group


Advantages of This Approach. The primary benefit of this design is that it controls for several of the "rival hypotheses" that threaten the other designs described earlier. To the extent that the two groups are the same except for the experience of the intervention, this design controls for trend effects (not if the comparison group doesn't know about the intervention) and simultaneous historical events or exposures (i.e., something else occurring at the same time as the intervention is responsible for any observed changes).

Drawbacks of This Approach. This design is subject to the threat of spillover or contamination effects, which could cause analysts to underestimate any effects from the intervention. In addition, in the absence of random assignment to the intervention and comparison group, it is possible that the two groups are not identical (or "nonequivalent"), leaving aside their exposure to the intervention. However, unlike the cross-sectional comparison group design, differences between the groups are only a threat to validity if they vary over time. The multivariate longitudinal analysis will adjust for any difference between groups that is constant over time. For example, if the intervention group were in an urban area and the comparison group in a rural area, one would expect health care costs to differ between the groups. But as long as the differences are reasonably constant over time, they will not bias the analysis.

This design may also suffer from the threat of selection bias, which occurs when researchers cannot observe all differences between groups or fully understand if those differences are constant or variable over time. However, if random assignment to groups is possible, this design becomes a randomized controlled trial, which is considered the gold standard design for establishing cause-effect relationships in intervention research because it controls for selection bias and all other threats to internal validity.

Randomization could occur at the individual level or at the facility/site level. For example, to facilitate evaluation, individuals could be randomly selected to participate in a disease management program that is a component of a VBP activity, or employers with multiple locations may opt to implement VBP activities in only selected sites. However, for most types of VBP activities and interventions, randomization will not be possible.

Example: Nonequivalent Comparison Group
An Evaluation of the Impact of an HMO Report Card
Purchaser. General Motors Corporation (GM).
Description of Research Activity. During the fall 1996 open-enrollment period (for 1997 enrollment), GM issued its first health plan performance report card to active salaried employees. The report card contained ratings on eight dimensions for each HMO available to active employees: NCQA accreditation status, benchmark HMO, patient satisfaction, medical-surgical care, women's health, preventive care, access to care, and operational performance. For the five dimensions based on HEDIS data, each plan received a designation of one to three diamonds, signifying "below expected performance," "average performance," or "superior performance." Some plans that could not provide HEDIS data received a "no data" designation. Because of the terms of GM's contract with the United Auto Workers (UAW), the company did not provide the report card to active hourly workers. An evaluation was conducted to measure the impact of the report card on enrollment while controlling for other important factors that might affect employees' decisions, such as out-of-pocket price.
Evaluators. The evaluation was conducted by researchers affiliated with Pennsylvania State University and the University of Michigan.
Research Design. The researchers used a nonequivalent comparison group design.
Data and Measures. GM's benefit consultant provided enrollment data files, including plan offerings by ZIP Code of residence and out-of-pocket prices by coverage category, for the period before the release of the report card (1996) and after the release (1997). Employee identification data were encrypted to protect confidentiality.
Methods. The analysts constructed regression models to predict the probability that an employee would enroll in one of the plans available as a function of the out-of-pocket price of the plan and the report card ratings. For statistical reasons, observations on individual employees were aggregated to calculate health plan market shares. The evaluators also performed a regression analysis to see whether and how health plan market share was related to out-of-pocket price and the report card rating variables. Additionally, since hourly employees did not receive the report cards but had access to the same plans with no out-of-pocket cost, the regression analysis included the market share of plans for hourly employees in order to control for other important time variant information unobserved by the researchers.
Results. The results indicate that out-of-pocket price is a significant predictor of the health plans that employees select. The results also suggest that, although employees did not appear to enroll in plans rated highly by the report card, they did seem to avoid plans with many below average ratings, but the effect was not large. The primary implication for purchasers is that report card efforts can influence health plan choices and that employees may be more sensitive to negative ratings than positive ratings.
Advantages and Disadvantages of Evaluation Strategy. The primary advantage of this analytic approach is the ability to isolate the separate effect of price and report card ratings on the probability of enrollment. Less rigorous methods may have improperly attributed plan switching to the report card. The primary disadvantage is the technical sophistication and time involved in performing such an analysis, and the assumption that hourly and salaried employees have similar plan preferences.
Source: Scanlon DP, Chernew M, McLaughlin C et al., The Impact of Health Plan Report Cards on Managed Care Enrollment. Journal of Health Economics 2002;21(1):19-42.


Time Series. The time series design addresses the important issue of underlying trends. In this approach, evaluators capture information on trends underway by making multiple observations before the intervention is implemented. They then make one or more observations after the intervention is implemented, and conduct an analysis to establish the trend and test whether the VBP activity caused a deviation from the trend.

Select for Figure 6 (9 KB).

Most time series analyses compare aggregate data (usually in the form of some proportion or rate) over time. The unit of time represented by each observation will vary across evaluations, depending upon available data and the type of intervention being evaluated. Within a single evaluation, however, the units of time whether they are years, quarters, months, or weeks intervention is implemented. They then make one or more observations after the intervention is implemented, and conduct an analysis to establish the trend and test whether the VBP activity caused a deviation from the trend. should be the same for all observation points in the time series.

The basic specification for this analytical approach assumes that the VBP activity affects the level of the outcome and that this effect persists over time. It also assumes that the VBP activity does not alter the trend. Modified specifications could allow the VBP activity to affect the trend and the level, and even more complex specifications could test for persistence of the effect.

Advantages of This Approach. The strength of this design is its ability to establish whether or not a change in the outcomes being measured is the result of a trend already underway or the intervention under investigation. This approach can be contrasted with a pre-test/post-test design, which cannot reveal whether the single observation after an intervention is the continuation of a trend.

Drawbacks of This Approach. The extent to which this design adequately controls for external time trends depends on the number of periods observed prior to the intervention (and to a lesser extent after the intervention) and the stability of the trend. In addition, any external historical factor or exposure that occurs contemporaneously with the VBP intervention will confound the results. Another weakness of this design is that it requires that data be gathered or estimated in the same way and available over multiple periods of time, including a significant time period before the intervention.

Example: Time Series
Use of a Control Chart To Begin and Maintain an Asthma Disease Management Initiative
Health Care Provider. Allegiance L.L.C., a physician-hospital organization (PHO) in Ann Arbor, MI.
Description of the Research Activity. This PHO contracted with two different HMOs to assume complete financial risk for the expenditures of a managed care population. As part of its effort to improve care management, the PHO established a goal of reducing hospitalizations and emergency room visits due to asthma. Beginning in 1995, it initiated a number of interventions, including grand rounds, newsletters, semi-annual feedback reports to primary care physicians listing their patients who might benefit from use of a steroid inhaler (based on pharmacy refill data), and peer pressure from physician leaders on colleagues with low rates of steroid inhaler utilization.
Despite these initiatives, little consistent progress was made on any asthma-related metric. Still concerned about improving asthma care, the physicians and hospital partners approved funding for an asthma nurse position (one FTE) that began in June 1999. The justification was that some of the $500,000 spent annually on asthma-related hospitalizations could be reduced through increased use of steroid inhalers (Donohue et al., 1997), which would result from patient and physician detailing by the asthma nurse. Other interventions initiated around June 1999 included a monthly feedback report to physicians and supplemental academic detailing by several utilization management nurses. The purpose of the evaluation was to assess the impact of this care improvement initiative.
Evaluators. Analysis of the asthma initiative is done internally. The executive committee of the PHO, subsequently referred to as the "decisionmakers," examines the analytic evidence when approving each year's budget.
Research Design. The evaluators used a time series design relying on control charts to track relevant outcomes in the months before and after the new initiative.
Measures used were:
  1. The percentage of bronchodilator patients (those taking three or more canisters in a 6-month period) who were also taking a steroid inhaler (according to pharmacy claims data).
  2. The percentage of asthma patients visiting an emergency room or hospitalized (according to medical claims data).
Methods. The evaluators used control charts to track progress on the measures. The mean of the data points plus the upper and lower control limits (3 sigma) are represented by horizontal lines.
Results. Thus far, there is no demonstrable improvement in the number of hospitalizations or emergency room visits. However, the proportion of bronchodilator patients on steroid inhalers increased, coincident with the staffing of the asthma nurse and other interventions begun in June 1999. (The control chart below displays a shift in data points, including several above the upper control limit, which indicates statistically significant changes in the delivery process.) This temporal improvement was sufficient to convince the decisionmakers to continue funding the nurse position despite considerable downsizing in the organization.
Advantages and Disadvantages of the Evaluation Strategy. Use of control charts permitted simple yet frequent assessments. But the lack of a concurrent comparison group weakened the argument for causality. This method also makes it difficult to determine which of several simultaneous interventions had the biggest impact.
Source: Allegiance L.L.C.



III. How Do You Choose a Research Design?

A number of factors will influence which of the research designs (alone, or in combination) would be best-suited for an evaluation of your VBP initiatives. Purchasers, possibly working with other stakeholders, can get started by trying to reach agreement on the following questions:

    • What Do You Want To Learn and How Do You Expect To Use the Information? The first task is to identify the research designs that can provide you with useful information. For example, some purchasers conduct evaluations to learn how the initiative is being perceived by stakeholders and to identify any barriers; in those cases, interviews or focus groups are likely to be most useful. Others want to gather data to get a general idea of whether the program is on the right track; so simple quantitative analyses are often appropriate. Still others pursue an evaluation in order to decide whether to continue the investment in a VBP activity; this goal may call for a quantitative research design that would support a solid analysis of the costs and benefits of the initiative.
    • What Kind of Evidence Do You Need? One of the most important criteria for choosing a research design is the kind of relationship you want to see. In some cases, it may be sufficient to see evidence of a possible correlation; for example, a purchaser that has implemented an initiative to spur providers to adopt computer systems that double-check prescriptions may be satisfied to know that hospitals are investing more in information technology. In other cases, purchasers may want evidence of causation, i.e., results that demonstrate that the VBP activity is having its desired effect. To learn this, purchasers must choose the study design that will be the strongest for showing whether or not the activity causes the result the purchaser wants to see.
      Purchasers must keep in mind that statistical analyses vary in their ability to detect an effect if one exists. And for all of the quantitative research designs, the statistical power will depend on the size of the effect and the size of the sample. For example, it is easier to detect a 10 percent decrease in mortality as opposed to a 5 percent decrease; and, whatever the effect is, it will be easier to detect with 1,000 observations than with 100. The earlier example from the New Jersey Medicaid program shows how power was reduced, despite randomization, because only half of the intervention group actually remembered receiving the report card.
    • Do You Need To Defend the Results to an External Audience? A related issue involves the level of certainty you want to have about the results. If providers or health plans (or your own managers) are likely to scrutinize and question the findings, you may need to choose a design that can adjust for or explain the effects of variables other than the VBP activity. Your ability to implement one of these designs will depend on whether you have baseline data, comparison groups, adequate sample sizes, and randomized assignment.
    • How Much Money Can You Put Towards the Evaluation? Some evaluation designs are more expensive than others; so it is important to know what your limits are. That said, other considerations may be more important than financial concerns. For example, if you need a strong analytic study with defensible results but cannot afford one, paying for a cheaper study that produces questionable results would not be a worthwhile option.
    • Do You Have Access to Other Resources? Some purchasers can overcome financial limitations by taking advantage of resources available within the organization or through partners in the VBP activity. For example, academic researchers at local universities may be willing to donate their time to an evaluation (especially if the findings can be published); the purchasing organization may be able to provide the analysts with office space and computers. A related question is whether you have access to analysts who can handle sophisticated evaluation designs. While you can always find appropriate researchers if you have the funds to look outside of your organization, this option is not available to all value-based purchasers.
    • How Much Time Do You Have for the Evaluation? The answer to this question will be driven primarily by when you need the results, but it may also depend on budgets and staff availability. The options available to you if you need an answer in 6 months are very different from what you can do if you can wait for 3 years. The collection of primary data is especially time-consuming.
    • What Kinds of Data Are Available to You? The choice of research design is often circumscribed by the nature and scope of the clinical or administrative data that are readily available. To the extent that the data you need are controlled by health care organizations, you may need to consider how much cooperation you can anticipate from local providers and health plans. One way to address this issue is to plan ahead for the evaluation by incorporating requests for data into contract negotiations. However, a significant amount of data is now readily available thanks to standardized measurement tools such as HEDIS® and CAHPS® (go to Step 4 for a discussion of these two tools).

Step 4. Implement the Research

Once the research design has been selected, the evaluation itself can begin. This process may take many forms, but it generally requires three tasks:

Since this guide is designed for the decisionmaking purchaser, rather than the analysts who may actually implement an evaluation, this section simply reviews some of the issues and resources that purchasers should be aware of with respect to choosing measures and collecting the data. It assumes that the data analysis will be handled by experienced researchers, whether internal or external to the organization.

Task 1: Identify Appropriate Measures

During the process of selecting a research design, purchasers often have to consider how they expect to define and measure the outcomes in which they are interested. For example, if a VBP activity was intended to improve quality of care for employees with heart disease, how exactly will you measure quality? Will you look at measures of health status, patient satisfaction, or clinical processes?

The specific definitions of quality and cost are less important than the recognition that both are important for defining, measuring, and focusing on value. Although this point may seem obvious, it is a crucial step in thinking about how to assess the impact of VBP activities because it draws attention to both the costs of those activities and the extent to which those activities improve the quality or reduce the costs of care. This section offers a broad discussion of measurement issues for both final and intermediate outcomes of interest to value-based purchasers.

It is important to remember that the measurement strategy must fit the intended research design, with quantitative research designs and methods generally imposing more formal measurement requirements. For example, a pre-test/post-test research design will require the ability to measure specific outcomes before and after the intervention. As mentioned previously, data availability and measurement issues can preclude the selection of specific research designs

Measuring Impact on Health Status. Evaluators rely on a wide range of measures to capture the impact of VBP activities on health outcomes. But purchasers must think carefully about which of those definitions and measures they want to use, particularly if health plans or providers may challenge their decisions.

Health status outcomes are not easy to measure, in part because it is not clear which perspective to take (i.e., the patient's or the clinician's) and which domains of health to evaluate. On the one hand, you could evaluate health outcomes using clinical measures, such as weight, cholesterol level, and other commonly used metrics. However, clinical measures do not capture the perspective of the person whose health is being evaluated, and therefore can miss very important aspects of health, such as mental and social well-being. Many researchers recognize the importance of both the clinical and patient perspective in defining the health status of individuals, and use a combination of both approaches to arrive at a final judgment.

Available measures of health outcomes include the following:

Measuring Mortality. Mortality rates, i.e., the rate of death for a given population, are sometimes regarded as a measure of health. For example, researchers often compare mortality rates and life expectancy rates to gauge the health status of different countries. When they find significant differences in these rates, they may say that one country is healthier than the other. But differences in the mortality rate do not always point to the cause of the differences, which undermines its usefulness as a measure of health.

In theory, purchasers could also use mortality rates as a measure of the health status of their covered populations. However, the benefit of using this measure is questionable because only a small percentage of a population dies in a given year, particularly when the population is younger and healthier as in an employment-based setting.

That said, mortality rates are a feasible and useful measure for certain VBP activities. For example, to evaluate a VBP initiative that tries to steer bypass surgery patients to high-volume providers, it would be reasonable to assess the impact on mortality rates for those patients. When used for this purpose, mortality rates should be age-adjusted and based on reasonable time windows (i.e., annual mortality).

Measuring Morbidity. Morbidity is a term used to describe the average level of illness in a population. Morbidity accounts for pain, chronic illness, acute illness, mental illness, etc. On a societal level, researchers often measure morbidity by the prevalence of chronic disease in a population or by measures that are correlated with illness, such as missed school days and job-based disability claims. Not surprisingly, morbidity occurs more frequently than mortality.

From a purchaser's perspective, morbidity can be regarded as a function of both the prevalence of chronic illness and the level of functioning of those with chronic illness. In most cases, VBP activities are more likely to affect the latter than the former. For instance, a VBP program aimed at diabetics cannot be expected to reduce the prevalence of diabetes in a given population. In fact, to the extent that the activity is designed to facilitate the identification of diabetics, the activity itself may reveal a higher prevalence than existed at baseline. However, the VBP activity could reduce some of the negative effects associated with chronic conditions. For example, a VBP initiative aimed at asthmatics might help patients take better control of the disease, reducing complications due to asthma and lowering the number of unnecessary emergency room visits and missed school or work days. Thus, for the purposes of assessing VBP, appropriate measures of morbidity would include indicators such as hospital readmission rates or correlated measures such as absenteeism from school or work.

In the long term, it may be possible for some VBP activities to affect the prevalence of chronic illness in a population. For example, many people believe that early intervention programs focusing on diet, exercise, and regular screening can prevent or reduce the level of chronic illnesses such as diabetes. Early screening and detection can also minimize major complications associated with diseases like cancer.

Measuring Health Status. The set of tools for measuring health outcomes for populations includes survey instruments designed to measure self-reported health status. The most common of these survey instruments is the SF-36® (QualityMetric, 2001). These instruments, which can be administered to populations including those covered by purchasers, have been demonstrated to measure health status broadly. Similar but more focused instruments have been developed for measuring health status for people with specific conditions, such as depression or asthma.

Although purchasers could use health status assessment instruments to measure final outcomes, it may not be reasonable to expect VBP activities to have a strong influence on these outcomes at the population level. However, for specific VBP activities, such as those that target care for chronic illnesses, it may be feasible to use the assessment instruments developed specifically for those conditions to detect differences in health status for relevant segments of the population.

Health status measures such as the SF-36® do not capture patient preferences for various health states, but other standard tools permit preference weighting of health states. For example, researchers might consider the quality of well-being index or the health utilities index. These indices, which generate measures of quality adjusted life years, are the recommended approach by the panel on the cost effectiveness in health and medicine (Gold et al., 1996). Many disease-specific indices exist as well, although most are not preference weighted. (For more information, go to Gold et al., 1996.)

Measuring Impact on Satisfaction With Health Plans and Care Delivery. The difficulty with measuring satisfaction with health plans and care delivery is that the scope of services, activities, and benefits encompassed by these two topics is quite large. As a result, it is a real challenge to develop a single, meaningful measure and to cover all relevant domains without making the questionnaire unreasonably long. These challenges become even greater when purchasers want to learn what is causing satisfaction to be less than optimal so that they can identify and make appropriate changes in policy.

However, satisfaction and opinion surveys of this type do exist and are used by many purchasers. For example, the CAHPS survey, which is discussed in Task 2, provides measures in several domains that are relevant to consumers, including a measure that reflects overall satisfaction with one's health care plan.

Measuring Impact on Costs. The measurement of costs will likely include several different types of costs:

Measuring VBP Activity Cost. Measurement of this type of cost depends on the accounting systems involved and the extent to which resources devoted to value-based purchasing are shared with other activities. As a basic principle for cost measurement, the evaluator would identify all resources devoted to value-based purchasing and then assign a cost to those resources. One option is to take a narrow perspective that focuses only on costs borne by the purchaser. Typically, the cost of staff resources would be the percent of time that each staff member devoted to the activity multiplied by the relevant wage rate (plus fringe benefits). Other resources might include computer time, office space, printing, supplies and any outside consulting expenses. Some purchasers might want to take a broader perspective that includes provider-level costs associated with data preparation and implementation of the VBP initiative.

When relevant resources are used for activities other than value-based purchasing, you will have to decide how much of those resources to allocate to the VBP activity. One approach is to identify all of the costs that would disappear, in the long run, if the VBP activity were not conducted. For example, office space used by staff associated with the VBP activity might be considered a fixed cost that would be incurred even without the VBP activity, and therefore should not be included as a VBP cost.

Measuring Health Care Costs. If the VBP activity is intended to have broad effects on health care expenditures, competition among health plans, or even employee enrollment decisions, it is reasonable to use premiums as a measure of costs. Premiums are usually easy to measure if contracting health plans are providing a full range of administrative and risk-bearing services. Purchasers that use third-party administrators or other support entities should include the costs for those services in premium costs.

If benefit designs differ among the treatment and comparison group, the evaluators will have to make adjustments, either directly in the measurement of premium or in the research design. If direct adjustments are going to be made, they should be based on actuarial assumptions. One type of adjustment in the research design that could control for differences in benefit design would be the inclusion of indicator variables representing various aspects of benefit design as covariates in multivariate analyses. This is only feasible if there are a sufficient number of observations.

In the nonequivalent comparison group design, benefit design differences will only matter if they affect the trends in premiums. Any impact on the level of premiums will be captured by the trend in the comparison group. For example, if the differences in benefit design between the treatment and comparison group cause premiums to differ by a constant amount, the analysis will control for that difference.

In some analyses, evaluators may wish to measure health care costs using data on health care expenditures. Relative to premium data, this has the advantage of allowing detailed analyses of sub-populations or cohorts with particular clinical conditions. However, this approach does not capture any savings in administrative costs at the insurer level or any gains from competition among insurers. Measurement of spending on health care services at the individual level typically requires access to claims data. From the purchaser's perspective, the appropriate cost measure is the amount actually paid for services, including the contribution of the employee/patient. In some cases, such as pharmaceutical rebates, some effort may be required to determine true expenditures. (See Gold et al., 1996, for details on various approaches to measuring health care costs.)

It is important to remember that expenditures may not reflect true costs. Payments may be above or below what the providers actually spend to deliver the service. If the evaluators wish to measure true resource use, they will have to conduct a more detailed accounting of the process of care delivery, identifying resources used to deliver care and valuing those resources. In some settings, this can be done with provider accounting systems. In other cases, charges adjusted by cost-to-charge ratios could be appropriate. Either way, the evaluators must also pay attention to how overhead costs are allocated to various activities and services.

The duration of observation might have important implications for the observed impact of value-based purchasing on medical care costs. Many VBP activities generate some short-term costs. For example, programs to improve compliance with medications might increase short-term expenditures. Some VBP activities, such as asthma management programs, may produce offsetting savings even in the short run. For others, such as diabetes control programs, it may be many years until any medical savings are realized. Because it can take a long time for key gains to be realized, evaluators may want to rely on simulation techniques if they wish to construct a full analysis of the impact of certain VBP activities.

Measuring Costs Outside the Health Care System. Cost measurement from a broad perspective would entail measuring non-health care related resources, costs of informal care, and costs of patient (and family) time related to the consumption of medical care and a change in health status. If these costs are likely to be important, they should be included. However, because these costs are often borne by the patient and family, they may be captured in quality measures such as satisfaction. Gold et al. (1996) describe measurement strategies for these variables, but purchasers may want to measure these variables as costs only if the VBP activity is likely to affect them and they are not adequately captured by quality measures.


Choosing Measures That Match the Intervention
Sometimes, purchasers implement limited, focused interventions and then wish to detect an effect on broader, aggregate outcomes. But evaluations must focus on measures that reflect appropriate and relevant outcomes of the VBP activity. For example, suppose a purchaser implements a diabetes case management program that succeeds in reducing expenditures for diabetics by 10 percent. Even if diabetics represent 20 percent of the population, it would be difficult to detect a measurable impact at an aggregate level (e.g., by measuring premiums or population health status), particularly if there is a lot of variance in expenditures for the remaining population. More appropriate measures could include the annual costs of care for diabetics, their satisfaction with care, and complication rates for diabetics.


Measuring Impact on Labor Market Outcomes. Among the most difficult costs to measure are the costs associated with decreased labor productivity as a result of employees seeking care or experiencing poor health. This includes the costs associated with absenteeism, decreased productivity while working, and labor turnover. In theory, these costs could be measured by the value of lost production associated with the absenteeism and lost productivity, plus the administrative costs of replacing workers or adjusting production processes. In practice, measuring these costs is a serious challenge. Some evaluators assign a cost to absenteeism by valuing missed days from work at the wage rate of the workers. A more thorough analysis would use accounting principles to assess the impact of absenteeism on production costs.

In some cases, the evaluators might want to treat variables such as missed workdays as measures of quality. If so, they must be careful not to double-count these variables by also including them in the calculation of costs. Gold et al (1996)recommend that, if you are using a quality-adjusted life year measure of quality, production costs should be excluded, or at least reported separately. But measures of the impact on production costs can be important variables for many VBP activities, especially from an employer's perspective. If VBP activities may have a measurable impact on these variables, evaluators should try to measure the effects and include them with costs unless the effects are explicitly captured by quality variables.

Measuring Impact on Utilization of Services. If you want to use utilization as an indicator of quality, several measurement options exist. The easiest is to simply measure the use of the target service. For example, one could measure Caesarean section rates or mammography rates. Presumably, the VBP initiative would try to decrease the former and increase the latter, but this measurement strategy does not attempt to distinguish between appropriate and inappropriate changes in the rates of either service. The HEDIS® system follows this approach.

An alternative approach would be to conduct a detailed analysis of care, perhaps using medical records. This tends to be very expensive, but it is feasible and has been used in a variety of studies. For some illnesses, there are quality-of-care assessment tools that can be applied.

The utilization measures most commonly used fall into the following categories:

      • Inpatient hospital admissions, days, and length of stay.
      • Emergency room use.
      • Outpatient hospital services.
      • Outpatient physician visits.
      • Referrals for specialty physician consultation.
      • Pharmaceutical utilization.

These measures may be broken down by patient characteristics (e.g., gender, age, race), provider characteristics (e.g., high-volume vs. low-volume hospital), delivery setting (e.g., group vs. solo practice), diagnosis, and procedure.

Purchasers should recognize that savings from reduced resource use do not necessarily flow back to them. For example, if providers get paid full capitation rates, they will capture any savings unless the capitation rates are lowered. Similarly, if hospital admissions are paid on a per case basis, as with diagnosis-related groups, reductions in length of stay will not generate savings for the purchaser.

Measuring Impact on Health-Related Behaviors. Evaluators can assess whether VBP programs encourage low-risk behaviors by measuring changes in the number or percent of employees who engage in these behaviors.

Measuring Impact on Patients' Decisions. To assess whether a VBP activity has affected the choices that patients make, evaluators can look for changes in the percentage of patients or employees choosing providers or health plans that have been identified as "top" or preferred performers on report cards or quality evaluations.

Dealing With Multidimensional Outcomes
Evaluators are often interested in several outcomes. One important question to resolve is whether to examine these outcomes separately or in aggregate. This is similar to the issue that arises in any cost-effectiveness analysis when the intervention affects several aspects of health status. The panel on the cost effectiveness in health and medicine (go to Gold et al., 1996) recommends that multiple outcomes be aggregated into a quality-adjusted life years scale. Conceptually, this kind of aggregation is possible for evaluations of VBP activities, but it would require measuring the many aspects of health status that might be affected. This is difficult to do but may be feasible for VBP activities that closely resemble a clinical intervention.
In other cases, when the outcomes of interest are intermediate health outcomes (including process or structure measures), aggregation is more complex. One approach is to not aggregate the various outcomes. The evaluation would report the various outcomes, and users of the research findings would need to draw their own conclusions about whether the investment was justified. This approach is most feasible when the number of outcomes is small and the users are comfortable weighing measures of quality against measures of costs. An alternative aggregation methodology involves combining outcomes into one or more domains of performance based upon subjective values. For more on this topic, go to the discussion of grouping HEDIS® measures.


Task 2: Collect the Data

Data may take the form of qualitative information or quantifiable values. It can be obtained from a variety of primary sources (where the evaluator collects the data) and secondary sources (where the data are collected by someone else but used by the evaluator). Regardless of the type or source of data, the quality of the program evaluation will depend on the data's reliability and validity. Since final and intermediate outcomes are the focus of VBP activities, the manner in which you measure these outcomes is crucial for producing credible and useful evaluation results. However, the importance of measurement accuracy applies equally to non-outcome data that are used in evaluations to control for other confounding factors.

For the purposes of evaluating VBP activities, common primary sources of data include administrative claims data, medical records, stakeholders (e.g., the health plans involved in guideline development), and health care consumers. Since this guide presumes that purchasers interested in gathering data from primary sources are likely to consult and contract with outside experts, this section is limited to a discussion of two major secondary sources:

A Quick Look at HEDIS®. HEDIS® is a set of about 60 process and outcome measures designed to capture dimensions of health plan quality. Initially developed by a group of large private employers, HEDIS® is now administered by the National Committee for Quality Assurance, Washington, DC. To date, HEDIS® has been used primarily to monitor the performance of HMOs, although research is currently being conducted to examine the feasibility of HEDIS® indicators for PPOs and other types of insurance products. Because each of the approximately 60 performance measures includes specific guidelines for data collection and reporting, the results are standardized. This allows purchasers and others to compare the performance of any health plan to the performance of other health plans nationally, regionally, and locally.

With the exception of the CAHPS® composites that recently became part of the HEDIS® reporting requirements (see more on this below), HEDIS® measures do not capture final outcomes. However, expert panels selected the measures included in HEDIS® because research evidence indicates that they are correlated with both costs and health status. For example, some of the HEDIS® measures capture the utilization rate of health care services and surgical procedures that are often overused, resulting in unnecessary costs and risks to patients. The Cesarean section (C-section) rate is an example of one such measure:

        • First, C-sections are more costly than vaginal deliveries.
        • Second, the scientific literature suggests that many C-sections are unwarranted (since vaginal deliveries are possible) and that those unwarranted C-sections put women at unnecessary risk for infection and other post-surgical complications (Sakala, 1993).

Other HEDIS® measures capture utilization rates for preventive care services and for screenings that are recommended for subsets of enrolled populations. For example, HEDIS® includes rates of mammography screening for women, prostate cancer screening for men, and immunizations for infants and adolescents. Although preventive care and health screenings do not directly capture any of the final outcomes in which purchasers are interested, they are thought to be correlated with health status and costs since screenings can lead to early detection and less expensive treatment, and prevention can lead to the avoidance of illness. HEDIS® also includes several clinical measures of treatment for selected diseases, such as rates of prescription of beta-blockers following a heart attack or readmission rates following discharge for a mental health diagnosis.

The collection, analysis, and dissemination of HEDIS® data have been a major focus of employers' VBP activities in the last 8 years. More recently, employers have been analyzing HEDIS® results to evaluate the impact of those and other activities. There are many approaches purchasers can use to measure the effects of VBP initiatives on HEDIS® scores. For example, you could examine whether a plan's scores surpass minimally accepted standards, or compare a plan's scores to regional or national averages or the Nation's top performing plans. Another option is to look for changes in a plan's HEDIS® scores from one period to the next.

However, for the individual purchaser, it is not clear how well HEDIS® serves as a source of data for evaluation purposes. One complication of using HEDIS® to assess the impact of VBP activities is the fact that there are more than 100 rates (i.e., a single measure may include separate rates for men and women or for people of different ages). It is not uncommon for plans to perform well on some rates but not on others, making it difficult to conclude anything about the overall performance of the plan across all rates. (Go to the box for more on this topic.) In addition, because many purchasers collect and analyze HEDIS® data, there is no way to know whether changes in performance were due to the collective focus of all purchasers or the specific activities of a single purchaser.

Additional information about HEDIS® can be found on the NCQA Web site at www.ncqa.org.

Managing HEDIS® Measures Through Aggregation
To address the issue of multiple measures, some purchasers combine multiple HEDIS® measures into one overall score or a smaller number of scores representing fewer dimensions, such as prevention, access, and surgical care. They then use these aggregated scores to compare plans to each other, or to compare a specific plan's performance from one period to the next.
To create these dimensions, or categories, one could assign weights to selected HEDIS® measures and group those measures into domains of performance either using ad hoc methods or factor analytic approaches. The exact weight for each of the measures in a domain would reflect organizational objectives and perhaps scientific literature regarding the importance of the measure in affecting health. Once the weights are determined, each plan or provider can receive a score based on its relative performance. This approach is similar to the strategy that some purchasers use to combine individual health plan performance measures into consumer-friendly categories for reporting purposes.
The advantage of this approach is that it allows for the many HEDIS® measures to be collapsed into a smaller subset. The disadvantage is that this approach masks the heterogeneity of the individual HEDIS® measures and makes it difficult to identify specific areas for improvement. For more on reporting categories, go to Scanlon et al., 2001; and https://talkingquality.ahrq.gov.


A Quick Look at CAHPS®. CAHPS®, sponsored by AHRQ, is a family of survey instruments designed to obtain consumer assessments about the quality of the health care services they receive. The core survey contains about 50 standard items that focus on multiple dimensions of the care and services provided by health plans, including getting needed care, getting care quickly, doctor's communication skills, courteousness and helpfulness of staff and customer service and other domains. CAHPS® instruments have been developed and tested for adults and children who are covered by commercial insurers, Medicare and Medicaid. Supplemental items have also been developed to identify and obtain data on the care provided for children and adults with chronic conditions. Other supplemental sets include items on interpreter services and transportation. Though CAHPS® was originally designed for consumer assessment of health plans, an upcoming version has been developed to obtain consumer assessment of providers within group practices. CAHPS® II (beginning in 2002) will also focus on use of CAHPS® information for quality improvement purposes.

The CAHPS® Survey and Reporting Kit contains complete instructions for implementation of the surveys, templates for reporting results to consumers, instructions for data analysis, and other issues such as presenting CAHPS® results to the media.

Like HEDIS®, the CAHPS® survey includes standardized questions and specific protocols for administering the survey so that each plan's results can be compared to the performance of other plans nationally, regionally, or locally. As noted, the NCQA has incorporated the CAHPS® composite measures into its data reporting requirements for HEDIS®. With the exception of the ratings of care and health plan services, most of the CAHPS® items are not direct measures of the other final outcomes discussed in this guide. However, research findings suggest that most of the CAHPS® measures are correlated with some of these outcomes, most likely health status and labor market outcomes. To the extent that CAHPS® captures the quality and appropriateness of clinical care, for example, the survey results would be correlated with health status. Similarly, since CAHPS® asks for enrollees' opinions about their health care plans, these results may be related to labor market outcomes. For example, if employees report that they are happy with their health care plans, one might expect lower employee turnover, although other factors can also lead to turnover.

Because CAHPS® is comprised of so many items, the use of CAHPS® for assessing the impact of VBP activities faces barriers similar to those discussed above for HEDIS®. Namely, individual items have to be aggregated in order to be useful. However, the CAHPS® developers have conducted considerable research regarding the appropriate aggregation of CAHPS® measures and issued guidelines for purchasers and others to follow (CAHPS® 2.0, 1999). In addition, because CAHPS® asks plan enrollees about the care and services of their health plans, the results may not be relevant to specific VBP activities that are more provider-oriented. This issue may be resolved by the upcoming introduction of G-CAHPS® (group-level CAHPS®), which focuses on consumers' experiences with physicians and medical practices.

To obtain the CAHPS® Survey and Reporting Kit free-of-charge, or to learn more about CAHPS®, contact the CAHPS® Survey Users Network (SUN) at 1-800-492-9261 or at http://www.cahps-sun.org Link to Exit Disclaimer. The SUN also provides limited technical assistance.

Task 3: Analyze the Data

For most evaluations, the analyst is not the same individual or group of individuals who made the initial decision to embark on the evaluation. In many cases, the purchaser may wish to contract with external consultants or individuals affiliated with academic institutions to assist in the analysis. This is particularly true for more complex analyses that require statistical expertise and familiarity with methods and software for conducting experimental and observational research. Outside analysts also offer the benefit of objectivity, since they have no stake in the results of their research.

Ideally, the analysts should be involved in all of the steps outlined in this guide, particularly in the choice of research design and issues of data collection and measurement, but in practice this is not always the case. Regardless of whether the analysts have been involved in the development and planning of the evaluation, it is important that they understand the details of the VBP program, the short-term and long-term objectives of the purchaser, and how the purchaser hopes to use the findings so that the analysis will result in information that is germane and useful.


Step 5. Summarize the Results and Interpret Implications for Purchasing Activities

Once the analysis has produced evidence regarding the impact of VBP activities on relevant outcomes, the next step is to ensure that those findings are communicated in a way that is helpful to you, and potentially to the larger community of value-based purchasers and health services researchers. For this to happen, the purchaser must first make sure that the analysts do not simply hand over hundreds of pages of output from regression models. Rather, the analysts should be directed to present senior-level management with a succinct list of key results and findings that are pertinent to the overarching goals and objectives of the organization. This document would be similar in concept to a legal brief or one-page business memo, both of which are designed to facilitate quick and accurate decisionmaking.

The second part of this step is to use these findings to draw out the implications for the VBP activity; this task may be performed by the analysts or by the purchaser. However, in practice, this work is often neglected or forgotten. In some cases, the results of an evaluation never make it to this step because of problems with the research or how the findings have been communicated (e.g., when analysts provide senior-level decisionmakers with information that is voluminous, too confusing to understand, or impossible to sort through). But purchasers need to determine what the results of the analysis mean for the VBP activity: whether it is working, where it is failing, whether and how it can be refined. Ultimately, this is the step where the transition from analysis to decisionmaking occurs, using the results of the VBP evaluation as the bridge.

The final task, of course, is for the purchaser to incorporate the results of the VBP evaluation into decisions. Because all organizations have different structures and processes for making decisions, and because information from the evaluation is just one of many inputs, this guide does not delve into this topic. However, purchasers are strongly encouraged to involve key stakeholder groups in discussing how to interpret and use the results. A key principle of "utilization-focused" evaluation (i.e., an evaluation that is attempting to produce results that will be useful to specific audiences) is that people outside of the evaluation team need to be involved in discussions of draft results and in decisions that derive from those results.


  • Babbie E. The Practice of Social Research, 8th ed. Belmont, CA: Wadsworth Publishing Company; 1998.
  • Bailey DM. Research for the Health Professional: A Practical Guide, 2nd ed. Philadelphia, PA: F.A. Davis Company; 1997.
  • Battistella R, Burchfield D. The Future of Employment-Based Health Insurance. Journal of Healthcare Management2000;45(1):46-57.
  • Buchmueller TC, Feldstein P. The Effect of Price on Switching Among Health Plans. Journal of Health Economics 1997, 16(2):231-47.
  • CAHPS® 2.0 Survey and Reporting Kit. Rockville, MD: Agency for Health Care Policy and Research; 1999. AHRQ Publication No. 99-0039.
  • Campbell DT, Stanley JC. Experimental and Quasi-Experimental Designs for Research. Dallas, TX: Houghton Mifflin Company; 1963.
  • Castles AG, Milstein A, Damberg CL. Using Employer Purchasing Power to Improve the Quality of Perinatal Care. Pediatrics1999;103(1):248-54.
  • Ceniceros R. Employees Bear More Costs. Business Insurance 2001;35(43):3,44.
  • Donahue JG, Weiss ST, Livingston JM et al., Inhaled Steroids and the Risk of Hospitalization for Asthma. Journal of the American Medical Association 1997;277(11):887-91.
  • Dowd BE, Finch M. Employers as Catalysts for Health Care Quality: Theory and Practice [mimeo]. Division of Health Services Research and Policy, University of Minnesota. Prepared for an Agency for Healthcare Research and Quality conference, "Understanding How Employers Can Be Catalysts for Quality: Insights for a Research Agenda," April 4, 2001, Washington, DC; 2001.
  • Dudley RA, Bae RY, Johansen KL et al., When and How Should Purchasers Seek to Selectively Refer Patients to High Quality Hospitals? Prepared for a National Academy of Sciences workshop, "Interpreting the Volume-Outcome Relationship in the Context of Health Care Quality," May 11, 2000, Washington, DC; 2000.
  • Farley DO, Short PF, Elliot MN et al., Effects of CAHPS® Health Plan Performance Information on Plan Choices by New Jersey Medicaid Beneficiaries. Health Services Research 2002 (in press).
  • Feldman R, Christianson J, Schultz J. Do Consumers Use Information to Chose a Health-Care Provider System? The Milbank Quarterly 2000;78:1,47-77.
  • Feldstein PJ. Health Care Economics, 4th ed. Albany, NY: Delmar Publishers; 1993.
  • Fink A. Evaluation Fundamentals: Guiding Health Programs, Research and Policy. Newbury Park, CA: Sage Publications; 1993.
  • Fossett JW, Goggin M, Hall JS et al., Managing Medicaid Managed Care: Are States Becoming Prudent Purchasers? Health Affairs 2000;19(4):36-49.
  • Fraser I, McNamara P, Lehman G et al., The Pursuit of Quality by Business Coalitions: A National Survey. Health Affairs1999;18(6):158-65.
  • Gabel J, Levitt L, Pickreign J et al., Job-Based Health Insurance in 2001: Inflation Hits Double Digits, Managed Care Retreats.Health Affairs 2001;20(5):180-6.
  • Gold MR, Siegel JE, Russell LB et al., Eds. Cost-Effectiveness in Health and Medicine. New York, NY: Oxford University Press; 1996.
  • Kohn LT, Corrigan JM, Donaldson MS, Eds. To Err Is Human: Building a Safer Health System. Washington, DC: Committee on Quality of Health Care in America, Institute of Medicine; 2000.
  • Leapfrog Group. Fact Sheet. Washington, DC; 2001. Available at: http://www.leapfroggroup.org/FactSheets/LF_FactSheet.pdf
  • Levit K, Smith C et al., Inflation Spurs Health Spending in 2000. Health Affairs 2002;21(1):172-81.
  • Lipson D, De Sa J. Impact of Purchasing Strategies on Local Health Care Systems. Health Affairs 1996;15(2):62-76.
  • Meyer J, Rybowski L, Eichler R. Theory and Reality of Value-Based Purchasing: Lessons from the Pioneers. Rockville, MD: Agency for Health Care Policy and Research; 1997. AHCPR Publication No. 98-0004.
  • Meyer JA, Wicks EK, Rybowski LS et al., Report on Report Cards: Initiatives of Health Coalitions and State Government Employers to Report on Health Plan Performance and Use Financial Incentives. vol. I. Washington, DC: Economic and Social Research Institute; 1998.
  • Meyer JA, Wicks EK, Rybowski LS et al., Report on Report Cards: Initiatives of Health Coalitions and State Government Employers to Report on Health Plan Performance and Use Financial Incentives. vol. II. Washington, DC: Economic and Social Research Institute; 1999.
  • Midwest Business Group on Health, Juran Institute, The Severyn Group. Reducing the Costs of Poor Quality Health Care Through Responsible Purchasing Leadership. [Draft]. Chicago, IL: Midwest Business Group on Health; 2002.
  • Milstein RL, Wetterhall SF et al., Framework for Program Evaluation. Morbidity and Mortality Weekly Report 1999, 48(No. RR-11).
  • Patton MQ. Utilization-Focused Evaluation, 3rd ed. Thousand Oaks, CA: Sage Publications; 1997.
  • Quality vs. Costs? A Survey of Healthcare Purchasing Habits and Concerns. Healthcare Financial Management2000;54(7):68-72.
  • QualityMetric Incorporated. SF-36® Health Survey. Lincoln, RI; 2001. http://www.qmetric.com
  • Ragin CC. The Distinctiveness of Case-oriented Research. Health Services Research 1999;34(5; Part II):1137-51.
  • Robinow A. Ensuring Health Care Quality: A Purchaser's Perspective—A Health Care Coalition. Clinical Therapeutics1997;19(6):1545-54.
  • Rodriguez T, Schauffler H. Exercising Purchasing Power for Preventive Care. Health Affairs 1996;15(1):73-85.
  • Sakala C. Medically Unnecessary Caesarean Section Births: Introduction to a Symposium. Social Science and Medicine1993;37(10):1177-98.
  • Scanlon DP, Chernew M, McLaughlin C et al., The Impact of Health Plan Report Cards on Managed Care Enrollment.Journal of Health Economics 2002;21(1):19-42.
  • Scanlon DP, Darby C, Rolph, E et al., The Role of Performance Measures for Improving Quality in Managed Care Organizations. Health Services Research 2001;36(3):619-41.
  • Schauffler HH, Brown C, Milstein A. Raising the Bar: The Use of Performance Guarantees by the Pacific Business Group on Health. Health Affairs 1999;18(2):134-42.
  • Schultz J, Thiede Call K, Feldman R et al., Do Employees Use Report Cards to Assess Health Care Provider Systems? Health Services Research 2001;36(3):509-30.
  • Shortell S, Richardson WC. Health Program Evaluation. Saint Louis, MO: Mosby; 1978.
  • Sofaer S. Qualitative Methods: What Are They and Why Use Them? Health Services Research 1999;34(5; Part II):1101-18.
  • Uhrig JD. Beneficiaries' Use of Quality Reports for Choosing Medicare Health Plans. [Ph.D. Dissertation]. The Pennsylvania State University; 2001.
  • Wennberg JE, Gittelsohn A. Small Area Variations in Health Care Delivery. Science 1973;82(117):1102-8.
  • Yeaton W, Camberg L. Program Evaluation for Managers. Boston, MA: Management Decision and Research Center, Health Services Research and Development Services, Office of Research and Development, Department of Veterans Affairs; 1997.


Selected Resources and Web Sites for Purchasers' Quality Improvement Activities

In addition to this guide, AHRQ has several other resources that may be helpful for purchasers seeking to improve the quality of health care. The Web sites listed below provide more information about these resources.

AHRQ Quality Indicators (QIs)

The AHRQ QIs software is a set of measures of health care quality that is designed for use in conjunction with hospital administrative data to highlight potential quality concerns, identify areas that need further study and investigation, and track changes over time. More details on the AHRQ Quality Indicators are available athttp://www.qualityindicators.ahrq.gov.


CONQUEST (COmputerized Needs-oriented QUality Measurement Evaluation SysTem) is quality improvement software that draws on two databases—one for clinical performance measures and one for conditions. CONQUEST helps users identify, understand, compare, evaluate, and select measures to assess and improve clinical performance.


CAHPS® is an easy-to-use kit of survey and report tools that provides reliable and valid information to help consumers and purchasers assess and choose among health plans and providers. All CAHPS® products are available from the CAHPS® Survey Users Network at http://www.cahps-sun.org Link to Exit Disclaimer.

National CAHPS® Benchmarking Database (NCBD)

Initiated in 1998, the NCBD provides benchmarks to facilitate comparisons across health plans by users of the CAHPS® survey. Users can access the database at http://ncbd.cahps.org Link to Exit Disclaimer.

Making Health Care Safer: A Critical Analysis of Patient Safety Practices

This evidence report, compiled by AHRQ's , reviews the evidence on a total of 79 patient safety practices. Making Health Care Safer: A Critical Analysis of Patient Safety Practices describes 11 practices that the researchers considered highly proven to work but which are not performed routinely in the Nation's hospitals and nursing homes. The report is available online at http://www.ahrq.gov/clinic/ptsafety/ or in printed format from the AHRQ Publications Clearinghouse.

National Guideline Clearinghouse™ (NGC)

The National Guideline Clearinghouse is a comprehensive database that provides objective, detailed information on evidence-based clinical practice guidelines at http://www.guideline.gov.

TalkingQuality Web Site

Launched in March 2002, the TalkingQuality Web site provides easy-to-use information on health care quality athttps://talkingquality.ahrq.gov. The site is sponsored by AHRQ, CMS, and the U.S. Office of Personnel Management. 


  • Academy for Health Services Research and Health Policy. http://www.academyhealth.org/
  • Agency for Healthcare Research and Quality: Reauthorization Fact Sheet. Rockville, MD: Agency for Healthcare Research and Quality; 1999. AHRQ Publication No. 00-P002. Available at: http://www.ahrq.gov/about/ahrqfact.htm
  • AHRQ Profile: Quality Research for Quality Healthcare. Rockville, MD: Agency for Healthcare Research and Quality; 2000. AHRQ Publication No. 00-P005. Available at: http://www.ahrq.gov/about/profile.htm
  • American Evaluation Association. 2001. http://www.eval.org
  • Babbie E. The Practice of Social Research, 8th ed. Belmont, CA: Wadsworth Publishing Company; 1998.
  • Bailey DM. Research for the Health Professional: A Practical Guide, 2nd ed. Philadelphia, PA: F.A. Davis Company; 1997.
  • Bailit M. Ominous Signs and Portents: A Purchaser's View of Health Care Market Trends. Health Affairs 1997;16(6):85-8.
  • Battistella R, Burchfield D. The Future of Employment-Based Health Insurance. Journal of Healthcare Management2000;45(1):46-57.
  • Bodenheimer T, Sullivan, K. How Large Employers Are Shaping the Health Care Marketplace. New England Journal of Medicine 1998;338(15):1084-7.
  • Brand R, Dudley R, Johansen K et al., Selective Referral to High-Volume Hospitals. Journal of the American Medical Association 2000;283(9):1159-65.
  • Buchmueller TC, Feldstein P. The Effect of Price on Switching Among Health Plans. Journal of Health Economics1997;16(2):231-47.
  • Burke B. Evaluating for a Change: Reflections on Participatory Methodology. New Directions for Evaluation 1998;80:43-56.
  • CAHPS® 2.0 Survey and Reporting Kit. Rockville, MD: Agency for Health Care Policy and Research; 1999. AHRQ Publication No. 99-0039.
  • Campbell DT, Stanley JC. Experimental and Quasi-Experimental Designs for Research. Dallas, TX: Houghton Mifflin Company; 1963.
  • Carroll S, Meyer J, Rybowski L et al., Employer Coalition Initiatives in Health Care Purchasing (vols. I and II). Washington, DC: Economic and Social Research Institute; 1996.
  • Castles AG, Milstein A, Damberg CL. Using Employer Purchasing Power to Improve the Quality of Perinatal Care. Pediatrics1999;103(1):248-54.
  • Ceniceros R. Employees Bear More Costs. Business Insurance 2001;35(43):3,44.
  • Christianson J, Feldman R, Weiner J et al., Early Experience with a New Model of Employer Group Purchasing in Minnesota.Health Affairs 1999;18(6):100-14.
  • Committee on Quality of Health Care in America, Institute of Medicine. Crossing the Quality Chasm: A New Health System for the 21st Century. Washington, DC: Institute of Medicine; 2001.
  • CONQUEST Fact Sheet. COmputerized Needs-Oriented QUality Measurement Evaluation SysTem. Rockville, MD: Agency for Health Care Policy and Research; 1999. AHCPR Publication No. 99-P001. Available at: http://www.ahrq.gov/qual/conquest/conqfact.htm
  • Donahue JG, Weiss ST, Livingston JM et al., Inhaled Steroids and the Risk of Hospitalization for Asthma. Journal of the American Medical Association 1997;277(11):887-91.
  • Donaldson M. Accountability for Quality in Managed Care. Journal on Quality Improvement 1998;21(12):711-25.
  • Dowd BE, Finch M. Employers as Catalysts for Health Care Quality: Theory and Practice [mimeo]. Division of Health Services Research and Policy, University of Minnesota. Prepared for an Agency for Healthcare Research and Quality conference, "Understanding How Employers Can Be Catalysts for Quality: Insights for a Research Agenda," April 4, 2001, Washington, DC; 2001.
  • Dudley RA, Bae RY, Johansen KL et al., When and How Should Purchasers Seek to Selectively Refer Patients to High Quality Hospitals? Prepared for a National Academy of Sciences workshop, "Interpreting the Volume-Outcome Relationship in the Context of Health Care Quality," May 11, 2000, Washington, DC; 2000.
  • Farley DO, Short PF, Elliot MN et al., Effects of CAHPS® Health Plan Performance Information on Plan Choices by New Jersey Medicaid Beneficiaries. Forthcoming in Health Services Research 2002 (in press).
  • Feldman R, Christianson J, Schultz J. Do Consumers Use Information to Chose a Health-Care Provider System? The Milbank Quarterly 2000;78:1,47-77.
  • Feldstein PJ. Health Care Economics, 4th ed. Albany, NY: Delmar Publishers; 1993.
  • Fink A. Evaluation Fundamentals: Guiding Health Programs, Research and Policy. Newbury Park, CA: Sage Publications; 1993.
  • Fossett JW, Goggin M, Hall JS et al., Managing Medicaid Managed Care: Are States Becoming Prudent Purchasers? Health Affairs 2000;19(4):36-49.
  • Fraser I, McNamara P. Employers: Quality Takers or Quality Makers? Medical Care Research and Review 2000;57(2):33-52.
  • Fraser I, McNamara P, Lehman G et al., The Pursuit of Quality by Business Coalitions: A National Survey. Health Affairs1999;18(6):158-65.
  • Gabel J, Levitt L, Pickreign J et al., Job-Based Health Insurance in 2001: Inflation Hits Double Digits, Managed Care Retreats.Health Affairs 2001;20(5):180-6.
  • Gabel J, Ginsburg P, Hart K. Small Employers and Their Health Benefits, 1988-1996: An Awkward Adolescence. Health Affairs 1997;16(5):103-10.
  • Galvin RS. An Employer's View of the U.S. Health Care Market. Health Affairs 1999;19(1):166-70.
  • Gold MR, Siegel JE, Russell LB et al., Eds. Cost-Effectiveness in Health and Medicine. New York, NY: Oxford University Press; 1996.
  • Grimaldi P. How Purchasers Evaluate Health Plans. Nursing Management 1997;28(10):31-2.
  • Healthcare Cost and Utilization Project (HCUP), 1988-97: A Federal-State-Industry Partnership in Health Data. Rockville, MD: Agency for Healthcare Research and Quality; 2000. Available at: http://www.ahrq.govhttp://www.hcup-us.ahrq.gov/overview.jsp
  • Johns Hopkins University. The Johns Hopkins ACG Case-Mix System. Baltimore, MD; 2001. http://www.acg.jhsph.edu
  • Kamlet M. The Comparative Benefits Modeling Project: A Framework for Cost Utility Analysis of Government Health Care Programs. Office of Disease Prevention and Health Promotion, Public Health Service, U.S. Department of Health and Human Services, Washington, DC; 1992.
  • Keller J. Business Coalition Initiatives Related to Behavioral Healthcare Purchasing and Quality Improvement. Behavioral Healthcare Tomorrow 1995;4(4):49-52.
  • Kohn LT, Corrigan JM, Donaldson MS, Eds. To Err Is Human: Building a Safer Health System. Washington, DC: Committee on Quality of Health Care in America, Institute of Medicine; 2000.
  • Leapfrog Group. Fact Sheet. Washington, DC; 2001. Available at: http://www.leapfroggroup.org/FactSheets/LF_FactSheet.pdf
  • Legnini M, Perry M, Robertson N et al., Where Does Performance Measurement Go From Here? Health Affairs2000;19(3):173-7.
  • Levit K, Smith C et al., Inflation Spurs Health Spending in 2000. Health Affairs 2002;21(1):172-81.
  • Lipson D, De Sa J. Impact of Purchasing Strategies on Local Health Care Systems. Health Affairs 1996;15(2):62-76.
  • Lo Sasso AT, Perloff L, Schield J et al., Beyond Cost: "Responsible Purchasing" of Managed Care by Employers. Health Affairs 1999;18(6):212-23.
  • Mandelblatt JS, Fryback DG, Weinstein MC et al., Assessing the Effectiveness of Health Interventions. In: Cost-Effectiveness in Health and Medicine. New York, NY: Oxford University Press, pp. 135-75; 1996.
  • Marquis M, Long SH. Who Helps Employers Design Their Health Insurance Benefits? Health Affairs 2000;19(1):133-8.
  • McGee J, Kanouse D, Sofaer S et al., Making Survey Results Easy to Report to Consumers: How Reporting Needs Guided Survey Design in CAHPS®. Medical Care 1999;37(3):MS32-40, supplement.
  • Meyer J, Rybowski L, Eichler R. Theory and Reality of Value-Based Purchasing: Lessons from the Pioneers. Rockville, MD: Agency for Health Care Policy and Research; 1997. AHCPR Publication No. 98-0004.
  • Meyer JA, Wicks EK, Rybowski LS et al., Report on Report Cards: Initiatives of Health Coalitions and State Government Employers to Report on Health Plan Performance and Use Financial Incentives. vol. I. Washington, DC: Economic and Social Research Institute; 1998.
  • Meyer JA, Wicks EK, Rybowski LS et al., Report on Report Cards: Initiatives of Health Coalitions and State Government Employers to Report on Health Plan Performance and Use Financial Incentives. vol. II. Washington, DC: Economic and Social Research Institute; 1999.
  • Midwest Business Group on Health, Juran Institute, The Severyn Group. Reducing the Costs of Poor Quality Health Care Through Responsible Purchasing Leadership. [Draft]. Chicago, IL: Midwest Business Group on Health; 2002.
  • Milstein RL, Wetterhall SF et al., Framework for Program Evaluation. Morbidity and Mortality Weekly Report 1999;48(No. RR-11).
  • Patton MQ. Utilization-Focused Evaluation, 3rd ed. Thousand Oaks, CA: Sage Publications; 1997.
  • Quality Interagency Coordination Task Force (QuIC). Fact Sheet. AHRQ, Rockville, MD: Agency for Healthcare Research and Quality; 2001. AHRQ Publication No. 00-P027. Available at: http://www.ahrq.gov/qual/quicfact.htm
  • Quality Measurement Advisory Service (QMAS). Arm in Arm: A Guide to Implementing a Coordinated Quality Measurement Program. Seattle, WA: Foundation for Health Care Quality; 1999.
  • Quality Measurement Advisory Service (QMAS). Measuring Health Care Quality for Value-Based Purchasing; 1997. http://www.qmas.org/tools/guide-management/26total.htm
  • Quality Measurement Advisory Service (QMAS). Organizing and Financing a Health Care Quality Measurement Initiative: A Guide For Getting Started; 1996. http://www.qmas.org/tools/guide-organizing/16total.htm
  • Quality vs. Costs? A Survey of Healthcare Purchasing Habits and Concerns. Healthcare Financial Management2000;54(7):68-72.
  • QualityMetric Incorporated. SF-36® Health Survey. Lincoln, RI; 2001. http://www.qmetric.com
  • Ragin CC. The Distinctiveness of Case-oriented Research. Health Services Research 1999;34(5; Part II):1137-51.
  • Reinertsen J. Collaborating Outside the Box: When Employers and Providers Take on Environmental Barriers to Guideline Implementation. Journal on Quality Improvement 1995;21(11):612-8.
  • Robinow A. Ensuring Health Care Quality: A Purchaser's Perspective—A Health Care Coalition. Clinical Therapeutics1997;19(6):1545-54.
  • Rodriguez T, Schauffler H. Exercising Purchasing Power for Preventive Care. Health Affairs 1996;15(1):73-85.
  • Rossi PH, Freeman HE, Lipsey MW. A Systematic Approach, 6th ed. Newbury Park, CA: Sage Publications; 1999.
  • Sakala C. Medically Unnecessary Caesarean Section Births: Introduction to a Symposium. Social Science and Medicine1993;37(10):1177-98.
  • Scanlon DP, Chernew M, McLaughlin C et al., The Impact of Health Plan Report Cards on Managed Care Enrollment.Journal of Health Economics 2002;21(1):19-42.
  • Scanlon DP, Darby C, Rolph, E et al., The Role of Performance Measures for Improving Quality in Managed Care Organizations. Health Services Research 2001;36(3):619-41.
  • Schauffler HH, Brown C, Milstein A. Raising the Bar: The Use of Performance Guarantees by the Pacific Business Group on Health. Health Affairs 1999;18(2):134-42.
  • Schultz J, Thiede Call K, Feldman R et al., Do Employees Use Report Cards to Assess Health Care Provider Systems? Health Services Research 2001;36(3):509-30.
  • Shortell S, Richardson, WC. Health Program Evaluation. Saint Louis, MO: Mosby; 1978.
  • Smith VK, Des Jardins T, Peterson KA. Exemplary Practices in Primary Care Case Management: A Review of State Medicaid PCCM Programs. Center for Health Care Strategies; 2000. Available at: http://www.chcs.org/publications/purchasing.html
  • Sofaer S. Qualitative Methods: What Are They and Why Use Them? Health Services Research 1999;34(5; Part II):1101-18.
  • South Carolina Department of Health and Environmental Control. Evaluation Resources. Available at: http://www.scdhec.net/HS/epi/Centered/resources.htm
  • Stone E, Bailit M, Greenberg M et al., Comprehensive Health Data Systems Spanning the Public-Private Divide: The Massachusetts Experience. American Journal of Preventive Medicine 1998;14(3S):40-5.
  • Thorpe KE, Florence CS, Gray B. Market Incentives, Plan Choices, and Price Increases. Health Affairs 1999;19(1):194-202.
  • Uhrig JD. Beneficiaries' Use of Quality Reports for Choosing Medicare Health Plans. [Ph.D. Dissertation]. The Pennsylvania State University; 2001.
  • U.S. General Accounting Office. Health Insurance: Management Strategies Used by Large Employers to Control Costs. GAO/HEHS-97-71. Washington, DC: U.S. Government Printing Office; 1997.
  • Wennberg JE, Gittelsohn A. Small Area Variations in Health Care Delivery. Science 1973, 82(117): 1102-1108.
  • Yeaton W, Camberg L. Program Evaluation for Managers. Boston, MA: Management Decision and Research Center, Health Services Research and Development Services, Office of Research and Development, Department of Veterans Affairs; 1997.
28501 words