### Evaluating the Impact of Value-Based Purchasing: A Guide for Purchaser

##### Preface

A recent report by the Institute of Medicine (Crossing the Quality Chasm: A New Health System for the 21st Century) identified a "chasm" between the quality of care we have and the quality of care we should have. Employers could be a powerful force for closing this gap, since they pay for much of the health care in the United States. Past research has shown that employers and employer coalitions have at least some of the tools they need to serve as a force for quality, and a growing number of pioneers indeed are developing and implementing strategies they hope will improve the quality and value of the health care they purchase. But to date we have very little evidence on the impact of such efforts:

• What strategies are effective?
• Under what circumstances?
• In what markets?

This lack of evaluation presents problems at both the program and policy levels. At the program level, it means that employers and coalitions find it hard to determine whether their own initiatives, much less the initiatives they might choose to emulate, are in fact meeting their intended goals. Given rising health care costs, strategies that cannot prove their impact are likely to be abandoned. At the policy level, the lack of evaluation leaves unanswered the question of whether one can rely on the current market mechanism for improving quality.

The health services research community can play an important role in helping to evaluate past value-based purchasing efforts. A recent AHRQ Program Announcement, "Impact of Payment and Organization on Cost, Quality and Equity," seeks to encourage such research. In the meantime, however, employers and coalitions have indicated that they need some tools to help them do "real time" tracking and assessment of the impact of their own efforts, so they can tell quickly which to keep and which to drop, and how to fine-tune their strategies. Evaluating the Impact of Value-Based Purchasing Initiatives: A Guide for Purchasers is intended to meet this immediate purchaser need. We hope employers and coalition leaders will try it out, tell us how well it is working and how it could be improved, and use it to improve and enhance their value-based purchasing efforts.

—Irene Fraser, Ph.D., Director
Center for Organization and Delivery Studies
Agency for Healthcare Research and Quality

##### Introduction

One of the unique aspects of health care in the United States is the manner in which it is financed. Unlike many other countries, the United States does not have a publicly administered universal insurance program. Instead, health care is financed through a multitude of public and private insurance programs administered by purchasers. In the United States, the primary group purchasers include Medicare and Medicaid for public health care programs, and employers and purchasing cooperatives for private health care programs. Purchasers are defined as follows:

"Purchasers" are public and private sector entities that subsidize, arrange and contract for—and in many cases bear the risk for—the cost of health care services received by a group of beneficiaries.

In the past decade, many public and private health care purchasers have become more active in the purchasing process. Rather than simply writing checks to health insurers or health care providers, they are attempting to measure, monitor, and improve the quality they are receiving for the health care dollar spent. Although there are numerous ways purchasers have approached these tasks, they have been collectively termed value-based purchasing (VBP).

The need and opportunity for VBP has probably never been greater than it is right now. After several years of moderate growth in health care costs, recent estimates indicate that costs grew by nearly 7 percent from 1999 to 2000, outpacing the growth of the economy for the first time in almost a decade (Levit, Smith et al., 2002). Reflecting that growth, private employers estimate that their health care costs in 2002 will be nearly 14 percent more than they were in 2001 (Ceniceros, 2001). These increases are the highest in more than a decade and come at a time of economic uncertainty, prompting many employers to question the value of these expenditures.

At the same time, a growing body of evidence suggests that a significant percentage of the money that employers and other purchasers are spending on health care pays for poor quality in the form of overuse, misuse (e.g., medical errors), and waste (Midwest Business Group on Health et al., 2002Kohn et al., 2000). For example, according to a study by the Midwest Business Group on Health, the Juran Institute, and The Severyn Group, a conservative estimate of the direct cost of poor quality care for employers would be $1,350 per employee per year, while the indirect cost of poor quality care, including lost time and productivity, is at least$340 per employee per year (Midwest Business Group on Health et al., 2002).

Despite the level of activity across the country, the impact of VBP activities on health care quality and costs has not been well established. The evaluation of these activities is a critical step for purchasers eager to identify and adopt beneficial tactics and avoid those found to be ineffective.

### III. How Do You Choose a Research Design?

A number of factors will influence which of the research designs (alone, or in combination) would be best-suited for an evaluation of your VBP initiatives. Purchasers, possibly working with other stakeholders, can get started by trying to reach agreement on the following questions:

• What Do You Want To Learn and How Do You Expect To Use the Information? The first task is to identify the research designs that can provide you with useful information. For example, some purchasers conduct evaluations to learn how the initiative is being perceived by stakeholders and to identify any barriers; in those cases, interviews or focus groups are likely to be most useful. Others want to gather data to get a general idea of whether the program is on the right track; so simple quantitative analyses are often appropriate. Still others pursue an evaluation in order to decide whether to continue the investment in a VBP activity; this goal may call for a quantitative research design that would support a solid analysis of the costs and benefits of the initiative.
• What Kind of Evidence Do You Need? One of the most important criteria for choosing a research design is the kind of relationship you want to see. In some cases, it may be sufficient to see evidence of a possible correlation; for example, a purchaser that has implemented an initiative to spur providers to adopt computer systems that double-check prescriptions may be satisfied to know that hospitals are investing more in information technology. In other cases, purchasers may want evidence of causation, i.e., results that demonstrate that the VBP activity is having its desired effect. To learn this, purchasers must choose the study design that will be the strongest for showing whether or not the activity causes the result the purchaser wants to see.
Purchasers must keep in mind that statistical analyses vary in their ability to detect an effect if one exists. And for all of the quantitative research designs, the statistical power will depend on the size of the effect and the size of the sample. For example, it is easier to detect a 10 percent decrease in mortality as opposed to a 5 percent decrease; and, whatever the effect is, it will be easier to detect with 1,000 observations than with 100. The earlier example from the New Jersey Medicaid program shows how power was reduced, despite randomization, because only half of the intervention group actually remembered receiving the report card.
• Do You Need To Defend the Results to an External Audience? A related issue involves the level of certainty you want to have about the results. If providers or health plans (or your own managers) are likely to scrutinize and question the findings, you may need to choose a design that can adjust for or explain the effects of variables other than the VBP activity. Your ability to implement one of these designs will depend on whether you have baseline data, comparison groups, adequate sample sizes, and randomized assignment.
• How Much Money Can You Put Towards the Evaluation? Some evaluation designs are more expensive than others; so it is important to know what your limits are. That said, other considerations may be more important than financial concerns. For example, if you need a strong analytic study with defensible results but cannot afford one, paying for a cheaper study that produces questionable results would not be a worthwhile option.
• Do You Have Access to Other Resources? Some purchasers can overcome financial limitations by taking advantage of resources available within the organization or through partners in the VBP activity. For example, academic researchers at local universities may be willing to donate their time to an evaluation (especially if the findings can be published); the purchasing organization may be able to provide the analysts with office space and computers. A related question is whether you have access to analysts who can handle sophisticated evaluation designs. While you can always find appropriate researchers if you have the funds to look outside of your organization, this option is not available to all value-based purchasers.
• How Much Time Do You Have for the Evaluation? The answer to this question will be driven primarily by when you need the results, but it may also depend on budgets and staff availability. The options available to you if you need an answer in 6 months are very different from what you can do if you can wait for 3 years. The collection of primary data is especially time-consuming.
• What Kinds of Data Are Available to You? The choice of research design is often circumscribed by the nature and scope of the clinical or administrative data that are readily available. To the extent that the data you need are controlled by health care organizations, you may need to consider how much cooperation you can anticipate from local providers and health plans. One way to address this issue is to plan ahead for the evaluation by incorporating requests for data into contract negotiations. However, a significant amount of data is now readily available thanks to standardized measurement tools such as HEDIS® and CAHPS® (go to Step 4 for a discussion of these two tools).

Step 4. Implement the Research

Once the research design has been selected, the evaluation itself can begin. This process may take many forms, but it generally requires three tasks:

Since this guide is designed for the decisionmaking purchaser, rather than the analysts who may actually implement an evaluation, this section simply reviews some of the issues and resources that purchasers should be aware of with respect to choosing measures and collecting the data. It assumes that the data analysis will be handled by experienced researchers, whether internal or external to the organization.

##### Task 1: Identify Appropriate Measures

During the process of selecting a research design, purchasers often have to consider how they expect to define and measure the outcomes in which they are interested. For example, if a VBP activity was intended to improve quality of care for employees with heart disease, how exactly will you measure quality? Will you look at measures of health status, patient satisfaction, or clinical processes?

The specific definitions of quality and cost are less important than the recognition that both are important for defining, measuring, and focusing on value. Although this point may seem obvious, it is a crucial step in thinking about how to assess the impact of VBP activities because it draws attention to both the costs of those activities and the extent to which those activities improve the quality or reduce the costs of care. This section offers a broad discussion of measurement issues for both final and intermediate outcomes of interest to value-based purchasers.

It is important to remember that the measurement strategy must fit the intended research design, with quantitative research designs and methods generally imposing more formal measurement requirements. For example, a pre-test/post-test research design will require the ability to measure specific outcomes before and after the intervention. As mentioned previously, data availability and measurement issues can preclude the selection of specific research designs

Measuring Impact on Health Status. Evaluators rely on a wide range of measures to capture the impact of VBP activities on health outcomes. But purchasers must think carefully about which of those definitions and measures they want to use, particularly if health plans or providers may challenge their decisions.

Health status outcomes are not easy to measure, in part because it is not clear which perspective to take (i.e., the patient's or the clinician's) and which domains of health to evaluate. On the one hand, you could evaluate health outcomes using clinical measures, such as weight, cholesterol level, and other commonly used metrics. However, clinical measures do not capture the perspective of the person whose health is being evaluated, and therefore can miss very important aspects of health, such as mental and social well-being. Many researchers recognize the importance of both the clinical and patient perspective in defining the health status of individuals, and use a combination of both approaches to arrive at a final judgment.

Available measures of health outcomes include the following:

Measuring Mortality. Mortality rates, i.e., the rate of death for a given population, are sometimes regarded as a measure of health. For example, researchers often compare mortality rates and life expectancy rates to gauge the health status of different countries. When they find significant differences in these rates, they may say that one country is healthier than the other. But differences in the mortality rate do not always point to the cause of the differences, which undermines its usefulness as a measure of health.

In theory, purchasers could also use mortality rates as a measure of the health status of their covered populations. However, the benefit of using this measure is questionable because only a small percentage of a population dies in a given year, particularly when the population is younger and healthier as in an employment-based setting.

That said, mortality rates are a feasible and useful measure for certain VBP activities. For example, to evaluate a VBP initiative that tries to steer bypass surgery patients to high-volume providers, it would be reasonable to assess the impact on mortality rates for those patients. When used for this purpose, mortality rates should be age-adjusted and based on reasonable time windows (i.e., annual mortality).

Measuring Morbidity. Morbidity is a term used to describe the average level of illness in a population. Morbidity accounts for pain, chronic illness, acute illness, mental illness, etc. On a societal level, researchers often measure morbidity by the prevalence of chronic disease in a population or by measures that are correlated with illness, such as missed school days and job-based disability claims. Not surprisingly, morbidity occurs more frequently than mortality.

From a purchaser's perspective, morbidity can be regarded as a function of both the prevalence of chronic illness and the level of functioning of those with chronic illness. In most cases, VBP activities are more likely to affect the latter than the former. For instance, a VBP program aimed at diabetics cannot be expected to reduce the prevalence of diabetes in a given population. In fact, to the extent that the activity is designed to facilitate the identification of diabetics, the activity itself may reveal a higher prevalence than existed at baseline. However, the VBP activity could reduce some of the negative effects associated with chronic conditions. For example, a VBP initiative aimed at asthmatics might help patients take better control of the disease, reducing complications due to asthma and lowering the number of unnecessary emergency room visits and missed school or work days. Thus, for the purposes of assessing VBP, appropriate measures of morbidity would include indicators such as hospital readmission rates or correlated measures such as absenteeism from school or work.

In the long term, it may be possible for some VBP activities to affect the prevalence of chronic illness in a population. For example, many people believe that early intervention programs focusing on diet, exercise, and regular screening can prevent or reduce the level of chronic illnesses such as diabetes. Early screening and detection can also minimize major complications associated with diseases like cancer.

Measuring Health Status. The set of tools for measuring health outcomes for populations includes survey instruments designed to measure self-reported health status. The most common of these survey instruments is the SF-36® (QualityMetric, 2001). These instruments, which can be administered to populations including those covered by purchasers, have been demonstrated to measure health status broadly. Similar but more focused instruments have been developed for measuring health status for people with specific conditions, such as depression or asthma.

Although purchasers could use health status assessment instruments to measure final outcomes, it may not be reasonable to expect VBP activities to have a strong influence on these outcomes at the population level. However, for specific VBP activities, such as those that target care for chronic illnesses, it may be feasible to use the assessment instruments developed specifically for those conditions to detect differences in health status for relevant segments of the population.

Health status measures such as the SF-36® do not capture patient preferences for various health states, but other standard tools permit preference weighting of health states. For example, researchers might consider the quality of well-being index or the health utilities index. These indices, which generate measures of quality adjusted life years, are the recommended approach by the panel on the cost effectiveness in health and medicine (Gold et al., 1996). Many disease-specific indices exist as well, although most are not preference weighted. (For more information, go to Gold et al., 1996.)

Measuring Impact on Satisfaction With Health Plans and Care Delivery. The difficulty with measuring satisfaction with health plans and care delivery is that the scope of services, activities, and benefits encompassed by these two topics is quite large. As a result, it is a real challenge to develop a single, meaningful measure and to cover all relevant domains without making the questionnaire unreasonably long. These challenges become even greater when purchasers want to learn what is causing satisfaction to be less than optimal so that they can identify and make appropriate changes in policy.

However, satisfaction and opinion surveys of this type do exist and are used by many purchasers. For example, the CAHPS survey, which is discussed in Task 2, provides measures in several domains that are relevant to consumers, including a measure that reflects overall satisfaction with one's health care plan.

Measuring Impact on Costs. The measurement of costs will likely include several different types of costs:

Measuring VBP Activity Cost. Measurement of this type of cost depends on the accounting systems involved and the extent to which resources devoted to value-based purchasing are shared with other activities. As a basic principle for cost measurement, the evaluator would identify all resources devoted to value-based purchasing and then assign a cost to those resources. One option is to take a narrow perspective that focuses only on costs borne by the purchaser. Typically, the cost of staff resources would be the percent of time that each staff member devoted to the activity multiplied by the relevant wage rate (plus fringe benefits). Other resources might include computer time, office space, printing, supplies and any outside consulting expenses. Some purchasers might want to take a broader perspective that includes provider-level costs associated with data preparation and implementation of the VBP initiative.

When relevant resources are used for activities other than value-based purchasing, you will have to decide how much of those resources to allocate to the VBP activity. One approach is to identify all of the costs that would disappear, in the long run, if the VBP activity were not conducted. For example, office space used by staff associated with the VBP activity might be considered a fixed cost that would be incurred even without the VBP activity, and therefore should not be included as a VBP cost.

Measuring Health Care Costs. If the VBP activity is intended to have broad effects on health care expenditures, competition among health plans, or even employee enrollment decisions, it is reasonable to use premiums as a measure of costs. Premiums are usually easy to measure if contracting health plans are providing a full range of administrative and risk-bearing services. Purchasers that use third-party administrators or other support entities should include the costs for those services in premium costs.

If benefit designs differ among the treatment and comparison group, the evaluators will have to make adjustments, either directly in the measurement of premium or in the research design. If direct adjustments are going to be made, they should be based on actuarial assumptions. One type of adjustment in the research design that could control for differences in benefit design would be the inclusion of indicator variables representing various aspects of benefit design as covariates in multivariate analyses. This is only feasible if there are a sufficient number of observations.

In the nonequivalent comparison group design, benefit design differences will only matter if they affect the trends in premiums. Any impact on the level of premiums will be captured by the trend in the comparison group. For example, if the differences in benefit design between the treatment and comparison group cause premiums to differ by a constant amount, the analysis will control for that difference.

In some analyses, evaluators may wish to measure health care costs using data on health care expenditures. Relative to premium data, this has the advantage of allowing detailed analyses of sub-populations or cohorts with particular clinical conditions. However, this approach does not capture any savings in administrative costs at the insurer level or any gains from competition among insurers. Measurement of spending on health care services at the individual level typically requires access to claims data. From the purchaser's perspective, the appropriate cost measure is the amount actually paid for services, including the contribution of the employee/patient. In some cases, such as pharmaceutical rebates, some effort may be required to determine true expenditures. (See Gold et al., 1996, for details on various approaches to measuring health care costs.)

It is important to remember that expenditures may not reflect true costs. Payments may be above or below what the providers actually spend to deliver the service. If the evaluators wish to measure true resource use, they will have to conduct a more detailed accounting of the process of care delivery, identifying resources used to deliver care and valuing those resources. In some settings, this can be done with provider accounting systems. In other cases, charges adjusted by cost-to-charge ratios could be appropriate. Either way, the evaluators must also pay attention to how overhead costs are allocated to various activities and services.

The duration of observation might have important implications for the observed impact of value-based purchasing on medical care costs. Many VBP activities generate some short-term costs. For example, programs to improve compliance with medications might increase short-term expenditures. Some VBP activities, such as asthma management programs, may produce offsetting savings even in the short run. For others, such as diabetes control programs, it may be many years until any medical savings are realized. Because it can take a long time for key gains to be realized, evaluators may want to rely on simulation techniques if they wish to construct a full analysis of the impact of certain VBP activities.

Measuring Costs Outside the Health Care System. Cost measurement from a broad perspective would entail measuring non-health care related resources, costs of informal care, and costs of patient (and family) time related to the consumption of medical care and a change in health status. If these costs are likely to be important, they should be included. However, because these costs are often borne by the patient and family, they may be captured in quality measures such as satisfaction. Gold et al. (1996) describe measurement strategies for these variables, but purchasers may want to measure these variables as costs only if the VBP activity is likely to affect them and they are not adequately captured by quality measures.

##### Sometimes, purchasers implement limited, focused interventions and then wish to detect an effect on broader, aggregate outcomes. But evaluations must focus on measures that reflect appropriate and relevant outcomes of the VBP activity. For example, suppose a purchaser implements a diabetes case management program that succeeds in reducing expenditures for diabetics by 10 percent. Even if diabetics represent 20 percent of the population, it would be difficult to detect a measurable impact at an aggregate level (e.g., by measuring premiums or population health status), particularly if there is a lot of variance in expenditures for the remaining population. More appropriate measures could include the annual costs of care for diabetics, their satisfaction with care, and complication rates for diabetics.

Measuring Impact on Labor Market Outcomes. Among the most difficult costs to measure are the costs associated with decreased labor productivity as a result of employees seeking care or experiencing poor health. This includes the costs associated with absenteeism, decreased productivity while working, and labor turnover. In theory, these costs could be measured by the value of lost production associated with the absenteeism and lost productivity, plus the administrative costs of replacing workers or adjusting production processes. In practice, measuring these costs is a serious challenge. Some evaluators assign a cost to absenteeism by valuing missed days from work at the wage rate of the workers. A more thorough analysis would use accounting principles to assess the impact of absenteeism on production costs.

In some cases, the evaluators might want to treat variables such as missed workdays as measures of quality. If so, they must be careful not to double-count these variables by also including them in the calculation of costs. Gold et al (1996)recommend that, if you are using a quality-adjusted life year measure of quality, production costs should be excluded, or at least reported separately. But measures of the impact on production costs can be important variables for many VBP activities, especially from an employer's perspective. If VBP activities may have a measurable impact on these variables, evaluators should try to measure the effects and include them with costs unless the effects are explicitly captured by quality variables.

Measuring Impact on Utilization of Services. If you want to use utilization as an indicator of quality, several measurement options exist. The easiest is to simply measure the use of the target service. For example, one could measure Caesarean section rates or mammography rates. Presumably, the VBP initiative would try to decrease the former and increase the latter, but this measurement strategy does not attempt to distinguish between appropriate and inappropriate changes in the rates of either service. The HEDIS® system follows this approach.

An alternative approach would be to conduct a detailed analysis of care, perhaps using medical records. This tends to be very expensive, but it is feasible and has been used in a variety of studies. For some illnesses, there are quality-of-care assessment tools that can be applied.

The utilization measures most commonly used fall into the following categories:

• Inpatient hospital admissions, days, and length of stay.
• Emergency room use.
• Outpatient hospital services.
• Outpatient physician visits.
• Referrals for specialty physician consultation.
• Pharmaceutical utilization.

These measures may be broken down by patient characteristics (e.g., gender, age, race), provider characteristics (e.g., high-volume vs. low-volume hospital), delivery setting (e.g., group vs. solo practice), diagnosis, and procedure.

Purchasers should recognize that savings from reduced resource use do not necessarily flow back to them. For example, if providers get paid full capitation rates, they will capture any savings unless the capitation rates are lowered. Similarly, if hospital admissions are paid on a per case basis, as with diagnosis-related groups, reductions in length of stay will not generate savings for the purchaser.

Measuring Impact on Health-Related Behaviors. Evaluators can assess whether VBP programs encourage low-risk behaviors by measuring changes in the number or percent of employees who engage in these behaviors.

Measuring Impact on Patients' Decisions. To assess whether a VBP activity has affected the choices that patients make, evaluators can look for changes in the percentage of patients or employees choosing providers or health plans that have been identified as "top" or preferred performers on report cards or quality evaluations.

##### In other cases, when the outcomes of interest are intermediate health outcomes (including process or structure measures), aggregation is more complex. One approach is to not aggregate the various outcomes. The evaluation would report the various outcomes, and users of the research findings would need to draw their own conclusions about whether the investment was justified. This approach is most feasible when the number of outcomes is small and the users are comfortable weighing measures of quality against measures of costs. An alternative aggregation methodology involves combining outcomes into one or more domains of performance based upon subjective values. For more on this topic, go to the discussion of grouping HEDIS® measures.

Task 2: Collect the Data

Data may take the form of qualitative information or quantifiable values. It can be obtained from a variety of primary sources (where the evaluator collects the data) and secondary sources (where the data are collected by someone else but used by the evaluator). Regardless of the type or source of data, the quality of the program evaluation will depend on the data's reliability and validity. Since final and intermediate outcomes are the focus of VBP activities, the manner in which you measure these outcomes is crucial for producing credible and useful evaluation results. However, the importance of measurement accuracy applies equally to non-outcome data that are used in evaluations to control for other confounding factors.

For the purposes of evaluating VBP activities, common primary sources of data include administrative claims data, medical records, stakeholders (e.g., the health plans involved in guideline development), and health care consumers. Since this guide presumes that purchasers interested in gathering data from primary sources are likely to consult and contract with outside experts, this section is limited to a discussion of two major secondary sources:

A Quick Look at HEDIS®. HEDIS® is a set of about 60 process and outcome measures designed to capture dimensions of health plan quality. Initially developed by a group of large private employers, HEDIS® is now administered by the National Committee for Quality Assurance, Washington, DC. To date, HEDIS® has been used primarily to monitor the performance of HMOs, although research is currently being conducted to examine the feasibility of HEDIS® indicators for PPOs and other types of insurance products. Because each of the approximately 60 performance measures includes specific guidelines for data collection and reporting, the results are standardized. This allows purchasers and others to compare the performance of any health plan to the performance of other health plans nationally, regionally, and locally.

With the exception of the CAHPS® composites that recently became part of the HEDIS® reporting requirements (see more on this below), HEDIS® measures do not capture final outcomes. However, expert panels selected the measures included in HEDIS® because research evidence indicates that they are correlated with both costs and health status. For example, some of the HEDIS® measures capture the utilization rate of health care services and surgical procedures that are often overused, resulting in unnecessary costs and risks to patients. The Cesarean section (C-section) rate is an example of one such measure:

• First, C-sections are more costly than vaginal deliveries.
• Second, the scientific literature suggests that many C-sections are unwarranted (since vaginal deliveries are possible) and that those unwarranted C-sections put women at unnecessary risk for infection and other post-surgical complications (Sakala, 1993).

Other HEDIS® measures capture utilization rates for preventive care services and for screenings that are recommended for subsets of enrolled populations. For example, HEDIS® includes rates of mammography screening for women, prostate cancer screening for men, and immunizations for infants and adolescents. Although preventive care and health screenings do not directly capture any of the final outcomes in which purchasers are interested, they are thought to be correlated with health status and costs since screenings can lead to early detection and less expensive treatment, and prevention can lead to the avoidance of illness. HEDIS® also includes several clinical measures of treatment for selected diseases, such as rates of prescription of beta-blockers following a heart attack or readmission rates following discharge for a mental health diagnosis.

The collection, analysis, and dissemination of HEDIS® data have been a major focus of employers' VBP activities in the last 8 years. More recently, employers have been analyzing HEDIS® results to evaluate the impact of those and other activities. There are many approaches purchasers can use to measure the effects of VBP initiatives on HEDIS® scores. For example, you could examine whether a plan's scores surpass minimally accepted standards, or compare a plan's scores to regional or national averages or the Nation's top performing plans. Another option is to look for changes in a plan's HEDIS® scores from one period to the next.

However, for the individual purchaser, it is not clear how well HEDIS® serves as a source of data for evaluation purposes. One complication of using HEDIS® to assess the impact of VBP activities is the fact that there are more than 100 rates (i.e., a single measure may include separate rates for men and women or for people of different ages). It is not uncommon for plans to perform well on some rates but not on others, making it difficult to conclude anything about the overall performance of the plan across all rates. (Go to the box for more on this topic.) In addition, because many purchasers collect and analyze HEDIS® data, there is no way to know whether changes in performance were due to the collective focus of all purchasers or the specific activities of a single purchaser.

Additional information about HEDIS® can be found on the NCQA Web site at www.ncqa.org.

##### The advantage of this approach is that it allows for the many HEDIS® measures to be collapsed into a smaller subset. The disadvantage is that this approach masks the heterogeneity of the individual HEDIS® measures and makes it difficult to identify specific areas for improvement. For more on reporting categories, go to Scanlon et al., 2001; and https://talkingquality.ahrq.gov.

A Quick Look at CAHPS®. CAHPS®, sponsored by AHRQ, is a family of survey instruments designed to obtain consumer assessments about the quality of the health care services they receive. The core survey contains about 50 standard items that focus on multiple dimensions of the care and services provided by health plans, including getting needed care, getting care quickly, doctor's communication skills, courteousness and helpfulness of staff and customer service and other domains. CAHPS® instruments have been developed and tested for adults and children who are covered by commercial insurers, Medicare and Medicaid. Supplemental items have also been developed to identify and obtain data on the care provided for children and adults with chronic conditions. Other supplemental sets include items on interpreter services and transportation. Though CAHPS® was originally designed for consumer assessment of health plans, an upcoming version has been developed to obtain consumer assessment of providers within group practices. CAHPS® II (beginning in 2002) will also focus on use of CAHPS® information for quality improvement purposes.

The CAHPS® Survey and Reporting Kit contains complete instructions for implementation of the surveys, templates for reporting results to consumers, instructions for data analysis, and other issues such as presenting CAHPS® results to the media.

Like HEDIS®, the CAHPS® survey includes standardized questions and specific protocols for administering the survey so that each plan's results can be compared to the performance of other plans nationally, regionally, or locally. As noted, the NCQA has incorporated the CAHPS® composite measures into its data reporting requirements for HEDIS®. With the exception of the ratings of care and health plan services, most of the CAHPS® items are not direct measures of the other final outcomes discussed in this guide. However, research findings suggest that most of the CAHPS® measures are correlated with some of these outcomes, most likely health status and labor market outcomes. To the extent that CAHPS® captures the quality and appropriateness of clinical care, for example, the survey results would be correlated with health status. Similarly, since CAHPS® asks for enrollees' opinions about their health care plans, these results may be related to labor market outcomes. For example, if employees report that they are happy with their health care plans, one might expect lower employee turnover, although other factors can also lead to turnover.

Because CAHPS® is comprised of so many items, the use of CAHPS® for assessing the impact of VBP activities faces barriers similar to those discussed above for HEDIS®. Namely, individual items have to be aggregated in order to be useful. However, the CAHPS® developers have conducted considerable research regarding the appropriate aggregation of CAHPS® measures and issued guidelines for purchasers and others to follow (CAHPS® 2.0, 1999). In addition, because CAHPS® asks plan enrollees about the care and services of their health plans, the results may not be relevant to specific VBP activities that are more provider-oriented. This issue may be resolved by the upcoming introduction of G-CAHPS® (group-level CAHPS®), which focuses on consumers' experiences with physicians and medical practices.

To obtain the CAHPS® Survey and Reporting Kit free-of-charge, or to learn more about CAHPS®, contact the CAHPS® Survey Users Network (SUN) at 1-800-492-9261 or at http://www.cahps-sun.org . The SUN also provides limited technical assistance.

Task 3: Analyze the Data

For most evaluations, the analyst is not the same individual or group of individuals who made the initial decision to embark on the evaluation. In many cases, the purchaser may wish to contract with external consultants or individuals affiliated with academic institutions to assist in the analysis. This is particularly true for more complex analyses that require statistical expertise and familiarity with methods and software for conducting experimental and observational research. Outside analysts also offer the benefit of objectivity, since they have no stake in the results of their research.

Ideally, the analysts should be involved in all of the steps outlined in this guide, particularly in the choice of research design and issues of data collection and measurement, but in practice this is not always the case. Regardless of whether the analysts have been involved in the development and planning of the evaluation, it is important that they understand the details of the VBP program, the short-term and long-term objectives of the purchaser, and how the purchaser hopes to use the findings so that the analysis will result in information that is germane and useful.

Step 5. Summarize the Results and Interpret Implications for Purchasing Activities

Once the analysis has produced evidence regarding the impact of VBP activities on relevant outcomes, the next step is to ensure that those findings are communicated in a way that is helpful to you, and potentially to the larger community of value-based purchasers and health services researchers. For this to happen, the purchaser must first make sure that the analysts do not simply hand over hundreds of pages of output from regression models. Rather, the analysts should be directed to present senior-level management with a succinct list of key results and findings that are pertinent to the overarching goals and objectives of the organization. This document would be similar in concept to a legal brief or one-page business memo, both of which are designed to facilitate quick and accurate decisionmaking.

The second part of this step is to use these findings to draw out the implications for the VBP activity; this task may be performed by the analysts or by the purchaser. However, in practice, this work is often neglected or forgotten. In some cases, the results of an evaluation never make it to this step because of problems with the research or how the findings have been communicated (e.g., when analysts provide senior-level decisionmakers with information that is voluminous, too confusing to understand, or impossible to sort through). But purchasers need to determine what the results of the analysis mean for the VBP activity: whether it is working, where it is failing, whether and how it can be refined. Ultimately, this is the step where the transition from analysis to decisionmaking occurs, using the results of the VBP evaluation as the bridge.

The final task, of course, is for the purchaser to incorporate the results of the VBP evaluation into decisions. Because all organizations have different structures and processes for making decisions, and because information from the evaluation is just one of many inputs, this guide does not delve into this topic. However, purchasers are strongly encouraged to involve key stakeholder groups in discussing how to interpret and use the results. A key principle of "utilization-focused" evaluation (i.e., an evaluation that is attempting to produce results that will be useful to specific audiences) is that people outside of the evaluation team need to be involved in discussions of draft results and in decisions that derive from those results.

### Selected Resources and Web Sites for Purchasers' Quality Improvement Activities

In addition to this guide, AHRQ has several other resources that may be helpful for purchasers seeking to improve the quality of health care. The Web sites listed below provide more information about these resources.

##### AHRQ Quality Indicators (QIs)

The AHRQ QIs software is a set of measures of health care quality that is designed for use in conjunction with hospital administrative data to highlight potential quality concerns, identify areas that need further study and investigation, and track changes over time. More details on the AHRQ Quality Indicators are available athttp://www.qualityindicators.ahrq.gov.

##### CONQUEST

CONQUEST (COmputerized Needs-oriented QUality Measurement Evaluation SysTem) is quality improvement software that draws on two databases—one for clinical performance measures and one for conditions. CONQUEST helps users identify, understand, compare, evaluate, and select measures to assess and improve clinical performance.

##### CAHPS®

CAHPS® is an easy-to-use kit of survey and report tools that provides reliable and valid information to help consumers and purchasers assess and choose among health plans and providers. All CAHPS® products are available from the CAHPS® Survey Users Network at http://www.cahps-sun.org .

##### National CAHPS® Benchmarking Database (NCBD)

Initiated in 1998, the NCBD provides benchmarks to facilitate comparisons across health plans by users of the CAHPS® survey. Users can access the database at http://ncbd.cahps.org .

##### Making Health Care Safer: A Critical Analysis of Patient Safety Practices

This evidence report, compiled by AHRQ's , reviews the evidence on a total of 79 patient safety practices. Making Health Care Safer: A Critical Analysis of Patient Safety Practices describes 11 practices that the researchers considered highly proven to work but which are not performed routinely in the Nation's hospitals and nursing homes. The report is available online at http://www.ahrq.gov/clinic/ptsafety/ or in printed format from the AHRQ Publications Clearinghouse.

##### National Guideline Clearinghouse™ (NGC)

The National Guideline Clearinghouse is a comprehensive database that provides objective, detailed information on evidence-based clinical practice guidelines at http://www.guideline.gov.

##### TalkingQuality Web Site

Launched in March 2002, the TalkingQuality Web site provides easy-to-use information on health care quality athttps://talkingquality.ahrq.gov. The site is sponsored by AHRQ, CMS, and the U.S. Office of Personnel Management.

