Bookshelf

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Institute of Medicine (US) Committee on Regional Health Data Networks; Donaldson MS, Lohr KN, editors. Health Data in the Information Age: Use, Disclosure, and Privacy. Washington (DC): National Academies Press (US); 1994.

Cover of Health Data in the Information Age

Health Data in the Information Age: Use, Disclosure, and Privacy.

Institute of Medicine (US) Committee on Regional Health Data Networks; Donaldson MS, Lohr KN, editors.

Washington (DC): National Academies Press (US); 1994.

Contents
Hardcopy Version at National Academies Press

2 Health Databases and Health Database Organizations: Uses, Benefits, and Concerns

No one engaged in any part of health care delivery or planning today can fail to sense the immense changes on the horizon, even if the silhouettes of those changes, let alone the details, are in dispute. 1 Beyond debate, however, is the need for much more and much better information on use of health care services and on the outcomes of that care. The needs are quite broad: health care reform; evaluation of clinical care and health care delivery; administration of health plans, groups, and facilities; and public health planning.

Policymakers, researchers, health professionals, purchasers, patients, and others continue to be frustrated in their attempts to acquire health information. They may not be able to determine with confidence the outcomes, quality, effectiveness, appropriateness, and costs of care for different segments of the population, for different settings, services, and providers, and for different mechanisms of health care delivery and reimbursement. When this is so, they can say little, with confidence, about the value of the investment in health care for population subgroups, regions, or the nation as a whole.

In principle, this information can be acquired through numerous avenues, such as surveys, electronic financial transactions for health insurance claims, computer-based patient records (CPRs), and disease registries. In practice, no one system will suit every need or produce information appropriate for every question. As introduced in Chapter 1, however, health database organizations (HDOs) hold considerable promise as a reasonably comprehensive source of the information needed to:

assess the health of the public and patterns of illness and injury; identify unmet regional health needs;

document patterns of health care expenditures on inappropriate, wasteful, or potentially harmful services;

find cost-effective care providers; and

improve the quality of care in hospitals, practitioners' offices, clinics, and various other health care settings.

The latter half of this chapter outlines these and other benefits of HDOs, the databases they access or control, and the analytic and information dissemination activities they undertake. It also discusses the applications that user groups might have for different types of databases. The committee advances some views on how major concerns about these databases, chiefly relating to the quality of their data, might be addressed, and it makes two recommendations. In preparation for those sections, the chapter next offers some definitions of key concepts and terms, explores the basic construct of HDOs (which the committee sees as the administrative and operational structure for regional health databases), and provides some examples of the variety of entities that now exist, are being implemented as this report was written, or are envisioned for the future.

Definitions

Even among experts, terms such as database and network are not used in the same manner. For this report, the committee advances the following working definitions for certain major concepts, building to its view of an HDO.

Database

The term database embraces many different concepts: from paper records maintained by a single practitioner to the vast computerized collections of insurance claims for Medicare beneficiaries; from files of computerized patient encounter forms maintained by health plans to discharge abstract databases of all hospitals in a given state; from cancer and trauma registries maintained by health institutions and researchers to major national health survey data of federal agencies. As commonly used and meant in this report, a database (or, sometimes, data bank, data set, or data file) is ''a large collection of data in a computer, organized so that it can be expanded, updated, and retrieved rapidly for various uses" (Webster's New World Dictionary, 2nd ed.).

Although databases may eventually be linked (or linkable) to primary medical records held by health care practitioners, this report addresses databases composed of secondary records. 2 Secondary files are generated from primary records or are separate from any patient encounter (as in the case of eligibility or enrollment files for health plans and public programs). They are not under the control of a practitioner or anyone designated by the practitioner, nor are they under the management of any health institution (e.g., the medical records department of a hospital). Furthermore, they are not intended to be the major source of information about specific patients for the treating physician. Secondary databases facilitate reuse of data that have been gathered for another purpose (e.g., patient care, billing, or research) but that, in new applications, may generate new knowledge.

The committee distinguishes between databases composed of secondary records and CPRs or CPR systems (IOM, 1991a; Ball and Collen, 1992), but its broader vision of computer-based health information systems includes direct ties to CPR systems. Many experts argue that until CPR systems are linked in some fashion to such data repositories or networks, neither will be complete or reach their full health care, research, or policymaking potential. 3

This chapter cites several examples of health databases used today for many purposes, but the ones noted are highly selective and intended to illustrate particular applications or kinds of data maintained. To understand the range of databases that HDOs might access and why there might be concern about protection of personal data, readers are referred to the many inventories of health databases. Publications from the National Association of Health Data Organizations (NAHDO) describe state and insurance databases (NAHDO, 1988, 1993). For databases related to federal programs supported by the Department of Health and Human Services (DHHS), readers can consult publications and manuals from the Health Care Financing Administration (HCFA, for Medicare and Medicaid), the Public Health Service (PHS, for surveys conducted by the National Center for Health Statistics; see also Gable, 1990; IOM/CBASSE, 1992; NCHS, 1993; Smith, 1993), and the Agency for Health Care Policy and Research (AHCPR, for the National Medical Expenditure Surveys and Patient Outcome Research Teams [PORTs]; AHCPR, 1990a). Major research databases include those developed for the RAND Corporation's Health Insurance Experiment (a large-scale social experiment conducted in the late 1970s and early 1980s on the utilization, expenditures, and outcomes effects of different levels of cost sharing [Newhouse and Insurance Experiment Group, 1993]), which were turned into a large number of carefully documented public-use tapes.

Key Attributes of Databases

In reviewing the considerable variation in databases that might be accessed, controlled, or acquired by HDOs, the committee sought a simple way to characterize them by key attributes. It decided on two critical dimensions of databases: comprehensiveness and inclusiveness. (Because these terms are used with distinct meanings in this report, they are italicized whenever used.)

Comprehensiveness. Comprehensiveness describes the completeness of records of patient care events and information relevant to an individual patient (Table 2-1). 4 It refers to the amount of information one has on an individual both for each patient encounter with the health care system and for all of a patient's encounters over time (USDHHS, 1991, refers to this as completeness). A record that is comprehensive contains: demographic data, administrative data, health risks and health status, patient medical history, current management of health conditions, and outcomes data. Each category is described briefly below.

TABLE 2-1

Comprehensiveness: Data Elements as a Critical Dimension of Health Care Databases.

Demographic data consist of facts such as age (or date of birth), gender, race and ethnic origin, marital status, address of residence, names of and other information about immediate family members, and emergency information. Information about employment status (and employer), schooling and education, and some indicator of socioeconomic class might also appear.

Administrative data include facts about health insurance such as eligibility and membership, dual coverage (when relevant), and required copayments and deductibles for a given benefit package. With respect to services provided (e.g., diagnostic tests or outpatient procedures), such data also typically include charges and perhaps amounts paid. Administrative data commonly identify providers with a unique identifier and possibly give additional provider-specific facts; the latter might include kind of practitioner (physician, podiatrist, psychologist), physician specialty, and nature of institution (general or specialty hospital, physician office or clinic, home care agency, nursing home, and so forth).

Health risks and health status Health risk information reflects behavior and lifestyle (e.g., whether an individual uses tobacco products or engages regularly in strenuous exercise) and facts about family history and genetic factors (e.g., whether an individual has first-degree family members with a specific type of cancer or a propensity for musculoskeletal disease).

Health status (or health-related quality of life), generally reported by individuals themselves, reflects domains of health such as physical functioning, mental and emotional well-being, cognitive functioning, social and role functioning, and perceptions of one's health in the past, present, and future and compared with that of one's peers. Health status and quality-of-life measures are commonly considered outcomes of health care, but evaluators and researchers also need such information to take account in their analyses of the mix of patients and the range of severity of health conditions.

Patient medical history involves data on previous medical encounters such as hospital admissions, surgical procedures, pregnancies and live births, and the like; it also includes information on past medical problems and possibly family history or events (e.g., alcoholism or parental divorce). Again, although such facts are significant for good patient care, they may also be important for case-mix and severity adjustment.

Current medical management includes the content of encounter forms or parts of the patient record. Such information might reflect health screening, current health problems and diagnoses, allergies (especially those to medications), diagnostic or therapeutic procedures performed, laboratory tests carried out, medications prescribed, and counseling provided.

Outcomes data encompass a wide choice of measures of the effects of health care and the aftermath of various health problems across a spectrum from death to high levels of functioning and well-being; they can also reflect health care events such as readmission to hospital or unexpected complications or side effects of care. Finally, they often include measures of satisfaction with care. Outcomes assessed weeks or months after health care events, and by means of reports directly from individuals (or family members), are desirable, although these are likely to be the least commonly found in the secondary databases under consideration here.

The more comprehensive the database is, the more current and possibly more sensitive information about individuals is likely to be. This suggests that comprehensiveness as envisioned here will have a direct correlation with concerns about privacy and confidentiality. By analogy, the Department of Defense treats information with increasingly higher levels of security as it becomes more comprehensive, even when the aggregated information is not considered sensitive (Ware, 1993).

Some patient events are unlikely to appear in databases (depending on how they originate); missing from the databases considered here are services that may have been advised but neither sought nor rendered—screening examinations not given, physician follow-up visits not advised or kept, and prescriptions given but not filled. Other reasons for missing data involve out-of-area care for an individual who is otherwise in the database; an example is medical services provided in Florida to New York residents when they are on vacation or living part of the year out of state. Yet another is when patients do not make claims against health insurance policies (regardless of where they are rendered); this transaction may not be recorded through any of the usual claims processing mechanisms used to generate the database.

Furthermore, databases may never be sufficiently comprehensive for research or outcomes analysis, especially if the choice of core data elements is parsimonious. Thus, when the question at hand is health status and outcomes long after health care has been rendered, HDO staff or outside researchers may need the capability and authority to contact individuals (providers and possibly patients) for information about outcomes and satisfaction with care. Such outreach activities would require some adequate funding mechanism.

Inclusiveness. Inclusiveness refers to which populations in a geographic area are included in a database. The more inclusive a database, the more it approaches coverage of 100 percent of the population that its developers intend to include. Databases that aim to provide information on the health of the community ought to include an enumeration of all residents of the community (e.g., metropolitan area, state) so that the information accurately reflects the entire population of the region, regardless of insurance category. Conversely, inclusiveness is reduced when membership is restricted to certain subgroups or when individuals expected to be in the database are missing (Table 2-2). For instance, a database that is intended to include all residents in a local area may include only those who are insured and file claims for services; it misses those not insured and those who, although insured, do not use health services. An insurance claims database that does not include members of a health maintenance organization (HMO) because no claims are filed will also not be inclusive for the geographic area.

TABLE 2-2

Inclusiveness: Populations Covered as a Critical Dimension of Health Care Databases.

Databases may be (and often are) designed to include only subsets of the entire population of a geographic area: those eligible for certain kinds of insurance, such as enrollees (subscribers, their spouses and dependents) in commercial insurance plans; persons receiving care from specific kinds of. providers or in certain settings (e.g., prehospital emergency care from emergency medical services and hospital emergency departments); persons with a given set of conditions (e.g., a cancer or trauma registry); an age group such as those age 65 and older (e.g., Medicare beneficiary files); 5 residents of a defined geographic area or political jurisdiction or scientifically selected samples of individuals, as in major health surveys. Clearly these categories are not mutually exclusive—individuals (as well as providers) can and do appear in more than one such database. The potential benefits of the database, however, will increase as the database moves toward being inclusive of the entire population of a defined geographic area.

HDOs will have to be clear about what groups are missing when describing their databases and the results of their analyses. Perhaps more important, HDOs should seek ways to ensure that all relevant populations are included, so that their analyses accurately reflect the population of the region and, thereby, yield estimates of the levels of underuse of health care in their respective regions.

Table 2-3 summarizes these two attributes. 6 The dummy matrix, although empty, illustrates how databases can be described, evaluated, and differentiated from each other. Cell a represents patient populations and data elements that are included in a database. Cell b depicts the individuals who are missing from a database that is otherwise fairly comprehensive. Cell c represents patient nonevents and missing data in a database that is otherwise reasonably inclusive. Cell d represents missing individuals and missing data. To the extent cells b, c, and especially d are large, the database in question will be less able to provide extensive, or unbiased, information; the sizes of cells b, c, and d are, therefore, three determinants of database quality.

TABLE 2-3

Characteristics of Databases According to Two Critical Dimensions.

Other Characteristics of Databases

The more comprehensive and inclusive databases are, the more they facilitate detailed and sophisticated uses and, in turn, entail both greater anticipated benefits and possible harms. The magnitude of either benefits or harms can depend on several other important properties of databases, however, as noted below.

Linkage over time. The ability to analyze patterns, quality, and costs of care over a period of time may be very important to users. They may want to construct episodes of care or develop other longitudinal profiles; cases in point (respectively) involve all the care provided to a specific patient for a discrete course of illness or injury, regardless of site or setting, and compilations of information on services provided by a local HMOs over rolling five-year periods. Such studies require not only unique identifiers for patients and providers (see below) but also a record structure that permits analysts to link dates and times with patient care events, problems, and diagnoses.

Timeliness. Facts based on patient-provider interactions and other relevant information (e.g., employment, health plan, health status, or outcomes) should be entered or updated frequently enough to permit their timely use and analysis. If databases are to be of assistance with direct patient care, then information must be sufficiently up to date that caregivers can rely on it in all clinical decision-making situations.

Accuracy and completeness. Data used for clinical care—decision making about a given individual—must be of far greater accuracy and completeness than those required for administrative uses. Databases used for clinical decision making must, in describing an individual, describe only that individual and do so accurately. For instance, missing or out-of-date data or files that commingle data for more than one individual under a single identifier have grave potential for harm. In addition, correcting errors found at a later time must be possible; ideally, alerting past users of the database to those errors and corrections ought to be possible as well.

Control, ownership, and governance. Whether a given database has been established by the public or the private sector (or is some hybrid) will have important implications for inclusiveness and access. For instance, databases addressed in this report may be publicly supported—especially at the state level—and may be operated and administered by a private entity. Some state hospital discharge databases—such as the Health Care Policy Corporation in Iowa and the Massachusetts Health Data Consortium—are of this kind. Alternatively, they may be developed, maintained, and financed wholly in the private sector, such as those developed by professional or health care organizations, insurers, or business coalitions. A database created by state or federal law can require participation; that is, it can demand that health professionals, institutions, and patients participate in providing data. For example, Washington state has passed legislation that mandates development of a statewide data system by a health services commission that will identify a set of health care data elements to be submitted by all providers (e.g., hospitals and physicians) (Engrossed Second Substitute Senate Bill 5304, 1993). To the extent databases are developed and maintained in the public sector or are networked with public-sector databases (especially at the federal level), they will be subject to regulations that differ from those affecting databases operated purely within the private sector for the benefit of private sponsors. Given the evolving nature of state and national health care reform plans and programs, movement toward electronic data interchange (EDI), progress toward CPRs, and emergence of various hybrid arrangements for financing and delivering health care, the development of HDOs is taking place in very different (and perhaps unpredictable) environments that will likely have disparate effects over time.

Origin of data. Databases can vary widely in the source(s) of their information. For example, data may come from hospital discharge abstracts, self-completed questionnaires from patients or survey respondents, insurance claims submissions, employer files, computer-based pharmacy files, CPRs, and other sources.

Hospital discharge abstracts are common sources of publicly held data: 36 states have mandates for the collection, analysis, and dissemination of hospital-level information for prudent purchasing, decision making, education of the public, and rate regulation. Such databases may be maintained by a variety of entities, including: the Department of Insurance (North Carolina), a freestanding health data commission (Iowa and Pennsylvania), a rate-setting commission (Massachusetts), or the Department of Health (Minnesota, New Jersey) (NAHDO, 1993). One well-known model is that of the New York Statewide Planning and Research Cooperative System, called SPARCS, which has been an influential source of information for research on hospital-specific mortality (Hannan et al., 1989b).

An example of a survey database is the Medicare Current Beneficiary Survey, a longitudinal panel survey that the HCFA Office of the Actuary launched in September 1991. Individuals sampled from the Medicare enrolled population are interviewed three times a year. The survey includes demographic and behavioral data, health status and functioning, insurance coverage, financial resources, family support, source of payment, use of Medicare and non-Medicare services, and access and satisfaction. Information from the survey can be linked to Medicare claims and other administrative data.

Person-identified and person-identifiable data. For purposes of this report, person-identified data contain pieces of information or facts that singly or collectively refer to one person and permit positive (or probable) identification of that individual. An obvious piece of identifying information is an individual's name. Other identifiers may be biometric, such as a fingerprint, a retinal print, or a DNA pattern.

The committee uses the term person-identifiable to characterize information that definitely or probably can be said to refer to a specific person. It includes items of information (e.g., the fact of a physician visit on a given day) that will allow identification of an individual when combined with other facts (e.g., zip code of residence, age, or gender). To render data non-person-identifiable, some data managers convert facts to a more general form before releasing those data to others. For instance, date of birth may be converted to age, date of admission to month of admission, or date of physician visit to intervals between visits.

Concerns about misuse or improper disclosure of person-identifiable data are likely to escalate as more health information is stored in computer files. Ultimately, protecting patient identity in the commonly understood sense may become very difficult given increasing computer capabilities, creative cross-linkages among data sets, and the usual curiosity of human beings.

Unique, universal person-identifiers. A unique identifier (1) applies to one and only one person and (2) does not change over time. It includes the biometric identifiers noted above as well as numeric or alphanumeric codes. Health insurers, plans, and entitlement programs assign identifiers; among them are the Social Security number (the basis of the health insurance claim numbers used by Medicare) and other alphanumeric codes typically used by Medicaid programs and commercial insurers. Such identifiers may be neither reliably unique nor universal in the sense of linking health databases. Providers also assign identifiers to patients—usually a medical record or account number—but they are not universal, as they are not used beyond that specific provider, and generally they cannot be matched to identifiers assigned by other providers, plans, or programs. The term universal as used here does not apply to identifiers that could link health and nonhealth (e.g., financial) databases.

The extent to which unique and universal identifiers are available for individuals in the database—for instance, all persons in a geographic area, or all users of the health system—may prove to be a critical factor in the utility of that database. They are a prerequisite for the construction of longitudinal records on individual patients that can reflect their health care events and outcomes across sites and time. Ideally, inclusive population-based databases will have unique universal identifiers for all members of the relevant population group, so that nonusers of the health care system can be taken into account in various analytic applications. The need for a universal identifier and the debate about the use of the Social Security number (or its derivatives) for this purpose are discussed in detail in Chapter 4.

Nonvolitional identifying information—for example, fingerprints or retinal prints—may also be important, particularly for HDOs that intend to contribute to direct patient care. These markers allow positive identification of individuals, such as trauma victims, who cannot identify themselves, presuming of course that the data about individuals are in the database. They may also help to ensure that a patient record corresponds to the presenting patient both in delivering patient care and in verifying eligibility for benefits.

Unique identifiers for health care providers and practitioners. This characteristic pertains to individual practitioners, particularly physicians; hospitals and other inpatient or residential facilities or institutions; HMOs as well as independent practice associations (IPAs), preferred provider organizations (PPOs) and similar organized, integrated health systems; and various other providers such as pharmacies (and pharmacy chains) and home health agencies. HCFA assigns a universal physician identification number, or UPIN, for records of care to Medicare patients. Because not all providers see Medicare patients (e.g., pediatricians do not), however, UPINs are not a means of identifying all practicing physicians in the country.

As with patient identifiers, unique identifiers for providers will ideally be consistent over time and used for one and only one individual institution or clinician. Failing that, HDOs will need to find ways to link multiple identifiers (e.g., when a physician belongs to more than one health plan or bills from different addresses with different tax numbers) and to assign individual identifiers to a group using a single number (e.g., when all physicians in an HMO use the HMO's identification number).

Data Network

A data network can be thought of as a set of databases that: (1) are hosted on several computer systems interconnected with one another and to terminals and (2) serve some community of users. Such a network will typically have a number of attributes. First, the databases are dispersed over several machines; each database or group of databases resides on one or more computer systems. Second, the computer systems are often, but not necessarily, physically distant from one another. Third, all the machines in the network are linked so that information can be transmitted from one machine to another. Finally, each machine has software to permit exchange of information among individual systems in the network and, in turn, to allow individual users of the network to query the many databases and to receive, analyze, and aggregate these data. This report focuses on networks in which one or more common data elements (e.g., patient name, provider identity, facility name) is a link parameter that relates records in one database to those in others.

Databases in data networks may be linked by various physical or other arrangements. These include telecommunications (e.g., microwave channels, local-area networks, the public-switched network, satellite circuits), physical transfer of magnetic tapes or disks, and dial-up connections. This report is intended to apply to any or all of these mechanisms for linking databases; that is, the term network does not imply here that an electrical connection between computers must be in place (in contrast to the common terminology of computer professionals, for whom network usually does include electrical linkage).

Hospitals, pharmacies, physicians' offices, insurance companies, public program offices, and employers all generate inputs to databases that are interconnected in such networks. This committee, however, is particularly interested in data networks with linked databases that have, at a minimum, two specific characteristics: (1) their linking implies or involves movement of health data outside the care setting in which they have been generated and (2) they include person-identified or person-identifiable data.

Health Database Organization

The Concept of HDOs

The committee chose the phrase health database organization (HDO) to refer to entities that have access to (and possibly control of) databases and that have as their chief mission the public release of data and of results of analyses done on the databases under their control. For purposes of this report, prototypical HDOs have the characteristics outlined in Chapter 1; these properties may not, however, be present to the same degree in existing or emerging HDOs today. As conceptualized by the committee, HDOs have a number of crucial characteristics.

They operate under a single, common authority.

They acquire and maintain information from a wide variety of sources in the health sector—for example, institutions and facilities, agencies and clinics, providers such as pharmacies, and physicians in private practice. They might also obtain information from other sources not directly connected with personal health care, such as the administrative files or databases on persons covered by a specific insurance plan or employed by a given company. In all these cases, HDOs might add and update information periodically (from hourly to annually) or on a case-specific basis (e.g., on all patients with a certain diagnosis or on all providers of a certain type). They put these databases to multiple uses (some of which may not yet be imagined), in contrast to administrative or research databases created to perform specific tasks or to answer only specific questions.

Files accessible to HDOs will include person-identified or person-identifiable data.

HDOs will serve a specific geographic area that is defined chiefly by geographic or political boundaries (e.g., metropolitan area, county, state) and will include those who reside in or receive services in that area, or both.

HDO population files will be inclusive, meaning that they include all members of a defined population—for instance, in a region—so that denominators are known and population-based rates of service utilization and health outcomes can be calculated.

The data will be comprehensive in the kinds of data included about individuals and will include not only administrative and clinical information, but also information about health status and satisfaction with care.

HDOs will process, store, analyze, and otherwise manipulate data electronically.

Files held by HDOs can be designed for interactive access in real-time for assistance with patient care when primary records are unavailable to a treating physician. They are not, however, typically viewed as primary patient records (e.g., a computer-based patient record), and they are not meant to be simply passive archives or warehouses for health information.

For maximum accountability, security, protection, and control over access to data, HDOs should have an organizational structure, a corporate or legal existence, and a physical location; for example, they would have a governing board, a staff, a building, and a mailing address. They would conduct business, articulate a mission statement, promulgate policies, implement procedures, and carry out manipulations and analyses of data, and they could be held accountable for their actions. Assuming these characteristics exist, the committee targets most of its recommendations at such HDOs. Some organizations may develop the functions described above, but not as their primary mission. The committee intends its recommendations to apply to those HDO-like units as well. One might also imagine proprietary programs, systems, or entities with units that function as HDOs and that would be controlled by the same general principles.

Although the committee adopted the simplified construct offered above for its study, it was aware that more complex entities may arise. The variations that may emerge—for instance, bifurcated legal structures that include a network operator and a user organization—may result in consortia of legal entities. To the extent that this trend decentralizes authority and undermines common operating rules, the issues addressed in this report will become far more serious and possibly unresolvable.

The committee examined the repository function of an HDO. In this role, information collected at the level of the patient or about patients, providers, plans, and clinical encounters is accessed, stored, and made available for others, such as providers, researchers, insurers, and planners, to analyze. In some cases HDOs may have additional functions, such as claims transfer and adjudication, but these were not the subject of the committee's work or recommendations.

Throughout its discussions, the committee focused on regional databases and HDOs. In this context, the term regional is meant to suggest that HDOs and their constituent networks and databases pertain to a defined population of individuals living in, or receiving health care in, some specifiable geographic area. These may be city centered, such as the established metropolitan statistical areas that comprise cities and their surrounding counties or suburbs; they may be statewide (and not cross state borders). In some uses, regional conveys the idea of a multistate territory (e.g., the Mountain States or the Mid-Atlantic region), but most of this committee's work has been directed at smaller regions. Far-thinking experts envision a time when regional entities will be linked across the nation, even if their governance and operations remain close to home. This creates the very long-range view of a national health data repository operated by a single organization or a federation of regional or state entities. Especially in the short term, however, HDOs may have overlapping geographic and population boundaries; that is, there might be several in a metropolitan area or within a state's boundaries that include different subpopulations.

The committee elucidated these concepts precisely because regional HDOs are only now emerging in the United States. Some have been legislated or are under consideration by several cities and states for legislative mandate, but none is in full operation. It believes, however, that such entities will become repositories of an immense array of health information—far more extensive in their holdings than any of today's data systems. Thus, the issues raised in Chapters 3 and 4 of this report are explored with an eye to the policies and procedures these emerging HDOs might establish today to realize their many potential benefits while protecting against or minimizing possible harms to individuals (whether patients or practitioners), institutions, or society in general.

HDOs Under Development

Described below are several HDOs currently under development that represent the kinds of entities the committee considered during this study. Only selected characteristics of these programs are given, as a means of illustrating specific points that reflect the attributes of prototypical HDOs as defined earlier in this chapter.

Hospital Consortium of Greater Rochester. In existence since the late 1970s, the Rochester Area Hospital Corporation (RAHC) was originally established to enhance cooperative links among the community hospitals and to put community resources to their best use. A recent initiative has led to its reorganization as the Hospital Consortium of Greater Rochester (HCGR) and to the continuing development of a community-wide health information network that has HDO characteristics. Recent community discussions have focused on the creation of a health care commission that will include representatives of the area's eight hospitals, physicians, employers, the two major third-party payers, and residents (Gates, 1993a; personal communication, Beverly Voos, President and CEO, RHI Group, November 1993). While the function of this commission is still being discussed, the initiative could include the use of a database maintained by the Rochester Healthcare Information Group (RHI Group), a wholly owned, for-profit subsidiary.

From 1980 to 1987, RAHC administered an experimental payment program with both state and federal funding. Under this program, it established a community-wide hospital data system and administered an annual global budget using a community database. That database contains demographic, clinical, and financial data on all acute-care discharges from RAHC hospitals from 1980 onward. It has approximately 100 data elements per patient record in the following categories: Social Security number, demographics, clinical information, patient classifications, provider identification, payer data, and resource use data. Reports are provided to HCGR and to member hospitals on an ad hoc basis. Regional and national comparisons can be made using statewide data and the National Hospital Discharge Survey.

Beginning with 1980, more than a million patient discharge records are recorded in this database. Reports can provide case-mix analyses (which compare lengths of stay by diagnosis, payer, age, and hospital); trend reports on mortality statistics and readmission statistics by year and by diagnosis; payer analysis (which uses cases by insurers by years to analyze age and length of stay); resource utilization analysis comparing routine daily care and ancillary care; severity-of-illness analysis; market share analysis using zip codes; and physician caseload and hospital case mix.

The RHI Group database now includes ambulatory surgery and will soon include outpatient clinic visits as well. Other database components include patients awaiting discharge from a hospital to a nursing home and a perinatal database that is under development. Eventually, it is expected that the database will include information from every clinical setting. RHI Group is able to track patient care over time because it has Social Security numbers in the database.

Henry Ford Health System. The Center for Clinical Effectiveness at the Henry Ford Health System in Detroit is developing systems to track patients' long-term functional status six months or more after treatment as well as costs of their care (Gates, 1993b). This focus has provided the impetus for the development of a uniform electronic data collection system. The developers plan to integrate the collection of data from many operational units of the hospitals and sites of care (e.g., ambulatory care physician visits, tumor registries, patient satisfaction surveys) to make data available for a variety of uses within Henry Ford's large integrated health care system, ranging from patient reminders and managed care activities to outcomes research that would be supported by a central data repository. From the standpoint of inclusiveness, such a system would include only patients at Henry Ford sites. That system includes over 400,000 HMO members of the Health Alliance Plan and 920 physicians. It owns four hospitals, operates two nursing homes, and has joint ventures to manage four other hospitals. The Henry Ford Health System is an independent, not-for-profit provider network (Anderson, 1993).

The New York Single-Payer Demonstration Program. New York State is implementing a three-year program to improve administrative efficiency of hospitals and other providers (some free-standing clinics and physicians) by coordinating, automating, and standardizing claims processing, billing, and payment systems. The initiative is not literally a single-payer effort but rather a single-claim demonstration to translate insurance claims and billing forms in whatever format they are submitted and forward them to payers. In terms of inclusiveness, the databases will be statewide and will include patients hospitalized in New York state; with respect to comprehensiveness, they will contain primarily hospital data with the addition of physician and clinic data from billing forms. The state does not plan to maintain a data repository, but the potential exists for it to direct data to such a repository in the future.

The Vermont Health Care Authority. The Vermont Health Care Authority (VHCA) is the creation of 1992 state legislation (Vermont Health Reform Act). It draws on earlier state efforts to share health care information, particularly the Vermont Program for Quality in Health Care, a project that has been under way since the late 1980s (Keller, 1993). The VHCA program will be inclusive (covering all Vermont residents) and comprehensive (all health care services that Vermont residents receive from providers both in state and out of state). The initiative will include a lifetime patient record—essentially a unified health care database—linked to an information repository.

The unified database is to be developed by a subsidiary group, Vermont Health Care Information Consortium, using files of all providers, a uniform insurance claims form, and electronic claims submission. The claims-driven health care database is intended to provide policy-related information such as aggregate levels of expenditures and utilization by sectors; it will include Medicare, Medicaid, Blue Cross and Blue Shield, and other provider or insurer groups (e.g., HMOs).

The information repository, when linked to the lifetime health record, is meant to be an integrated system that improves access, controls costs, gives consumers health care information, and improves quality of care. These outcomes are to be achieved through two proposed mechanisms that are similar to those examined in Chapter 3 of this report: (1) feedback programs to share data on quality and practice patterns with one-third of Vermont's practicing physicians and (2) public disclosure of information about providers.

As of late 1993, the role of state agencies was not yet clear, but governance of the not-for-profit consortium will include a public-private partnership, with representation of state government (the Department of Health and the governor's office), Vermont employers, the Vermont Business Roundtable, Blue Cross and Blue Shield, health care providers (hospitals, physicians, the state's medical school), and consumer and patient advocates. It will have an advisory committee and several subcommittees for activities focused on patient advocacy and confidentiality, a business plan, financial issues, technical concerns, and data elements.

Community Health Management Information System. In the early 1990s the John A. Hartford Foundation launched a program of support for innovative, community-based development efforts to meet the shared information needs of all health system stakeholders at the local level: purchasers, consumers, providers, payers, and regulators. The Hartford initiative has focused on several regions of the country; grantees are located now in the states (or cities) of Iowa, Minnesota, New York, Ohio, Tennessee (Memphis), Vermont, and Washington. The program concept—generally known as the Community Health Management Information System (CHMIS)—has been described by Benton International (BI, 1991a, 1991b, 1992), which developed the CHMIS design and functional specifications that were being adapted by the local sites. 7

The CHMIS is based on two components. The first is a transaction system. To facilitate point-of-service transactions for the patient and to speed claims processing, computer terminals at each provider site will be used to access patient information based on a personal identification code (PIC) and a magnetized card similar to those used by automatic teller machines. At the time of service, patients or clerks will key in a PIC to allow access to eligibility, coverage, and billing information. This approach is comparable to those followed by prescription medication plans that use terminals in pharmacies to confirm that a customer is eligible for plan coverage and to determine what charges should be paid. Electronic switches or clearinghouses process bills and insurance forms from hospitals, physicians, laboratories, pharmacies, and other sites by electronically forwarding claims and encounter information to the insurance carrier, health plan, or third-party administrator.

The second component is a data repository, the main focus of this committee's interest. Certain information about patients, providers, plans, and clinical encounters will be routed to and stored in a data repository (or made available through distributed databases); these data will then be available for use and analyses by providers, payers, purchasers, consumers, regulators, and researchers (see William M. Mercer, Inc., 1993). The repositories are intended to be inclusive—all individuals receiving care in a defined region or state. They are also designed to be comprehensive—including demographic, eligibility, clinical, health risk, and health status information.

As envisioned in the BI specifications for CHMISs, data sources are varied:

Transaction-based information In fee-for-service systems, encounter data typically associated with insurance claims, such as procedure and diagnostic codes, constitute much of the general data set and will be acquired directly from the provider as part of the claims transaction. In prepaid capitated systems (group and staff model HMOs) that do not normally produce insurance claims forms, special arrangements will be designed to obtain needed data. The system is intended to incorporate Medicare and Medicaid claims data at some point.

Patient satisfaction surveys and health status questionnaires This information will be obtained from the patient (or possibly a family member in the case of, e.g., minor children), on either a routine or a sample basis. Survey instruments and questionnaires might also contain inquiries about lifestyle and health habits.

Special studies The general data set will be augmented by specified clinical data acquired from providers to permit researchers and others to conduct special studies of specific health conditions or other topics. The subjects of special studies would likely change over time, but clearly could include matters related to quality of, satisfaction with, access to, and costs of health care in the relevant community or state.

Although not fully operational as this report was being prepared in1993, the Hartford CHMISs were moving forcefully toward implementation. Two different operating models seem to predominate. One model is based on a state legislative requirement for all providers to send data to a public agency that contracts with the CHMIS operating entity; Washington State and Vermont fall into this group. A second model is voluntary, relying on recognition by providers and payers of the system's benefit to them. In such cases local business or health care purchasing coalitions may require that all providers with whom they contract submit data through the CHMIS; the Memphis Business Group on Health exemplifies this approach.

The Benefits of Health Databases

The gains expected from imaginative but responsible uses of the information held by HDOs accrue not only to various interest groups but also to populations generally, whether in a metropolitan or substate region, a given state, or the nation as a whole. The size of the potential benefits, whether to the community at large or to specific users, is likely to be a function of the comprehensiveness and inclusiveness of the databases—the more comprehensive or inclusive (or both) the more powerful the information will be at every level and for every potential user and use.

Broad-based Benefits

The intent of many database and HDO efforts today is to give regions a way to monitor and improve the value of their health care services and the well-being of their residents. HDOs might achieve this by making available information on access to care, costs, appropriateness, effectiveness, and quality of health care services and providers. HDOs can also contribute to improvements in quality of care by making information available to institutions and groups of practitioners for their use in quality assurance and quality improvement (QA/QI) programs and for regional health planning.

Many HDOs (especially those developed with public funds and by legislative mandate) can be expected to be useful in addressing a wide range of policy questions and in this way they will contribute to the national debate related to health care reform. 8 Regardless of the path of reform efforts, the questions noted below are of special importance, as implied by the brief scenarios that opened Chapter 1. For example:

Access Are people in a given region receiving appropriate care in a timely manner? Are services equitably available and affordable by all groups in that population? Do access barriers relating to social and cultural factors appear to persist? Does the use of particular types of providers or facilities differ by patient or consumer characteristics?

Costs Can the rate of increase in aggregate health expenditures be moderated? Can accurate estimates be made of the costs of care in given geographic areas? Can health care delivery and administration be made more efficient? Can administrative costs be reduced? Can cost shifting within the public sector (e.g., between states and the federal government, or from the private to the public sector) be minimized?

Quality of care Can the provision of health services be organized so as to increase the likelihood of health outcomes that are desired by individual patients? Can information from these databases address three main quality problems: use of inappropriate and unnecessary services, underuse of appropriate and needed services, and poor technical and interpersonal performance? Can clinical and other information in HDO files contribute to more, and better, practice guidelines? Can credible information about more effective and appropriate health care services be made available to clinicians and institutions in a more timely, and less threatening, fashion? Can useful information about the quality and outcomes of care of different kinds of providers be assembled and made available in convenient and prompt ways to consumers and organizers of provider networks and plans (e.g., insurance companies)? 9

Delivery of health services What services are appropriate and effective for what health care problems? How does the provision of those services vary across geographic areas, population groups, types of providers, settings of care, and time? Can innovative approaches to health care delivery be designed so as to promote the goals of health reform?

Disease incidence and public health What are the major causes of death, illness, and disability for different groups in the population? How are these patterns changing over time?

Health planning How might the acquisition, location, operation, and financing of facilities, capital equipment, health personnel, and other resources be made more rational, more affordable, and more responsive to clear community and regional needs?

Differential Benefits as a Function of Users and Uses

Answers to the questions above benefit almost all members of a given population; in that sense, the gains are broad-based. Other benefits of HDOs and their activities will depend on a specific user and use, which are explored more fully in the next section. One aspect of differential benefits should be underscored, however. In today's complicated U.S. health sector, what may profit one party may well work to the detriment of other parties. For example, information that encourages insurers or others to contract only with certain providers in a community, on grounds of either quality or cost, is doubtless of benefit to those insurers and providers, and it does give insurers the opportunity to direct patients toward high-quality providers. Such practices may, however, threaten the financial stability, livelihood, or professional standing of other, noncontracted providers—this is certainly not a benefit for them and may undermine the systems for the delivery of care to other, less-favored patient groups.

The next section briefly identifies users—groups that have a stake in the use of health-related data—and is followed by a discussion of potential uses of HDO databases. Some potential applications of data raise concerns that the report returns to in Chapters 3 and 4. In developing positions and recommendations about the actions that HDOs should take with respect to data (this chapter), public disclosure of health-related information on providers (Chapter 3), and with respect to privacy and confidentiality of person-identified data (Chapter 4), the committee tried to balance the broad-based benefits (and the narrower benefits sought by certain groups) against the possible harms that might be done to individuals or to broader health and social policy goals.

Users of Information in HDOs

As noted, many stakeholders in the health care system will share the general uses and the derivative benefits described above for HDOs. The major users include:

Health care provider organizations and practitioners Provider organizations include physicians in solo practice and large multispecialty groups; managed care groups such as HMOs, IPAs, and PPOs; free-standing surgery centers and other ambulatory care facilities; institutions such as hospitals and nursing homes; and enterprises such as pharmacies, clinical laboratories, and home health agencies.

Patients, families, and community residents in general The information in HDO databases may also be valuable for active patients and their families; more generally, it will be useful for residents who, although not patients at a given time, seek information about health care.

Academic and research organizations Academic and research organizations take many forms: academic health or medical centers affiliated with the nation's public and private universities; schools of medicine, dentistry, nursing, and allied health professions; and schools of public health. Private research institutes also fall into this group. Most of the entities in this category have major patient care and educational responsibilities, but they also carry out much of the health research in this country on issues involving access, effectiveness, utilization, costs, quality, and acceptability of care.

Payers and purchasers This category of users includes health insurance firms and companies with self-insured health plans that pay for some or all of the health care of their beneficiaries or employees. It also includes managed care companies, third-party payers (TPPs), and third-party administrators (TPAs), who will look to HDOs for assistance in managing standard insurance tasks. Insurers and self-insured employers also administer a variety of retrospective and prospective utilization management and case management programs.

Employers and business or purchaser coalitions Typically, business coalitions comprise major employers in a given area, many of whom have self-insured health plans; some coalitions may have provider members, but many do not. These groups have been a driving force in developing data networks in several regions of the country. (For example, as the CHMIS models were evolving during the period of this study, they were motivated in large measure by the concerns and interests of business or purchaser coalitions.)

Health agencies At the federal level, at least three PHS agencies might find HDO databases of considerable value in their daily operations (beyond the clear contributions that such databases would make to outcomes and effectiveness research): the Food and Drug Administration, for postmarketing surveillance and monitoring responsibilities (USDHHS, 1991); the Centers for Disease Control and Prevention, for public health and prevention activities; and the Health Services and Resources Administration, for its Maternal and Child Health block grants and health work force training programs. Other DHHS agencies—such as HCFA, the Administration on Children and Families, and the Administration on Aging—might be added to this list of potential users for similar federal oversight tasks. At the state, county, and municipal level, analogous health departments are likely to be users of HDO information for corresponding purposes; they are also likely to be central to HDO development and operations.

Other potential database users Other users may well view HDO information as valuable. These include community and consumer organizations; charitable groups and volunteer groups concerned with various diseases; social service agencies; law enforcement agencies at the federal, state, and local level; attorneys; and commercial entities such as direct marketing firms, financial and credit institutions, and bill collection agencies, to list only a few. To the extent that these users seek person-identifiable information, however, the committee takes an extremely negative view toward providing access to HDO files.

Uses of Databases

Without a clear understanding of potential users and their reasons for wanting access to data, HDOs cannot frame or implement sensible policies about a range of operational activities. To provide a background for reaching conclusions about HDOs and for developing recommendations to address the major issues implied by the vignettes that opened Chapter 1, the committee explored the uses of HDO information. Those described below are seen as the most likely in the short term, but as the direction of health reform becomes clearer, new uses and users may arise (e.g., related to health purchasing alliances), and some described below (e.g., insurer roles) may become obsolete. In some cases, uses of HDO data are illustrated by reference to databases held by organizations or public agencies that approach the HDO concept (see Table 2-4).

TABLE 2-4

Examples of Databases in Current Use.

Assessing Access to Care and Use of Services

Assessing access or lack of access to care is critical in evaluating the performance of systems of health care delivery and the rational planning of those systems. Understanding the economic, geographic, and transportation barriers to health services, variations in, and access to health services is essential in the evaluation of the effects of ongoing or changing health care delivery systems. Several recent studies have examined the relationship between insurance, socioeconomic status, and race, on the one hand, and access to and use of health services, on the other (Bravemen et al., 1989; Burstin et al., 1992; Patrick et al., 1992; Adler et al., 1993); racial and gender differences in disease incidence and survival have also been examined (Ayanian and Epstein, 1991; Hannan et al., 1991b; Ayanian, 1993; Becker et al., 1993; Whittle et al., 1993). At the level of regions of the country, unmet health needs may be especially significant for minorities or other groups such as pregnant women or poor children; users of HDO information may need to pay special attention to such groups.

Among the better-known work on patterns of utilization is that related to the phenomenon of geographic or small-area variations in the use of medical services, particularly invasive procedures. 10 Much of the landmark research in this area has relied on the administrative and other databases that HCFA maintains for the Medicare program (Table 2-4); these files have been extremely useful for research purposes for more than a decade. Other studies have employed data from state data organizations (e.g., Wennberg and Gittelsohn, 1982). For purposes of tracking use of services, conducting many different kinds of health services and health policy research, and otherwise administering a complex population-based health system, many experts regard the databases maintained by the individual provinces of Canada as models for uses of HDO information (see Table 2-4 for a description of the Manitoba files).

Some uses of HDO data to explore patterns of utilization could raise concerns, however. For example, information that permits third-party payers to devise insurance packages attractive to (or affordable by) only certain groups in the population is clearly of competitive benefit to the companies and the populations they target, but such practices may operate to the disadvantage of the excluded groups. To the extent the latter overlap with the vulnerable populations noted above, many would regard this use of HDO data as undesirable.

Assessing Costs and Identifying Opportunities for Savings

Curbing health care expenditures includes placing global limits on spending and linking fees to changes in the volume of services. For such efforts to be effective and equitable, however, those directing them will have to understand better the geographic variations in services and the reasons for these variations (Welch et al., 1993). Equally significant will be documenting the true economic costs of delivering health care as a means of understanding patterns of health expenditures and, secondarily, the efficiency of different plans and systems of care. To the extent that HDOs acquire reliable and valid information on services rendered and on charges and payments for those services (however questionable the actual relationship between billed charges and true costs), they will be in a position to clarify cost and expenditure issues.

Evaluating Quality and Outcomes of Care

Information about quality of care is important to everyone—for choosing a source of care, designing a health plan, building a malpractice case, or trying to improve care—and this committee gave quality assurance and improvement issues special attention in its deliberations. Physicians, institutions, and others who deliver direct patient care and insurers who establish their own provider networks will need to carry out QA/QI activities. It can be argued that the quality issues will have even greater visibility if certain approaches to health care reform gain prominence (those premised on managed competition) because of the heavy reliance that will be placed on the availability of credible quality-of-care information to consumers, purchasers, and regulators. As noted above with respect to access, the work that HDOs might do or support on quality of care must take disadvantaged, at risk, vulnerable populations more directly into account (Lohr et al., 1993).

If information available from HDOs is reliable and amenable to diagnosis-specific analyses and if it can be aggregated by physician, institution, and the like, then it may prove more useful for these purposes than current regulatory approaches to quality assurance. Of special interest to insurers (and policymakers in implementing health reform) is the potential for HDOs to use aggregate data to provide clinical practice benchmarks or norms. Such norms allow insurers (or others) to compare the practices and outcomes of a given provider with those of similar providers.

Hospital-specific Mortality Rate Studies

Hospital-specific mortality rate studies have been an early focus of quality of care studies using large databases. Much of this work began with HCFA'S release of such information in the mid-1980s, and a steady stream of reports (produced annually until 1993 by HCFA) from numerous teams of investigators has appeared since that time. 11 Statewide databases have also been used for research projects on mortality rates following open heart surgery. One example comes from the New York State Cardiac Surgery Reporting System (Hannan et al., 1989b, 1990, 1991a; Zinman, 1991; see also Chapter 3 of this report) and another from the Pennsylvania Health Care Cost Containment Council, whose report listed hospital charges and risk-adjusted mortality rates for coronary artery bypass graft (CABG) surgery for 35 Pennsylvania hospitals and 170 cardiac surgeons (PHCCCC, 1992).

A painstaking evaluation of the impact of the Diagnosis-related Group Prospective Payment System (DRG-based PPS) in Medicare also relied on Medicare files for critical data on patient outcomes (Kahn et al., 1992; Keeler et al., 1992a). Because the thrust of this work relates to quality of care, this report returns to such studies in Chapter 3.

Effectiveness and Outcomes Research

Yet another critical research area involves the effectiveness and outcomes of health care—the clinical evaluative sciences as some call it. Understanding effectiveness involves evaluation of the utility and appropriateness of health care in everyday settings with so-called average patients and usual providers (Brook and Lohr, 1985). William Roper, M.D., the former HCFA administrator, coined the phrase ''what works in the practice of medicine" to characterize the questions that researchers in medical effectiveness might address (Roper et al., 1988). At about the same time, the National Center for Health Services Research (now AHCPR) began an ambitious research program whose grantees are known as Patient Outcomes Research Teams (PORTs) (AHCPR, 1990a, 1990b; Raskin and Maklan, 1991). A dozen or more PORTs are under way at any one time in this country, and all of those concerned with health problems of the elderly rely heavily on Medicare files.

Although these studies often involve rigorous design and statistical methods, by intent they are usually not randomized trials that require massive primary data collection—hence the attractiveness of information on health care that already has been collected and stored in data files. Information in these databases can, nonetheless, contribute to classic randomized, controlled trials—for example, by providing indications of the epidemiology of disease or treatment patterns or, in some circumstances, serving as a means of designing a sampling frame for the study. 12 As a case in point: two articles that appeared as this report was being prepared examined the utility of surgery for prostatic cancer (Fleming et al., 1993; Lu-Yao et al., 1993). One paper relied on Medicare's claims system to estimate the risk of radical prostatectomy; the other examined time trends and geographic variations in prostate cancer diagnosis. Both can be traced to earlier analyses on variations in the use of transurethral prostatectomy for benign prostatic hypertrophy, which had been based on data contained in large-scale databases (see the citations in footnote 10). The committee thus placed great emphasis on the need to expand such uses of health databases—including those expected to be assembled by HDOs—to address the myriad health services research questions that now confront this nation. A recent report from AHCPR provides a useful compilation of automated data sources and a literature review for ambulatory care effectiveness research as well as a description of automated ambulatory record systems (USDHHS, 1993a).

Quality Assurance and Quality Improvement Programs

Providers will find information in HDO databases of particular value for QA/QI programs. In a health care environment emphasizing competition—one that may heavily regulate prices and other economic factors and disallow preexisting condition clauses, biased risk selection by insurers, and similar cost-shifting tactics—competition on the basis of quality of care may become far more prominent (AMPRA, 1993; IOM, 1993c; Palmer and Adams, 1993; Tillmann and Sullivan, 1993). Provider groups have a clear incentive to implement meaningful QA/QI efforts as a means of doing as well as possible in comparative analyses. Some internal efforts may involve recruiting high-quality staff or dismissing poorly performing staff; other elements involve improving performance across the board. Thus, these databases may offer provider groups help for strategic planning, marketing, and competing in local health markets; these benefits presuppose that providers choose to act on the information that they can glean directly from the database or that they are furnished as part of an external quality-review program.

In California, for example, all nonfederal acute-care hospitals submit discharge abstracts to the state's Office of Statewide Health Planning and Development. Among the data elements available for analysis are age, sex, presence of chronic conditions, dates of admission, surgery, and procedures in addition to a primary operation; information dates back to at least 1983. Using data from this file, Luft and Romano (1993) reported on two sets of CABG-related analyses: (1) describing general patterns of CABG use and risk-adjusted outcomes over a seven-year period and (2) identifying hospitals with significantly or consistently higher or lower death rates after CABG than would be expected. These kinds of analyses are commonly done with hospital discharge abstract databases; often, however, they are subject to considerable criticism, especially because of the inadequacy of information to permit adjustment for patient risk factors, such as ejection fraction or previous CABG. Given the constraints of the database that Luft and Romano used, their study was considered exemplary because of their sophisticated approach to measuring outcome and performing statistical analyses (Chassin, 1993b).

Another example of quality-of-care applications of databases is the quantitative and qualitative work done for the IOM report on the end-stage renal disease (ESRD) program (IOM, 1991b). These analyses drew on information in one or more of three major ESRD data systems: the ESRD Program Management and Medical Information System, administered by HCFA; the United States Renal Data System, administered by the National Institute of Diabetes and Digestive and Kidney Diseases; and the United Network for Organ Sharing data system, a part of the Southeast Organ Procurement Foundation.

In short, for well more than a decade researchers have employed large databases, particularly those of the Medicare program, for studies that today have significant bearing on our understanding of the quality of health services in terms of both processes and outcomes of care. Apart from their intrinsic worth and findings, such studies have generated important hypotheses, the exploration of which promises to yield further, and considerable, social benefit.

Planning and Monitoring Patient Care

Health care practitioners will be able to use the information in HDO databases in many patient care responsibilities. Examples of such applications include: checking patients' allergies to medication, obtaining patient histories at the time of patient-practitioner encounters, planning the management of complex cases, and fostering better communication among all providers rendering care to an individual patient and between clinicians and ancillary personnel.

Descriptive information derived from such databases may enable primary care physicians (and their support staffs) to conduct outreach and health promotion activities; such tasks might involve identifying individuals who should receive periodic screening tests and providing up-to-date immunization records for school enrollment. Chronically ill patients and their families could use HDO data files to help maintain family health records (e.g., logs of visits to a specialist or admissions to a hospital) or to create a running sum of out-of-pocket expenses for office-based care, medications, or hospitalizations.

HDOs with prescription databases might enable physicians, pharmacists, and others to track prescribed medications and to report adverse drug reactions more readily. Their databases might also be used to identify medication abuse by patients who obtain a pharmaceutical agent from multiple providers or to detect medications prescribed by different providers whose interactions could cause adverse reactions or reduce their effectiveness.

Even for those who are not patients or relatives of patients at any given time, HDO databases may be of value. Descriptive information on primary care physicians, specialists, other caregivers, and health plans (e.g., location, special aspects of a practice or facility, or usual charges) may be of great help when individuals plan to seek care for new problems or in new locales. Health education materials that could be developed as an adjunct to HDO activity, such as guidelines for preventive care, nutrition, or available community resources, can be a valuable resource for residents as well.

Many large employers have become deeply involved in managed care, case management of high-risk or high-cost patients, and in workplace health promotion efforts. They are likely to use HDO information in all these kinds of programs for their employees, employees' dependents, and (possibly) retirees. Government is a significant employer at the federal, state, and local levels, and it has as much interest in good health and good decision making among civil servants, as does the private sector for its work force. Thus, the uses that corporate (or small) employers might have for HDO information apply equally to the public sector.

Less obvious reasons for seeking access to health databases can be imagined. For example, adopted children may wish to obtain information on certain health or genetic characteristics of their birth (natural) parents, as a means of making health-related decisions of their own; such requests might be brought through third parties as a way of protecting the privacy of the parents or the adopted child, or both. Although this example may seem remote today, uncommon uses of these databases—and the considerable ethical, psychological, political, and practical ramifications they may have—ought to be contemplated in advance as policies about access, privacy, and disclosure are set in motion.

Enhancing Administrative Efficiency

One major goal of health care reform is to make health insurance claims processing and financial transactions more efficient. All health insurers have databases derived from several sources. These databases have traditionally served as mechanisms for eligibility verification, provider reporting (for tax purposes), and claims adjudication. When their data flow in an electronic transaction (i.e., EDI) system, the efficiency related to billing, reimbursement, claims tracking, remittance reconciliation, and similar business matters can be very high; the Workgroup for Electronic Data Interchange, in its report to the Secretary of DHHS (WEDI, 1992), estimated benefits, savings, and strategies for implementing EDI in the coming years.

Analogously, when HDOs build their repositories through such electronic systems, they may be able to support nearly instantaneous verification of insurance plan eligibility and covered benefits, 13 facilitate claims submission, and eliminate time-consuming, costly paperwork. The closer that these databases approximate the medical record, the more exact reimbursement strategies will become and the less time can be spent on record requests and appeals.

Operating Managed Care Programs

Providers or others may find HDO databases helpful in identifying likely high-cost patients who would benefit from case management and in streamlining precertification tasks. Case management and precertification customarily call for case-by-case decision making—whether an individual's care for severe mental illness will be reimbursed if given in the inpatient setting or whether services for a high-risk pregnancy will be organized through a case manager.

TPPs and TPAs have begun to apply medical logic programs to augment their precertification programs with what might be called intelligent adjudication—that is, decision making that takes into account historical medical information. Given appropriate and consistent use of standardized coding rules, electronic (as opposed to telephone- or paper-based) precertification systems can simplify and speed decision making for patients, physicians, TPPs, and TPAs. HDO data may be useful for profiling services received by patients who are subject to precertification programs and for portraying prevailing practices in a locale, thereby contributing to the construction of such logic programs.

In these ways, HDO data could be brought to bear to improve patient care and to minimize the frustrating, inequitable, or idiosyncratic features of present-day utilization review and case management. Looking well into the future, some experts hypothesize that information in large-scale databases will significantly change utilization management processes and, indeed, even obviate the need for utilization management as it is conducted today. This might happen as better epidemiologic data and the methods of artificial intelligence make it possible to create case-management protocols based on complex logic trees that take account of far more patient, clinical, and other variables than is possible today.

Strategic Planning and Selective Contracting

HDO data can facilitate a range of long-term planning, business, and financial management tasks that insurers, employers, and providers face. Such information can also be applied in the selective contracting activities that are becoming increasingly common.

Strategic Planning

With respect to long-range strategies, TPPs and TPAs might be able to improve their underwriting and benefits design through analysis of HDO data; for instance, such analyses might enable them to set premiums more accurately or to establish benefit packages. However, the committee hopes that TPPs and TPAs would not use HDO databases for selective underwriting that further fragments the risk pool. More broadly, payers and purchasers might be able to determine the risks they face with respect to future demand for health care more accurately from such databases than they could in the past. If HDO databases are reasonably inclusive, TPPs and TPAs might then be able to understand better how such demand might vary by geographic area or population group and how it might change over time.

Payers are likely to use HDO information in strategic planning for more than just the health insurance portion of their business. For instance, some health insurers may be part of conglomerates that offer life, disability, workers' compensation, and other forms of insurance. In theory, HDOs might provide information on individuals, or groups in a geographic area, that would be of considerable interest to those managing other activities of an insurance company. Such data might be helpful in devising nonhealth insurance packages that are attractive (or not attractive, as the case may be) to certain individuals or populations in those locales.

Providers may seek to use HDO databases for many reasons: to project market share when considering mergers with other facilities, to select sites for satellite clinics, to establish ambulatory surgery centers, to acquire group practices, and in other ways to plan future activities with financial implications. Some groups may wish to acquire competitive intelligence in order to set charges for their services, so that they can consider whether to lower or raise their charges for some or all purchasers or patients. Although such activities might well raise antitrust issues, addressing such questions was beyond the expertise of the committee or the scope of the study.

Selective Contracting

To develop provider networks—systems of practitioners (including physicians, dentists, optometrists, and psychologists), ancillary facilities (e.g., clinical laboratories or short-term substance abuse programs), hospitals and nursing homes, and agencies that deliver home health or home-based hospice care—that can compete effectively in the health sector today, insurers need more specific, accurate, and detailed data that will permit them to contract selectively with such providers. Selective contracting in this context means not only identifying practitioners and providers that can deliver high-quality care within some acceptable cost norms, but also recognizing those that cannot; in either case, it implies that providers will perform in accordance with responsible practice guidelines and protocols. In addition, providers might use measures of normative behavior to determine the standards of quality each provider or plan might expect of its practitioners.

Thus, HDO information might be applied in retrospective profiling of provider-related information as a means of identifying providers that might be brought into (or kept out of) selective contracting arrangements; it can also be employed to monitor performance over time. One of the chief aims of HDOs emerging today is, in fact, to aid payer or other groups in this process by enabling them to develop (or acquire) analyses of practice patterns, outcomes, costs, and similar variables that will permit them to make decisions about the providers their systems will include or exclude.

Other Business-related Uses

Nearly three-quarters of employers with 1,000 or more employees manage self-insured health plans (Foster Higgins, 1991, in IOM, 1993e). These firms can, for all practical purposes, be regarded as conducting (or being responsible for) the same tasks as insurers, as discussed above. 14 Health databases or HDOs based chiefly on the sponsorship of employers will thus, in theory, offer considerable advantages in benefits planning, selective contracting, monitoring provider network performance, and similar activities relating to management of employment-based health insurance coverage. For example, employers may want to create case-management plans that are increasingly directive and oriented toward exclusive provider organization (EPO) arrangements, like those that are common today for high-technology therapies such as transplantation and cardiac surgery.

Employers might also wish to use the information in some databases for personnel actions—promotions, relocations, dismissals, and the like. 15 As will be noted later in this report, serious concerns must be addressed about misuse of data in general and with respect to possible violation of the Americans with Disabilities Act (ADA), for instance. Nevertheless, the connections between workplace wellness and personnel actions are clear.

Tracking Injury and Illness, Preventive Care, and Health Behaviors

Those studying and having responsibility for public health efforts can be expected to use HDO databases for a broad set of applications. These include analyses of the incidence of injury and disease and studies of the prevalence of trauma-related health problems and chronic illness. Today, disease and injury registries provide information on traumatic events, episodes of illness, and the processes and outcomes of care that exemplify what might be done with HDO data in the future. For instance, cancer registry data from two states, Illinois and Washington, have been used to address a range of questions (Hand et al., 1991; Lasovich et al., 1991) including underuse of services in the hospitals studied. One study focused analyses on the percentage of early-stage breast cancer patients who do not receive indicated auxillary lymph node dissection or assays for hormone receptors; in another, the study question was the percentage of women receiving breast-conserving surgery who did not receive indicated radiation therapy. In commenting on these studies, Chassin (1991) notes that they suggest problems "in the extent to which physicians fail to communicate options and outcomes data objectively" (p. 3473) and advocates routine feedback of these kinds of data to hospitals. With respect to injury, the American College of Surgeons National Trauma Registry is another example of a database that provides information on patterns of injury and their outcomes (see Table 2-4); for those concerned with emergency medical services, such sources of epidemiologic and clinical information are critical (IOM, 1993d).

Other public health applications of HDO databases relate to preventive care and health behaviors. For some industries, for instance, epidemiologic information from large databases may enable analysts to identify potential safety or health-related problems in workplace environments and to suggest corrective steps. Immunization tracking systems, currently under development regionally and nationally, might be incorporated into HDO databases to simplify monitoring and recording of children's immunization status both in aggregate and individually. HDOs might also maintain information about blood type, organ donors, and tissue matching in their databases, as a means of fostering improved blood banking and organ procurement and transplant services.

Promoting Regional and Community Health Planning, Education, and Outreach

Health Planning and Education

When HDO databases are statewide, or sponsored by state health departments, the potential uses by states and all subordinate levels of government for health planning, health care delivery, public health, and administrative responsibilities become quite extensive; they can involve the health departments and social services agencies of states, counties, and municipalities in many overlapping efforts. Planning and educational activities that could employ HDO data might be focused on improving access to, reducing costs of, and enhancing quality of care; on organizing provider systems of care; or on investigating epidemiologic patterns of injury or illness. For example, community-specific studies conducted using HDO data might examine the kinds of cases treated by local hospital emergency departments, whether use differs by hospital or patient characteristics, and whether patient outcomes differ accordingly. Such information might enable public agencies to target public funds or other resources in new ways to meet previously undetected problems or needs.

Integrating data on vital statistics, epidemiologic surveillance, and local and regional public health programs with those in the personal-health-care files of HDOs raises the possibility of more effective public health activities for monitoring health, attaining public health objectives at a population level, and targeting efforts for hard-to-reach individuals. For example, researchers in Boston have developed and operationalized a distributed health record system for a homeless population seen at many sites by many different providers (Chueh and Barnett, 1994).

Community Outreach

In addition to whatever public-sector agencies might do to monitor the public health of communities, community and consumer organizations may wish also to carry out population-based studies as a means of learning where significant health problems exist and of making elected officials and others more accountable for solving those problems. Another significant way that information held by HDOs may contribute to the work of community, voluntary, and consumer groups is in their public education and outreach programs. Here the data may suggest emerging problems that warrant increased attention (or waning problems that need reduced effort); data may also indicate where (in geographic areas or population subgroups) education initiatives might best be targeted. For example, recognition that bicycle accidents are a major source of children's head injuries could lead to community education programs in schools and neighborhood associations. Public-sector agencies, academic centers, or consumer groups might pursue such public health efforts by analyzing HDO data and developing community-specific informational materials (e.g., public information brochures on sources of care for special problems).

Charitable groups and voluntary organizations concerned with particular diseases and conditions have many roles: providing information to and support for patients with particular illnesses and for their families; sponsoring research; and lobbying for more policy attention, social acceptance, and research support for the problem. Because they are likely to be private organizations that secure their funds through donations from individuals and corporations, most must engage in aggressive fund-raising campaigns. Information from health data banks might enable them to increase their efficiency in amassing epidemiologic information and perhaps in targeting fund-raising efforts.

Other Uses for HDO Databases

The IOM committee identified a great many other potential users and uses of HDO databases, including agencies engaged in law enforcement at the federal, state, and local levels; law firms and attorneys; and various commercial entities. The more plausible are briefly described here.

Law enforcement officials can be expected to find many uses for the information held in HDO files. They may wish to trace individuals (for instance, to locate parents not paying child support). They may also need to investigate alleged illegal acts; in the health context, this might extend to abuse of illegal substances or cases of possible child abuse. Conceivably, law enforcement agencies might want genetic information to assist them in identification of a suspect. Finally, such agencies may be expected to monitor providers and patients for possible fraud.

Arguably attorneys and law firms might identify many uses for HDO data, including malpractice litigation. Plaintiffs' lawyers, for instance, might try to access information from HDOs concerning previous quality-of-care deficiencies of a physician or hospital; defendants' counsel might seek to demonstrate, through analysis of HDO data, that the provider acted well within community standards. One important application occurs in cases where the past or current health condition of the patient is relevant to the case or is at issue in the case. Product safety litigation may also call forth requests for data from the network, especially when a medical device is in question. Finally, attorneys representing health plans, insurers, medical groups, hospitals, and other providers in their business (e.g., financial) concerns may find information contained in the databases of use in advising their clients about risk management, taxes, financing, and similar matters.

A wide array of other kinds of companies, organizations, and services might well have an interest in the information available through HDOs. Among them are direct marketing firms, financial and credit institutions, and bill collection agencies. Such entities (especially the last named) might wish to have person-identified information, but in general many applications of the information might not be directed at patients but rather at providers or at groupings such as zip codes. Financial and credit institutions might be interested in health plan and hospital data to determine market share or estimate solvency for a given group practice or facility. In general, this committee takes an extremely negative view toward giving these groups access to HDO files, particularly any data that might conceivably identify individual persons, and thus these uses are not explored further here.

Comment

The committee emphasizes that its roster of users includes examples of current as well as potential HDO database users; 16 it does not believe that HDOs necessarily ought to satisfy all such claimants. It does acknowledge, however, that the mere existence of a database creates new demands for access and new users and uses. Consequently, those who establish health databases and HDOs may be creating something for which the end uses cannot always be anticipated.

Because this study took place at a time of change in both health care infrastructure and information systems, the committee tried to anticipate the probable sources of the tension that will exist between those who create databases and wish to protect the information and those who might argue for access to those databases on grounds of anticipated benefits. Historically, the creation of large databases, such as those to administer the Social Security program and the National Crime Information Network, has been followed by modifications in the databases themselves and in the policies and legislation that regulate access to them—which results more often than not in relaxing prohibitions or barriers to access. Realism dictates that large databases such as those maintained by HDOs will be dynamic. In the committee's view, policies regarding access to these databases should, therefore, be based on firm principles but flexible enough to accommodate unavoidable changes and unanticipated uses.

The benefits of electronic patient records should not be overlooked, however. These benefits include the availability of much more powerful databases, elimination of the need for repeated requests to record subjects for the same information, and assurance that information is available when needed. Despite the privacy concerns described, it should be possible to improve privacy protection and safeguard the confidentiality of health information in HDOs through a variety of methods described in later chapters.

Moreover, information must be acted on by individuals in a position to change their own, and others', behaviors and performance. Most experts agree that getting information to people and organizations is just the first, and perhaps not the most important, step in the change process. Although this committee (in Chapter 3) places great store on information dissemination efforts by HDOs, HDOs will not be well placed to follow up the actions taken (or not taken) by recipients of that information.

Many of the challenges faced by the health care sector are essentially exogenous—for instance, the changing demographics of the U.S. population, problems of international competition in the manufacturing and information-services sectors, and increasing disintegration of social and familial structures. No amount of radical change in health care, let alone tinkering, will demonstrably affect those problems, and HDOs similarly cannot influence them.

Further, despite the promise that HDOs hold for addressing certain health policy issues, this committee emphasizes that information derived from the files of HDOs and similar entities will not be the solution to all the ills of the health care system. Information may be incomplete or untimely, lack critical variables such as health status, or otherwise be imperfect. In addition, such data may be observational, meaning that they lend themselves more to description than to causal or inferential analysis, and more to retrospective commentary than to prediction. In the terminology used earlier, HDOs and their constituent databases may be neither acceptably comprehensive nor inclusive.

Commentary on a related information activity is instructive. In 1992 the IOM, in conjunction with the Commission on Behavioral and Social Sciences and Education (CBASSE) reviewed the plans of the National Center for Health Statistics for a new National Health Care Survey. The survey is described as having the following objective: "to produce annual data on the use of health care and the outcomes of care for the major sectors of the health care delivery system. These data will describe the patient populations, medical care provided, financing, and provider characteristics" (IOM/CBASSE, 1992, p. 6). The IOM/CBASSE report commented at some length on the ability of existing data sources (e.g., current NCHS surveys) to provide these kinds of information and noted (p. 38):

[They] are rapidly becoming outdated and less comprehensive than is desirable. Often they do not cover the universe of providers and sites of health care [or] patients or potential users of health care. They lack sufficient information on exactly what services are provided and what the outcomes of those services are . . . are inexact with respect to financial data . are not timely; and . are inaccurate, incomplete and unreliable.

These faults may well affect the data repositories and networks considered by this IOM committee; they are discussed in greater detail below.

Ensuring the Quality of Data

The above discussion has outlined the many potential users, uses, and benefits of HDOs. Ultimately, however, the real rewards of developing and operating HDOs will depend heavily on the quality of the data that they acquire and maintain. The committee considers this subject of sufficient importance that it elected to comment on it directly.

The absolute prerequisites to successful implementation of any type of database or HDO with the expansive goals implied by the foregoing discussion are reliable and valid data. Developers must ensure that the data in their systems are of high enough quality that the descriptive compilations, the effectiveness research, and the comparative analyses envisioned can be done in a credible, defensible manner. (McNeil et al., 1992, describe limitations of current data systems for profiling quality of care, especially at the individual provider level.) Mistakes, qualifications and caveats, retractions, and similar problems must be minimized, and precision about what data are actually being sought must be maximized. All this must be done from the outset so that the long-term integrity and believability of the database and work based on its information will not be undermined irretrievably.

The committee did not wish to prescribe methods that HDOs might employ for ensuring data quality, judging that approaches might differ by type of database and HDO. It did, however, consider that success in meeting this responsibility will call for attention on several fronts.

First, the committee held the view that information becomes more useful when it is used. Although the characteristic of comprehensiveness is clearly of primary importance in considering the value of a database, HDOs need to avoid the trap of collecting everything that it is possible to collect, regardless of its reliability and completeness, and thereby end up with data elements that will be used only rarely and, worse, be of questionable value when they are used.

Part of the problem is that analysts will have little experience with such data elements and may make incorrect assumptions about their reliability or about how to interpret values correctly. Another part of the problem is that some data, although currently collected routinely because an entry must be made in a box on a form, are not used for anything by anyone. Such data will likely have a very low level of accuracy. A commonly cited example relates to information on hospital diagnoses in the Medicare program; diagnoses were often doubtful before the advent of the DRG-based PPS (see Gardner, 1990). When diagnostic data began to figure in decisions about reimbursement, studies of quality of care, choices in clinical care, or analyses about productivity, the situation changed. After 1983 hospitals came to be paid on the basis of DRGs (which obviously are diagnosis based), and diagnostic information improved markedly, although some problems persist. Similar problems of suspicious (missing, wrong, or even fraudulent) information on insurance claims forms for outpatient care exist to this day; the underlying problem is that payment mechanisms do not depend so heavily on outpatient diagnostic data—that is, the information is not used in the same way as inpatient data—so little incentive exists to record diagnoses accurately. 17 The least that can happen in these instances is that those data elements consume computer memory; the worst is that the data will be used in ways that contaminate an entire study or cause unwarranted harm to individuals, groups, or practitioners.

Second, data must be accurate and analyzable. Sometimes these points are couched in terms of reliability and validity of data. 18 More generally, the accuracy and completeness of data elements that will be used extensively must be guaranteed if they are to be useful. Among the problems one must guard against are the following: missing data; out-of-range values for quantitative data (e.g., age of patient; charges; even laboratory values in the most advanced databases of the future); unrealistic changes in parameters over time (e.g., the doubling of a patient's weight between office visits); clearly erroneous information (e.g., wrong sex); and miscoded information on diagnostic tests, actual diagnoses, surgical procedures, medications, and the like. Analysts must also be cautious about their interpretation of patient care events—for example, not misconstruing the reasons for or timing of a particular diagnostic procedure when interpreting events in the course of treatment of a life-threatening emergency.

Third, the committee also believes that structural aspects of health databases should be emphasized as conducive to high-quality data and information. Databases should be built around a core of uniformly reported (or translatable) data that is relevant and can be shown to be accurate and valid for the HDO's intended analyses (in keeping with the comments just above). In addition, HDO should have an easily implemented capacity to supplement core data elements. The committee and other experts agree on the significant tension that exists between the desire for comprehensive databases and the consequently broad uses to which HDO data might be put and the wisdom of a certain parsimony in the actual gathering of person-identifiable information.

Although the committee realizes that the federal government may have to take the lead in standards development and improved coding systems, the committee urges HDOs to foster, encourage, and work toward national standards for coding and definitions for (at least) core data elements. 19 Government leadership is indispensable in matters of coding and data uniformity, but widespread input from the private sector is desirable. The reason is that the costs of momentous or frequent changes (in terms of money, loss of comparability of data, potential incompatibility of clinical and payment coding, and incentives for fragmentation and upcoding of services) can be significant; consultation between the public and private sectors can help avert excessive or unnecessary costs of these types.

Fourth, the committee takes the position that the basic structure and content of these databases ought to be carefully designed from the beginning, but they must have sufficient capacity for expansion and change as health care reform, effectiveness and outcomes research, and other dynamic aspects of the health care sector evolve in coming years. This requirement implies that due attention will be paid to the quality of new categories of data that may become available for HDOs in the future.

RECOMMENDATION 2.1 ACCURACY AND COMPLETENESS

To address these issues, the committee recommends that health database organizations take responsibility for assuring data quality on an ongoing basis and, in particular, take affirmative steps to ensure: (1) the completeness and accuracy of the data in the databases for which they are responsible and (2) the validity of data for analytic purposes for which they are used.

Part 2 of this recommendation applies to analyses that HDOs con duct. They cannot, of course, police the validity of data when used by others for purposes over which the HDOs have no a priori control.

Until HDOs can demonstrate the quality of their data, the committee cautions that their proponents must guard against promising too much in the early years, particularly in the area of improving quality of care and conducting research on the appropriateness and effectiveness of health services. The committee returns to this point in Chapter 4 in a discussion of data protection and data integrity.

As many investigators have pointed out, the absence of sufficient clinical information in most databases today (and likely for tomorrow) is a critical limitation (Roos et al., 1989; Hannan et al., 1992; Chassin, 1993b; Krakauer and Jacoby, 1993). Efforts to acquire such information through manual abstraction of relevant information in hospital records, which is the basis of various patient classification programs (e.g., Medis Groups or HCFA's proposed Uniform Clinical Data Set), are costly and time-consuming. Some means of obtaining such information more directly from patient records will be needed.

Clinical data should be obtained, whenever practical, to validate analyses. The committee does not regard the clinical data found in medical records, whether computerized or not, as always sufficiently comprehensive, accurate, or legible to characterize them as a "gold standard," but they are a valuable, and sometimes indispensable, touchstone against which to judge the less rich administrative data on which many types of health policy and health services research are and must be based.

The validity of elements in a database must be matched with the kinds of inferences that can be drawn. The committee believes that the best method of enhancing the comprehensiveness of HDO databases and the accuracy and completeness of data elements is to move toward CPRs in which the desired variables themselves, rather than high-level abstraction and proxy coding systems, could be accessed. This committee does not wish to convey the impression that the transition to CPR systems is anything but an extraordinarily difficult task. Although the progress made in establishing a CPR Institute is laudable, much remains to be done for that organization to realize even the main objectives set forth for it in the IOM report on CPRs and CPR systems. In addition, planning efforts by the Computer Science and Telecommunications Board (a unit of the Commission on Physical Sciences, Mathematics, and Applications of the National Research Council) on the national information infrastructure and its role in health care (and health care reform) make clear that both the health care and the computer and information sciences communities have a considerable way to go even in agreeing on details about the directions that policies and technical advances should take in addressing major issues in this critical area.

In its April (1993) report to the Secretary of DHHS, the Work Group on Computerization of Patient Records supported the development of national standards for documenting and sharing patient information. It also called on the American National Standards Institute Healthcare Information Standards Planning Panel to coordinate the development, adoption, and use of national information standards for patient data definitions, codes and terminology, intersystem communication, and uniform patient, provider, and payer identifiers.

RECOMMENDATION 2.2 COMPUTER-BASED PATIENT RECORD

Accordingly, the committee recommends that health database organizations support and contribute to regional and national efforts to create computer-based patient records.

The committee acknowledges the importance of computer-based patient records with uniform standards for connectivity, terminology, and data sharing if the creation and maintenance of pooled health databases is to be efficient and their information accurate and complete. The committee urges HDOs to anticipate the development of CPRs and to contribute to the development and adoption of these standards. HDOs should take a proactive stance, by joining efforts by the CPR Institute and other organizations working to facilitate implementation of CPRs, helping in standards-setting efforts, and otherwise becoming full participants in the multidisciplinary effort that is now under way.

Summary

Much of the thrust of this report concerns how to maximize the benefits that this committee believes can be realized from the construction and operation of inclusive and comprehensive health databases. In examining these questions, the committee has focused on what it calls health database organizations. HDOs are emerging entities of many different characteristics in states and other geographic regions of the country; the committee made two key assumptions about them: (1) HDOs have access to and possibly control considerable amounts of person-identifiable health data outside the care settings in which those data were originally generated and (2) the chief mission of HDOs is public release of data and results of studies about health care providers or other health-related topics.

The broad-based value of HDOs and their databases might be said to be the provision of reliable and valid information in a reasonably timely manner to address all the major questions in health care delivery—access, costs, quality, financing and organization, health resources and personnel, and research—facing the nation today and in the coming years. The chapter also details the narrower benefits that might accrue to a variety of potential users, including patients and their families, health care providers, purchasers and payers, employers, and many other possible clients in the public and private sectors.

In assembling the data that will go into products for all such users and uses, the committee had sobering concerns about the quality of those data. Thus, it recommends that HDOs take responsibility for assuring data quality on an ongoing basis, and in particular take affirmative steps to ensure: (1) the completeness and accuracy of the data in the databases for which they are responsible and (2) the validity of data for analytic purposes for which they are used [by HOOs] (Recommendation 2.1). The committee also recommends that HDOs support and contribute to the regional and national efforts to create CPRs and CPR systems (Recommendation 2.2).

Initially, HDOs will attempt to provide data for particular users and uses to answer particular kinds of questions. Nevertheless, advances in the creation and operation of computer-based databases, whether centralized or far-flung, can be expected in the coming years. The committee believes that thoughtful appreciation of their potential and anticipation of their potential limitations will hasten that progress. The development of HDOs—their structure, governance, and policies on disclosure as well as on protection of data—must be designed for the achievement of these long-term goals.

The next chapter takes up the major responsibilities of HDOs in carrying out a critical mission: furnishing information to the public on costs, quality, and other features of health care providers in a given region or community. The committee adopted two strong assumptions as it began to consider this topic. The first is that considerable benefits will accrue to interested consumers and to the public at large from having access to accurate and timely information on these aspects of the health care delivery system with which they deal; this has been the thrust of the present chapter. The other assumption is that HDOs supported by public funds ought to have a stated mission of making such information available, and this will be a core element of several committee recommendations. The committee also assumes, however, that harms can arise from some uses of the information in such databases. For this reason, in the next chapter the committee considers administrative and other protections that it believes HDOs should put in place.

Footnotes

The Clinton administration's proposed Health Security Act (HSA, 1993) gives appreciable attention to information systems and related matters. It calls for the establishment of a National Health Board to oversee the creation of an electronic data network consisting of regional centers that collect, compile, and transmit information (Sec. 5103). The board will, among other duties, provide technical assistance on (1) the promotion of community-based health information systems and (2) the promotion of patient care information systems that collect data at the point of care or as a by-product of the delivery of care (Sec. 5106).

The types of information collected would include: enrollment and disenrollment in health plans; clinical encounters and other items and services provided by health care providers; administrative and financial transactions and activities of participating states, regional alliances, corporate alliances, health plans, health care providers, employers, and individuals; number and demographic characteristics of eligible individuals residing in each alliance area; payment of benefits; utilization management; quality management; grievances, and fraud or misrepresentation in claims or benefits (Sec. 5101).

The HSA further specifies the use of (1) uniform paper forms containing standard data elements, definitions, and instructions for completion; (2) requirements for use of uniform health data sets with common definitions to standardize the collection and transmission of data in electronic form; (3) uniform presentation requirements for data in electronic form; and (4) electronic data interchange requirements for the exchange of data among automated health information systems (Sec. 5002). It also calls for a national health security card that will permit access to information about health coverage although it will contain only a minimum amount of information (Sec. 5105) (Health Security Act. Title V. Quality and Consumer Protection. Part 1. Health Information Systems).

According to the IOM (1991a, p. 11): "A primary patient record is used by health care professionals while providing patient care services to review patient data or document their own observations, actions, or instructions. A secondary patient record is derived from the primary record and contains selected data elements to aid nonclinical users (i.e., persons not involved in direct patient care) in supporting, evaluation, or advancing patient care." At present, most medical records are maintained on paper, not in computers, and the U.S. General Accounting Office provides the following startling figures on the equivalent volume of paper: "We estimate that the 34 million annual U.S. hospital admissions and 1.2 billion physician visits could generate the equivalent of 10 billion pages of medical records" (GAO, 1993a, p. 2).

One major hurdle to the development of CPRs involves standards for vocabulary, structure and content, messaging, and security, according to GAO reports (1991, 1993a); without standards for uniform electronic recording and transmission of medical data, effective automated medical record systems will be delayed. This committee did not examine these technical issues, although they pertain as well to large-scale regional HDOs; arguably, the government and the private sector will need to move more forcefully on development of such standards—perhaps moving beyond near-total reliance on voluntary efforts—if CPRs, CPR systems, and regional health databases and networks are to succeed.

The discussion of comprehensiveness and inclusiveness of databases is couched in terms of what might be regarded as the traditional domain of medical care, including mental health care. Clearly, more advanced databases could include information on dental care and care provided by health professionals that practice independently, such as nurse-practitioners and nurse-midwives, acupuncturists, or alternative healers of various sorts. Even more far-reaching databases might contain information on sociomedical services provided through, for instance, day care and home care for adults or children.

This is illustrative only because Medicare files also include younger but disabled beneficiaries and persons with end-stage renal disease.

The congressional Physician Payment Review Commission (PPRC) has been in the forefront of advocates for a national data system (PPRC, 1992, 1993). In its 1992 annual report, PPRC described an "all-patient database" [emphasis in the original], conceptualized as a "network of local or regional data processing centers . to streamline the transfer of administrative information for payment and service-use tracking purposes" (p. 269). The report goes on to posit "parallel organizing entities . to coordinate the use of these data [and] the data processing centers and the organizing entities would make up an all-patient data network" (p. 269). The commissioners also envisioned the network evolving into a "means to link and assimilate more detailed clinical information." Although the general thrust of the PPRC idea is consonant with the long-range views of this IOM committee, the specific understanding of what a database or network is differs. In defining an all-patient database, the commissioners appear to have in mind what this committee terms inclusiveness; what the PPRC report lays out as "core data elements'' in that database approaches what the IOM report calls comprehensiveness.

Benton International is a consulting firm for the financial services industry, with expertise in credit card and ATM (automatic teller machine) transaction processing. All information about CHMIS in the BI specifications is in the public domain, and neither the John A. Hartford Foundation nor BI retains proprietary claims on this information.

In a recent report, the Institute of Medicine (IOM, 1993c) outlined the critical elements of health care reform that it believed sound reform proposals ought to address. The five main topics were access, containing costs, ensuring quality, financing care, and enhancing the infrastructure of health care. Individual IOM reports have dealt with specific topics related to certain aspects of health reform, such as access (IOM, 1993a), employment-based insurance (IOM, 1993e), quality of care (IOM, 1990), clinical practice guidelines (IOM, 1992a), and the information infrastructure (IOM, 1991a).

At the time this study was conducted, the Hartford Foundation sponsored a separate study from researchers at the Harvard School of Public Health to examine questions related to the establishment of an ''Institute for Health Care Assessment," which would have as a major goal the advancement of quality measurement and improvement. Services that such an institute might provide to HDOs might include project formulation, technical assistance (in quality measurement, data collection and management, and analysis), report design and profiling (e.g., of morbidity, patient satisfaction, provider adherence to guidelines, and variation in use of costly technologies), and project evaluation. A clearinghouse effort and dissemination might also be contemplated. For further information, see McNeil et al. (1992).

The literature in this area is extensive. Well-known articles—beginning about a decade ago on small-area-variations analysis—include McPherson et al., 1981; Roos and Roos, 1981; Roos et al., 1982; Wennberg and Gittelsohn, 1982; Wennberg et al., 1982, 1984; Eddy, 1984; Roos, 1984; Wennberg, 1984; Health Affairs, 1984; Chassin et al., 1986a, 1987; Merrick et al., 1986; Winslow et al., 1988a, 1988b; Wennberg, 1990; Paul et al., 1993. For a recent review of this literature and a new interpretation of this body of empirical work that suggests that physician enthusiasm for particular services explains much of the geographic variation in utilization, see Chassin, 1993a.

The first major release of hospital-specific mortality rates dates to publications from HCFA (e.g., HCFA, 1991). Other illustrative research efforts include those reported by Chassin et al. (1989) on all Medicare hospitalizations, by Dubois et al. (1987a, 1987b) on admissions to institutions belonging to a single hospital chain, and by Luft and Romano (1993) on risk-adjusted death rates from coronary artery bypass and graft operations for hospitals in California.

Technology assessment overlaps with these research efforts in so far as it extends beyond reviews of the published literature or operations of expert panels to actual collection and analysis of data. Often, however, technology assessment is directed more at emerging technologies—for example, new drugs, devices, or (less often) procedures—than at established ones. To the extent this is true, databases of the sort described in this report, particularly those derived from financial transactions in health care, will not contain much relevant or useable data on those newer technologies. They will, however, contain valuable information for the comparison of these new technologies with existing approaches (which always include watchful waiting). When the focus of technology assessment is on established technologies, databases are useful not only for such evaluation activities, but also for setting priorities for assessment (IOM, 1992b).

The term "benefits," as used in this report, has two distinct meanings, depending on the context. One reflects the general notion of positive advantages, gains, and useful aids in the conduct of some activity. The other is the narrower insurance-related concept of a benefit package or contract, in which a set of services (typically characterized as "medically necessary") is specified as covered (and in which other services may be specifically identified as not covered). This committee assumes that major health care reform will likely bring about a standardized class of covered benefits, at least in a "basic" package mandated nationally; this would reduce the need for verification of covered benefits by providers and insurers. To the extent that reform initiatives permit differential types of supplemental insurance plans to be offered, however, such verification may still be needed and the desirability of doing that instantaneously remains high.

The technicalities—not trivial matters—relating to Employment Retirement Income Security Act (ERISA) preemptions with respect to self-insured employer plans were not directly addressed by this IOM committee. Many observers believe, however, that meaningful health care reform will require modification, if not outright repeal, of ERISA insofar as health insurance is concerned (IOM, 1993e, 1993c). Federal preemptive legislation would override existing state statutes and regulations and effectively take some aspects of insurance regulation out of the hands of state insurance commissioners; if that occurs, distinctions between self-insured corporations and insurers subject to state insurance regulation will decrease if not disappear altogether. The ramifications of ERISA extend beyond insurance regulation to rules for protecting the privacy and confidentiality of personal health data, including those held by HDOs; informational privacy issues are addressed in Chapter 4.

For purposes of making hiring decisions about individuals, potential employers may wish to obtain information on such persons. This committee will (in Chapter 4) take a very strong position against person-identifiable information being made available for this purpose and thus does not examine that possible use of health databases further.

In The Computer-Based Patient Record: An Essential Technology for Health Care, the IOM examined in some depth the array of users of computer-based patient records (CPRs) and CPR systems, indicating that an ''exhaustive list . would essentially parallel a list of the individuals and organizations associated directly or indirectly with the provision of health care. Patient record users provide, manage, review, or reimburse patient care services; conduct clinical or health services research; educate health care professionals or patients; develop or regulate health care technologies; accredit health care professionals or provider institutions; and make health care policy decisions" (IOM, 1991a, p. 31). It is difficult to improve on that enumeration in the present context, even though the nature of the databases themselves (CPRs versus networks based on, e.g., insurance billing transactions or surveys) is quite different and the emphasis (e.g., patient care delivery versus health plan management) differently placed.

One example of the problem of diagnostic coding for insurance claim purposes was provided during a study site visit. A member of an internal medicine group noted that he used essentially six outpatient (office visit) diagnoses because "they work" and because he would otherwise be questioned or second-guessed too much by insurers if he recorded more, or more detailed, diagnostic codes. Because reimbursement is keyed to length and complexity of a visit, rather than to diagnosis, he had a clear conscience about this practice.

Reliability in this context relates to the need for data to be reasonably accurate and complete—that is, essentially free of missing values, systematic bias in what data are captured or recorded and how those data are coded, and random errors. Validity concerns relate to the issue of whether analyses done on a given database are appropriate for the questions being asked and whether those analyses will provide defensible answers that are internally consistent and externally generalizable. According to Palmer and Adams (1993), measures of quality can be reliable if the rate of random error is low, although they may still contain systematic error (meaning that some attribute is being captured but that it may not be the one intended); for quality measures to be valid, both random and systematic error must be low. These considerations of random and systematic error mean that the level of reliability of a measure (or the underlying data) place a ceiling on the level of validity that can be attained; unreliable measures or information can never be valid.

AHCPR has explored the feasibility of linking administrative databases for effectiveness research and urged the development of uniform messages and vocabulary standards (USDHHS, 1991). See also Aronow and Coltin (1993).