Evaluating Public-Participation Exercises: A Research Agenda

Share Embed


Descripción

ARTICLE

10.1177/0162243903259197 Science, Rowe, Frewer Technology / Evaluating , & Human PublicValues Participation

Evaluating PublicParticipation Exercises: A Research Agenda Gene Rowe Institute of Food Research Lynn J. Frewer Wageningen University The concept of public participation is one of growing interest in the UK and elsewhere, with a commensurate growth in mechanisms to enable this. The merits of participation, however, are difficult to ascertain, as there are relatively few cases in which the effectiveness of participation exercises have been studied in a structured (as opposed to highly subjective) manner. This seems to stem largely from uncertainty in the research community as to how to conduct evaluations. In this article, one agenda for conducting evaluation research that might lead to the systematic acquisition of knowledge is presented. This agenda identifies the importance of defining effectiveness and of operationalizing one’s definition (i.e., developing appropriate measurement instruments and processes). The article includes analysis of the nature of past evaluations, discussion of potential difficulties in the enactment of the proposed agenda, and discussion of some potential solutions.

Keywords: public participation; evaluation; effectiveness; research agenda

Introduction: The Concept of Public Participation Public participation may be defined at a general level as the practice of consulting and involving members of the public in the agenda-setting, decision-making, and policy-forming activities of organizations or institutions responsible for policy development. Currently, in the United Kingdom and elsewhere, the issue of public participation is one of growing interest to academics, practitioners, regulators, and governments (e.g., Leach and Wingfield 1999). In the United Kingdom, for example, a number of signifiScience, Technology, & Human Values, Vol. 29 No. 4, Autumn 2004 512-556 DOI: 10.1177/0162243903259197 © 2004 Sage Publications

512

Rowe, Frewer / Evaluating Public Participation 513

cant recent reports from the government have called for increased public participation at national and local levels, in realms as diverse as health care, the environment, transportation, and local government (for details, see Roberts, Bryan, Heginbotham, and McCallum 1999; Owens 2000; Martin and Boaz 2000; Bickerstaff and Walker 2001). In practice, there is a move away from an elitist model in which expert advice acts as the authoritative source for regulation to one in which citizens have a voice in framing government decisions (Frewer and Salter forthcoming). Among the key questions that need to be answered regarding public participation are why it has caught the attention of policy institutions at the present time, and why public participation is perceived by institutions as potentially facilitating governance and institutional practices. In addition, and perhaps most important, is the question of how can we be sure that “participation” results in any improvement over previous ways of doing things, or indeed, of any effective or useful consequences at all. The first two questions have been discussed elsewhere (e.g., Frewer and Salter forthcoming). This article addresses the latter question, which concerns the issue of “evaluation.” The article begins by briefly discussing what is meant by public participation and why it is thought to be an improvement on traditional ways of making decisions, setting agendas, and devising policy that have traditionally been left in the hands of the elected or the expert without further reference to the views of the public. The multifaceted nature of participation, and the variety of ways (mechanisms) in which it is operationalized, will be discussed. The article then focuses on the crucial issue of how one determines that a particular participation mechanism or exercise has been successful. In addressing this issue, the dearth of high-quality empirical evaluations will be highlighted, along with the lack of any comprehensive framework for conducting such evaluations. An agenda is then set out that outlines key issues for conducting research on the issue of evaluation. The need for such an agenda has been identified elsewhere: Webler (1999) specifically stated that “what is needed is a concise research agenda for the field [of public participation]” (p. 65), while earlier, Sewell and Phillips (1979) noted how “appraisal [of participation exercises/ mechanisms] has been hampered by the deficiency of frameworks for analysis” (p. 337). In describing the agenda, the main difficulties in conducting evaluations will be identified (both conceptual and practical), as will ways in which these might be resolved. It is hoped that this article will increase the understanding in the research and practitioner communities of the need for evaluation, and its importance, and provide a framework for conducting such evaluations.

514 Science, Technology, & Human Values

The Rise of Public Participation: Rationales and Mechanisms The history of the rise of public participation in democratic societies is not the subject of this article and will be dealt with only briefly. The concept of “participation” has been associated by various authors with the movements of pluralism and direct democracy, which have risen over the last century or so at the expense of the managerial model of public administration (e.g., Reich 1985; Laird 1993; Dryzek 1997). In part, this rise is attributable to declining public confidence in the processes that develop policy decisions and to reduced trust in those to whom the processes have traditionally been conferred through election or recognition of expertise. The important point that arises is that governments and their various agencies have, whether as a result of legislation or inclination, increasingly sought public views on policy issues in a more direct and specific manner than dictated by the traditional model of governance through which decision makers are periodically elected to set policy (often doing so with the help of chosen experts) without further public input. It would be too simplistic to attribute the growth of interest in participation entirely to greater respect by institutional actors for public views on policy issues. There is undoubtedly a certain amount of pragmatism involved in endorsing public participation, with institutions recognizing that a nonconsulted public is often an angry one and that involving the public may be one step toward mollifying it. Certainly, some policy formulators may be more concerned with increasing public confidence in the policy process than truly seeking the views of the public. Participation conducted for such tokenistic reasons alone, however, with little intention of acting on the information gathered from the public, may prove counterproductive should the public appreciate this underlying rationale and has been much condemned (e.g., Nelkin and Pollak 1979; Fitzpatrick and White 1997). But what, precisely, is meant by participation? “Public participation” is a complex concept, the scope and definition of which is open to debate. That is, the public may be involved (in policy formation, etc.) in a number of different ways, or at a number of levels (e.g., Arnstein 1969; Nelkin and Pollak 1979; Wiedemann and Femers 1993). In some cases, the public may participate by being the passive recipients of information from the regulators or governing bodies concerned; in other cases, public input may be sought, as in the solicitation of public opinion through questionnaires or focus groups; and in still

Rowe, Frewer / Evaluating Public Participation 515

other cases, there may be active participation of public representatives in the decision-making process itself, such as through public representation on an advisory committee. Arnstein (1969), for example, considered that true participation involves a high level of empowerment of the public and a direct input into the decision process, and decried approaches that appear to be participative yet yield no real power (e.g., the typical public meeting). For Arnstein, mechanisms such as the survey are a “step towards true participation,” without being truly empowering or participative. For others, however, participation is a less constrained concept, the key distinction being between “communication,” in which the public has no input per se (it simply receives sponsor information and might only have a voice in clarifying meaning), and “participation,” in which it does (e.g., Rowe and Frewer 2000). From the latter definition, mechanisms such as surveys and focus groups may be considered as participative, in addition to deliberative mechanisms that have guaranteed public influence. This wider definition is one that appears to be at least implicitly adopted by many researchers in the area who use the term participation in describing mechanisms such as surveys (e.g., Carr and Halvorsen 2001). In this article, participation is defined according to this latter perspective as entailing initiatives in which sponsors acquire some form of public input. After all, whether public information is elicited is an a priori characteristic of different mechanisms, though whether that information is used (to empower the public) depends as much on sponsor motives as intrinsic mechanism characteristics and may be determined only some period after the event. An empowering definition may therefore create research problems since one might not necessarily be sure that the initiative being studied, as an example of participation, would subsequently prove to have been correctly classified. The mechanisms that exist to enact participation are diverse, ranging from the traditional (e.g., public meeting) to the novel (e.g., consensus conference) and from mechanisms that seek responses from participants acting alone (e.g., surveys) to those involving deliberation between participants interacting in groups (e.g., focus groups). Certainly, the number of mechanisms has multiplied over recent years. What is less certain, however, is their quality and effectiveness. That is, do these various mechanisms, and the individual exercises in which they are used, achieve what they set out to achieve? Do they accomplish one or more of the various aims ascribed to the concept of public participation? Answering these questions involves the evaluation of the exercises or mechanisms, including systematic comparisons between different exercises. The importance of this, and its difficulty, is discussed next.

516 Science, Technology, & Human Values

The Issue of Evaluation Evaluation of participation exercises is important for all parties involved. These include the sponsors of the exercise, the organizers that run it, the participants that take part, and the uninvolved-yet-potentially affected public. In this sense, the evaluation of public-participation exercises is no different from the evaluation of any social program. Evaluation is important for financial reasons (e.g., to ensure the proper use of public or institutional money), practical reasons (e.g., to learn from past mistakes to allow exercises to be run better in future), ethical/moral reasons (e.g., to establish fair representation and ensure that those involved are not deceived as to the impact of their contribution), and research/theoretical reasons (e.g., to increase our understanding of human behavior). As such, few would deny that evaluation should be done wherever and whenever possible. Unfortunately, evaluation is also difficult. Rosener (1981) listed four problems inherent in conducting evaluations: first, that the participation concept is complex and value laden; second, that there are no widely held criteria for judging success and failure of an exercise; third, that there are no agreedupon evaluation methods; and fourth, that there are few reliable measurement tools. In writing this, Rosener was effectively considering the difficulty of conducting an ideal evaluation. However, as Joss (1995) notes, evaluation is “a term embracing many different kinds of judgments, from informal assessment that relies on intuition or opinion, to well-defined and systematic research that makes use of social science research methods” (p. 89). Arguments for informal evaluation are weak. Joss suggests that those arguing for the informal evaluation of consensus conferences (one particular, and not unusual, participation mechanism) argue that these have shown themselves to be “effective” merely by continuing to take place and attracting a wide audience. Practically, it is argued, they do not produce the “hard data” necessary for analysis, and there may be resistance from those, like organizers, with a vested interest in success. The continued existence of a mechanism, however, may stem from inertia and a lack of protest, rather than any positive attribute, while a large audience can usually be gathered for any event (with sufficient publicity), from a circus to an execution, and this has no particular implication for appropriateness. Practical difficulties are a better argument for informal evaluation, but there are ways these may be overcome to a greater or lesser extent, that is, sufficient data can be acquired, and resistance to evaluation by sponsors and organizers may be countered. It is our view that the conduct of rigorous evaluations using social science methodologies should be an important part of public-participation exercises. In the following sections, an agenda is set out for conducting evaluations in a

Rowe, Frewer / Evaluating Public Participation 517

structured manner, with the aim of enabling the absolute and the relative evaluation of any particular exercise, that is, to determine whether it is effective and to allow comparison with the effectiveness of other exercises of the same or different mechanism type. The ultimate aim of such research would be to establish which mechanism works best in which situation; in a practical sense, such knowledge is crucial for sponsors, organizers, and participants. In a research sense, the significant question concerns why certain mechanisms work better than others in certain situations. Data generated from evaluations may be used as a basis for inductively generating theories about the contingent utility of mechanisms or be used to test theories deductively derived. The aim of the article is to indicate an agenda whereby such data may be generated; it is not to develop such a theory itself.

An Agenda for Evaluating Public-Participation Exercises The agenda is described in terms of a sequential number of steps. In describing each step, theoretical and practical difficulties to following it are also discussed.

Step 1: Define Effectiveness The first step is to define what is meant by the term effectiveness (or success, quality, or whatever synonym one wishes to use). Unless there is a clear definition of what it means for a participation exercise to be effective, there will be no theoretical benchmark against which performance may be assessed. The difficulty lies in the fact that effectiveness in this domain is not an obvious, unidimensional and objective quality (such as speed or distance) that can be easily identified, described, and then measured. Indeed, there are clearly many aspects to the concept of “participation exercise effectiveness,” and these are open to contention. As an example, consider the activity of a group that is meeting to produce a solution to a particular problem (after all, many participation mechanisms tend to be group based). How would one judge the effectiveness of the group? Among the possible standards (hence definitions) are the speed at which the group came to its solution, the number of ideas generated, the quality of the ideas generated, and the extent to which the final solution represented group consensus. Of course, “speed of decision making” could be interpreted positively or negatively: it could reflect an efficient process, or it could indicate that the group engaged in insufficient deliberation to reach a solution or was not provided with sufficient resources to

518 Science, Technology, & Human Values

appreciate the complexity of the problem. On the other hand, “number of ideas” as a criterion of success might reflect the complexity of the problem but not the efficacy of the proposed solutions. Assessing the “quality of ideas” generated might involve value judgments being applied to those ideas, while focusing on the development of “group consensus” might, arguably, detract from the diversity of opinions that may have value in their own right, or at least should be made public as part of a transparent process. In addition, an exercise that succeeded at one of these aspects might fail at another—for example, speed to solution and number of ideas generated may be antagonistic goals. Furthermore, not all participation exercises include group behavior or aim to acquire some objectively good solution to implement, hence evaluations need to take multiple dimensions into account. The main components to a definition of effectiveness are discussed next. A universal definition? The generalizability of evaluation criteria. Given the variety of forms of participation mechanisms, and their seemingly diverse aims, the question arises as to whether it is possible or sensible to talk about a definition of public-participation effectiveness in any general sense. Certainly, given the variety of perspectives and interpretations of the participation concept, it is unlikely that all researchers would agree on a single universal definition of what does and does not constitute participation. From a democratic perspective, for example, an effective participation exercise might be one that is somehow “fair,” and a number of related criteria might be stipulated. From a decision-making perspective, effective participation might be indicated by an output that is in some sense “better,” and alternative criteria related to decision quality might be stipulated. Likewise, an economic framework might be concerned with cost or resource characteristics. Each definition might be deemed “universal,” in the sense that it is intended to cover the full range of participation examples and not a limited subset of these. Alternatively, it may be argued that a variety of “local” definitions are more sensible, in which participation mechanisms are divided into subgroups that vary according to whether they seek certain types of outcomes (such as attaining consensus, educating participants, or producing the best decision) or involve certain types of process (such as group-based or individual-based activities). Definitions of effectiveness can then be generated for each subgroup. It might even be argued that every exercise is, in fact, unique and ought to be evaluated according only to its own very specific aims—such as “to get organization X to commit to specific activity Y.” A universal definition, encompassing all types of participation exercises and mechanisms, may theoretically be used to develop measures that will enable the effectiveness of any participation exercise to be ascertained and

Rowe, Frewer / Evaluating Public Participation 519

compared with any other. More limited (local) definitions will result in measures that will allow comparison of exercises only to others belonging to the particular subgroup covered by that definition. For example, if a universal definition of effectiveness stated that an exercise must “be perceived as being fair to all parties involved,” one could develop a measure of fairness and apply this to any participation exercise to compare its effectiveness with any other. However, if one developed a definition that stated that effectiveness involved “achieving a better decision,” then the measure developed on this basis could be used only to assess the effectiveness of the class of exercises in which actual decisions were made (as opposed to others in which simple opinions were sought). Of course, one could develop a universal definition that covered process and outcome variety by being deliberately vague, as in defining effectiveness in terms of “achieving whatever aims (consensus, better decisions, education) are intended,” but the more fuzzy the definition, the more difficult it will be to develop adequate measures and hence to determine whether that type of effectiveness is achieved. Another perspective comes from Syme and Sadler (1994), who provided six criteria for establishing a definition of success (see pp. 533-34). For example, their first criterion is that “the criteria for showing that objectives have been met need to be agreed on between public and planner” (p. 533). Their other criteria set out a procedure for doing this (“where,” “when,” etc.). This perspective implies—and provides a mechanism for producing—specific evaluation criteria and hence specific definitions of effectiveness for each participation exercise. There is no correct answer to the universal-versus-local question. However, we suggest that specific aims of individual participation exercises may always be phrased in terms of more general classes of aims (though whether the classes are broad enough to be described as universal, or less broad and hence local is open to debate) that will allow comparative analysis. This is not to say that researchers should accept a single universal definition, or a single set of local definitions that are independent and mutually exclusive—different interpretations of the participation concept may militate against this (indeed, debate about the relative merits of different definitions is liable to enrich the participation debate)—but simply that a more general phrasing of what is meant by effectiveness is necessary if we are to acquire findings that are comparable. The evaluation perspective: Effective according to whom? Another complex issue relating to defining participation-exercise effectiveness is the fact that there are various constituencies involved in the process, from the sponsors to the participants and the various publics (or stakeholder groups) that

520 Science, Technology, & Human Values

they are meant to represent. Hence, what might appear effective to some might not appear so to others. For example, participants might be satisfied with a deliberative conference process and judge it effective on that basis, while the sponsors might be dissatisfied with the resulting recommendations and, on that basis, judge it ineffective. This fact both complicates the production of a definition (which needs to somehow take the various perspectives into account) and implies the need for an unambiguous, a priori statement of what is meant by effectiveness and how it might be ascertained, in order to reduce contention and dispute about the merits of the exercise later. One way to get around this problem is to take an objective perspective, in which the contentment or acceptance of the specific parties involved (whoever they might be) is an important aspect or criterion of effectiveness. Often those defining effectiveness implicitly adopt this approach. Outcome versus process effectiveness. A further complication that impacts on how one might usefully define effectiveness is the practical difficulty of identifying an end point to participation exercises (a point at which one can say that an exercise has ceased, and no further actions will derive from it). That is, institutional and societal responses to a particular exercise may be manifest months or even years after an exercise has finished. Given that the reason for defining effectiveness is to enable its measurement, however, a definition that focuses on qualities or quantities that are difficult to measure is of limited use. In defining effectiveness, one dichotomy that exists is between the outcome of an exercise and the process associated with the exercise (e.g., Chess and Purcell 1999; Rowe and Frewer 2000). In many ways, the assessment of outcomes is preferable because these will correspond more directly to the desired aims of the exercise. However, these may be difficult to ascertain in a timely manner, and outcomes may to some extent also be due to other variables, such as the occurrence of simultaneous events or externally mediated pressures influencing policy processes (e.g., Chess and Purcell 1999). As such, evaluation of exercise processes must often serve as surrogate to the outcomes of the exercise. That is, if the exercise process is good (it is conducted well according to one’s definition) then it would seem more likely that the outcomes will be good than they would be if the process is bad (and if attained, then arguably due to other factors). For example, it would seem more likely that decision makers will ignore the recommendation of an exercise (a “bad” outcome) if they perceive it to have been poorly run (e.g., with unrepresentative participants), than if they perceive it to have been well run (e.g., with representative participants).

Rowe, Frewer / Evaluating Public Participation 521

Definitions in the literature. In spite of these complexities, numerous definitions of effective public participation have been produced, varying in terms of their content and the ways in which they have been derived. Some definitions have been developed on the basis of theory (e.g., Esogbue and Ahipo 1982; Fiorino 1990; Laird 1993; Poisner 1996); others, through summarizing the opinions of authors or researchers and their findings (e.g., Arnstein 1969; Rosener 1975; Wiedemann and Femers 1993; Rowe and Frewer 2000); and still more, through conducting surveys or interviews of those involved in participation exercises to ascertain their views on what constitutes an effective exercise (e.g., Moore 1996; Shindler and Neburka 1997; Carnes, Schweitzer, Peelle, Wolfe, and Munro 1998; Lauber and Knuth 1999; Tuler and Webler 1999). Frequently, researchers and authors may simply discuss or forward some key aspect of effective participation, rather than a complete definition, or else may present some rule of thumb or checklist for effective participation in which the definition is implicit (e.g., Milbraith 1981; Desvousges and Smith 1988). Though the production of such definitions is important, the validity and utility of the various existing definitions (both comprehensive and partial) is uncertain as their relative merits have generally gone unchallenged (though Sewell and Phillips, 1979, critically compared four different models for evaluation that comprised different definitions). Sometimes, such definitions are stated and left for others to consider and apply; on other occasions, they are used in the same article as the basis for evaluating participation exercises or mechanisms. Often, evaluation is informal, based on the researchers’ subjective and theoretical assessment of a particular type of participation mechanism or comparison of several different mechanisms (e.g., Heberlein 1976; Checkoway 1981; Fiorino 1990; Rowe and Frewer 2000). Of greater immediate use, however, are cases where definitions are used in actual empirical evaluations, since this requires the operationalization of the definition, which is the second step in our research agenda (as discussed in the next section). By operationalization, we mean the development of instruments or processes that enable the measurement of successful attainment of the effectiveness criteria. Two previous articles have attempted to summarize evaluation studies: Lynn and Busenberg (1995) analyzed fourteen evaluations of citizen advisory committees, and Chess and Purcell (1999) analyzed evaluations of a number of public meetings, workshops, and citizen advisory committees. Furthermore, Sewell and Phillips (1979) summarized twenty-two “case study” evaluations that were presented at a Canadian conference, though

522 Science, Technology, & Human Values

they did not give details of the authors (hence it is not possible to find further information on them), and they did note that specific evaluation criteria were seldom identified in these cases. In covering evaluations of all participation types, the analysis in this article is more comprehensive than that of either Lynn and Busenberg (1995) or Chess and Purcell (1999). This article also differs from the previous ones in terms of the way in which the evaluations are detailed and analyzed (focusing on the nature of the definitions of effectiveness and the ways in which these have been measured) and the way in which studies were selected for inclusion (and exclusion) in our analysis (which results in some of the studies in the previous analyses not being included here). Evaluation articles were identified by searching the Web of Science reference databases (including both science and social-science databases) using search terms citizen or public in combination with participation or involvement along with the term evaluation or assessment. These databases cover only the major journals dating from 1981. Therefore, references cited in the articles identified by the database search were also considered, and copies of articles that apparently addressed the evaluation issue were attained. Similarly, the references in these articles were checked for further relevant articles, and so on. Results of the searches are summarized in Table 1, which provides a description of empirical studies that have attempted some form of evaluation of public-participation exercises. The intent was to focus on studies in which a definition of effectiveness is made explicit and stated a priori, in order to distinguish evaluations from exploratory or descriptive case studies in which criteria were either unstated or generated after the exercise had been completed. We emphasize the term exploratory, since case studies can also be used to test theories, conduct evaluations, and so on (Yin 1994). It is as important in the conduct of case studies for these purposes, however, as in other methods, that the key concepts being studied are defined and operationalized (see, e.g., Yin 1994, p. 32). It is particularly important that evaluations state effectiveness criteria a priori, not only from a research perspective but also from a practical perspective to prevent dispute with those who disagree with the evaluation result and subsequently take issue with the nonagreed criteria. A small number of interesting exploratory case studies are, however, included in the table for comparison. In spite of this structured approach, it would be surprising if all relevant articles have been found. Other omissions, however, are deliberate. In compiling Table 1, there were four types of articles that were intentionally omitted. These include the following: (text continues on p. 539)

523

Process and Outcome Universal

Process and Outcome Local (transport planning)

Reference

Rowe, Marsh, and Frewer (2004)

Bickerstaff and Walker (2001)

Evaluation Criteria

Measurement Instruments

Examples Evaluated

The interviews were intended to elicit other factors not considered in the evaluation framework. The authors claim that the instruments were validated previously.

Comments

(continued)

Process criteria were Survey of English Survey of 71 percent Article is a thorough surInclusivity (4 meaHighway Authorities: of English Highway vey of participation in sures); Transparency open and closed quesAuthorities. Content local transport planning (4); Interaction (2); tions (rating scales, attianalysis of 66 percent in the United Kingdom. Continuity (3). Outtude statements). Content (58) of Authorities Evaluation, per se, was comes were evidence analysis of local transport (transport issues). not the goal, although that participation had plan documents, with rethe process of the impacted on the shape searchers rating several study essentially allows of plan and on specific measures associated with some evaluation (of areas of plan. each criterion. authority plans, rather than specific participation exercises or initiatives per se).

Nine criteria: Task Defi- Participant questionnaires One “deliberative connition, Representative- (one question per evalua- ference” (addressing ness, Early Involvetion criterion); telephone sponsor’s policy for ment, Independence, interviews with particiassessing radiation Cost-Effectiveness, In- pants; case survey-like doses in food). fluence, Transparency, instrument (multiple Structured Decision measures per the nine Making, Resource Ac- criteria). cessibility.

Definition Type (Process/Outcome) (Universal/ Local/Specific)

Table 1. An Analysis of Public-Participation Evaluation Studies

524

Einsiedel, Jelsoe, and Breck (2001)

Carr and Halvorsen (2001)

Reference

Evaluation Criteria

Measurement Instruments

Examples Evaluated

Comments

None stated a priori

Substantive impact on No instruments described: Three consensus con- Not a structured evaluapublic debate and the comparative analysis ferences (Denmark, tion, more an informal political decisions; “focused on the final reCanada, and Austracomparison of three exProcedural impact (im- ports elaborated by the lia). Topic of each was ercises. pact of the mechalay panels . . . the written food biotechnology. nism, e.g., is it more evaluations of these prowidely adopted); Sojects, and our own expericial impact. ences.”

Outcome Three criteria: Repre- Representativeness mea- Three different exerTwo of the criteria (not Local (discussion sentativeness; Identifi- sured by comparing atcises: a mail survey; representativeness) of land manage- cation of common tendance to demographic conversations with were measured differment/forestry, good; Incorporation of data. Other criteria ascommunity groups; ently according to the but criteria could values/beliefs into dis- sessed by responses to and community dinexercise being evalusurvey questions and be Universal) cussion. ners (forestry manage- ated, so validity of analysis of exercise tran- ment). these comparisons unscripts. certain. These criteria were only loosely defined/measured.

Definition Type (Process/Outcome) (Universal/ Local/Specific)

Table 1 (continued)

525

Process Universal

Outcome (Process variables measured and related to Outcome goals) Universal (implied)

None stated a priori

Halvorsen (2001)

Beierle and Konisky (2000)

Barnes (1999)

Scale-based survey instru- Two techniques: The study also reported ment (5 point) completed focused conversations demographic characby participants. Multiple (with existing commu- teristics of participants, questions per criterion (7 nity groups) and com- noting the importance for satisfaction; 6 for com- munity dinners. of “representativeness,” fort and convenience while treating comfort combined; 5 for and convenience as deliberation). one criterion. Reliability of scale tested using Cronbach’s alpha.

A posteriori consideration of findings related to criteria such as “fairness” and “competence.”

(continued)

Observation of the exerTwo citizens juries The “evaluation” largely cises according to a (which addressed descriptive of the exerstructured schedule. Tele- health service cises’ structures and phone interviews with changes and needs of processes. Some criteparticipants. Some ques- an ageing population). ria for effectiveness detionnaires (questions on, rived after the evaluae.g., reasons for taking tion and from compart). ments of participants.

Three “social goals”: Case survey method Twenty-nine exercises, Identified a number of incorporating public used: evaluator scores mainly of citizen advi- “context” and “process” views into decision performance on standard sory groups (cases attributes, which were making; resolving con- questions (high, medium, chosen only when assessed and then corflict among competing low), based on published data quality was rated related with rating of interests; restoring case studies and other in- moderate or high, not performance of the extrust in public agenformation. low). Topic addressed ercises on the three socies. by these was environ- cial goals to determine mental planning. what makes for successful participation.

Four: Comfort; Convenience; Satisfaction; Deliberation.

526

Comments

None stated, though Questionnaire to members Thirteen consensus Questions detailed in the implication that “imof Danish Parliament conferences (held in article, plus full details pact” (a sign of effec(seven fixed response Denmark between of response rates. Imtiveness) would be and two open questions) 1987 and 1985) (tech- pact largely assessed manifest in percepand interview of five nology assessment). through “awareness” of tions of conferences members. Questionnaire conferences. by MPs and public. to two samples of public, before and after one conference (ten fixed, two open).

None stated a priori

Examples Evaluated

Joss (1998)

Measurement Instruments

Outcome Several implied effec- Semistructured telephone One consensus confer- No clear measures were Local (consensus tiveness criteria: Acinterviews (with most par- ence (USA) (telecom- stated, so establishing tual impact; Impact on ticipants), plus minor use munication issues). effectiveness was difficonferences) general thinking; Imcult (evidence was of internet survey. pact on training (learnscattered and qualitaing) of knowledgeable tive). Some evidence of personnel; Interaction “lay learning,” but not with lay knowledge other impacts. (impact on lay learning).

Evaluation Criteria

Guston (1999)

Reference

Definition Type (Process/Outcome) (Universal/ Local/Specific)

Table 1 (continued)

527

(continued)

Outcome Two criteria: decrease Calculated number of Sixty-seven examples In comparing the effecLocal (negotiated time to develop regula- days for completion of of negotiated ruletiveness of the negotirulemaking) tions; reduce or elimi- rules (for negotiated making, though ated and traditional nate subsequent judi- rulemaking and tradition- focused on analysis of rulemaking procedures, cial challenges. ally derived rules). Coltwelve cases comconsidered potential lected data on litigation of pleted for the U.S. selection bias, and negotiated and traditional EPA. Compared reother factors that might rules. sults to various samimpact on effectiveness ples of traditionally (in terms of the chosen completed rules. two criteria).

Coglianese (1997)

None stated explicitly, Face-to-face and teleSix citizens juries (vari- Emphasis on a broad though variety of ques- phone interviews; obser- ous health issues). evaluation of the citizen tions with evaluative vation; semistructured jury mechanism. Evalutheme (e.g., impact on questionnaires for particiation mostly descriphealth authority decipants. Lack of details on tive. Useful insights, but sion making) generquestions asked, number no detail on definitions ated in report. of respondents, and or measurement instruso on. ments that would allow comparisons with other exercises.

None stated a priori

McIver (1998)

528

Comments

Not explicitly stated, Five participant question- One consensus confer- A thorough, timebut aspects considnaires (given at different ence (UK) (plant bioconsuming evaluation, ered include: efficiency times in the process), and technology). using many information (aspects to do with one questionnaire to exsources, though no whether exercise was perts, steering commitclearly stated a priori “run well”), effectivetee, and project manager; definition of effectiveness (outcomes, such observation; interviews; ness. In addition, the as impact on public group discussions with evaluation considered debate, influence on key parties (preexercise questions such as how policy making), perand postexercise). the conference comceived success. pared to other models of participation.

None stated a priori

Examples Evaluated

Joss (1995)

Measurement Instruments

Outcome and Five criteria: A structured interview Three “community Details of instruments Process representativeness; survey of a sample of advisory fora” (“like cit- not given. A sixth criteUnclear if Local or effectiveness of participants, plus postal izens’ panels”), which rion, cost-effectiveness, Universal method process; com- questionnaire of relevant met on a number of was mentioned but not patibility with particiothers. Also “continuous occasions. Issue was assessed. pants’ objectives; monitoring” of meetings, waste management. knowledge achieved; papers produced, and impact on decision so on. process.

Evaluation Criteria

Petts (1995)

Reference

Definition Type (Process/Outcome) (Universal/ Local/Specific)

Table 1 (continued)

529

Mayer, de Vries, and Geurts (1995)

None stated a priori

(continued)

None stated, though Variety of methods men- One consensus confer- The article reports only a analysis of questiontioned (observation, minor part of the evaluence (the Nethernaire responses inquestionnaires, media lands) (human genet- ation. Searches of the cluded representative- analysis, content analyics research). literature do not reveal ness of the panel, sis), though article disany other articles on whether participants’ cusses only a “quasithis evaluation. Design values/opinions experiment”, using a of quasi-experiment changed and whether questionnaire filled out by provides interesting they learned anything the lay panel and control model for using this (implying these are groups during the promethod to conduct significant features of cess. evaluations, but abeffective exercises). sence of clear criteria for effectiveness limits this evaluation.

530

Renn, Webler, and Wiedemann (1995)

Reference Process Universal

Evaluation Criteria

Measurement Instruments

Examples Evaluated

Comments

Fairness and Compe- The various criteria were Eight mechanisms: citi- The evaluation frametence are described as assessed by answering zen advisory commit- work is described in a two “metacriteria,” with standard questions (indi- tees, planning cells, chapter by Webler. seven “subcriteria” de- cators). Exercises (mech- citizens juries, media- Most of the rest of the tailed. anisms) were scored by tion, compensation, chapters in this edited the evaluator as plus, mi- citizen initiatives, study book involve other aunus, or neutral, in regroups, and negotithors using Webler’s sponse to each. Evaluaated rulemaking. framework (often only tion by practitioners, alMechanisms used loosely) to evaluate luding to real examples. to address various specific types of particiThe three editors also topics. pation mechanism geevaluated the mechanerically. Certain aunisms. thors introduce other criteria into their (largely hypothetical) evaluations.

Definition Type (Process/Outcome) (Universal/ Local/Specific)

Table 1 (continued)

531

Five criteria: Obtain in- A questionnaire sent to Thirteen resourceThe initiatives were evalput early in planning; forest planners. Openplanning (forestry) ini- uated, rather than each Involve public through- ended interviews with tiatives—involving vari- specific exercise. Each out planning process; staff members (sixty toous types of participa- particular initiative Obtain representative tal). Planning documents tion (in each case, and could involve several input; Use personal analyzed for evidence re- within each initiative). different exercises and interactive methgarding fulfillment of the However, only six of (public meeting, workods; Use input in decriteria. these were analyzed shop, etc.). Generally, velopment and evaluausing all the meameasures only loosely tion of alternatives. sures. defined, so no clear-cut assessments of success or failure of the initiatives on the different criteria (general commentary instead).

Blahna and Process and Yonts-Shepard Outcome (1989) Local (resource planning— implied)

(continued)

None stated explicitly, Representativeness asOne citizen panel com- Some comparison of imbut representativesessed by similarity of prising 147 individuals pact of panel versus ness a concern. panel-public demograph- (topic: transportation public hearings on the Participation rate/cost- ics and opinions; costplanning). This panel’s issue, but not formally effectiveness noted, effectiveness by cost per opinions were gained addressed in analysis. and impact on policy hour of participant time; through mail surveys Many interesting reformation. Public and impact on policy formaand interviews (by sults, but lack of a priori policy maker opinions tion by congruence of phone and in-home)— framework means that implied to be imporpanel views with policies so unlike most “citizen evaluation somewhat tant, but not stated in adopted by sponsors. Citi- panels” (mainly surunstructured, and repliwhat way prior to zens’ views obtained by veyed by mail). cation of evaluation difanalysis. questionnaire, and policy ficult. makers’ by interview.

Outcome Uncertain if Universal or Local (citizen panel)

Kathlene and Martin (1991)

532

Comments

Outcome—changes in Interviews and document facility design and pro- analysis. cedures, and incorporation of advice into local ordinances.

One citizen advisory committee, one task force (hazardous waste sites).

Details of data collection limited (e.g., number of interviews).

None stated a priori

Examples Evaluated

Lynn (1987)

Measurement Instruments

Outcome One criterion—outQuestionnaire: 1-5 rating Nine citizen advisory The study was an Local (citizen ad- come—measured in of satisfaction (board boards (various isattempt to determine visory boards), three ways (participant member); 1-5 rating of im- sues, within one U.S. the extent to which efbut may be perceptions, sponsor plementation of board city). fectiveness was a conUniversally perceptions, actual ideas (by sponsors). Acsequence of board inrelevant outcomes). tual outcomes judged on dependence from ad3-point scale based on inministrators (assessed terviews and documenby variety of items on a tary evidence. questionnaire). Correlation analysis conducted.

Evaluation Criteria

Houghton (1988)

Reference

Definition Type (Process/Outcome) (Universal/ Local/Specific)

Table 1 (continued)

533

Process and Outcome Universal

None stated a priori

Crosby, Kelly, and Schaefer (1986)

Plumlee, Starling, and Kramer (1985)

(continued)

Conclusions that the planning programs had failed to a degree in both cases. Generally, this may be seen as an exploratory study identifying evaluation criteria. Details on interviews are not comprehensive.

A number of features of Five Citizen Panels No clear performance inan exercise that would (essentially, citizen ju- dicators or instruments enable the exercise to ries) (water quality). detailed—evaluation succeed against the evalseemed mostly subjecuation criteria were identitive. The authors develfied, then assessed by oped the mechanism evaluators. evaluated, so possible vested interest in the subjective evaluation.

None stated, though in- Interviews of participants, Two citizen advisory terviews revealed four organizers and sponsors committees (water“factors” related to ex- (thirty-two total). quality planning). ercise failure: conflicting expectations (participants and sponsors); technical nature of issue; interjurisdictional strife; delays in accomplishments.

Six criteria: Representativeness; Effective decision making; Process fairness; Cost-effective; Process flexibility; High likelihood that recommendations followed.

534 Outcome Universal (implied)

Outcome Local (“public meetings”)

Berry, Portney, Bablitch, and Mahoney (1984)

Gundry and Heberlein (1984)

Reference

Evaluation Criteria

Measurement Instruments

Examples Evaluated

Comments

Three criteria: Partici- Questionnaires sent to “Three public meetCriteria phrased as hypants, and their opinparticipants plus ranings”: actually, one potheses, with the defiions, and the variance domly selected individupublic meeting, one nition for effectiveness in opinions, should be als from the sample pop- set of fifty public meet- being implicit (if effecrepresentative. ulation. Checked for ings, and two worktive, then the three hysimilarity/difference in shops (issues: road potheses stated would demographics/opinions salt, deer hunting, be rejected). of the two groups on the and resource issue. management).

Three criteria: Subjec- Rated by authors: the Forty-five exercises of A meta-analysis of past tive assessment of composite judgment various types (e.g., evaluations, in which previous evaluator; (prior subjective evaluator public hearings), cov- the authors distill their Representativeness of assessment) and repreering various topics own definition of effecparticipants; Respon- sentativeness on 3-point (e.g., transport, hous- tiveness from explicit siveness of agency to scales, and responsiveing, the environment). and implicit criteria policy demands of par- ness on a 4-point scale. used in those evaluaticipants. tions. Most of these evaluations were done by the agencies administering the programs (i.e., not published).

Definition Type (Process/Outcome) (Universal/ Local/Specific)

Table 1 (continued)

535

(continued)

A mailed questionnaire, Questionnaires sent to The definition of success sent to chief executive every U.S. city over is largely implicit, and officers (of General Reve- 50,000, every year discussed in terms of nue Sharing [GRS] since inception of various impacts (outprograms). Three “meaGRS program (approx. comes). Longitudinal sures”: expenditure out50 percent response). and cross-sectional comes, public interest/ Over 2,000 public analysis done of the involvement in local hearings noted (varihearings in each city. affairs, and net fiscal ous topics). effect.

Stated goal: “to A case study: observation One citizens’ task force Task force met over two improve the responof task force meetings, in- (air-quality planning). years, involving sevensiveness of the planterview of sponsors and teen representatives of ning process to citiparticipants, review of key stakeholder groups zens’ values,” implying public documents and (not public). effective participation consultant report. in planning involves this.

Cole and Caputo Outcome Essentially one crite(1983) Uncertain if rion—“impact.” Universal or Local (focus on public hearings)

Stewart, Dennis, None stated a and Ely (1984) priori

536 Process and Outcome Universal

Reference

MacNair, Caldwell, and Pollane (1983)

Evaluation Criteria

Measurement Instruments

Examples Evaluated

Comments

Effectiveness equated The six measures were 394 “citizen particiEffectiveness assumed to “level of community rated from 0 to 3, and pation units” advising to relate to level of partnership.” Six crite- then combined into a sin- on a wide range of community partnership ria: frequency of meet- gle score representing topics (e.g., planning). (if high, then effective). ings; allocated recommunity partnership. The “units” are not de- Evaluation not the main sources; access to The authors made the scribed and liable to focus of the article, higher authority; inratings on the basis of in- represent a variety of rather, an attempt to volvement in decision- formation provided by mechanisms. link the power of public making process; inagency staff in response agencies (rated by tended role of citizens; to a mailed questionnaire. thirty-six “experts” on selection of independ10-point scales) with ent membership. level of community partnership. Found that when “power” low, effectiveness (community partnership) high.

Definition Type (Process/Outcome) (Universal/ Local/Specific)

Table 1 (continued)

537

Outcome Essentially one Used Bales Interaction Nine local citizen advi- The study not only asUncertain if criterion (implicit)— Analysis—a process sory committees (vari- sessed the effectiveUniversal or “power over internal used by trained observers ous contexts, e.g., ness of the nine exerLocal (focus on decision making” to “classify interactions.” planning). cises according to the citizen advisory (within an advisory Here, checked percentgiven criterion but meacommittees) committee or particiage of suggestions/opinsured a number of “expation exercise). Imions that were member planatory variables,” via plied that high internal initiated (vs. staff initiquestionnaires, interpower equal high efated). views, observation, and fectiveness. analysis of written information.

Hannah and Lewis (1982)

(continued)

Outcome Essentially two criteria A questionnaire. Ratings Thirteen “small group To establish consensus, Uncertain if Uni(implicit), though auof 1-5 of importance of workshops” (forestry views of workshop parversal or Local thors discuss as one— thirteen statements, then planning). Data colticipants were com(focus on work- “consensus,” and selection of one of six lected from others as- pared with those of shops) whether participation sociated with the exer- nonparticipants, and statements to describe process in general extent of openness to in- cise (from other questions asked these was perceived as befluence of the process. participative mechato also rate where they ing open to public “innisms) for comparison thought the Forestry fluence” (described by purposes, and from Service stood on the authors as “openForestry Service perissues. Little evidence ness”). sonnel. that workshop improved “consensus,” while most of participants thought influence was low.

Twight and Carroll (1983)

538 Outcome Local (public hearings)

Process and Outcome Universal

Rosener (1982)

Godschalk and Stiftel (1981)

Reference

Evaluation Criteria

Examples Evaluated

Comments

Change in voting 1,816 public hearings Considered effectiveness outcomes (permits held by the California as a consequence of issued by sponsors) inde- Coastal Commission staff recommendations pendent of staff recom(context of protection and the presence of citmendations (measured of coastal resources). izen opposition. In most by percentage of permit hearings, the citizens denials associated with did not participate participant opposition). (e.g., oppose permit), but when they did, there was evidence of influence.

Measurement Instruments

Seven criteria: Field observation; interOne planning initiative 29 percent of all particiAccessibility; Involveviews with planning staff; (water pollution conpants (over 1,600) sent ment; Public Awaremail surveys of public trol) involving sevenquestionnaires, with a ness; Staff Awareness; participants; reports of teen participation 66 percent return. Most Effect on Staff and staff time devoted to par- methods (e.g., workof article focuses on rePlan; Effect on Publics ticipation. The survey inshops, public hearsults from the survey. and Plan Support; volved 5-point scales, ings, advisory commit- Evaluation criteria deCost. open questions. tees). Some compari- veloped from a proson of effectiveness of cess/impacts model the different mechagrounded in exchange nisms. theory.

Essentially one criterion: influence.

Definition Type (Process/Outcome) (Universal/ Local/Specific)

Table 1 (continued)

Rowe, Frewer / Evaluating Public Participation 539

1. Technical reports. In opposition to Chess and Purcell (1999), who included a number of such publications in their analysis, we do not believe that these are as valid as journal articles (or, to a lesser extent, chapters in edited books), as they may not have been through a peer review process. Practically, they may also be difficult to obtain by researchers, and their absence from any comprehensive database means that any selection would be, to a greater or lesser extent, arbitrary and dependent on previous citations in the literature. 2. Articles lacking substantial details of evaluations, so that the empirical basis of conclusions is unclear (e.g., Gariepy 1991). The dividing line between an empirical-based and an opinion-based evaluation is not always obvious. For example, “observation” can be the basis of both types of evaluation. When there is evidence that observation has followed some structured data collection process relating to specific participation examples (e.g., following an observation schedule, perhaps with checks for interjudge reliability) then the evaluation is considered empirical, but when there is no such evidence of structure, then it is considered “opinion” and excluded (e.g., articles by Elder 1982; Lenaghan, New, and Mitchell 1996). The excluded cases include those evaluations in which a structured framework (e.g., set of evaluation criteria) is posited then hypothetical general examples are considered (e.g., Heberlein 1976; Susskind and Ozawa 19831; Rowe and Frewer 2000). 3. Non–English-language articles. These were omitted for practical reasons, although it is also true that English is the predominant language for academic research and most significant research is liable to be found in Englishlanguage journals. Nevertheless, a number of important omissions may derive from this policy: for example, Joss (1995) suggested “only a few evaluation studies in connection with consensus conferences have been published, none of them in English” (p. 89). (It should be noted, however, that the evaluations of Danish and Dutch consensus conferences described by Joss appear largely descriptive in nature rather than evaluative in the sense taken in this article.) 4. Evaluations of wider initiatives involving some public participation component rather than a specific mechanism or exercise per se (e.g., Ouellet, Durand, and Forget 1994; Moore 1996; Lauber and Knuth 1999), or of some specific tool (e.g., a decision-support approach) that might itself form part of a wider participation mechanism (e.g., Horlick-Jones et al. 2001). In such cases, the contribution of the participation component to the success of the initiative, or of the tool for a participation mechanism, may be unclear, and the criteria for success of initiative or tool and participation exercise or mechanism may differ.

In Table 1, the identified evaluation articles are described in a number of ways. In the first column, the references are shown (listed chronologically); in the second, the form of definition of effectiveness is detailed; in the third, the evaluation criteria used are given; in the fourth, the nature of the instruments or methods used to establish effectiveness are described; in the fifth, the mechanisms or exercises that were evaluated are shown; and in the sixth column are general comments on the nature of the evaluation. This section focuses on the findings concerning the definitions of effectiveness (i.e., information in columns 2 and 3).

540 Science, Technology, & Human Values

The second column in Table 1 describes the type of definitions used in the empirical evaluations. The definitions are described in terms of two components: whether the definition considers outcome or process criteria (as described earlier) and whether the definition of effectiveness is intended to be applicable to all public-participation exercises (universal), a subgroup of exercises or mechanisms (local), or only the specific exercise being evaluated (specific). In a number of cases, as noted, what is meant by effectiveness is not always clearly stated beforehand but may be implicit in the description of the components being considered in the evaluation. It should also be noted that definitions are not described according to the issue of “effective to whom,” because in almost all cases, the evaluation criteria used in the definition are implied to be normative and objective, that is, to either take into account all perspectives, or at least, not derive from any one party’s point of view (participants, public, sponsor, organizer) save, perhaps, the evaluators themselves. This is not to say, however, that the way in which effectiveness is then measured does not include some bias (e.g., by being based solely on the opinions of specific participants or on the insights of the organizers). The specific evaluation criteria are detailed in column 3. Considering Table 1, it is notable that approximately half of the studies have stipulated outcome criteria and that half have stipulated both outcome and process criteria (with usually more of the former than the latter), and there are only two (Renn, Webler, and Wiedemann 1995; Halvorsen 20012) that defined effectiveness solely according to process criteria. Of the outcome criteria stipulated, a number of common themes reoccur. The criterion of “representativeness” (i.e., that participants are representative of the wider affected population), for example, has been stipulated in one form or another in many of the evaluations. Otherwise, outcome criteria have generally related to the exercise having some impact on the sponsors, such as on their decision making or attitudes, or else on the knowledge of the public. The process criteria tend to consider how components of the exercise lead to effective and fair involvement of participants, in terms of enabling appropriate and efficient two-way communication. In this respect, most of the process criteria have been related to group-interaction processes (and this is certainly the case with Renn, Webler, and Wiedemann 1995). The justifications for the different definitions and criteria are not the focus of the present article, though a critical analysis of these would be a useful addition to the literature. Generally, there is also a roughly equal split of studies in which the definition of effectiveness appears to apply to public participation as a whole (universal definition) or to some reduced subset of participation exercises or contexts (local definition). In the case of local definitions, most claim to comment on the effectiveness of a particular class of mechanisms (e.g.,

Rowe, Frewer / Evaluating Public Participation 541

consensus conferences, public meetings, citizen advisory boards), but there are a number that address a particular context (transport planning, land management). In most of these cases, a reading of the particular studies gives the impression that the limits of generalizability of results placed by the authors stem from a wish not to be too ambitious in interpreting results, rather than any belief that the effectiveness criteria are applicable only to the mechanism or context in question. Indeed, few of the studies explicitly stated the generalizability of their evaluation criteria, and a certain amount of interpretation was necessary in categorizing the definition types. Such a lack of clarity is problematic and should be remedied in future evaluation studies. In summary, in spite of the complexity involved, production of a definition of effectiveness is a necessary first step in the evaluation of participation exercises. Researchers should not be too concerned in the first instance as to whether the definition is “right,” or unambiguous, or uncontroversial. It will not be. But the definition forms the basis for future research that may supply empirical evidence that will enable consideration of the sense of the definition itself. For example, if a measure was developed on the basis of a particular definition of effectiveness that subsequently led to the conclusion that certain exercises were ineffective, in the face of other evidence that they were somehow effective, this would suggest that either the definition, measure, or both were somehow inadequate and required changing. As such, it might be hoped that the definition(s) will evolve over time and that our insights into what it means for a participation exercise to be effective will increase. A comparative analysis of the merits of the different evaluation criteria used (per Sewell and Phillips 1979) would be useful, though this is beyond the scope of the present article.

Step 2: Operationalize the Definition To be truly useful in a research sense, it is necessary that one’s definition of effectiveness be operationalized. By this, we mean it is necessary to develop one or more processes or instruments to measure whether, and to what extent, a particular public-participation exercise has successfully attained the required, defined state. The definition may be operationalized in a number of ways, via processes such as participant interviews and evaluator observation, or via specific instruments such as participant questionnaires, and may be qualitative or quantitative in nature. The essence of a suitable procedure or instrument is that it is detailed and structured (to allow it to be reused) and that it should be tested for its appropriateness and accuracy. Certain criteria from the development of psychometric instruments (instruments for measuring psychological concepts or properties) and, indeed, from the

542 Science, Technology, & Human Values

evaluation of social programs, may be apt here, namely, those of reliability and validity, but other criteria, such as usability, may also be of practical importance in determining process or instrument quality. It should be emphasized that reliability and validity are concepts of relevance to all social-science methods (Yin 1994), that is, to qualitative methods (such as case studies) as well as quantitative methods, although they may be easier to document and ascertain for the latter. Establishing the quality of measures: validity, reliability, and usability. Validity essentially concerns the extent to which an instrument or process effectively and appropriately measures the target concept (here, “publicparticipation effectiveness”—howsoever this is defined). There are a variety of different ways in which an instrument (such as a questionnaire) or process (such as an interview or case study) can be said to be valid, and these come with associated labels, such as predictive, concurrent, construct, content, external, and face validity. These concepts are explained in detail in most standard social-science textbooks (Elmes, Kantowitz, and Roediger 1985; Kidder and Judd 1986; Oppenheim 1992). One way of assessing validity is to compare the results from the new instrument (questionnaire, case study, etc.) with those attained using other, already established instruments that are intended to measure a similar concept (known as concurrent validity), and another is by assessing whether the derived results allow sensible prediction of some future outcomes that may be subsequently tested (predictive validity). For example, consider a measure of whether participants perceive a participation exercise to be fair. One might establish its concurrent validity by comparing its results with those from some existing measure, such as a participant-satisfaction questionnaire, as one might argue that participants who felt the process unfair would not be satisfied. One might establish predictive validity by assessing the correspondence of results with some future outcome, like propensity to write negative letters to the press, as participants who rated the exercise unfair should have a greater propensity to write such letters; if the reverse were true, one could question the validity of the fairness measure. The difficulty in establishing concurrent and predictive validity (and other types that we will not detail here, such as construct validity) is that these rely on the presence of existing measures of participation-exercise effectiveness, as well as on existing theories about when effectiveness might be achieved, how, under what circumstances, and so on. Certainly, there are copious suggestions in the literature about participation-exercise effectiveness, but few, if any, formalized and detailed theories. As such, it would appear that the current research environment might militate against the use of stringent

Rowe, Frewer / Evaluating Public Participation 543

validation concepts in many cases. Face validity is the least stringent type of validity and simply concerns whether results from using the measure seem sensible. There is no formal mechanism for determining whether an instrument has face validity—such a conclusion relies on subjective assessment alone. Similarly, though direct objective measures of a concept are more likely to be valid than judgmental ones (e.g., from participants), in many circumstances, judgmental assessments are the only evaluation options available (Rossi, Freeman, and Lipsey 1999) and are better than nothing. At the very least, researchers should discuss reasons for believing that their measures of effectiveness are valid (and ideally, evidence for the quality of instrument should form the basis of a section in any study write-up). The reliability of a measure refers to its ability to yield consistent (opposed to inconsistent) results. It is an important concept because reliability is a necessary but not necessarily sufficient condition for validity to be achieved. That is, if a test is not reliable, it cannot be valid, but even if it is reliable, it may not be valid (it might, for example, be reliably measuring a different concept to that intended). Reliability is often taken to entail two separate aspects: external and internal reliability (Bryman and Cramer 1997). External reliability refers to the degree of consistency over time (Bryman and Cramer 1997). It is important for a test to yield similar results when administered on separate occasions (e.g., a scale that gave different weights of an object on different occasions would be of little use). When a test is administered on two or more separate occasions, then test-retest reliability is being examined. As an example, consider a participant completing a questionnaire about the fairness of a public meeting or other participation exercise: should the questionnaire be given a second time, the participant should give roughly similar responses (if they do not, this could indicate a problem with the questionnaire—such as the question wordings being too vague or difficult to understand). The concept is of relevance to qualitative methods too. Consider an interview process: if participants give a different indication of fairness of the exercise when interviewed a second time, one would have to query the reliability of the process (the questions asked, the way the interview was conducted, etc.). Of course, there is the difficulty that events occurring between the two tests might lead to discrepancies of scores (e.g., the respondents might have learned something during that time or changed their opinion). Internal reliability is particularly important with regard to multiple-item scales in questionnaires (Bryman and Cramer 1997), in which a number of items are purported to measure the same concept. A number of procedures for estimating internal reliability or consistency exist. Split-half reliability involves dividing the items into two groups and then measuring the

544 Science, Technology, & Human Values

relationship between the scores on the items in the two halves (i.e., calculating correlation coefficients). Another statistical method involves calculating the average of all possible split-half reliability coefficients (known as Cronbach’s alpha). Interrater and intrarater reliability are similar pertinent concepts, though perhaps more relevant to assessing qualitative as opposed to quantitative processes. These refer to the consistency in rating some aspect of an exercise either by two or more raters (inter), or by a single rater (intra), where the “rater” would normally be the evaluator. If two raters gave different scores or evaluations of some exercise or component of it (e.g., the fairness of a participation exercise), or if a single rater gave different scores on separate occasions, then rater reliability would be suspect. The source of unreliability may be the lack of coherent or accurate specification of an observation or interview schedule. For example, if a citizens jury was being rated on the criterion of “quality of process” by two evaluators, one might rate it high because it generated many ideas, and another might rate it low because the jury process was slow with considerable contention. Respecifying the observation schedule to include a more precise definition of success on the named criterion (e.g., “a high quality process is one that generates a high number of ideas”) would be liable to result in similar ratings by the evaluators, hence increasing the reliability of the overall process. Establishing the reliability of processes and instruments for evaluating participation exercises appears to be less of a research problem than establishing validity. That is, the current existence of theories or instruments in the public participation domain is not required. However, one difficulty worth noting is the fact that the statistical tests used to confirm reliability (as well as validity) in quantitative methods—tests such as factor analysis, correlation, and Cronbach’s alpha—require the existence of large amounts of data that might not be readily available, especially when a participation exercise uses few participants (e.g., consensus conference, focus groups). Restrictions in the collection of adequate data may also arise from the unwillingness of sponsors or organizers to allow the distribution of instruments on more than one occasion. Hence, full cooperation of the sponsors or organizers needs to be attained if possible. Considering the reliability and the validity of a process or instrument intended to be a measure of universal participation-exercise effectiveness (as opposed to one intended to measure effectiveness of specific exercise types or specific exercises) brings additional difficulties. Perhaps the main one is the narrow range of exercises that are typically assessed. To develop a universal effectiveness measure and to confirm its reliability and validity, it is necessary to test the process or instrument in all situations in which it might be

Rowe, Frewer / Evaluating Public Participation 545

used, otherwise it might have limits that go undetected. For example, consider a set of scales that was developed and tested on children, and hence, measured up to only one hundred pounds: it might be reliable, and valid, but one would have difficulty in using the instrument to weigh heavier adults. In a real participation example, Halvorsen (2001) set out to develop a questionnaire instrument and assess its reliability or validity. In doing so, she deliberately chose two exercises to assess that were expected to score well on the evaluation criteria of “comfort” and “convenience.” Though the resultant work is a great improvement on that of many evaluations, there is still doubt as to whether the scale would be equally reliable, or valid, when applied to exercises that were not comfortable and convenient (e.g., the questionnaire might not be sensitive or accurate enough to identify these or to differentiate them from “good” exercises)—and this should, ideally, be established in the future. Unfortunately, because of the sensitive nature of conducting evaluations of expensive exercises (in which the sponsors and organizers have a heavy financial commitment), it is likely that the exercises typically evaluated in the literature have a bias toward those that are effective rather than those that are ineffective. Indeed, from personal experience, we have found a variety of those representing sponsors and organizers to be suspicious or even hostile to the idea of having their exercise evaluated, with certain sponsors even attempting to sway the outcomes of evaluations. It is perhaps unsurprising that those happiest to have their exercises evaluated are those who are the most confident in their success. A practical solution in terms of future research is to ensure agreement with the project funder for access to previously identified exercises, or the conduct of the exercises, by the researchers themselves. Without testing the instrument’s performance in measuring the effectiveness of a poor exercise, one could not be certain that it would lead to an accurate assessment of such exercises. Rossi, Freeman, and Lipsey (1999) suggest that “in addition to being reliable and . . . valid, a good outcome measure is one that is feasible to employ, given the constraints of time and budget” (p. 252). We refer to this criterion as usability. Usability of a process or instrument may be the easiest criterion to assess in establishing the quality of an evaluation process or instrument since the responses of participation exercise sponsors, and the participants themselves, directly indicate this (i.e., if an instrument is so complex that sponsors refuse to allow evaluators to employ it, then this indicates its failure on the usability criterion). Usability is liable to inversely correspond to the amount of effort required on the part of those involved in the participation process (i.e., participants, organizers, and sponsors). As such, usability may well increase as the reliability and validity of a measure decrease. There is no easy

546 Science, Technology, & Human Values

way to balance the requirements of usability and validity or reliability: researchers need simply to ensure the former and attempt to maximize the latter. Regardless of how this is achieved, we suggest that it is important in all evaluations that specific reference is made to how the reliability and validity of measures have been ascertained, citing all appropriate evidence. Hopefully, following this process, researchers may selectively adopt instruments or processes of demonstrated quality, which will improve comparability of research findings. Processes and instruments in the literature. In Table 1, the processes and instruments used in the published evaluation studies to ascertain or measure effectiveness are described (in column 4). It is notable that very few studies have actually measured effectiveness using an objective approach. When this has occurred, the studies have generally concentrated on establishing the true representativeness of a sample with respect to the wider population (e.g., Kathlene and Martin 1991; Carr and Halvorsen 2001). Other outcomes measured objectively have included whether litigation arose after a process (Coglianese 1997), as well as costs per participant time (Kathlene and Martin 1991). Most of the detailed studies have instead attempted to measure effectiveness by ascertaining the opinions of participants through interviews, questionnaires, or surveys (e.g., Petts 1995; Guston 1999) or by evaluating aspects of the process according to the judgment of the authors or evaluators, via observation, content analysis, media analysis, or other semistructured processes (e.g., Crosby, Kelly, and Schaefer 1986; Sinclair 1977; Rosener 1981; Beierle and Konisky 2000). Several studies have attempted to evaluate exercises using a variety of instruments or processes (e.g., Kathlene and Martin 1991; Joss 1995; Rowe, Marsh, and Frewer 2004), which may help in the triangulation of findings (i.e., if different methods are used, and these give similar findings, then this may increase confidence in those findings and the validity of the measures). What is particularly noteworthy, however, as indicated in column 4 and in column 6 (general comments), is the lack of detail of the processes and instruments employed. In very few cases are questionnaire items, or interview questions, or content analysis details given; exceptions include Beierle and Konisky (2000), Halvorsen (2001), and Rowe, Marsh, and Frewer (2004). This absence of details on the instruments makes study replication difficult, as this relies on contacting the authors and assuming that they still possess details of the processes or instruments involved—assuming these existed in the first place (reading the studies in Table 1 often gives the impression that interviews were informal and unstructured and that content analyses may never have been precisely specified). It may be that the absence of such

Rowe, Frewer / Evaluating Public Participation 547

details in publications derives from a lack of appreciation as to their importance by the authors or from a resistance by editors to publishing extensive methodological details. In any case, a reading of the evaluation papers additionally reveals a remarkable lack of concern for issues such as the reliability and validity of the measures. The few that note such details include Halvorsen (2001), who conducted reliability checks (using Cronbach’s alpha) of the multiple items used to measure her separate evaluation criteria; Lauber and Knuth (1999), who directly addressed the issues of the reliability and validity of the items used to operationalize their evaluation criterion (fairness)3; and Rowe, Marsh, and Frewer (2004), who described their instruments as “validated” (though for details, the article refers readers to an unpublished report). In conclusion, analysis of the published evaluation studies raises concerns about the validity of the measures developed to operationalize evaluation criteria, and it indicates difficulties with respect to their replicability and usability in future research. Advances in science rely on the presence of standard validated procedures and instruments for measurement, which can be taken up and used by all researchers in a particular discipline, allowing comparability of findings. Few, if any, such rulers presently exist in the publicparticipation domain.

Step 3: Conduct the Evaluation and Interpret Results Having defined what is meant by effectiveness and developed processes or instruments to enable measurement of the extent to which such effectiveness is achieved, the third step is to actually conduct the evaluation. Evaluating a particular exercise (or exercises) and drawing a conclusion about its absolute or relative effectiveness is generally sufficient for the purposes of a practical evaluation. However, academically, our interests should be in what the specific results tell us about participation-mechanism effectiveness more generally and should help in the development of the theory of “what works best when.” That is, it is important to understand the exercise, and its effectiveness, in terms of a wider context involving all possible exercises and mechanisms in all conceivable situations. To interpret the results of an evaluation, a variety of issues need to be considered. To what extent is the exercise typical of a general participation mechanism? The number of named participation mechanisms that exist is large and seemingly growing. However, these mechanisms are not generally well defined, and this may cause confusion. For example, is there any meaningful difference between a “deliberative conference,” a “consultation seminar,”

548 Science, Technology, & Human Values

and a “two-day workshop,” or between a “citizen jury” and “planning cell,” or between a “citizen advisory committee” and a “task force”? And to what extent does one’s evaluated exercise match its supposed type? Is it typical of the ideal form, or is it an atypical and unusual exemplar? And if atypical, should less weight be given to the results of the evaluation when generalizing effectiveness to the idealized mechanism? To make progress in establishing a theory of “what works best when,” it is necessary to define participation mechanisms (the what)4, so that we know what each entails and so that we can correctly classify any particular exercise and correctly attribute the attained evidence of effectiveness. Such definitions may be fuzzy, in the sense of describing a range of values (e.g., “number of participants involved”) or activities (e.g., “balanced information is supplied to participants,” by whatever means). The important thing is that the variables that are used within the definitions are significant ones that may (in theory or based on evidence) impact on effectiveness. It is likely, for example, that whether participants are provided with information (e.g., in a consensus conference) or whether they are not (e.g., in a survey) will impact on the effectiveness of a particular exercise in a particular situation, and hence all mechanisms should have, as part of their definitions, details on whether information is supplied. Development of a typology of mechanisms might proceed through identifying the key mechanism variables that might impact on effectiveness, and using these to describe a limited set of general mechanism types. Although a number of attempts have been made to classify mechanism types (e.g., Rosener 1975; Nelkin and Pollak 1979; Wiedemann and Femers 1993; Maloff, Bilan, and Thurston 2000), there is no accepted and widely used typology at present. The need for this is clear from consideration of Table 1: the descriptive labels given to the exercises that have been evaluated are many, making comparability of results difficult. A typology of mechanisms, with clear labels and definitions of specific mechanisms and their higher order classes, would ease the interpretation of results, and the development of such a typology should be a research priority. To what extent is the context of the exercise typical of a general type of context? Participation mechanisms do not take place in a vacuum but within particular contexts. Broadly, by context, we mean the environment in which an exercise takes place, including the political/cultural/economic climate (e.g., political background behind the commissioning of an exercise), as well as the nature of the issue being considered (e.g., level of controversy surrounding it). The effectiveness of a participation exercise will depend to some degree on the appropriate choice of mechanism to match context (or vice versa, to

Rowe, Frewer / Evaluating Public Participation 549

the extent that contextual variables may be controlled). In terms of establishing “what works best when,” defining when equates to defining the context. In the same way that there exist a large number of participation mechanisms, there also exist a large number of contexts, that is, a large number of significant contextual variables that will interact with mechanism type to impact on participation effectiveness. Consider a sponsor that wishes to run an exercise to involve stakeholders in discussing options for siting a waste facility. Which of the multitude of context features are of sufficient significance to potentially influence the success of the different participation mechanisms? The topic (e.g., “waste”)? The physical activity being discussed (e.g., “siting”)? The nature of participants (e.g., “stakeholders”)? The level of public concern (e.g., perhaps “high” here)? The difficulty of defining the context of any one exercise is so great that empirical evaluations rarely consider this issue beyond a superficial identification of the broad topic (this is reflected in the brevity of contextual description given in Table 1). As with participation mechanisms, what is required is a typology of context, identifying the key contextual variables and hence allowing the particular context of a study to be appropriately designated, enabling comparison of results with those from other studies. Although various authors have at times made suggestions about what some of the most important contextual variables affecting participation effectiveness might be (e.g., Mazmanian 1976; Berry et al. 1984; Renn et al. 1993; Aronoff and Gunter 1994; Coglianese 1997), there is again no accepted and widely used typology. Establishing such a typology, based on theory or empirical evidence, should be another research priority. Is effectiveness mainly due to the application of the exercise or to a match/ mismatch of mechanism and context? Even though an exercise of a particular mechanism type might prove effective in a scenario of a particular context type, according to one’s measures, this in no way guarantees that the mechanism-context match is good, or vice versa. Effectiveness may also be a consequence of the quality of application of the exercise. That is, an exercise of a particular mechanism type may be appropriate for a particular context, but poor conduct might lead to a poor outcome. Consider, for example, an exercise using a facilitator to help group discussion: the presence of a facilitator might generally be beneficial in the context in question (good mechanismcontext match), but a poorly trained or incompetent facilitator might undermine the exercise. In this case, it would be wrong to conclude that the mechanism (involving facilitation) was inappropriate in this context and that some contrary mechanism (in which facilitation does not occur) was appropriate. One way to overcome variability in results of evaluations that arise as a consequence of differences in the quality of applications is to conduct a num-

550 Science, Technology, & Human Values

ber of evaluations using the same mechanism type in the same context and then to average the scores on effectiveness measures. This would reduce the impact of extreme performances (both good and bad). Naturally, this requires the evaluations of a large number of exercises (the individual exercise being the unit of analysis), but with the huge number of exercises taking place every day (in the United Kingdom, for example, many, if not most local councils have various ongoing participation activities at this very moment, while retrospective analysis of past, well-documented exercises may also be possible), this should be possible to a degree. Certainly, producing typologies in which the plethora of contexts and exercises are reduced to a smaller number of classes will much reduce the number of comparisons that need to be made. It will also be easier to do this for some mechanisms (e.g., focus groups) than for others (e.g., consensus conferences). For expensive and complex mechanisms, it might be necessary to evaluate simple examples in artificial situations and then generalize to the full-scale mechanism type. Evaluations in the literature. Column 5 in Table 1 describes the types of exercises evaluated. The information describes the exercises using the particular mechanism labels ascribed by the authors and occasionally notes brief details of the contexts in which the exercises took place. The table reveals that an apparently great variety of mechanisms have been evaluated, the exercises having been variously described as consensus conferences, deliberative conferences, citizen advisory committees, citizen advisory boards, focus groups, task forces, community groups, negotiated rulemaking task forces, community advisory forums, citizen initiatives, citizen juries, planning cells, citizen panels, public meetings, workshops, public hearings, and others. Generally, it is true that evaluation articles give fairly full descriptions of the specific nature of exercises (who, how, where, when), but they also give almost no consideration to the general nature of the exercise, that is, to the classification of the exercise mechanism and context. For example, studies may state or imply that the evaluation criteria relate to a particular class of mechanisms (e.g., consensus conferences) without defining the limits of that class of mechanism (inclusive or exclusive) so that we can tell what exercises are sufficiently similar (in the authors’eyes) for the results to have generalizable relevance. (With regards to contexts, this pattern is even worse, with context labels being more vague or even nonexistent.) This is particularly problematic in cases where authors conduct an evaluation using a local definition of effectiveness, for example, in the case of public meetings (e.g., Gundry and Heberlein 1984). The question arises (and is not answered): what is a public meeting? Is it the same as a public hearing? Is it sufficiently different from a community advisory forum that the findings have no implications? And so

Rowe, Frewer / Evaluating Public Participation 551

on. Without typologies of mechanisms and contexts, and an attempt by researchers to adequately define the exercise(s) they are evaluating against these, little progress will be made in establishing a theory of “what works best when”. (It is largely for this reason that Table 1 does not contain a column revealing the results of the evaluation studies.)

Discussion In this article, an attempt has been made to describe an ideal agenda for conducting research into the effectiveness of public-participation exercises. Such an agenda would seem necessary to help guide future research. Indeed, it may be argued that the absence of such a framework has been in part responsible for the generally disorganized and sporadic nature of past research, which has resulted in no great corpus of findings or significant theory. The first aim of research should be to establish a common definition of effectiveness (or at least, one definition for each theoretical perspective on what participation should entail to be effective) and to develop valid and reliable and usable ways to measure this. Once it is possible to establish exercise effectiveness, research may then turn to considering which potential explanatory variables are responsible for the degree of effectiveness discovered. The ultimate aim of research, following the presented agenda, is to identify which participation-mechanism class to use in which particular situation class to increase the chance of effective participation. Development of such a theory of “what works best when” would not only be of academic interest but also of practical importance. Furthermore, we have not started to address how systematic evaluation of the impact of public participation on policy development might be addressed. Such an evaluative process requires new theoretical perspectives and methodological innovations, which cannot be addressed in the current discussion. Undoubtedly, establishing “what works best when” using the suggested agenda will be a major task. It would be naïve to downplay the research difficulties involved. These stem from an absence of precise and coherent definitions of the important concepts (public participation, effectiveness, the different mechanisms, etc.); a lack of adequate instruments and processes for measuring aspects related to the conduct and outcome of participation exercises; a high number of potentially confounding variables and a commensurate lack of ability to exert experimental control during evaluative studies; the tendency for exercises to result in quantitatively poor data that might hinder appropriate analysis; and the need to conduct multiple evaluations of each

552 Science, Technology, & Human Values

mechanism (or rather, mechanism class) in each situation (or situation class) over a range of applications that vary in the quality of applications. It is also important to note that while the proposed agenda is rooted in (and driven by) a quantitative, experimental philosophy, and is largely intended to direct quantitative research, we see no reason why the agenda should not be fundamentally applicable to all research in this area, regardless of the nature of research method involved. That is, there is no reason why structured qualitative methods, such as case studies (see Yin 1994), might not be used to enact evaluation research using this agenda. Furthermore, exploratory and descriptive qualitative research may serve a valuable function in identifying variables of contexts and mechanisms that might impact on effectiveness—information that might inform the construction of definitions and productions of typologies, driving subsequent quantitative research. Nevertheless, without stating an agenda, and acknowledging potential research difficulties, it is unlikely that much progress will be made in understanding how and why public-participation initiatives work. The current article represents a start in this direction.

Notes 1. These authors described three actual case studies of mediated negotiation previously held, and informally assessed them in their article using a number of evaluation criteria. They also “evaluated” other hypothetical mechanisms (e.g., referenda) using the same criteria. 2. The article by Halvorsen (2001) also discussed the representativeness of its sample, without specifying this as an evaluation criterion of interest at the outset. 3. This article evaluated a wider initiative, so is not included in the table. 4. In defining effectiveness, we have already addressed the meaning of best.

References Arnstein, S. R. 1969. A ladder of citizen participation. Journal American Institute of Planners 35:215-24. Aronoff, M., and V. Gunter. 1994. A pound of cure: Facilitating participatory processes in technological hazard disputes. Society and Natural Resources 7:235-52. Barnes, M. 1999. Building a deliberative democracy: An evaluation of two citizens’juries. London: IPPR. Beierle, T. C., and D. M. Konisky. 2000. Values, conflict, and trust in participatory environmental planning. Journal of Policy Analysis and Management 19 (4): 587-602. Berry, J. M., K. E. Portney, M. B. Bablich, and R. Mahoney. 1984. Public involvement in administration: The structural determinants of effective citizen participation. Journal of Voluntary Action Research 13:7-23.

Rowe, Frewer / Evaluating Public Participation 553 Bickerstaff, K., and G. Walker. 2001. Participatory local governance and transport planning. Environment and Planning A 33:431-51. Blahna, D. J., and S. Yonts-Shepard. 1989. Public involvement in resource planning: Toward bridging the gap between policy and implementation. Society and Natural Resources 2:209-27. Bryman, A., and D. Cramer. 1997. Quantitative data analysis. London: Routledge. Carnes, S. A., M. Schweitzer, E. B. Peelle, A. K. Wolfe, and J. F. Munro. 1998. Measuring the success of public participation on environmental restoration and waste management activities in the US Department of Energy. Technology in Society 20 (4): 385-406. Carr, D. S., and K. Halvorsen. 2001. An evaluation of three democratic, community-based approaches to citizen participation: Surveys, conversations with community groups, and community dinners. Society and Natural Resources 14 (2): 107-26. Checkoway, B. 1981. The politics of public hearings. The Journal of Applied Behavioral Science 17 (4): 566-82. Chess, C., and K. Purcell. 1999. Public participation and the environment: Do we know what works? Environmental Science and Technology 33 (16): 2685-92. Coglianese, C. 1997. Assessing consensus: The promise and performance of negotiated rulemaking. Duke Law Journal 46 (6): 1255-1349. Cole, R. L., and D. A. Caputo. 1983. The public hearing as an effective citizen participation mechanism: A case study of the General Revenue Sharing Program. American Political Science Review 78 (2): 404-16. Crosby, N., J. M. Kelly, and P. Schaefer. 1986. Citizens panels: A new approach to citizen participation. Public Administration Review 46:170-78. Desvousges, W. H., and V. K. Smith. 1988. Focus groups and risk communication: The “science” of listening to data. Risk Analysis 8 (4): 479-84. Dryzek, J. S. 1997. The politics of the earth: Environmental discourses. Oxford: Oxford University Press. Einsiedel, E. F., E. Jelsoe, and T. Breck. 2001. Publics at the technology table: The consensus conference in Denmark, Canada, and Australia. Public Understanding of Science 10 (1): 83-98. Elder, P. S. 1982. Project approval, environmental assessment and public participation. The Environmentalist 2 (1): 55-71. Elmes, D. G., B. H. Kantowitz, and H. L. Roediger III. 1985. Research methods in psychology. 2nd ed. St. Paul, MN: West. Esogbue, A. O., and Z. M. Ahipo. 1982. A fuzzy sets model for measuring the effectiveness of public participation in water resources planning. Water Resources Bulletin 18 (3): 451-56. Fiorino, D. J. 1990. Citizen participation and environmental risk: A survey of institutional mechanisms. Science, Technology, & Human Values 15 (2): 226-43. Fitzpatrick, R., and D. White. 1997. Public participation in the evaluation of health care. Health and Social Care in the Community 5 (1): 3-8. Frewer, L. J., and Salter, B. Forthcoming. Public attitudes, scientific advice and the politics of regulatory policy: The case of BSE. Science and Public Policy. Gariepy, M. 1991. Toward a dual-influence system: Assessing the effects of public participation in environmental impact assessment for hydro-Quebec projects. Environmental Impact Assessment Review 11:353-74. Godschalk, D. R., and B. Stiftel. 1981. Making waves: Public participation in state water planning. Journal of Applied Behavioral Science 17 (4): 597-614.

554 Science, Technology, & Human Values Gundry, K. G., and T. A. Heberlein. 1984. Do public meetings represent the public? American Planning Association Journal 50 (Spring): 175-82. Guston, D. H. 1999. Evaluating the first US consensus conference: The impact of the citizens’ panel on telecommunications and the future of democracy. Science, Technology, & Human Values 24 (4): 451-82. Halvorsen, K. E. 2001. Assessing public participation techniques for comfort, convenience, satisfaction and deliberation. Environmental Management 28 (2): 179-86. Hannah, S. B., and H. S. Lewis. 1982. Internal citizen control of locally initiated citizen advisory committees: A case study. Journal of Voluntary Action Research 11 (4): 39-52. Heberlein, T. A. 1976. Some observations on alternative mechanisms for public involvement: The hearing, public opinion poll, the workshop and the quasi-experiment. Natural Resources Journal 16 (1): 197-212. Horlick-Jones, T., J. Rosenhead, I. Georgiou, J. Ravetz, and R. Lofstedt. 2001. Decision support for organisational risk management by problem structuring. Health, Risk & Society 3 (2): 141-65. Houghton, D. G. 1988. Citizen advisory boards: Autonomy and effectiveness. American Review of Public Administration 18 (3): 283-96. Joss, S. 1995. Evaluating consensus conferences: Necessity or luxury? In Public participation in science: The role of consensus conferences in Europe, edited by S. Joss and J. Durant, 89108. London: The Science Museum. . 1998. Danish consensus conferences as a model of participatory technology assessment: An impact study of consensus conferences on Danish Parliament and Danish public debate. Science and Public Policy 25 (1): 2-22. Kathlene, L., and J. A. Martin. 1991. Enhancing citizen participation: Panel designs, perspectives, and planning. Journal of Policy Analysis and Management 10: 46-63. Kidder, L., and C. M. Judd. 1986. Research methods in social relations. 5th ed. New York: Holt, Rinehart and Winston. Laird, F. N. 1993. Participatory analysis, democracy, and technological decision making. Science, Technology, & Human Values 18 (3): 341-61. Lauber, T. B., and B. A. Knuth. 1999. Measuring fairness in citizen participation: A case study of moose management. Society and Natural Resources 11 (1): 19-37. Leach, S., and M. Wingfield. 1999. Public participation and the democratic renewal agenda: Prioritisation or marginalisation? Local Government Studies 25:46-59. Lenaghan, J., W. New, and E. Mitchell. 1996. Setting priorities: Is there a role for citizens’juries? British Medical Journal 312:1591-93. Lynn, F. M. 1987. Citizen involvement in hazardous waste sites: Two North Carolina success stories. Environmental Impact Assessment Review 7:347-61. Lynn, F. M., and G. J. Busenberg. 1995. Citizen advisory committees and environmental-policy: What we know, what’s left to discover. Risk Analysis 15 (2): 147-62. MacNair, R. H., R. Caldwell, and L. Pollane. 1983. Citizen participants in public bureaucracies: Foul-weather friends. Administration & Society 14 (4): 507-23. Maloff, B., D. Bilan, and W. Thurston. 2000. Enhancing public input into decision making: Development of the Calgary Regional Health Authority public participation framework. Family and Community Health 23 (1): 66-78. Martin, S., and A. Boaz. 2000. Public participation and citizen-centred local government: Lessons from the best value and better government for older people pilot programmes. Public Money and Management 20 (2): 47-53.

Rowe, Frewer / Evaluating Public Participation 555 Mayer, I., J. de Vries, and J. Geurts, 1995. An evaluation of the effects of participation in a consensus conference. In Public participation in science: The role of consensus conferences in Europe, edited by S. Joss and J. Durant, 109-24. London: The Science Museum. Mazmanian, D. A. 1976. Participatory democracy in a federal agency. In Water politics and public involvement, edited by J. C. Pierce and H. R. Doerksen, 201-23. Ann Arbor, MI: Ann Arbor Science. McIver, S. 1998. Healthy debate? An independent evaluation of citizens’juries in health settings. London: King’s Fund. Milbraith, L. W. 1981. Citizen surveys as citizen participation mechanisms. Journal of Applied Behavioral Science 17 (4): 478-96. Moore, S. A. 1996. Defining “successful” environmental dispute resolution: Case studies from public land planning in the United States and Australia. Environmental Impact Assessment Review 16:151-69. Nelkin, D., and M. Pollak. 1979. Public participation in technological decisions: Reality or grand illusion? Technology Review 81:55-64. Oppenheim, A. N. 1992. Questionnaire design, interviewing and attitude measurement. London: Pinter. Ouellet, F., D. Durand, and G. Forget. 1994. Preliminary-results of an evaluation of 3 healthy cities initiatives in the Montreal area. Health Promotion International 9 (3): 153-59. Owens, S. 2000. Engaging the public: Information and deliberation in environmental policy. Environment and Planning A 32:1141-48. Petts, J. 1995. Waste management strategy development: A case study of community involvement and consensus-building in Hampshire. Journal of Environmental Planning and Management 38 (4): 519-36. Pierce, J. C., and H. R. Doerksen. 1976. Citizen advisory committees: The impact of recruitment on representation and responsiveness. In Water politics and public involvement, edited by J. C. Pierce and H. R. Doerksen, 249-66. Ann Arbor, MI: Ann Arbor Science. Plumlee, J. P., J. D. Starling, and K. W. Kramer. 1985. Citizen participation in water quality planning. Administration and Society 16 (4): 455-73. Poisner, J. 1996. A civic republican perspective on the National Environmental Policy Act’s process for citizen participation. Environmental Law 26:53-94. Reich, R. B. 1985. Public administration and public deliberation: An interpretive essay. Yale Law Journal 94 (7): 1617-41. Renn, O., T. Webler, and P. Wiedemann. 1995. Fairness and competence in citizen participation: Evaluating models for environmental discourse, Dordrecht, the Netherlands: Kluwer Academic. Renn, O., T. Webler, H. Rakel, P. Dienel, and B. Johnson. 1993. Public participation in decision making: A three-step procedure. Policy Sciences 26 (3): 189-214. Roberts, T., S. Bryan, C. Heginbotham, and A. McCallum. 1999. Public involvement in health care priority setting: An economic perspective. Health Expectations 2:235-44. Rosener, J. 1975. A cafeteria of techniques and critiques. Public Management 57 (December): 16-19. . 1978. Citizen participation: Can we measure its effectiveness? Public Administration Review 38 (September/October): 457-63. . 1981. User-oriented evaluation: A new way to view citizen participation. Journal of Applied Behavioral Science 17 (4): 583-96.

556 Science, Technology, & Human Values . 1982. Making bureaucrats responsive: A study of the impact of citizen participation and staff recommendations on regulatory decision making. Public Administration Review 42 (July): 339-45. Rossi, P. H., H. E. Freeman, and M. W. Lipsey. 1999. Evaluation: A systematic approach. 6th ed. London: Sage. Rowe, G., and L. J. Frewer. 2000. Public participation methods: A framework for evaluation. Science, Technology, & Human Values 25 (1): 3-29. Rowe, G., R. Marsh, and L. J. Frewer. Evaluation of a deliberative conference using validated criteria. 2004. Science, Technology, & Human Values 29(1): 88-121. Sewell, W. R. D., and S. D. Phillips. 1979. Models for the evaluation of public participation programmes. Natural Resources Journal 19:337-58. Shindler, B., and J. Neburka. 1997. Public participation in forest planning: 8 attributes of success. Journal of Forestry 95:17-19. Sinclair, M. 1977. The public hearing as a participatory device: Evaluation of the IJC experience. In Public participation in planning, edited by W. R. D. Sewell and J. T. Coppock, 105-22. New York: John Wiley. Stewart, T. R., R. L. Dennis, and D. W. Ely. 1984. Citizen participation and judgment in policy analysis: A case study of urban air quality policy. Policy Sciences 17:67-87. Susskind, L., and C. Ozawa. 1983. Mediated negotiation in the public sector. American Behavioral Scientist 27:255-79. Syme, G. J., and B. S. Sadler. 1994. Evaluation of public involvement in water resources planning: A researcher-practitioner dialogue. Evaluation Review 18 (5): 523-42. Tuler, S., and T. Webler. 1999. Voices from the forest: What participants expect of a public participation process. Society and Natural Resources 12 (5): 437-53. Twight, B. W., and M. S. Carroll. 1983. Workshops in public involvement: Do they help find common ground? Journal of Forestry 81 (November): 732-35. Webler, T. 1999. The craft and theory of public participation: A dialectical process. Journal of Risk Research 2 (1): 55-71. Wiedemann, P. M., and S. Femers. 1993. Public-participation in waste management decisionmaking—Analysis and management of conflicts. Journal of Hazardous Materials 33 (3): 355-68. Yin, R. K. 1994. Case study research: Design and methods. 2nd ed. London: Sage.

Gene Rowe is currently a senior scientist in the Consumer Science Group at the Institute of Food Research, Norwich (United Kingdom). His Ph.D., gained from the Bristol Business School at the University of the West of England, concerned the use of nominal groups to improve human judgment and decision making. Apart from a continuing interest in judgment and decision making, his research activities and publications have also spanned topics from expert systems and forecasting to risk perception and public participation. Much of his recent work has focused on the issue of evaluating the effectiveness of public-participation exercises. Lynn J. Frewer is a research professor in Food Safety and Consumer Behavior at Wageningen University in the Netherlands. She has a background in psychology (MSc and Ph.D. from UCL, London, and the University of Leeds, both in the United Kingdom). She has research interests in risk analysis, as well as societal aspects of food safety and emerging technology.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.