Improbable outcomes: Infrequent or extraordinary?

July 5, 2017 | Autor: Anine Riege | Categoría: Cognition, Probability, Humans, Judgment, Female, Male, Young Adult, Aged, Middle Aged, Adult, Male, Young Adult, Aged, Middle Aged, Adult
Share Embed


Descripción

Cognition 127 (2013) 119–139

Contents lists available at SciVerse ScienceDirect

Cognition journal homepage: www.elsevier.com/locate/COGNIT

Improbable outcomes: Infrequent or extraordinary? Karl Halvor Teigen a,⇑, Marie Juanchich b, Anine H. Riege a a b

Department of Psychology, University of Oslo, Norway Department Leadership, HRM and Organisation, Kingston University London, UK

a r t i c l e

i n f o

Article history: Received 30 May 2012 Revised 1 December 2012 Accepted 4 December 2012 Available online 30 January 2013 Keywords: Verbal probabilities Improbable Unlikely Frequentistic probabilities Probability judgments

a b s t r a c t Research on verbal probabilities has shown that unlikely or improbable events are believed to correspond to numerical probability values between 10% and 30%. However, building on a pragmatic approach of verbal probabilities and a new methodology, the present paper shows that unlikely outcomes are most often associated with outcomes that have a 0% frequency of occurrence. Five studies provide evidence that when people complete or evaluate statements describing ‘‘improbable’’ outcomes, based on outcome distributions or expected ranges, they favor extraordinary outcomes that have not occurred in the original sample. For quantitative outcomes that can be ordered on a unipolar dimension, an improbable outcome is typically perceived as having a higher outcome value than those observed. Thus when battery life for a sample of laptop batteries is shown to range from 2.5 to 4.5 h, 5 or 6 h are considered better examples of ‘‘improbable’’ duration times than those that actually occur in 10% of the cases. Similarly, an improbable exam grade is one that has not yet been observed, rather than one that has been obtained by a small percentage of students. And when climate experts claim that a 100 cm increase in sea level by the year 2100 is ‘‘improbable’’, participants believe that the same experts’ maximum estimates will be much lower. We conclude that judgments of what is improbable suggest outcomes beyond the expected range, rather than simply low frequency outcomes. These results are compatible with a causal (propensity) interpretation rather than a statistical (frequency) interpretation of probabilities. Ó 2013 Elsevier B.V. All rights reserved.

1. Introduction In The Cyberiad, science fiction writer Lem (1985) relates the adventures of the inimitable inventors Trurl and Klapaucius. At one occasion, Trurl constructed a probability amplifier in his basement, with the result that formerly improbable creatures, primarily dragons (which are less unlikely than goblins and elves) sprung into being. The successful inventor was almost devoured by the first dragon that materialized, but fortunately, Klapaucius was nearby and lowered the probability, and the monster vanished. ⇑ Corresponding author. Address: Department of Psychology, University of Oslo, P.B. 1094, Blindern, NO-0317 Oslo, Norway. Tel.: +47 22 84 51. E-mail address: [email protected] (K.H. Teigen). 0010-0277/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.cognition.2012.12.005

This story is interesting on several accounts. One is that degrees of probability are equated with degrees of existence rather than frequencies of occurrence. Second, improbable creatures are not just rare; they are those that normally do not exist. In the present paper we want to show that lay conceptions of probability, and notably of improbability, may have something in common with the notions of Trurl and Klapaucius. 1.1. Improbability according to the translation approach There exists a large body of research where people have been asked, in various fields, to convert verbal expressions of uncertainty and probability into corresponding numerical probabilities (for reviews, see Clark, 1990; Theil, 2002). One recurrent finding is that verbal probabilities are fuzzy

120

K.H. Teigen et al. / Cognition 127 (2013) 119–139

concepts, which can be better described in terms of a membership function over the [0, 1] probability scale, than by specific values on this dimension (Budescu & Wallsten, 1995). It has nevertheless been common to report means or medians as representative interpretations of the probabilities associated with terms and phrases such as improbable, even chance, probable, and very likely (e.g., Juanchich, Teigen, & Villejoubert, 2010). Group data show some consistency when such expressions are translated into probability values (percentages) between 0% and 100%. For improbable and unlikely, the terms in focus of the present study, most average values that have been reported lie in the range between 10% and 30%, depending on theme and type of study. For instance, in six risk studies reviewed by Theil (2002), unlikely was rated as corresponding to 14%, 18%, 18.5%, 19%, 24%, and 31%, respectively. Improbable and unlikely are typically given very similar ratings in studies where both these terms have been used, perhaps with slightly higher scores for unlikely. Reagan, Mosteller, and Youtz (1989) found both to be translated with 15%; Kong, Barnett, Mosteller, and Youtz (1999): 13% and 14%; Lichtenstein and Newman (1967): 12% and 18%; Budescu and Wallsten (1985): 17% and 20%; Reyna (1981): 19.9% and 20.1%; Clarke, Ruffin, Hill, and Beamen (1992): 21% and 24%; Mazur and Hickam (1991): 30.9% and 30.9%. In non-English Germanic languages, unlikely and improbable have to be rendered with the same term [Dutch: onwaarschijnlijk, German: unwahrscheinlich, Norwegian: usannsynlig], which have also been shown in various studies to correspond, on a group level, to probabilities in the 10–30% range (Brun & Teigen, 1988; Doupnik & Richter, 2003; Smits & Hoorens, 2005). To facilitate communication about risks and other uncertain events, scholars in several fields from medicine to military intelligence have suggested standard lists of probability phrases together with proposed numerical equivalences. Some of these have been informed by empirical studies like those mentioned above, whereas others seem to be mainly based on the authors’ linguistic intuitions, coupled with a penchant for symmetry and orderliness. On these lists, improbable can be found as corresponding to percentages between 15% (Renooij & Witteman, 1999) and 30% (Weiss, 2007, Fig. 1). Reports from the IPCC (Intergovernmental Panel on Climate Change, 2007) inform their readers that unlikely in statements about climate change means probabilities below 33%, while probabilities lower than 10% are called very unlikely. JP 2-0, the joint report for US military intelligence, suggests unlikely to be used for confidence estimates between 10% and 40% (Pace, 2007). Despite all vagueness and variation documented by this extensive research literature, scholars seem to agree in one, prominent respect: Trurl was wrong. Improbable creatures do exist. They are to be found singly, in duplicates or even triplicates in samples of ten, and in hundreds in populations exceeding 1000. Among the 7 days of the week, one will be an unlikely day, and in the course of a year, one or two of the months will be unlikely. There is no need for a probability amplifier to coax an unlikely outcome into being; they swarm around us by the dozen.

1.2. Improbability as out-of-range outcomes Yet there is also evidence suggesting that people think of improbable outcomes as values located outside the normal range. In a study of boundary values for ranges of imperfectly known quantities, Teigen, Halberg, and Fostervold (2007) compared the meanings of inclusive boundary phrases, which includes the quantity described in the range (at least, at most, minimum, maximum) and exclusive phrases, which excludes the quantity described in the range (more than, less than, above, below). In Experiment 4, participants read statements where a boundary phrase qualified a numerical value, for instance, ‘‘the shoes cost more than [at least] NOK 500’’, and were asked whether this statement implied that the shoes can or cannot cost NOK 500, and further whether they would consider NOK 500 to be an improbable price. About 2/3 of the participants reading statements with exclusive terms said that the boundary values were improbable. Most of these (72.9%) also said the price could not be so low. In contrast, only 35–40% of those reading inclusive terms said that the boundary values were improbable. Such values were claimed by most participants (77.1%) to be values that could occur. New analyses of these data show significant (p < .05) positive correlations between endorsements of ‘‘improbable’’ and ‘‘cannot’’ statements within each condition (/ = .33 and .16 for exclusive and inclusive terms, respectively). The correlations indicate that construing an event as unlikely is related to believing that it cannot happen, especially in the exclusive boundary condition. In Experiment 5 of the same study, participants in four conditions were asked to imagine a product that would cost ‘‘minimum 1200’’, ‘‘maximum 1200’’, ‘‘more than 1200’’, or ‘‘less than 1200’’. On this basis, they were asked to provide three instances of probable values and three examples of improbable values. New analyses1 of these data show that 78.9% of participants in the lower-limit conditions mentioned prices below the lower limit as examples of improbable values, and 61.3% of participants in the upperlimit conditions suggested prices above the upper limit as improbable instances. Many of these instances were far beyond the limits. For example, for a product that cost ‘‘less than 1200’’, most of the participants provided values outside of the range, such as 2000, as examples of improbable prices. These experiments were intended to investigate the meaning of range boundaries rather than the meaning of improbable outcomes. Thus the numerical probabilities of the improbable outcomes were not established. Yet, indirect evidence suggests that unlikely was associated with outcome values taken from outside the range of values that could possibly be expected, corresponding to occurrence frequencies much lower than warranted by the 10–30% probability estimates obtained by translation studies.

1.3. A new approach In a more recent set of experiments, Teigen, Juanchich, and Filkuková (2013) have approached the meaning of ver1

We thank Anne-Marie Halberg for conducting these analyses.

K.H. Teigen et al. / Cognition 127 (2013) 119–139

bal probability terms from a new angle. Instead of being asked to place probability terms on a numeric probability scale, participants were shown a complete distribution of outcomes and asked to enter an appropriate outcome value in statements containing various probability words. Thus, the traditional How Likely (HL) procedure was replaced with a Which Outcome (WO) approach. So for instance, after being presented with a sample of computer batteries, with battery life ranging from 1.5 to 3.5 h, participants were requested to complete statements such as ‘‘It is possible [certain; quite probable, quite improbable] that a battery will last . . .. . . hours’’, with a value they considered natural in the given context. So rather than deciding ‘‘how likely’’ is an improbable outcome, they estimated ‘‘how large’’ the outcome would be. The battery vignette was accompanied with a unimodal distribution of outcomes, with the bottom value occurring in 10% and the top value occurring in 5% of the cases. About one third of the participants selected the top value (3.5 h) as quite improbable, but an even greater proportion (46%) suggested still higher values (up to 13 h), which had never been mentioned in the vignette text. Very few chose the bottom value, whereas some participants proposed still lower values. Altogether, more than half of the participants chose ‘‘improbable’’ values outside the range of the vignette distributions (Teigen et al., 2013, Table 3). 1.4. Variants of probability When people are asked to describe verbal probabilities with numbers, improbable is equated to probabilities in the 10–30% range. And yet they often seem to think of improbable outcomes as quite extraordinary events. The present set of studies was designed to investigate this apparent paradox. Evidently, the probability equivalents elicited by the ‘‘translation’’ (How Likely) approach must be based on other considerations than occurrence frequencies, perhaps reflecting different notions of probability. It has been pointed out by several authors that probability is a ‘‘polysemous’’ concept with several related meanings. Hertwig and Gigerenzer (1999) draw a distinction between mathematical probabilities, which can be assumed to follow the rules of probability theory, and nonmathematical meanings, where these rules do not apply. These authors found that participants who were asked to solve a conjunction problem tended to prefer various nonmathematical translations of the probabilities in question, suggesting ‘‘possibility’’, ‘‘conceivability’’, and ‘‘plausibility’’ as the most often preferred synonyms (Hertwig & Gigerenzer, 1999, Experiment 2). From this perspective, an improbable outcome could be defined as a denial of these features, namely as incredible, not plausible, and hard to believe. If not outright impossible, such an event should be expected to occur very rarely, or not at all. Within the domain of mathematical probabilities, one basic distinction can be drawn between probabilities reflecting ‘‘external’’ (aleatory) vs. ‘‘internal’’ (epistemic) uncertainty (Hacking, 1975; Lagnado & Sloman, 2007). In the first case, probabilities are assumed to reflect unsettled events in the real world, like the outcomes of a lottery. In the second, they are assumed reflect an individual’s degree

121

of knowledge, or state of ignorance (Fox & Ulkümen, 2011). In their fourfold taxonomy of variants of uncertainty, Kahneman and Tversky (1982) further distinguished between two ‘‘external’’ variants, according to whether probabilities are assessed in a distributive mode, based on frequencies, or in a singular mode, ‘‘in which probabilities are assessed according to the propensities of the particular case in hand’’ (p. 152). It has been a long standing debate among probability theorists if probabilities should be primarily given a frequentistic interpretation (von Mises, 1961/1928), or whether they can also be legitimately discussed from a causal, ‘‘propensity’’ perspective (Gillies, 2000; Popper, 1959, 1990). It is a moot issue whether people in their daily life are frequentists (Cosmides & Tooby, 1996; Gigerenzer & Hoffrage, 1995), or rather adhere to a singular, causal conception (Keren & Teigen, 2001). Perhaps they switch between both modes (Jones, Jones, & Frisch, 1995; Reeves & Lockhart, 1993). From a causal, single event perspective, participants may describe an improbable event ‘‘mathematically’’ as 10% probable to indicate that the occurrence of this event is ‘‘against all odds’’, and is thus not going to happen, failing to realize that this probability (given a frequentistic interpretation) suggests an outcome that indeed is going to happen in 10 out of 100 cases. Thus from a ‘‘nonmathematical’’ perspective, improbable outcome values border on the impossible or are at least implausible. When people are called upon to adopt a mathematical mindset, by being asked to convert improbable into numerical probabilities, they may think of low probability values in terms of propensities that are too weak to manifest themselves as real occurrences. In both cases, we predict that ‘‘improbable’’ outcomes will be described as extraordinary and too extreme to occur with a frequency of 10–20%. 1.5. The present experiments Experiment 1 was designed to investigate the extremeness of improbable values and how often they are predicted to occur. Participants in one condition were asked to produce improbable monthly temperatures, followed by a question about whether these values can be expected to occur occasionally or not at all. In another condition they were asked about impossible temperatures, which were expected to be still more extreme, yet showing a substantial degree of overlap with the improbable ones. In the following experiments we allowed participants to compare low-frequency and zero-frequency outcomes more directly, by presenting outcome distributions which include outcomes that have occurred infrequently (5–15%) and outcomes that have never occurred (yet). In Experiment 2 participants were asked to give ratings of how correct, and how natural, are statements describing such outcomes as being unlikely. In Experiment 3a and 3b they were instead asked to choose outcome values that are appropriate to complete statements describing unlikely events. Experiment 3a also included statements about improbable, doubtful, not certain, and a chance, to assess the similarities and potential differences between unlikely/improbable and other common phrases denoting low-probability events.

122

K.H. Teigen et al. / Cognition 127 (2013) 119–139

Experiment 3b further investigated how many unlikely events participants believed would be observed in new, larger samples. Both these experiments employed outcome distributions arranged from small to large on unipolar ratio scales. Experiment 4 used the same methodology (statement completion) with outcomes that can be arranged on bipolar (reversible) ordinal and nominal scales. Participants in Experiment 5 evaluated climate expert predictions of a rise in sea level. In Experiment 5a they rated how correct it would be to call the stated changes probable or improbable. In Experiments 5b and 5c participants were given a range of expected outcomes and asked to identify an improbable outcome, and vice versa (given an improbable outcome, what is the expected range). In addition, they were asked to state the numerical probability corresponding to the improbable outcome. In this way we tested whether people think that probability values of 10–20% can be used to describe outcomes beyond the expected range. 2. Experiment 1 In this experiment we asked participants to suggest ‘‘improbable’’ temperatures for each month next year. We suspected that their suggestions would contain values that were quite deviant from normal averages. However, if ‘‘improbable’’ events correspond to occurrence frequencies of 10–30%, one should expect people to suggest values that they believe will occur occasionally, although rather seldom. Studies of frequency terms like rarely, seldom, and often, agree that seldom corresponds to frequencies within this interval. Seldom has been assigned frequencies of 16% (Lichtenstein & Newman, 1967), 17% (Clarke et al., 1992), and 19% (Hamm, 1991). Numerical conversions of its German equivalent, ‘‘selten’’, have been equated to 13% (Fischer & Jungermann, 1996) and 18.5% (Bocklisch, Bocklisch, & Krems, 2012).2 If, however, ‘‘improbable’’ is given a non-frequentistic interpretation, participants may include not only values that are expected to occur ‘‘seldom’’, but even those that are not expected to occur at all. Some such values may be regarded as impossible. The present experiment was accordingly designed to compare people’s perceptions of improbable and impossible values. Although we do not believe that these concepts are used synonymously, we expected some overlap between the distribution of suggested improbable and suggested impossible outcomes. 2.1. Method 2.1.1. Participants and design Participants were 137 students attending a lecture in introductory psychology at the University of Oslo (104 women and 32 men, two participants did not report sex; median age 20 years). They were randomly assigned to one of 2 These studies show that ‘‘rare’’ and «rarely» correspond to somewhat lower frequencies (around 10%). In German and Norwegian rarely and seldom cannot be distinguished, but must be given the same translation (German: selten; Norwegian: sjelden). Fischer and Jungermann (1996) translate ‘‘selten’’ with ‘‘rarely’’, whereas Bocklisch et al. (2012) use ‘‘infrequently’’ as their English translation.

two conditions, by receiving a questionnaire titled What is the meaning of ‘‘improbable’’ (n = 74), or What is the meaning of ‘‘impossible’’ (n = 63). 2.1.2. Questionnaires The questionnaires were introduced as part of a research project about words and expressions used in everyday life as well as by experts, for instance in climate research. To this end, they were asked to suggest average monthly temperatures they would consider improbable [impossible] to occur in Oslo for each of 12 months next year. As background information, they received a list of the normal averages per month, ranging from 4.3 °C in January to 16.4 °C in July. (For a full translation of the questionnaires, see Appendix A.) After suggesting 12 improbable or impossible temperatures, one for each month, they were asked whether they thought that any of these would actually occur next year, or over the next 10 years, and if yes, how many. On the next page of the questionnaire, they were asked to rate on 5-point Likert scales about their agreement with the following statements (1: Strongly disagree, 5: Strongly agree):  The temperatures I wrote are improbable [impossible] because they, in my opinion, will not occur in the future.  The temperatures I wrote are improbable [impossible] because they, in my opinion, will seldom occur in the future.  The temperatures I wrote are improbable [impossible] because they are so far away from the normal for these months.  The temperatures I wrote are improbable [impossible] because they are incompatible with the current climatic conditions in Oslo. 2.2. Results The suggested improbable or impossible monthly averages were re-coded in terms of their deviations from normal averages. Such deviations can be both positive and negative for any given month; for instance a July average of 26 °C or of 6 °C could be regarded as equally deviant by being, respectively, 10° higher or 10° lower than the normal average of 16 °C. However, a count of signed deviations revealed a tendency to suggest temperatures defying the season: improbable and impossible winter temperatures were typically warmer, whereas improbable and impossible summer temperatures were typically colder than the normal averages, as shown in Fig. 1. This indicates that improbable (as well as impossible) values are conceived as values that are not only more extreme, but also diametrically opposed to the normal state of affairs. The distributions of unsigned (absolute) deviations were positively skewed, particularly for impossible values, which ranged from 3–4 to several hundred degrees above/ below normal. The median unsigned deviation was 15.7° centigrade above/below normal for improbable temperatures and 24.9° for impossible temperatures. A median

123

K.H. Teigen et al. / Cognition 127 (2013) 119–139

These results indicate that people do not reserve the term ‘‘improbable’’ for outcomes that have not occurred yet, but are expected to appear from time to time in the future. Most of our participants felt that the values they suggested would not occur even in the next 10-year period and that will not occur was a fairly good explanation for their choice. Such values border on the impossible even if ‘‘impossible’’ outcomes were, on average, even more extreme. 3. Experiment 2

Fig. 1. Percentages of participants who suggested positive deviations (improbable and impossible temperatures that are warmer than monthly normal).

split for the whole sample showed that 63.5% of participants in the improbable group suggested deviations of less than 19°, whereas 64.5% chose values that departed from more than 19° from the average in the impossible group; v2(1, N = 136) = 10.604, p < .001. Thus impossible temperatures are, as predicted, more extreme than improbable ones, but there is a considerable overlap. Even improbable temperatures were placed very far from the normal ones. If improbable values were selected as meaning a 10– 20% occurrence frequency, one should expect one or two of them to occur next year, given that the 12 months can be regarded as independent events. Over the next 10 years, such values should occur about ten times more often (as there would be 120 months where an ‘‘unlikely’’ result could happen). But 81.9% of the respondents did not expect any improbable temperature for the year after, and 60.3% did not expect any improbable result for the next 10 years, either. The expectations of impossible values were, as predicted, even lower (93.7% did not expect any ‘‘impossible’’ value next year, and 84.1% did not think they would appear during the 10 years period; both these figures are significantly different from those for improbable values, with Fisher exact test, p = .041 and p = .002). When asked, on the next page of the questionnaire, why their suggested temperatures were improbable, participants felt that ‘‘they will not occur in the future’’ was a better explanation than ‘‘they will seldom occur in the future’’; t(72) = 2.085, p = .041 (for all mean ratings, see Table 1). They also agreed that the values were improbable because they were so extreme (far above/below normal) and incompatible with current climatic conditions. These reasons were endorsed even more strongly for impossible values.

When asked to produce ‘‘improbable’’ mean temperatures, participants in Experiment 1 suggested values that were very far from the normal ones. Most of them did not think such temperatures would ever occur, and considered ‘‘will not occur in the future’’ a good reason for regarding them as improbable. However, participants were not informed about the actual frequency of such deviant values. Experiment 2 was set up to study how well improbable describes different values in a frequency distribution of outcomes, including those that have actually occurred infrequently (5–10%) and those that have not occurred. Participants were in this experiment asked to rate several statements instead of just suggesting or picking one of them. We also investigated the effects of two different instructions, by asking people in one condition to judge the statements according to how correct they were, and in another whether they sounded natural in the given context. It is conceivable that the first instruction would prompt participants to respond in agreement with a frequentistic criterion (e.g., judging that it is correct to say that outcomes with a 10% probability are improbable), whereas the question about ‘‘natural’’ usage could evoke a more pragmatic approach (e.g., finding it more natural to say that outcomes that have not occurred are improbable). 3.1. Method 3.1.1. Participants and design Participants were recruited to an experiment on the web (with Qualtrics) through Amazon Mechanical Turk for taking part in a larger study on probability and risk judgments. We report here only data from a subset of 106 participants who were asked to judge statements about unlikely outcomes (participants in two other conditions made similar judgments about possible and certain outcomes). There were 51 women and 54 men (one did not report gender), median age was 32 years (age range 18–64). Nearly half the sample had a college degree of

Table 1 Mean agreement (1–5) with reasons for why suggested temperatures are improbable or impossible. Explanations

Improbable

Impossible

tdiff

p

Will not occur in the future Will seldom occur in the future Far above/below normal Incompatible with climatic conditions

4.00 3.53 4.19 4.01

4.34 2.90 4.55 4.30

1.999 2.364 2.034 1.527

.048 .02 .044 ns

124

K.H. Teigen et al. / Cognition 127 (2013) 119–139

4 years or more. Participants completed the questionnaire after a task focusing on the effect of severity on risk estimates adapted from Harris, Corner, and Hahn (2009, Study 1). The severity condition or the responses given in that task did not affect participants’ further responses, and will not be further discussed here. The experiment featured a 2  4  4 mixed design, with type of instruction (with focus on correctness vs. naturalness) as a between-subjects factor, and ratings of four outcome values (lowest, middle, highest, and out of range) in four vignettes (Battery, Weight loss, Jeans, and Mail) as within-factors. 3.1.2. Procedure and questionnaires Before providing their judgments on predictions, participants read a short description of the task, which was in Condition 1 to assess how correct were the four predictions, and in Condition 2 to assess how natural they were, on separate five-point rating scales. A correct prediction was defined as ‘‘when it describes accurately the chances of the event occurring’’ (1: Not at all correct; 5: Completely correct); a natural prediction was defined as ‘‘a prediction sounds natural if it appears likely to be said in the described context, or if it is easy to imagine someone saying it in the given situation’’ (1: Not at all natural; 5: Very natural). Participants were presented with four vignettes adapted from Teigen and Filkuková (2013) and Teigen et al. (2013), showing the complete distributions of outcomes for battery life in a sample of laptop batteries, weight loss for a sample of dieters taking part in a weight reduction program, amount of shrinkage in a sample of jeans, and the number of days in the mail for a sample of letters sent from Norway to various addresses in the US. The distributions were unimodal, with five outcome values, of which the middlemost was the most frequent, and the lowest and highest value occurring in about 10% of the cases. For each of these vignettes, participants read four unlikely predictions, describing the lowest, the middle (peak), the maximum value of the distribution, and a value beyond the range (+1 unit above the range maximum). For example, in the Battery vignette, the distribution ranged from 1.5 to 3.5 h, by increments of 0.5 h, participants judged statements saying ‘‘It is unlikely that a battery of this brand will last 1.5 h [2 h, 3.5 h, and 4 h]’’. Overall each participant made 16 judgments (four per vignette); vignettes and statements were presented in a randomized order. A complete questionnaire is presented in Appendix B.

Fig. 2. How correct, and how natural, are different outcome values in ‘‘unlikely’’ statements; mean ratings (1: Not at all correct; 5: Completely correct [natural]) and standard errors, based on four vignettes.

p < .001, g2p ¼ :103. This interaction is due to a deviant response pattern for the letter vignette, where some participants deemed very short delivery times to be as unlikely as extremely long ones. Vignette also had a significant main effect, F(2.03, 211.18) = 144.31, p < .001, g2p ¼ :281, whereas the test of between-subjects effects revealed no significant differences between instructions (F(1, 104) = 1.41, p = .237, g2p ¼ :013). As shown in Fig. 2, statements claiming that the middle (and most frequent) values were unlikely were not considered to be correct (nor natural). However, statements describing the lowest value as unlikely did not fare much better, even if these values occurred in only 5–10% of the cases. Statements characterizing the highest value as unlikely were given much higher ratings, while statements describing out-of-range values, beyond the highest value, were given the highest ratings of all. The pattern depicted in Fig. 2, with out of range (above maximum) ratings higher than ratings of high extremes, which in turn are higher than ratings of low extremes, appears to be quite robust by manifesting itself in all four scenarios, under two different instructions. It is interesting that even people who have been asked to ‘‘describe accurately the chances of the event occurring’’ (semantic condition) denied that p = 10% is unlikely when speaking of a small outcome magnitude, but accepted the same probability as unlikely when speaking about high outcome value. For high outcome values with zero frequency (judging from the distribution), unlikely is even more appropriate. 3.3. Discussion

3.2. Results Ratings of statements in all vignettes followed the same general pattern, summarized in Fig. 2. An overall 2  4  4 mixed ANOVA was conducted, where vignette and outcome values were placed as within-subjects independent variables and instruction (pragmatic vs. semantic) as a between-subjects independent variable. The tests of withinsubjects effects (with Greenhouse–Geisser correction) showed a significant effect of outcome, F(2.70, 280.50) = 3.65, p = .016, g2p ¼ :034, and a significant interaction between vignette and outcome, F(7.35, 280.50) = 11.89,

In Experiment 2b by Teigen et al. (2013), participants were allowed to select only one instance of an unlikely (improbable) value. Thus from their choices of a top value, or one that had never occurred, we cannot conclude that these values are the only unlikely ones. Perhaps some of the other, actually occurring values could be described as ‘‘unlikely’’, as well. From a fuzzy concept (membership function) approach, one might think that the term improbable covers a span of probabilities, perhaps ranging from 0% to 40% (as suggested by range studies by Wallsten, Budescu, Rapoport, Zwick, and Forsyth (1986) and Villejou-

K.H. Teigen et al. / Cognition 127 (2013) 119–139

bert, Almond, and Alison (2009)). Probabilities high in this span could also be described with other phrases, such as a chance, a possibility, uncertain, and a low probability, whereas probabilities closer to zero have fewer alternatives. Thus unlikely might have been used for describing near zero probabilities, because of pragmatic rather than semantic reasons. Even if unlikely means a probability of 0–40, it could be most often used to characterize probabilities near zero simply because of a lack of other appropriate phrases for probabilities in this bracket. However, the ratings in the present experiment indicate that low values are not considered unlikely, even if they rarely occur. Moreover, unlikely is a more appropriate descriptor of outcomes that have not been observed than observed, but rare outcomes. These usages are not merely considered natural, but also more correct. So the selection of extreme unlikely values cannot be explained from a lack of other descriptors, they are simply the most illustrative unlikely values. The prototypical unlikely beast looks more like a dragon than an endangered arctic fox. 4. Experiment 3a Participants in Experiment 1 and in the previous studies by Teigen et al. (in preparation) were shown distributions covering outcomes with occurrence frequencies of 5–10% and upwards. But values outside of the distribution range were never explicitly described. Participants might have thought that such values, although not mentioned, occasionally occur. In Experiment 3 such values were presented as part of the distribution, but with zero frequency. With this design, participants were given a real choice between which outcome value should be described as improbable: Those with an occurrence frequency of 10% or those that have never occurred. The experiment was also designed to compare improbable and unlikely with other negative verbal phrases (doubtful and not certain), to test whether the preference for outcome beyond the range was limited to unlikely or was associated to negative directionality expressions in general. As a control, we also included one positive, lowprobability phrase (a chance). 4.1. Method 4.1.1. Participants Overall, 122 participants were recruited by Amazon Mechanical Turk to complete a web-based questionnaire. Of these, 26 participants who responded too fast or too slow (in less than 2.5 min or in more than 10 min) were excluded. The remaining sample of 96 participants included 51.1% female, median age = 32 years (age range 19–66). A majority of participants had a job (71.3%) and had ‘‘some college’’ or a college degree (54.2%). Two participants did not share their socio-demographics. They were randomly assigned to five different conditions with 17–23 participants in each. 4.1.2. Material and procedure All participants were presented with four vignettes describing outcome distributions for computer battery life,

125

weight loss, shrinkage of jeans (as in Experiment 1), and the amount of glomps to be found in a distribution of shmulps. The last, nonsense vignette was adapted from Juanchich, Sirota, and Butler (2012), and was included as a ‘‘neutral’’, utility-free task (it is difficult to say what is better of large or small amounts of glomps). Each vignette was illustrated by a unimodal bell-shaped bar chart with 7 bars, the two extreme ones having a zero frequency. In contrast to the vignettes used previously (Juanchich, Teigen, & Gourdon, 2013; Teigen et al., 2013), the zero frequency options were explicitly included in the graph. Order of vignettes was randomized between subjects. For each vignette, participants were asked to complete a sentence containing either it is unlikely, it is improbable, it is doubtful, it is not certain, or there is a chance, in five different conditions, with a fitting outcome value; for instance: ‘‘It is unlikely that a laptop battery will last . . .. . . hours before being recharged’’. The complete questionnaire is presented in Appendix C. A pre-test conducted on a sample of 153 Mechanical Turk workers focused on the probability associated with the different probability terms in a between-subjects design. Results of the pre-test showed that unlikely, improbable, doubtful, and a chance convey on average probabilities below 50%, with mean ratings ranging from 36.8% to 45.1%. Only not certain was slightly above 50% with an average of 54.1%. 4.2. Results Outcome values chosen to fit the uncertainty statements were coded as below minimum, minimum, intermediate (one of the middle three outcome values), maximum, and above maximum. The first and last of these had 0% frequency of occurrence, whereas minimum and maximum had a 5–10% occurrence frequency each, as illustrated by the figures in Appendix C. Outcome choices turned out to be highly similar for all four vignettes (including the nonsense shmulp vignette). Average choice frequencies for coded outcome values are presented in Table 2. Table 2 shows that typical unlikely and improbable outcomes are consistently selected from the high end of the distribution, and most often higher than the maximum observed values, confirming the ratings reported in Experiment 2. Interestingly, doubtful and not certain follow the same pattern. Doubtful has previously been found to correspond to similar probabilities as improbable (Brun & Teigen, 1988; Wallsten et al., 1986), whereas not certain is believed to correspond to a higher probability, usually in the 40–50% range (Brun & Teigen, 1988; Reyna, 1981). The probabilistic meanings found in the pre-test sample of Mechanical Turk participants were similar. However, all these probabilities have in common a negative directionality (Teigen, 1988; Teigen & Brun, 1995, 1999), that is: they suggest that the target outcome may not occur. In contrast, a chance is a positive phrase, despite suggesting a rare event, in the sense that it directs the readers’ or the listeners’ attention to the event’s possible occurrence. As a result, statements describing a chance are typically completed with actually occurring outcomes, especially the maximum observed value. This pattern of

126

K.H. Teigen et al. / Cognition 127 (2013) 119–139

Table 2 Mean percentages of chosen outcome values in five uncertainty conditions for all vignettes, Experiments 3a and 3b. Condition

n

Below min 0%

Minimum 5–10%

Intermediate 20–40%

Maximum 5–10%

Above max 0%

Experiment 3a Unlikely Improbable Doubtful Not certain A chance

19 23 19 17 18

5.3 6.5 7.9 4.4 –

– 1.1 2.7 – 4.2

12.0 19.5 6.7 29.4 38.9

21.3 13.0 22.7 13.2 50.0

61.4 59.8 60.1 51.4 7.0

Experiment 3b Unlikely

104

1.6

0.3

6.4

32.7

58.9

choice replicates previous results for there is a chance, it is possible (Teigen et al., 2013, Experiment 2a), and for what can happen (Teigen & Filkuková, 2013), where people have been found to focus on the highest obtainable value. These results point to a negative directionality effect that is not limited to the specific terms unlikely or improbable, nor is it limited to low probability expressions. Indeed, it appears that several negative directionality expressions, such as not certain and doubtful are also associated with non-occurring outcome values from beyond the top of the range. 5. Experiment 3b Participants who selected out-of-range values as ‘‘unlikely’’ might have assumed that even if they were not observed in the present, limited sample, such values might still occur in the population at large. From a statistical point of view, it can be argued that the 10% most extreme values in a normal distribution will only occur in a subset of all small samples drawn from the population; thus it is perfectly reasonable to imagine that values that do not show up in a particular sample may still occur with a 10% frequency or more in the total population. Experiment 2 actually included two rather small samples (n = 10) and two samples presented as percentage distributions of indeterminate size. In Experiment 3a, all samples were presented as percentage distributions, perhaps suggesting a sample size of 100; however, actual size of the samples was never specified. Experiment 3b was performed as a replication of the unlikely condition of Experiment 3a, with two added features: One question about the number of ‘‘unlikely’’ values the participants expected to find in a new sample of 100 or 1000 items, and another question asking them to explain why they regarded the ‘‘unlikely’’ value unlikely. Samples of this size, especially the largest one, should more accurately reflect the population characteristics and reveal the expected prevalence of ‘‘unlikely’’ values. 5.1. Method 5.1.1. Participants Overall, 104 participants were recruited by Amazon Mechanical Turk to complete a web-based questionnaire. Two participants did not report their socio-demographic characteristics. The participants included 32% female, med-

ian age = 26 years (age range 18–65). A majority of participants (76%) had a job, and 69% had some college or a college degree. 5.1.2. Materials and procedure Participants read the vignettes Computer battery, Diet, and Jeans, as described in the previous experiment, presented in this order. All vignettes described a sample of 100 events by means of a unimodal, bell-shaped bar-chart, depicting bars where the first and the last had a frequency of zero, similar to those illustrated in Appendix C, except that the highest occurring value had a frequency of 5% in all vignettes. For each vignette, participants completed the three following tasks presented on separate pages along with the frequency distributions. First, participants completed an Outcome Completion Task featuring the probability term unlikely (e.g., ‘‘it is unlikely that a Comfor battery will last . . .. . . hours’’). Then, participants in Condition 1 judged the number of times the outcome they selected would occur in a sample of 100 new tests, and in Condition 2 the same judgment was made for a future sample of 1000 tests. Subsequently, participants judged to what extent they agreed with four assertions describing reasons why the outcome they selected was unlikely. Agreement was rated on seven point scales from 1: Strongly disagree to 7: Strongly agree. 1. A Comfor battery duration of [Outcome selected] is unlikely because, as far as I know, it has not occurred in the past. 2. A Comfor battery duration of [Outcome selected] is unlikely because, in my opinion, it will not occur in the future. 3. A Comfor battery duration of [Outcome selected] is unlikely because, in my opinion, it will seldom occur in the future. 4. A Comfor battery duration of [Outcome selected] is unlikely because it is so far away from the normal. Finally, participants responded to socio-demographic questions. 5.2. Results 5.2.1. Outcome selection Around one third chose the highest occurring outcome, which according to the graph occurred in 5% of the cases,

K.H. Teigen et al. / Cognition 127 (2013) 119–139

but most participants selected an outcome from beyond the range provided. For example, in the battery vignette, 57.7% participants chose battery durations beyond the maximum 3.5 h, and 30.8% chose the top outcome (3.5 h). Mean percentages for all three vignettes are shown in the bottom row of Table 2.

5.2.2. Frequencies of selected outcomes Participants reported the expected number of selected outcomes in 100 or in 1000 new trials. Those who had selected actually occurring top values believed, on average, that such outcomes would occur 6.44 times in a new sample of 100, and 38.33 times in a new sample of 1000. These values are rather close to the 5% occurrence frequencies in the original samples. If anything, the relative occurrence frequency of unlikely instances appears to decrease rather than increase with the size of the sample, in line with previous studies of ratio bias and numerosity effects (cf. Reyna & Brainerd, 2008). However, we were in this experiment particularly concerned with the majority who had selected values beyond the top of the distribution. They believed, in 97.6% out of 86 cases, that the ‘‘unlikely’’ value would not occur in a new sample of 100. In Condition 2, participants believed in 87.5% out of 95 cases that such values would not occur even in a new sample of 1000 observed values. These results indicate that very few participants think that a non-occurring event may show up in a later sample, not even in a much larger one. These results confirm earlier demonstrations of the ‘‘law of small numbers’’ (Tversky & Kahneman, 1971), i.e., people’s beliefs that samples, regardless of size, have similar characteristics as the populations from which they are drawn. Thus we find no evidence for the speculation that participants pick out-of-range values from small samples in the belief that they will occur more often in the future.

5.2.3. Reasons for selected outcome Participants who had selected out of range values strongly agreed that this outcome was unlikely because it had not occurred (overall mean agreement = 6.27). They also tended to agree that this outcome was unlikely because it will not occur (M = 4.76) or will seldom occur (M = 5.08) in the future, and because this outcome is far from normal outcomes (M = 5.09). The first rating was in all vignettes significantly higher than the other three (ts > 4.50), whereas the other three were not significantly different, except in the Diet vignette, where not occur was rated somewhat lower than seldom and far from normal. Those who had selected outcome values within range obviously had to disagree with the first statement (M = 2.26) and also with the second (M = 2.69), but agreed with seldom (M = 5.64) and far from normal (M = 5.39). Thus ‘‘seldom occur’’ was in this study a better equivalent for ‘‘unlikely’’ than was the case in Experiment 1, perhaps because the present study displayed a complete distribution, showing a considerable variability of values. On this background, seldom occur might seem to be a more cautious phrase than in the temperature context, where less variations in future averages might be expected.

127

6. Experiment 4 Experiment 3 demonstrated that when participants are shown a distribution of quantitative outcomes, for instance, the duration in hours of a set of laptop batteries, and are asked to complete unlikely-statements, such as: ‘‘It is improbable that a battery lasts for . . .. . . hours’’, they will often provide top values with a zero frequency of occurrence. They also rated statements about such values as completely correct and very natural (Experiment 2). The vignettes in these experiments concerned outcome distributions on unipolar ratio scales extending from ‘‘low’’ to ‘‘high’’ outcome values (batteries lasting from 1.5 to 3.5 h). However, such scales may induce respondents to imagine outcomes beyond the range that is presented. Experiment 4 was designed to replicate the findings based on quantitative outcomes and to extend this research to a larger variety of outcome distributions (i.e., ordinal and categorical outcomes). As in Experiment 3, the distributions used here include at least one infrequent value (occurring in 10% of the cases, or less) and potential outcomes that have never occurred. We surmised that non-occurring values would be more often selected than infrequent values, even for values on ordinal and nominal scales. 6.1. Method 6.1.1. Participants Students following two psychology classes (first and second year) at the University of Tromsø, Norway, served as participants; N = 69 (58% female; median age 21 years). They were randomly given one of two versions of the questionnaire described below. 6.1.2. Questionnaires All questionnaires contained four vignettes describing distributions of 4–7 different outcomes, including at least one low-probability outcome (5–10% occurrence), and at least one zero-frequency outcome. Participants were subsequently asked to rate the appropriateness of statements about ‘‘improbable’’ outcomes, and to fill in the missing value in ‘‘improbable’’ statements. For instance, after being shown a distribution of exam grades, the question would be: ‘‘It is improbable that a student receives . . .. . . on this exam’’. Outcomes ranged from categorical to ordinal and ratio scale values. 6.1.2.1. Computer batteries. This vignette was identical to the battery vignette used in Experiments 2 and 3, except that all battery durations were increased with 1 h, to make more room for values below the minimum. Participants were asked to evaluate four improbable statements according to how correct or natural they seem to be (rating scales from 1: fits very poorly, to 5: fits very well). In Condition 1 these values were (a) 2 h (below minimum), (b) 2.5 h (minimum value), (c) 4.5 h (maximum value), and (d) 5 h (above maximum). In Condition 2 the below minimum and above maximum values were even more extreme, at 1 and 6 h, respectively. However, the differences in ratings

128

K.H. Teigen et al. / Cognition 127 (2013) 119–139

between the two conditions were not significant, so the results were pooled. 6.1.2.2. Snowy owls. The text of this vignette was in Condition 1 as follows: ‘‘Studies of snowy owls in Northern European countries reveal large seasonal variations. One study found that 30% of all eggs are hatched in the spring, 60% in the summer, 10% in the autumn and none in the winter’’. In Condition 2 the text and the probabilities were the same, except that winter was placed first and autumn last in the list of seasons. Complete the sentence below with a season that fits the context: ‘‘It is improbable that a snowy owl is hatched in . . .. . .. . .. . .. . .’’

6.1.2.3. Hotel ratings. This vignette described hotel ratings from a sample of 200 guests, accompanied by a bar graph showing the number of guests who had rated the hotel as excellent (22.5%), very good (37.5%), good (30%), below average (10%), or poor (0). Participants were asked to complete the following statements (Condition 1): (a) ‘‘It is improbable that a guest will evaluate this hotel as . . .. . .. . .. . .’’ (b) ‘‘It is not certain that a guest will evaluate this hotel as . . .. . .. . .. . .’’ In Condition 2 the order of statements was reversed. There appeared to be no difference between conditions, so the results were pooled. 6.1.2.4. Improbable grades. The text was as follows: ‘‘One hundred students stood for an exam in psychology with varying results. The figure below shows how many students obtained which grade in this course.’’ This vignette was accompanied with a bar graph showing in Condition 1 that A was obtained by five students, B: 20, C: 40, D: 25, E: 10, and F: 0. In Condition 2 the distribution was changed so that no student received an A, whereas five students received an F. Participants were then asked to complete these two statements:  It is probable that a student gets . . .. . .. . . on this exam.  It is improbable that a student gets . . .. . .. . . on this exam. For a full translation of the vignettes, see Appendix D 6.2. Results 6.2.1. Computer batteries Of the four statements to be rated in Vignette 1, (b) (minimum values) and (c) (maximum values) both correspond to probabilities around 10%, but they were not considered equally good candidates for being unlikely, as statement (b) was given a mean appropriateness score of 2.38 against 3.28 for statement (c); t(68) = 4.75, p = .000. These results are similar to the ratings of low vs. high val-

ues reported in Fig. 2. Statements (a) (below minimum) and (d) (above maximum) contain out-of-range values, which have never occurred. Statements below minimum (not tested in Experiment 2) turned out to be rated even less appropriate than statements about minimum values, M = 1.97 vs. 2.38; t(68) = 2.60, p = .012, whereas statements above maximum were, as in Experiment 2, more appropriate than statements about maximum values, M = 4.10 vs. 3.28; t(68) = 4.62, p = .000. Most participants (70%) gave top appropriateness ratings (a score of 5) to statement (d), which described duration times that were not attained by any of the batteries in the tested sample. In contrast, only 13% considered that the improbable statement about 4.5 h ‘‘fits very well’’, despite the fact that this duration time was only attained by 10% of the batteries in the sample. 6.2.2. Snowy owls Fifty-eight participants (84.1%) completed the statement about an improbable season for hatching with ‘‘winter’’, whereas 10 (14.5%) wrote ‘‘autumn’’ (one wrote ‘‘autumn/winter’’). Thus it seems more appealing to use the empty category as an example of an ‘‘improbable’’ season than a season with 10% hatchings. Winter was chosen equally often regardless of its position in the list of seasons. 6.2.3. Hotel ratings Altogether, 58 of 67 participants (86.6%) completed the statement about improbable ratings with ‘‘poor’’, an evaluation that had not occurred in the sample of 200 guest evaluations. Seven participants (10.4%) suggested ‘‘below average’’ (in addition, one participant suggested that both these ratings were improbable). Not certain did not suggest one specific rating, the most frequent ratings being ‘‘below average’’ (37.3%), which occurred in 10% of the cases, and ‘‘excellent’’ (26.9%), which occurred 22.5% of the time. In other words, not certain could qualify either a top or a bottom rating. 6.2.4. Improbable grades All participants (100%) completed the statement about a probable grade with C, this being the most frequent and hence most likely grade. The statement about an improbable grade was completed by all participants with either an A or an F. Interestingly, most participants suggested the grade that had never been given. In Condition 1, 26 of 31 participants (78.9%) suggested F as improbable, while the remaining 5 (21.1%) suggested A. In Condition 2, this pattern of preference was reversed; here 30 of 38 (83.9%) chose A, whereas 8 (16.1%) suggested F. The difference between the two conditions is highly significant, v2(1, N = 69) = 26.95, p = .000. 6.3. Discussion In all vignettes, a majority of 80–90% felt that improbable (unlikely)3 best characterized outcome values that had 3 The Norwegian term is ‘‘usannsynlig’’, which can be translated both as improbable and unlikely. There are in Norwegian only one term with this meaning.

K.H. Teigen et al. / Cognition 127 (2013) 119–139

never occurred. In three of the distributions, this was at the same time extreme values. In Vignette 1, where outcomes described values on a unipolar ratio scale, only the top values and, especially, values beyond the top were considered improbable. In Vignette 4 grades at both ends of the scale could be improbable if no student had received them. Vignette 2 (snowy owls) shows that outcomes with zero frequency are preferred even for categories that do not form an ordinal or a ratio scale. However, one cannot conclude from these results that improbable means zero probability, as all distributions can be regarded as (fairly large) samples rather than exhaustive descriptions of the total population. So for instance, even if no snow owl hatchings have yet been observed during winter, the possibility remains that a winter specimen will be found in the future. Outcomes that occur in 5–10% of the cases are evidently less qualified to be called improbable, even if most ‘‘translation’’ studies have found that improbable is believed to correspond to probability values of this magnitude or higher. As a control, we asked a sample of 43 students (who had not been exposed to the battery vignette) to suggest a probability corresponding to quite improbable in the statement ‘‘it is quite improbable that the battery in a laptop computer will last 3 h before needing to be recharged’’. With the exception of one participant, they all suggested probabilities from 10% and upwards, with 20% as the median probability. Not certain, which in Experiment 3a was chosen to describe above maximum values, was in the hotel vignette used to describe less improbable outcomes. Two factors may account for this difference. (1) In the present experiment, the negation implied by not certain makes it an awkward phrase for describing negative events, because it seems to suggest that an attempted poor outcome is not achieved. ‘‘It is not certain Paul will fail’’ may sound like Paul is trying to fail, or that we are hoping for his failure (unless the context has established Paul’s potential failure as a target issue, which is now being questioned). As a consequence, many participants preferred to place not certain at the positive rather than at the negative end of the distribution (it is not certain that the hotel will be rated as excellent). (2) Second, the same participants evaluated both phrases (unlikely and not certain) in a ‘‘joint presentation’’ format (Hsee, 1996). This format facilitates comparisons and promotes a focus on how these phrases differ, rather than how similar they are. Because different questions suggest a need for different answers; participants may have felt encouraged to complete the two sentences with different values (Schwarz, 1999).

7. Experiment 5 Experiment 5 was designed to investigate the meaning of improbable in yet another domain, namely predictions of climate change issues. Scenarios of future events are typically fraught with considerable uncertainty, giving rise to range rather than point forecasts, or to a whole family of different scenarios dependent on which assumptions that will turn out to hold true (cf. IPCC, 2007). Range forecasts are often intended to be held with 90–95% confidence or

129

more, but frequency distributions including the prevalence of specific outcomes are rarely available. Prior research has shown that people are quite insensitive to variations in confidence level associated with such ranges (Teigen & Jørgensen, 2005); they tend to think of intervals of the same width regardless of instructions, even in the absence of any specified level of confidence. However, from a normative standpoint, probabilities associated with outcome values outside the expected range should be very low. Altogether three experiments were conducted where participants were asked to evaluate probable and improbable predictions of a rise in sea level by the year 2100, relative to an expected range of outcomes. Participants were undergraduate Norwegian students (mostly psychology students) who were randomly allocated to Experiments 5a, 5b, or 5c. In Experiment 5a, they were asked to judge how reasonable it would be to describe outcomes inside and outside of the expected range as probable or improbable, using a similar procedure as in Experiment 2. In Experiments 5b they were given an expected range and asked to specify an ‘‘improbable’’ outcome, whereas in Experiment 5c they were given an ‘‘improbable’’ outcome, and asked to specify the expected range. 7.1. Experiment 5a 7.1.1. Method 7.1.1.1. Participants. Participants were 70 undergraduate students at the University of Oslo, 77.5% were female, and the median age was 21.5 years. They were randomly assigned to one probable and one improbable condition. 7.1.1.2. Material and procedure. All participants received a vignette (inspired by Harris and Corner (2011)) describing forecasts of the rise in sea level caused by global warming and the melting of ice in Greenland and at the North and the South Pole. The vignette stated that ‘‘climate experts expect a 50–90 cm rise in sea level around Norway by the year 2100’’,4 but did not provide any probability information. On this basis participants were asked to evaluate five statements according to how appropriate they appeared to be (on five-point scales from 1: not correct, to 5: completely correct). Participants in Condition 1 evaluated the following probability statements: ‘‘It is probable that the sea level around Norway rises [40: 50; 70; 90; 100] cm’’. Participants in Condition 2 evaluated a parallel set of improbability statements: ‘‘It is improbable that the sea level around Norway rises [40; 50; 70; 90; 100] cm’’. Participants in both groups were finally asked to rate their concerns about future climate changes (for themselves and for future generations), their interest in climate issues, and their level of knowledge about such issues, on 4 In a recent report, the ‘‘worst case’’ scenario for 2100 implies maximally a 82–121 cm increase in sea level for four major cities along the Norwegian coast (Simpson et al., 2012; Table 7.1). However, due to a simultaneous vertical land motion, the net effect may be a negative increase or no increase at all. There are no probability estimates in the report. The authors admit that they ‘‘consider the occurrence of high-end changes to be very unlikely, however, no formal assessment of their probability can be made’’ (p. 12).

130

K.H. Teigen et al. / Cognition 127 (2013) 119–139

four 7-point rating scales. These results will not be discussed further. 7.1.2. Results Ratings of correctness of probable and improbable statements, made by participants in Condition 1 (probable statements) and Condition 2 (improbable statements) are displayed in Fig. 3. Fig. 3 shows that 40, 50, or 70 cm levels of sea rise should be called probable rather than improbable, despite the fact that a 40 cm rise is below the expected range. A 90 cm rise could equally well be called probable as improbable (by being on the upper border of the expected range), whereas improbable was judged most appropriate to characterize the above maximal outcome (100 cm). Separate repeated-measure ANOVAs show main significant effects of rise in sea level both for probable, F(4, 156) = 28.665, p = .000, g2p ¼ :42, and for improbable, F(4, 116) = 6.832, p = .000, g2p ¼ :19. Pairwise comparisons of probable reveal that a 100 cm rise is less probable than all lower amounts (p < .001), whereas pairwise comparisons of improbable show 100 cm to be significantly more improbable than all lower amounts (p < .05), except 90 cm. 7.1.3. Discussion The ratings of improbable statements replicate those reported in Experiment 2 (Fig. 2) and for the Battery vignette in Experiment 4. The results of the present experiment show in addition that low values are considered more probable than high ones. This indicates an ‘‘at least’’ reading of probable values, paralleling a previous finding in numerical statements containing will (Teigen & Filkuková, 2013) and certain (Teigen et al., 2013). For instance, when shown a distribution of battery durations, similar to the one used in Experiment 2, many participants said that a battery will last, or is certain to last 1.5 h, which was the smallest value in the distribution of duration times. Such responses are plausible for increasing amounts, where higher values entail lower ones. For example, when sea level increases by 90 cm, it is also correct to say that it has increased by 50 cm—and then by 40 additional cm, while the reverse does not hold. (For a linguistic discussion of ‘‘exact’’ vs. ‘‘at least’’ readings of numerals, see Levinson, 2000.)

7.2. Experiment 5b Participants in the previous experiments were asked to identify appropriate usages of the term improbable, but did not estimate the numerical probabilities associated with this term. In Experiment 5b, participants were asked to do both. From a frequentistic standpoint, it may seem obvious that an outcome that occurs in 10% of the cases should have an estimated probability of 10%, and that an outcome that never occurs, should be given a probability of 0% (or close to 0%). In such case, it might seem superfluous to ask what are the numerical probabilities involved. However, not everybody shares this view. If probabilities are not derived from frequencies, it could make sense to associate ‘‘improbable’’ with probability estimates that differ from their frequency. The question to be explored in Experiment 5b is whether improbable events that lie outside the expected range of outcomes are associated with very low probabilities (close to 0%), in agreement with the frequentistic conception of probabilities, or whether people are able to entertain a notion of improbable events as extreme, and yet corresponding to numerical probabilities at the 10– 30% level, without finding these judgments mutually inconsistent. For the sake of comparison, participants were asked to evaluate both improbable and probable outcomes. Numeric probability estimates were in this experiment obtained with two response formats: as written estimates and by the use of rating scales. The use of both formats was motivated by a recent study (Riege & Teigen, in press) showing that written probability estimates are more consistent (more additive) than estimates obtained on rating scales. 7.2.1. Method 7.2.1.1. Participants. Participants were 65 undergraduate students at the University of Oslo, 76.9% were female, 21.5% male (one did not report gender or age). The median age was 23 years. They were randomly assigned to one of the two format conditions (writing vs. rating scale). 7.2.1.2. Material and procedure. Participants in both conditions received the same vignette as in Experiment 5a, but this time they were told that climate experts expected a 50–90 cm rise in sea level around Norway by the year 2100. On this basis, they were asked to complete the following statements:  It is probable that the sea level around Norway will rise with . . . cm.  It is improbable that the sea level around Norway will rise with . . . cm.

Fig. 3. Mean ratings (1–5) of appropriateness of verbal probability statements when sea level is expected to increase with 50–90 cm.

Both these magnitude estimates were accompanied with a numerical probability estimate, as answer to the question: How probable do you think it is that the sea will rise with the amount you have estimated to be ‘‘probable’’ [‘‘improbable’’]? Give a number between 0% and 100%. (For a full translation of the vignette, see Appendix E.)

K.H. Teigen et al. / Cognition 127 (2013) 119–139

Half of the participants were asked to write their estimates as numbers between 0% and 100% (Condition 1), the other half gave their estimates as ratings on a 21-point scale where probabilities between 0% and 100% were listed in multiples of five (Condition 2). 7.2.2. Results Most participants suggested probable values lying within the 50–90 cm range. Eight participants believed it was probable that the sea will rise by 50 cm or less. Overall, probable outcomes were estimated as having a median numerical probability of 70%. A large majority of participants (80%) completed the improbable statement with values equal to or higher than the maximal sea rise expected (90 cm), whereas only 20% suggested lower values. The three most frequently mentioned values were 90 cm (26.2%), 100 cm (15.4%), and 200 cm (18.5%). This shows, in yet another context, that unlikely values are rather high than low, and often located far above the expected range. Despite their extremeness, the improbable outcomes were not associated with a zero (or close to zero) probability, but were compatible with estimates from previous ‘‘translation’’ studies, with a median estimated probability of 20%. (Medians were considered more representative than means, due to positively skewed distributions.) Extreme sea rise values were not judged to be less probable than less extreme ones. The correlation between extremeness (sea rise in cm for participants suggesting values within or above the expected range, log transformed to avoid a disproportionate effect of outliers) and probability estimates for improbable outcomes was close to zero, r (n = 50) = .04), rather than negative, which should be expected from a consistency point of view. There was a tendency for scale estimates to be higher than written estimates, but both sets of estimates lie within the 10– 30% bracket known from previous How Likely-studies. Thus even if participants in this study placed the unlikely outcome well outside the expected range of values, this did not seem to reduce their probability estimates. 7.3. Experiment 5c In Experiment 5b participants first read the degree to which the sea will rise, expressed as a range (50–90 cm), and then, based on this range, completed a sentence describing an ‘‘improbable’’ sea rise. Experiment 5c followed a complementary procedure: participants first read about an amount that was explicitly characterized as ‘‘improbable’’, and then produced a range based on this forecast. As in Experiment 5b, participants also produced a numerical probability estimate corresponding to the improbable value. 7.3.1. Method 7.3.1.1. Participants. Participants were 63 undergraduate students at the University of Oslo, 76.2% were female, and the median age was 22 years. None of them had taken part in any of the previous experiments. They were randomly assigned to one of the two response conditions.

131

7.3.1.2. Material and procedure. Participants in both conditions received a variant of the same sea level vignette as above. This time they were told about a climate expert who claimed that a 100 cm rise in sea level was improbable (Norwegian: ‘‘usannsynlig’’), and asked (a) to estimate the numeric probability he might have in mind. They were then asked (b) to suggest an expected range of rise in sea level, by completing the following sentence, attributed to the same expert: ‘‘We expect the sea level to rise between . . . and . . . cm’’. Finally they were asked (c) to give their own probability estimate of a 100 cm rise. Probability estimates were again given either as written numbers (Condition 1) or as checkmarks on a 0–100% rating scale (Condition 2), as in Experiment 5b. (For a full translation of the questionnaire, see Appendix E.)

7.3.2. Results The participants thought that the climate expert had a median probability of 20% in mind when saying that a sea rise of 100 cm was improbable. There was in this group no difference between written and rated estimates. Based on the prediction that a sea rise of 100 cm was unlikely, participants believed that the sea would rise between 20 cm and 56 cm (median estimates for minimum and maximum sea level). This range lies well below the 100 cm value claimed to be improbable by the expert. Most participants (77.8%) suggested expected maxima below 100 cm; 20.6% suggested an expected maximum equal to 100 cm, whereas only one single participant suggested that the sea could rise by more than 100 cm. These results indicate that participants were aware that forecasters use the term improbable for values above the expected range. The final question: ‘‘How probable do you think it is that the sea level will rise with 100 cm?’’ was generally answered with higher probability estimates than the first question, Mdn = 50%. Reflecting ‘‘your’’ (rather than the expert’s) opinion, this estimate cannot be regarded as another numeric translation of improbable. It could indicate that the participants did not agree with the expert’s opinion and believed he was too cautious (Juanchich et al., 2012). However, the high frequency of 50% answers may in this context simply signify ‘‘I don’t know’’, rather than an estimate of the probability of the improbable event occurring (cf. Bruine de Bruin, Fischhoff, Millstein, & Halpern-Felsher, 2000; Fischhoff & Bruine de Bruin, 1999). 7.4. Discussion Most participants considered an ‘‘improbable’’ rise in sea level to be higher than the range of expected outcomes (Experiment 5b), and conversely, that the range of expected outcomes was below an ‘‘improbable’’ rise (Experiment 5c). Yet, participants in both experiments suggested that this improbable, out of range-value corresponded to a numeric probability of around 20%. This value is similar to findings from previous translation studies. For instance, Harris and Corner (2011), using a vignette where a rise in sea level of 3 feet was said to be ‘‘unlikely, perhaps very unlikely’’, received numeric probability interpretations of

132

K.H. Teigen et al. / Cognition 127 (2013) 119–139

this message of around 20% in their low severity conditions. In contrast with Experiments 2–4 the present experiments did not include frequency information. Such information would not be appropriate, since a 100 cm rise in sea level in the year 2100 is a singular event for which no occurrence frequencies are available. Yet a well-calibrated judge should think of outcomes that are estimated to be 20% probable as events that occur from time to time, without being considered extraordinary. Our respondents, who think that a 20% probability describes outcomes far beyond the highest expected value, are using numbers in a way that seems to defy a statistical interpretation. 8. General discussion A large body of evidence from studies following a How Likely-approach has established that events labelled improbable (or unlikely) suggest probabilities in the 10– 30% range. We find, in contrast, using a Which Outcomemethodology, that improbable outcomes are believed to have a frequency of occurrence close to zero. Both findings appear to be replicable and robust, they can even co-occur within the same context (as shown in Experiments 5b and 5c). In which respects do these two approaches differ, how can the contrasting findings be explained, and how can they be incorporated in a theory of lay probabilistic thinking? 8.1. P(improbable) = 20% or 0%? Past studies of verbal probabilities have typically presented participants with a single, specified or unspecified ‘‘improbable’’ outcome and asked which probability values (on the 0–1 probability scale) are implied by this linguistic descriptor. This task can be construed as a binary decision, where the basic question is whether one believes that this particular outcome will occur or not. We assume that in such binary situations, that a value of 50% (p = .5) will be regarded as reflecting pure chance or complete ignorance (a fifty–fifty chance, cf. Fischhoff & Bruine de Bruin, 1999), whereas values above 50% indicate that the target outcome will probably happen, and values below 50% indicate that it will probably not happen. By giving an ‘‘improbable’’ outcome a value of 10–20%, which is below 50% with an ample margin, the speaker makes known that this outcome is not believed to be among those that are going to happen. The present studies challenge this practice in tasks that differ from traditional translation studies in three important respects. First, participants were in Experiments 2–4 given occurrence frequencies for all outcomes. For a statistically minded individual, this should be the ideal way of representing probabilities. However, for a lay propensity theorist, it is less helpful. If improbable describes outcomes with a ‘‘very low propensity’’, too weak to emerge under normal circumstances, outcomes that actually occur in 10 or 20 out of 100 cases may not qualify as improbable en-

ough. Thus the 10–20% values suggested by judges who have singular events in mind, may not automatically imply a 10–20% rate of occurrence. Second, participants in our studies were given a range of multiple outcomes rather than simply one outcome at the time. This may prevent them from using the 50% value as the ‘‘ignorance prior’’, dividing the outcome space between likely and unlikely events. Based on a 50% criterion, most outcomes in the present experiments, even the most likely ones, would fall in the ‘‘unlikely’’ category. Third, most outcome values used in the present studies could be arranged from low to high. This led to a second finding: People selectively prefer the high end to the low end of the distribution. An improbable outcome is not simply any low-probability or zero-frequency outcome; this phrase seems to be reserved for top results and high achievements (not necessarily good ones), at least for outcomes that can be ordered along a cumulative (unipolar) magnitude scale. For reversible (bipolar) scales, like plus and minus degrees centigrade, and exam grades ranging from A to F (or from F to A), improbable outcomes might be found at both ends (for similar results with the term possible, see Teigen et al., 2013). Focus on high rather than low extreme values are not an exclusive property of negative expressions such as improbable. When people are asked to describe a possible outcome, or one that has a chance of occurring, they also select values at the high end rather than at the low end of a distribution (Teigen et al., 2013), but this time those that fall within the expected range of outcomes. Together these findings point towards a general preference for top vs. bottom scores in a distribution.

8.2. Causal or frequentistic probabilities? Although the differences between the present findings and the results of previous translation studies can be partly attributed to differences in methodological approach, they cannot be reduced to a purely methodological issue, as explained above. Our results indicate that many participants agree with Trurl: Improbable creatures are those that have never been observed (yet). When improbable outcomes have been associated with a 10–20% probability, it may not be because such outcomes are believed to occur with this frequency, but because they are perceived to have only a weak propensity of occurring. Findings by Slovic, Monahan, and MacGregor (2000) can be interpreted as supporting the difference between a statistical (frequency) and a causal (propensity) interpretation of probability. They interviewed experienced forensic psychologists and psychiatrists about a hospitalized mental patient, Mr. Jones, who allegedly had a 20% chance of committing a new act of violence. Given this ‘‘low’’ risk, 79% were willing to discharge him. Those who were given the same information in a statistical format, and told that ‘20 out of every 100 patients similar to Mr. Jones are estimated to commit an act of violence’, felt that the situation was more serious, and 41% refused to discharge him. According to Slovic et al., frequencies were associated with

K.H. Teigen et al. / Cognition 127 (2013) 119–139

a greater risk perception because they triggered more frightening images than abstract percentage estimates. We propose in contrast that a ‘‘20% chance’’ is perceived from a singular point of view as a rather weak propensity, which may never manifest itself as a new act of violence, whereas the frequency format (20 out of 100) suggests that such acts are not uncommon. Equating a propensity estimate with statistical frequencies seems to have a similar effect as turning up Trurl’s machine; suddenly a considerable number of improbable creatures, believed to be non-existent because of their low propensity charges, pops up. And then they appear less improbable than they did before.

8.3. Linguistic vs. mathematical improbabilities From a linguistic point of view, improbable and unlikely can be regarded as negations, focusing on the nonoccurrence rather than the occurrence of the target event (Sanford & Moxey, 2003; Teigen, 1988; Teigen & Brun, 1995). Research has shown that people find negations more difficult to process than corresponding affirmations (Just & Carpenter, 1971), and may accordingly rely more strongly on an intuitive ‘‘gist’’ representation (Reyna & Brainerd, 1991) than on an understanding of the exact quantities involved. Moreover, it has been claimed that people are less sensitive to gradations of negative formulations than corresponding positive formulations of the same state of affairs. For instance, people clearly distinguish between the healthiness of a product containing 5% vs. 25% fat, but judge a 95% and a 75% fat free product to be more equally healthy (Sanford, Fay, Stewart, & Moxey, 2002), presumably because it is easier to discriminate between degrees of presence (in this case: how much fat it contains) than between degrees of absence (fat free percentages). We found in Experiment 3a that participants selected above maximum values for several different negative expressions, including ‘‘not certain’’, which is usually taken to signify numerical values close to 50%. It appears that the denials implied in all these phrases suggested outcome values that cannot be realized. From a nonmathematical perspective, it stands to reason that negations of anything that is described as plausible, conceivable, believable, or certain, should not be expected to happen on a regular basis like in one out of five or one out of ten cases. Incredible events may not be downright impossible, but at least exceptional and unpredictable. It has been argued by Horn (1989, pp. 235–236) that uncertain, unlikely (improbable), and impossible form a negative epistemic scale, in the sense that what is impossible must also be improbable, but not vice versa. Scalar implicatures suggest that improbable outcomes can be perceived as not impossible (otherwise one would have said so), but such implicatures are not entailments, and can be cancelled by statements like ‘‘X is improbable, if not impossible’’. The results presented in the next section are compatible with this view: Improbable events may be impossible, but could also be taken to describe less extreme events.

133

8.4. Dragons revisited We opened this paper by suggesting that people may use improbable in a ‘‘Trurlian’’ sense, describing phenomena that do not exist under normal circumstances. However, while the present studies demonstrate that improbable outcomes are perceived to be beyond the ordinary, they did not explicitly address the issue of nonexistence. To examine people’s willingness to describe non-existent entities as improbable, a final, small-scale study was performed, where 58 students, attending classes in psychology at two Norwegian universities, were asked to complete three sentences starting with ‘‘It is improbable that . . .’’. The questionnaire comprised three vignettes, each with four options, as shown in Appendix F. Two of the options could be considered likely or ‘‘normal’’, as for instance the observation of footprints by a reindeer or a moose in the mountains around Tromsø. A third option was assumed to be extraordinary or rare (footprints from a bear, an animal that has not been observed in this region); whereas the fourth option described a mythical entity (footprints of a dragon). A majority of the participants chose to complete the ‘‘improbable’’ statements with mythical creatures: The ghost (65%), the monster (62%), and the dragon (66%), in the three vignettes, respectively. These were consistently explained to be improbable because ‘‘they don’t exist’’, ‘‘they are mythical’’, and so on (86% of all explanations in this category). Less than one third preferred to complete the sentences with low-probability, but not impossible options (the burglar, the submarine, and the bear), explaining that these would be unexpected or rare, but not completely inconceivable. Thus the concept of improbability seems able to encompass non-existent as well as real, but extraordinary options.

8.5. Implications for risk communication The present findings have implications for risk communication. If people believe that ‘‘unlikely’’ or ‘‘improbable’’ events are those that are not going to happen, one should avoid using this term for actually occurring, low frequency events, and reserve it for risks that are theoretically possible, but for most practical purposes can be left out of account. Infrequent, but non-negligible events should perhaps rather be described by terms having a positive directionality, e.g., a ‘‘low probability’’. So far the IPCC recommended exclusively the use of negative directionality expressions to convey degrees of certainty below 50%. A new analysis of the probability associated to the predictions presented in the IPCC report by Smithson, Budescu, Broomell, and Por (2012) shows that negative directionality expressions give rise to a larger range of interpretations than positive ones. This is in line with the ambiguity of negative probability expressions demonstrated in the present study. Studies showing the role of verbal probabilities in polite language have demonstrated that people sometimes use uncertainty expression to hedge their predictions

134

K.H. Teigen et al. / Cognition 127 (2013) 119–139

or to soften harsh news (Bonnefon & Villejoubert, 2006; Juanchich et al., 2012). For instance a doctor says to a patient ‘‘it is possible that you have cancer’’, when he means it is quite probable, and ‘‘it is probable’’ when he is actually quite sure. Analogously, negative directionality expression such as unlikely may sometimes be used as a polite way of saying ‘‘this can’t happen’’. Thus I might say to a UFO proponent that visitors from outer space are quite unlikely, when I mean they are impossible, but am unwilling to push my point. Similarly, Trurl’s sci-fi probability amplifier is also quite unlikely (if not impossible) from a material point of view, but perhaps it is more likely as a psychological model for how people think about probabilities.

January February March April May June July August September October November December

Normal monthly averages (°C)

Improbable [impossible] monthly averages

4.3 4.0 0.2 4.5 10.8 15.2 16.4 15.2 10.8 6.3 0.7 3.1

. . .. . .. . . . . .. . .. . . . . .. . .. . . . . .. . .. . . . . .. . .. . . . . .. . .. . . . . .. . .. . . . . .. . .. . . . . .. . .. . . . . .. . .. . . . . .. . .. . . . . .. . .. . .

8.6. Conclusion It has often been observed that people care more about outcome magnitudes than about outcome probabilities. Lottery players are more focused on prizes than chances, and accident magnitudes create more concern than accident rates, to a point that has been described as ‘‘probability neglect’’ (Sunstein, 2002). Rather than inquiring about odds and occurrence frequencies, people would like to know, dichotomously, whether something is going to happen or not happen, whether a product is healthy or unhealthy, or a procedure safe or unsafe. Even among acknowledged risks, we might draw a distinction between the unacceptable and the acceptable ones, those that are a cause of concern vs. those that could or should be ignored. Small probabilities represent a challenge in this respect. They have from one (frequentistic) point of view, real consequences, at least occasionally, but are from another (propensity) point of view chiefly of academic interest. Wavering between these interpretations, people may produce a bimodal pattern of response, exaggerating or neglecting the risk (McClelland, Schulze, & Coursey, 1993). Outcomes that are called improbable clearly belong to the ‘‘negligible’’ category. Such ‘‘black swans’’ are too extreme to be predicted or taken into account—although we may have underestimated their role in shaping the course of events and even in creating history (Taleb, 2007). Appendix A. Questionnaire used in Experiment 1 A.1. What is the meaning of ‘‘improbable’’ [‘‘impossible’’]? This is a part of a research project about words and phrases that are commonly used in daily life as well as in communicating expert opinions to the public, for instance in the field of climate research. In this case we would like to know how you, in everyday language, use the term ‘‘improbable’’ [‘‘impossible’’]. Your task is simply to suggest average temperatures for each of the 12 months next year that will, in your opinion, be improbable [impossible] in Oslo (monthly averages = mean daily temperatures for all days in the month). For comparison purposes we include information about average temperatures (in °C) for each month.

Do you think that any of these months will actually obtain average temperatures as cold or as warm as those you have written? Yes/No. If yes, how many? Do you think that any of the months in the next 10-year period will actually obtain average temperatures as cold or as warm as those you have written? Yes/No. If yes, how many? Next page: How well do you agree or disagree with the following statements:  The temperatures I wrote are improbable [impossible] because they, in my opinion, will not occur in the future Strongly disagree 1 2 3 4 5 Strongly agree.  The temperatures I wrote are improbable [impossible] because they, in my opinion, will seldom occur in the future Strongly disagree 1 2 3 4 5 Strongly agree.  The temperatures I wrote are improbable [impossible] because they are so far away from the normal for these months Strongly disagree 1 2 3 4 5 Strongly agree.  The temperatures I wrote are improbable [impossible] because they are incompatible with the current climatic conditions in Oslo Strongly disagree 1 2 3 4 5 Strongly agree. Appendix B. Questionnaire used in Experiment 2 You will now read four short vignettes describing past performances of different objects (e.g., how long computer batteries are lasting). In each vignette, you will read four predictions. Condition 1: Your task will be to judge to what extent these predictions are correct. A prediction is correct when it describes accurately the chances of the event occurring. Condition 2: Your task will be to judge to what extent these predictions sound natural in the given context. A prediction sounds natural if it appears likely to be said in the described context or if it is easy to imagine someone saying it in the given situation. B.1. Computer batteries A sample of computers of the brand ‘‘Comfor’’ were tested to check how long the batteries last before they

135

K.H. Teigen et al. / Cognition 127 (2013) 119–139

need to be recharged. All computers were used by students for lecture notes and similar purposes. The figure below shows how many batteries lasted how many hours (duration is rounded to the nearest half hour).

figure below shows how much the jeans shrunk in length (shrinkage is rounded to the nearest half cm). 40 30

50 40

20

30

10

20

0 0.4 in

0.6 in

0.8 in

1 in

1.2 in

10 0 1.5h

2h

2.5h

3h

3.5h

Based on these results, please rate to what extent each of the four statements below is correct [sounds natural] to describe the shrinkage of Kenvelo Jeans after washing: It It It It

is is is is

unlikely unlikely unlikely unlikely

that that that that

the the the the

battery battery battery battery

will will will will

last last last last

for for for for

1.5 h. 2.5 h. 3.5 h. 4 h.

Based on these results, to what extent each of the four statements below is correct [sounds natural] to describe a Comfor battery duration? Responses were provided on a 6-point scale ranging from 0: not at all correct [natural] to 5: Very correct [natural]. B.2. Weight reduction A weight reduction product ’’Taremare’’ based on seaweed shows the following results for ten men and women adhering to the diet over a period of 3 months.

#1 #2 #3 #4 #5 #6 #7 #8 #9 #10

Weight before (lb)

Weight after (lb)

Weight loss (lb)

201 194 181 225 161 165 212 187 176 172

181 176 170 212 146 154 198 179 154 165

20 18 11 13 15 11 13 9 22 7

Based on these results, please rate to what extent each of the four statements below is correct [sounds natural] to describe the effect of the Taremare program: It It It It

is is is is

unlikely unlikely unlikely unlikely

that that that that

one one one one

will will will will

lose lose lose lose

7 lb. 13 lb. 22 lb. 23 lb.

B.3. Jeans A sample of jeans of the brand ‘‘Kenvelo’’ were machine washed in the regular way and tested for shrinkage. The

It It It It

is is is is

unlikely unlikely unlikely unlikely

that that that that

Kenvelo Kenvelo Kenvelo Kenvelo

jeans jeans jeans jeans

will will will will

shrink shrink shrink shrink

0.4 in. 0.8 in. 1.2 in. 2 in.

B.4. Mail The postal services investigate how long it takes to send a letter from Norway to various addresses in the US. Ten letters are mailed on a regular Monday at 3 pm. 2 3 4 1

letters arrive letters arrive letters arrive letter arrives

on Wednesday. on Thursday. on Friday. next Monday.

Based on these results, to what extent each of the four statements below is correct [sounds natural] to describe the time it takes to send a letter from Norway to the US? It It It It

is is is is

unlikely unlikely unlikely unlikely

that that that that

a a a a

letter letter letter letter

will will will will

take take take take

3 days. 5 days. 7 days. 8 days.

Appendix C. Materials for Experiment 3 This questionnaire includes five short vignettes followed by a statement about the information contained in the vignette. Your objective is to complete the blank in each statement with the value that sounds best given the context. C.1. Computer battery A sample of computers of the brand ‘‘Comfor’’ were tested to check how long the batteries last before they need to be recharged. All computers were used by students for lecture notes and similar purposes. The figure below shows how many batteries lasted how many hours (duration is rounded to the nearest half hour).

136

K.H. Teigen et al. / Cognition 127 (2013) 119–139

C.4. Glomps Please read the following vignette; do not worry if you do not understand the meaning of some words. Shmulps have different numbers of glomps. The graph below shows the number of glomps of a sample of Shmulps.

Complete the sentence below with a number that seems appropriate in this context. It is [unlikely] [improbable] [doubtful] [a chance] that the battery in a Comfor computer will last for . . .. . . hours. C.2. Jeans shrinkage A sample of jeans of the brand ‘‘Kenvelo’’ was machine washed in the regular way and tested for shrinkage. The figure below shows how much the jeans shrunk in length (shrinkages are rounded to the nearest 0.2 in.).

Complete the sentence below with a number that seems appropriate in this context. It is [verbal probability] that a Shmulp will have . . .. . . glps.

Appendix D. Vignettes used in Experiment 4, Condition A [Condition B in brackets] D.1. Computer batteries

Complete the sentence below with a number that seems appropriate in this context. It is [verbal probability] that Kenvelo jeans will shrink . . .. . . in after washing.

A sample of computers of the brand ‘‘Comfor’’ was tested to check how long the batteries last before they need to be recharged. All computers were used by students for lecture notes and similar purposes. The figure below shows how many batteries lasted for how many hours (duration is rounded to the nearest half hour). 40

C.3. Diet A weight reduction product ’’Taremare’’ based on seaweed shows the following weight loss (in pounds) for a sample of men and women adhering to the diet over a period of 3 months.

30

20 10 0 2,5

3

3,5

4

4,5

Based on these results, what is appropriate to say? Rate the following statements (rating scales from 1: fits poorly, to 5: fits very well) according to how correct or natural they seem to be.

Complete the sentence below with a number that seems appropriate in this context. By adhering to the Taremare program it is [verbal probability] that a person will lose . . .. . . lb.

It is improbable that the battery in a Comfor laptop will last for 1 h [2 h]. It is improbable that the battery in a Comfor laptop will last for 2.5 h. It is improbable that the battery in a Comfor laptop will last for 4.5 h. It is improbable that the battery in a Comfor laptop will last for 6 h [5 h].

K.H. Teigen et al. / Cognition 127 (2013) 119–139

D.2. Snowy owls A study of snowy owls in Northern European countries reveals large seasonal variations. One study found that 30% of all eggs are hatched in the spring, 60% in the summer, 10% in the autumn and none in the winter. [Condition B starts with winter and ends with autumn]. Complete the sentence below: ‘‘It is improbable (unlikely) that a snow owl is hatched in . . .. . .. . .. . .. . .’’

137

[The figure in Condition B showed the same distribution, except that no student received an A and five students received F]. It is likely (probable) that a student gets . . .. . .. . . on this exam. It is unlikely (improbable) that a student gets . . .. . .. . . on this exam. Appendix E. Vignettes used in Experiment 5 E.1. Experiment 5a

D.3. Hotel ratings Hotel Charlotte in Kingston has received these ratings from a sample of 200 guests: 80 70

60 50

By the year 2100 experts estimate the sea level around Norway to have risen with 50–90 cm. Based on this, what do you think is natural (correct) to say? Evaluate the following five statements according to how correct they appear to be, by circling one number on each scale (1: Not correct 5: Completely correct). It is probable [improbable] that the sea level will rise with 40 cm; 50 cm; 70 cm; 90 cm; 100 cm.

40

E.2. Experiment 5b

30 20 10 0 Excellent

Complete evaluations:

Very good

these

Good

Below average

sentences

with

Poor

appropriate

It is improbable that a guest will evaluate this hotel as . . .. . .. . .. . .’’ It is not certain that a guest will evaluate this hotel as . . .. . .. . .. . . [Condition B received these statements in the reverse order]. D.4. Probable grades One hundred students stood for an exam in psychology with varying results. The figure below shows how many students obtained which grade in this course.

By the year 2100 experts estimate the sea level around Norway to have risen with 50–90 cm. Based on this, what do you think is natural (correct) to say? Fill in numbers that are appropriate in the following sentences: (1a) It is probable that the sea level around Norway will rise with . . . cm (1b) How probable do you think it is that the sea will rise with the amount you have estimated to be ‘‘probable’’? Give a number between 0% and 100% (2a) It is improbable that the sea level around Norway will rise with . . . cm (2b) How probable do you think it is that the sea will rise with the amount you have estimated to be ‘‘improbable’’? Give a number between 0% and 100%

E.3. Experiment 5c Imagine an expert describing the situation by the year 2100 in this way:

50 40 30 20 10

0 A

B

C

D

E

F

Based on these results, what is appropriate to say? Complete the sentences below with a grade that seems natural in the context.

‘‘It is improbable that the sea level around Norway will rise with 100 cm’’. (a) Which probability do you think he has in mind when he says ‘‘it is improbable’’? Give a number between 0% and 100% (b) If this expert should estimate how much the sea level around Norway is expected to rise, what would be appropriate to say? We expect the sea level to rise between . . . and . . . cm. (c) What is your probability for a sea rise of 100 cm? Give a number between 0% and 100%

138

K.H. Teigen et al. / Cognition 127 (2013) 119–139

Appendix F. Questionnaire used in final study F.1. What is the meaning of ‘‘improbable’’? This is a part of a research project about words and phrases that are commonly used in daily life, as well as in communicating expert opinions in various domains. This time we would like to know how you, in everyday language, use the term ‘‘improbable’’. 1. Imagine that you wake up in the middle of the night by the sound of a slamming door. You think it could be: (a) a burglar; (b) the wind; (c) a ghost; (d) a neighbor coming late home. Complete the sentence below with one of these alternatives in a way that seems natural: It is improbable that it is . . .. . .. . .. . .. . .. . .. . . Give a brief explanation of your answer: . . .. . .. . .. . .. . .. . .. . .. . .. . .. . . 2. John is touring Scotland. Looking out over Loch Ness, he sees at a distance a large object moving in the water. He thinks it could be: (a) the Loch Ness monster; (b) a shoal of fish; (c) a log adrift; (d) a submarine. Complete this sentence: It is improbable that it is . . .. . .. . .. . .. . .. . .. . . Give a brief explanation of your answer: . . .. . .. . .. . .. . .. . .. . .. . .. . .. . . 3. Levi is a French tourist hiking in the mountains around Tromsø. In a marsh he discovers some large footprints, and imagines it could be: (a) a moose; (b) a bear; (c) a reindeer; (d) a dragon. Complete this sentence: It is improbable that it is . . .. . .. . .. . .. . .. . .. . . Give a brief explanation of your answer: . . .. . .. . .. . .. . .. . .. . .. . .. . .. . .

References Bocklisch, F., Bocklisch, S. F., & Krems, J. (2012). Sometimes, often, and always: Exploring the vague meanings of frequency expressions. Behavior Research Methods, 44, 144–157. http://dx.doi.org/10.3758/ s13428-011-0130-8. Bonnefon, J.-F., & Villejoubert, G. (2006). Tactful or doubtful? Expectations of politeness explain the severity bias in the interpretation of probability phrases. Psychological Science, 17, 747–751. Bruine de Bruin, W., Fischhoff, B., Millstein, S. G., & Halpern-Felsher, B. L. (2000). Verbal and numerical expressions of probability: ‘‘It’s a fifty– fifty chance’’. Organizational Behavior and Human Decision Processes, 81, 115–131. Brun, W., & Teigen, K. H. (1988). Verbal probabilities: Ambiguous, context-dependent, or both? Organizational Behavior and Human Decision Processes, 41, 390–404. Budescu, D. V., & Wallsten, T. S. (1985). Consistency in interpretations of probabilistic phrases. Organizational Behavior and Human Decision Processes, 36, 391–405. Budescu, D. V., & Wallsten, T. S. (1995). Processing linguistic probabilities: General principles and empirical evidence. The Psychology of Learning and Motivation, 32, 275–318. Clark, D. (1990). Verbal uncertainty expressions: A critical review of two decades of research. Current Psychology: Research and Reviews, 9, 203–235. Clarke, V., Ruffin, C., Hill, D., & Beamen, A. (1992). Ratings of orally presented verbal expressions of probability by a heterogeneous sample. Journal of Applied Social Psychology, 22, 638–656.

Cosmides, L., & Tooby, J. (1996). Are humans good intuitive statisticians after all? Rethinking some conclusions from the literature of judgments under uncertainty. Cognition, 58, 1–73. Doupnik, T. S., & Richter, M. (2003). Interpretations of uncertainty expressions: A cross-national study. Accounting, Organizations and Society, 28, 15–35. Fischer, K., & Jungermann, H. (1996). Rarely occurring headaches and rarely occurring blindness. Is rarely = rarely? Journal of Behavioral Decision Making, 9, 153–172. Fischhoff, B., & Bruine de Bruin, W. (1999). Fifty–fifty = 50%? Journal of Behavioral Decision Making, 12, 149–163. Fox, C. R., & Ulkümen, G. (2011). Distinguishing two concepts of uncertainty. In W. Brun, G. Keren, G. Kirkebøen, & H. Montgomery (Eds.), Perspectives on thinking, judgment, and decision making (pp. 21–35). Oslo: Universitetsforlaget. Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102, 684–704. Gillies, D. (2000). Variants of propensity. British Journal of the Philosophy of Science, 51, 807–835. Hacking, I. (1975). The emergence of probability. Cambridge: Cambridge University Press. Hamm, R. M. (1991). Selection of verbal probabilities: A solution for some problems of verbal probability expression. Organizational Behavior and Human Decision Processes, 48, 193–223. Harris, A. J. L., & Corner, A. (2011). Communicating environmental risks: Clarifying the severity effect in interpretations of verbal probability expressions. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37, 1571–1578. Harris, A. J. L., Corner, A., & Hahn, U. (2009). Estimating the probability of negative events. Cognition, 110, 51–64. Hertwig, R., & Gigerenzer, G. (1999). The ‘‘conjunction fallacy’’ revisited: How intelligent inferences look like reasoning errors. Journal of Behavioral Decision Making, 12, 275–305. Horn, L. R. (1989). A natural history of negation. Chicago: The University of Chicago Press. Hsee, C. K. (1996). The evaluability hypothesis: An explanation for preference reversals between joint and separate evaluations of alternatives. Organizational Behavior and Human Decision Processes, 67, 247–257. Intergovernmental Panel on Climate Change (2007). Summary for policymakers: Contribution of working group I to the fourth assessment report of the Intergovernmental Panel on Climate Change. . Jones, S. K., Jones, C. T., & Frisch, D. (1995). Biases of probability assessment: A comparison of frequency and single-case judgments. Organizational Behavior and Human Decision Processes, 61, 109–122. Juanchich, M., Sirota, M., & Butler, C. L. (2012). Effect of the perceived functions of linguistic risk quantifiers on risk perception, severity and decision making. Organizational Behaviour and Human Decision Processes, 118, 72–81. http://dx.doi.org/10.1016/j.obhdp. 2012.01. 002. Juanchich, M., Teigen, K. H., & Gourdon, A. (2013). Top scores are possible, bottom scores are certain (and middle scores are not worth mentioning): A pragmatic view of verbal probabilities, submitted for publication. Juanchich, M., Teigen, K. H., & Villejoubert, G. (2010). Guilt as ‘‘likely’’ or as ‘‘not certain’’? Contrast as determinant of positive and negative verbal probabilities. Acta Psychologica, 135, 267–277. Just, M. A., & Carpenter, P. A. (1971). Comprehension of negation with quantification. Journal of Verbal Learning and Verbal Behavior, 10, 244–253. Kahneman, D., & Tversky, A. (1982). Variants of uncertainty. Cognition, 11, 143–157. Keren, G., & Teigen, K. H. (2001). The probability-outcome correspondence principle: A dispositional view of the interpretation of probability statements. Memory and Cognition, 29, 1010–1021. Kong, A., Barnett, O., Mosteller, F., & Youtz, C. (1999). How medical professionals evaluate expressions of probability. New England Journal of Medicine, 315, 740–774. Lagnado, D. A., & Sloman, S. A. (2007). Inside and outside probability judgment. In D. J. Koehler & N. Harvey (Eds.), Blackwell handbook of judgment and decision making (pp. 155–176). Oxford: Blackwell Publishing Ltd. Lem, S. (1985). The cyberiad. San Diego: Harcourt (original published in 1967). Levinson, S. C. (2000). Presumptive meanings: the theory of generalized conversational implicatures. Cambridge, MA: The MIT Press.

K.H. Teigen et al. / Cognition 127 (2013) 119–139 Lichtenstein, S., & Newman, R. (1967). Empirical scaling of common verbal phrases associated with numerical probabilities. Psychonomic Science, 9, 563–564. Mazur, D. J., & Hickam, D. H. (1991). Patients’ interpretation of probability terms. Journal of General Internal Medicine, 6, 237–240. McClelland, G., Schulze, W., & Coursey, D. (1993). Insurance for lowprobability hazards: A bimodal response to unlikely events. Journal of Risk and Uncertainty, 7, 95–116. Pace, P. (2007). Intelligence confidence levels. JP 2-0, Joint Intelligence (rev. Ed.), Appendix A. . Popper, K. (1959). The propensity interpretation of probability. British Journal for the Philosophy of Science, 10, 25–42. Popper, K. (1990). A world of propensities: Two new views of causality. Bristol: Thoemmes. Reagan, R., Mosteller, F., & Youtz, C. (1989). Quantitative meanings of verbal probabilityexpressions. Journal of Applied Psychology, 74, 433–442. Reeves, T., & Lockhart, R. S. (1993). Distributional versus singular approaches to probability and errors in probabilistic reasoning. Journal of Experimental Psychology: General, 122, 207–226. Renooij, S., & Witteman, C. (1999). Talking probabilities: Communicating probabilistic information with words and numbers. International Journal of Approximate Reasoning, 22, 169–194. Reyna, V. (1981). The language of possibility and probability: Effects of negation on meaning. Memory and Cognition, 9, 642–650. Reyna, V. F., & Brainerd, C. J. (1991). Fuzzy-trace theory and framing effects in choice. Gist extraction, truncation, and conversion. Journal of Behavioral Decision Making, 4, 249–262. Reyna, V. F. B., & Brainerd, C. J. (2008). Numeracy, ratio bias, and denominator neglect in judgments of risk and probability. Learning and Individual Differences, 18, 89–107. Riege, A.H., & Teigen, K. H. (in press). Additivity neglect in probability estimates: Effects of numeracy and response format. Organizational Behavior and Human Decision Processes. . Sanford, A. J., Fay, N., Stewart, A., & Moxey, L. (2002). Perspective in statements of quantity, with implications for consumer psychology. Psychological Science, 130–134. Sanford, A. J., & Moxey, L. M. (2003). New perspectives on the expression of quantity. Current Directions in Psychological Sciences, 12, 240–243. Schwarz, N. (1999). Self-reports: How the questions shape the answers. American Psychologist, 54, 93–105. Simpson, M., Breili, K., Kierulf, H. P., Lysaker, D., Ouassou, M., & Haug, E. (2012). Estimates of future sea-level changes for Norway. Technical report of the Norwegian mapping authority. . Slovic, P., Monahan, J., & MacGregor, D. G. (2000). Violence risk assessment and risk communication: The effects of using actual

139

cases, providing instruction, and employing probability versus frequency formats. Law and Human Behavior, 24, 271–296. Smits, T., & Hoorens, V. (2005). How probable is probably? It depends on whom you’re talking about. Journal of Behavioral Decision Making, 18, 83–96. Smithson, M., Budescu, D. V., Broomell, S. B., & Por, H.-H. (2012). Never say ‘‘not:’’ Impact of negative wording in probability phrases on imprecise probability judgments. International Journal of Approximate Reasoning, 53, 1262. Sunstein, C. S. (2002). Probability neglect: Emotions, worst cases, and law. Yale Law Journal, 112, 61–107. Taleb, N. N. (2007). The black swan: The impact of the highly improbable. London: Penguin Books. Teigen, K. H. (1988). The language of uncertainty. Acta Psychologica, 68, 27–38. Teigen, K. H., & Brun, W. (1995). Yes, but it is uncertain: Direction and communicative intention of verbal probabilistic terms. Acta Psychologica, 88, 233–258. Teigen, K. H., & Brun, W. (1999). The directionality of verbal probability expressions: Effects on decisions, predictions, and probabilistic reasoning. Organizational Behavior and Human Decision Processes, 80, 155–190. Teigen, K. H., & Filkuková, P. (2013). Can > will: Predictions of what can happen are extreme, but believed to be probable. Journal of Behavioral Decision Making, 26, 68–78. Teigen, K. H., Halberg, A.-M., & Fostervold, K. I. (2007). Single-limit interval estimates as reference points. Applied Cognitive Psychology, 21, 383–406. Teigen, K. H., & Jørgensen, M. (2005). When 90% confidence intervals are only 50% certain: On the credibility of credible intervals. Applied Cognitive Psychology, 19, 455–475. Teigen, K. H., Juanchich, M., & Filkuková, P. (2013). Verbal probabilities: An alternative approach, submitted for publication. Theil, M. (2002). The role of translations of verbal into numerical probability expressions in risk management: A meta-analysis. Journal of Risk Research, 5, 177–186. Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76, 105–110. Villejoubert, G., Almond, L., & Alison, L. (2009). Interpreting claims in offender profiles: The role of probability phrases, base rates and perceived dangerousness. Applied Cognitive Psychology, 23, 36–54. von Mises, R. (1961). Probability, statistics, and truth. London: Allen & Unwin (original published in 1928). Wallsten, T. S., Budescu, D. V., Rapoport, A., Zwick, R., & Forsyth, B. (1986). Measuringthe vague meanings of probability terms. Journal of Experimental Psychology: General, 115, 348–365. Weiss, C. (2007). Communicating uncertainty in intelligence and other professions. International Journal of Intelligence and Counter Intelligence, 21, 57–85.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.