A Bayesian Recognitional Decision Model

Share Embed


Descripción

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 1

A Bayesian Recognitional Decision Model Shane T. Mueller Applied Research Associates, Inc. ABSTRACT: The recognition-primed decision (RPD) model (Klein, 1993) is an account of expert decision making that focuses on how experts recognize situations as being similar to past experienced events and thus rely on memory and experience to make decisions. A number of computational models exist that attempt to account for similar aspects of expert decision making. In this paper, I briefly review these extant models and propose the Bayesian recognitional decision model (BRDM), a Bayesian implementation of the RPD model based primarily on models of episodic recognition memory (Mueller & Shiffrin, 2006; Shiffrin & Steyvers, 1997). The proposed model accounts for three important factors used by experts to make decisions: evidence about a current situation, the prior base rate of events in the environment, and the reliability of the information reporter. The Bayesian framework integrates these three aspects of information together in an optimal way and provides a principled framework for understanding recognitional decision processes.

Introduction and Background THE RECOGNITION-PRIMED DECISION (RPD) MODEL (KLEIN, 1989, 1993, 1998) IS A conceptual model that arose from naturalistic observation of expert human decision makers. Its basis in naturalistic settings and applied research helps to identify the types of decisions and situations that appear most relevant to expert decision makers in their work. These situations often involve “time pressure, uncertainty, ill-defined goals, high personal stakes, and other complexities” (Lipshitz, Klein, Orasanu, & Salas, 2001, p. 332). The main insight of the RPD model is that the skill of expert decision making lies in making sense of the current situation, identifying past situations that were similar, and using the workable actions that were taken in the past to guide current choices. In contrast to the processes hypothesized by more classic theories of decision making, naturalistic decision making is not about balancing trade-offs between choices, estimating probabilities, or assessing the expected utility of features or options. The RPD model describes a fairly complex set of processes used by experts to make decisions. Despite the fact that the model incorporates concepts such as pattern matching, recognition memory, and situation-action decision rules (all of which constitute psychological theories that have been embedded or implemented in computational and mathematical models), it is fundamentally a conceptual model

ADDRESS CORRESPONDENCE TO: Shane T. Mueller, Klein Associates Division, ARA Inc., 1750 N. Commerce Center Blvd., Fairborn, OH; [email protected]. Journal of Cognitive Engineering and Decision Making, Volume 3, Number 2, Summer 2009, pp. xxx–xxx. DOI 10.1518/155534309X441871. © Human Factors and Ergonomics Society. All rights reserved.

1

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 2

of decision making, without a computational or mathematical implementation. Yet there are aspects of the RPD theory that are similar to a number of computational models of decision making, and so these other models may be able to provide reasonable mechanistic and formal models of the RPD theory. For example, at a high level, random walk and accumulator models arising out of the stochastic process tradition (Busemeyer & Townsend, 1993; Ratcliff, 1978; Usher & McClelland, 2001) offer explicit computational mechanisms for dealing with time-pressured decision making. Although typical descriptions of stochastic models such as decision field theory (Busemeyer & Townsend, 1993) focus on the use of a decision threshold to drive a response, these models typically allow decisions to be made at arbitrary deadlines using the current balance of information. This is a fundamental aspect of such models that separates them from all-or-nothing decision processes (cf. Meyer, Yantis, Osman, & Smith, 1985, for discussion). These models accumulate evidence for different options in microsteps and come to a decision when evidence for an option passes some boundary. To the extent that RPD was born out of situations requiring immediate, time-sensitive action, these models may offer insights into a similar domain and provide natural accounts of satisficing (making a decision that is good enough) that are central to the RPD model. Furthermore, some of these models have been closely connected to models of memory retrieval, which is also central to the RPD model. For example, Ratcliff’s (1978) seminal paper introducing the diffusion model was entitled “A Theory of Memory Retrieval” (and see also Shiffrin, Ratcliff, & Clark, 1990). However, these stochastic models typically require multiple well-defined choice alternatives to be available at the outset of the process. In contrast, the RPD model needs to retrieve only a single option to begin evaluating it, and this option is often good enough that it does not need to be compared with alternatives. In the stochastic modeling tradition, many of the processes described by the RPD model have been given names such as “option generation” (Johnson & Raab, 2003; Raab & Johnson, 2007). This approach treats option generation as a preprocessing step along the way to decision making, which is reasonable for theories based on empirical paradigms in which research participants are given carefully crafted, clearly defined options to choose among in laboratory settings. However, even when option generation is considered, the focus remains on choice among options, rather than evaluation of individual options. As an example of this focus, although typical random walk models of decision making (e.g., Ratcliff, 1978) require the comparison of two and only two options, some alternatives (e.g., Usher & McClellend, 2001) have developed alternative architectures to get around this limitation. However, this was done by increasing the comparison process to three, four, or more options, rather than decreasing it to evaluating the goodness of a single option. In contrast, in many natural situations, the world does not present decision makers a set of clear alternatives from which to choose. Shapiro and Kacelnik (2007) provided an interesting example of single-option choices among starlings. Choice behavior for these birds appears to be optimized

2

Journal of Cognitive Engineering and Decision Making / Summer 2009

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 3

for sequential rather than simultaneous encounters with feeding opportunities, because rarely does the world present a starling with two alternative feeding opportunities it must choose between. Instead, the starling must decide whether taking advantage of the current opportunity is worthwhile in comparison with other unknown opportunities. The starling’s decision problem is similar to that faced in the so-called secretary problem (e.g., Seale & Rapoport, 1997), in which a person must make a series of sequential decisions about whether to take a course of action (to hire a secretary). In both cases, as in the RPD model, a rich background knowledge of what constitutes a good option eliminates the need for (and is preferable to) comparison between options. For the RPD model, option generation is the tail that wags the dog of classical decision making. The skills that make experts good at their decisions consist of their ability to generate good options—perhaps so much so that there is often no need to generate multiple options—and to evaluate them against one another. Consequently, these stochastic models of decision making remain rooted in a tradition of comparison and choice among options and may not be well suited for developing a computational model of naturalistic expert decisions. One domain of computational models that is perhaps more relevant to RPD involves episodic and semantic memory. The RPD model assumes that decision making is rooted in past experiences, and so the process is essentially a question of identifying memories for previous events or situations that are similar to the current situation. A number of computational models have explored such decisions in the context of episodic memory, memory for lists of words or pictures, categorization, priming, lexical decision, and other related phenomena. Two prominent models are MINERVA 2 (Hintzman, 1984), and retrieving effectively from memory (REM; Shiffrin & Steyvers, 1997). The core assumptions of these models are that memories for events take on feature-based representations and are stored as individual traces. MINERVA 2 stores past memory entirely as episodes and uses similarity-matching rules to determine what is retrieved or recalled. The REM model allows for long-term memory prototypes as well as individual episodic traces to exist. This notion is somewhat more consistent with the originally articulated RPD model, which assumes that past situations eventually merge into prototypes that are used to mediate decisions. REM also frames the memory retrieval process as a Bayesian decision problem, which allows evidence, uncertainty, and prior probabilities to be incorporated in principled (and even optimal) ways. This Bayesian framing was at least partly inspired by Anderson’s (1990) development of the Adaptive Control of Thought – Rational (ACT-R) computational architecture, another model that embeds theories of memory retrieval with close connections to decision making. Although ACT-R uses mechanisms for decision making and choice that are fundamentally different from those of the REM and RPD models, it has been used to develop instance-based decision models (e.g., Gonzalez, Lerch, & Lebiere, 2003) that are similar in spirit to the RPD model.

A Bayesian Recognitional Decision Model

3

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 4

Following Hintzman’s (1988) extension of MINERVA 2 to a decision realm, Dougherty and colleagues (e.g., Dougherty, 2001; Dougherty, Gettys, & Ogden, 1999) developed the MINERVA-DM decision-making model rooted in episodic memory theory. MINERVA-DM resembles aspects of the RPD model in that past experiences are used as a basis for decision making. However, the goal of that model was to account for human errors in frequency, likelihood, and probability estimation. This motivation places the model outside the domain of RPD, as it focuses on tasks that are not relevant to much of expert decision making. Yet the essence of the model is to explore connections between decision making and memory, which is a critical insight of the RPD model. Recent updates of this modeling approach have focused on hypothesis generation (Thomas, Dougherty, Sprenger, & Harbison, 2008), which provides insight into a wide range of diagnostic classification tasks in which experts engage. These previous modeling efforts derive from academic efforts to understand decision-making processes. Given the RPD model’s basis in naturalistic and applied settings, it is perhaps not surprising that a number of agent-based modeling and simulation efforts have adapted or integrated RPD processes to provide more realistic predictions or simulations of human-like agents. For example, Warwick, McIlwaine, Hutton, and McDermott (2001) have described a MINERVA 2-based computational RPD model that resembles Hintzman’s (1988) and Dougherty’s (2001) models but focuses more on naturalistic decisions, with the goal of providing realistic predictions of expert decisions in simulated work environments. Similarly, Yen and colleagues (Fan, Sun, Sun, McNeese, & Yen, 2006; Fan, Sun, McNeese, & Yen, 2005; Fan, Sun, Sun, et al., 2005; Ji et al., 2007) have reported on developments of a multiagent modeling system called R-CAST. R-CAST is not a model of human cognitive processes per se; it is, rather, an artificially intelligent multiagent system in which agents have RPD decision-making capabilities. It is intended to model group behavior and to serve as reasonable teammates and decision aids. Similar agent-based simulation approaches to adapting RPD processes include Norling, Sonenberg, and Rönnquist’s (2000) and Norling’s (2004) attempts to integrate RPD decision processes into belief-desire-intention (BDI) agents; Sokolowski’s (2002, 2003, 2007) composite agent approach to implementing RPD models; Regal, Reed, and Largent’s (2007) use of an RPD decision module to evaluate command approach strategies in simulated war-gaming; and Raza and Sastry’s (2007, 2008) introduction of the RPD-Soar agent, which allows naturalistic decision making within a computational architecture. My goal in this paper is to present a new computational model of recognitional decision making, implemented as a Bayesian decision process. Bayesian theory is a way to perform inference about different hypotheses based on both evidence and prior knowledge. A number of Bayesian models of psychological processes exist, and they offer principled approaches to understanding how evidence is used in various decision processes. These include decisions about episodic memory (REM: Shiffrin & Steyvers, 1997; BCDMEM: Dennis & Humphreys, 2001), semantic

4

Journal of Cognitive Engineering and Decision Making / Summer 2009

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 5

knowledge (REM-II: Mueller & Shiffrin, 2006), perceptual judgment (ROUSE: Huber, Shiffrin, Lyle, & Ruys, 2001), and word recognition (the Bayesian reader: Norris, 2006). Although there are many ways to develop models using Bayesian principles, these models share the property that they hypothesize prior probabilities of different events and use observed evidence to perform inference about the true likelihood of those events (called the posterior likelihood). They also allow principled use of information. For example, in comparison with a somewhat arbitrary similarity-matching process employed by MINERVA 2, REM equates similarity with the probability that one memory trace generated another. Such models are not optimal in the sense that they are always correct; rather, they allow for optimal use and combination of information, under specified limitations (such as memory errors or biases). These models suggest methods for implementing a Bayesian computational model of recognitional decision making. Along with allowing prior expectations to be combined with observed evidence, the Bayesian framework also allows one to specify directly a decision maker’s model of how the evidence was generated. Consequently, the trust in the reporter or perceived reliability of a sensor can be incorporated as well. These concepts form the basis of the Bayesian recognitional decision model I describe next.

A Bayesian Recognitional Decision Model The Bayesian recognitional decision model (BRDM) adopts a Bayesian framework for both theoretical and practical reasons. (The RPD model described by Klein, 1993, 1998, is fairly extensive and covers processes such as pattern matching, action selection, analogy, and mental simulation. The aspect of the model covered in this paper is focused on the process of recognizing relevant past events, and I use the term recognitional decision to highlight the fact that this model does not incorporate many of the more complex and interesting aspects of behavior described by the RPD model.) Many computational decision models offer a number of free parameters with which to explain data. These can represent processes such as evidence sampling, decision thresholds, and intrinsic noise. Often, different sources of noise can masquerade as one another, and so the true source of effects can be difficult to determine (cf. Mueller & Weidemann, 2008). The Bayesian approach makes a default assumption that evidence is used optimally and, thus, the interesting processes lie elsewhere. This is reasonable as a default assumption because there are many ways to be suboptimal but only one way to be optimal. As such, it is perhaps the most parsimonious default assumption. In addition, many of the optimality assumptions can be relaxed if the circumstances warrant. Thus, although naturalistic decision making research was in part a reaction against classic models of optimal decision making (cf. Lipshitz et al., 2001), the BRDM does not violate these basic insights simply by virtue of its Bayesian nature. Rather, it may indeed help focus theory on identifying information and knowledge used by experts to make decisions.

A Bayesian Recognitional Decision Model

5

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 6

Empirical Context of Model Development This research is based partially on interviews with U.S. Army and U.S. Air Force officers responsible for chemical/biological defense. Their primary role is to evaluate sensor and human reports regarding the presence of chemical or biological threats and attacks. Because the incidence of actual events is very rare (many officers will not witness a true positive event during their entire careers), this might be considered an ultravigilance task. I will not provide a detailed account of their decision-making process but will summarize the three types of information these officers used to determine the validity of a threat: background information, the cues in the reports, and the reliability of the source of the reports. The interviews suggested that once a situation is identified, a fairly scripted sequence of events is executed for addressing that situation. This script is often part of doctrine and of the local techniques, tactics, and procedures, and indeed it is often preplanned. These scripts are well practiced and exercised, and although they are unlikely to fit a situation exactly, their execution is routinized once a situation is identified. The interviews suggested that an suspected attack is typically treated as a true chemical/biological incident until a convincing level of evidence can be established to eliminate the possibility. Next, I will describe the mechanisms by which the BRDM identifies a situation based on background information, environmental cues, and the reliability of the reporter. Model of the Environment For the BRDM to operate, one must first make assumptions about how events occur in the environment, are reported to the decision maker, and are represented by the decision maker. First, assume that a number of classifiable events might occur. In the chemical/biological weapons protection domain, these event classes might correspond to things such as conventional missile attacks, false alarms from environmental dust, or specific nerve agents delivered through ballistic missiles. These event classes have typical sets of features associated with them. The header row of Table 1 illustrates some of the potential features that signal different types of chemical attacks. For the example simulations in the next section, assume that each event class has a binary feature prototype (as in the first row of Table 1), in which each feature is either present or absent. In fact, even this

TABLE 1. Example Feature-Based Representations of Events and Reports About Events Possible Features Binary event prototype Probabilistic event prototype Report about event Feature base rate

Chemical Odor

Radiation

Mortar Explosion

Garlic Smell

Coughing

1 .9 1 .8

0 .01 0 .3

0 .2 0 .2

0 .1 1 .1

1 .95 1 .6

Note. The features and numbers in the table do not reflect the actual rate of feature occurrence in real-world settings.

6

Journal of Cognitive Engineering and Decision Making / Summer 2009

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 7

is probably not accurate and not necessary—environmental events can be defined probabilistically (e.g., the second row of Table 1). This distinction amounts to defining an event or object (e.g., a chair) by its most typical features (four legs, a seat, armrests, and a back) versus defining a chair by the probability of each feature occurring in chairs you have experienced (four legs 75% of the time, a seat 94% of the time, armrests 52% of the time, and a back 85% of the time). For any particular event, assume that these features are reported to a decision maker as either present or absent, as shown in the third row of Table 1. These events may have different a priori base rates of occurring in the environment (an example is depicted the fourth row of Table 1). So, for the examples in Table 1, although the features reported for a particular event (a chemical odor and coughing) may match the probabilistic prototype for a particular event fairly well, those features are also likely to occur “naturally” even when that event has not occurred. Often, naturally occurring classes happen with long-tail distributions, such that a few very common classes are very frequent, whereas many events happen rarely (see Zipf, 1949). Assume that this is accurate for this domain in order to assess the effect that differences in prior probability can have on recognitional decision making. In any specific situation, the state of the environment will be reported to the BRDM by a simulated sensor (either a human reporter or a networked device). The properties by which a sensor reports the state of the environment are captured in a probabilistic model, which the decision model later uses as a generative model to account for the observed data. In the simulation to be discussed, the sensor reports environmental features with a fixed level of reliability, r. When the sensor does not report accurately, the feature value it reports depends upon the base rate of that feature in the environment (an example of which is shown in the third row of Table 1). More formally, if B(r) is the result of a Bernoulli trial such that B(r) ⫽ 1 with probability r and 0 otherwise, and br⑀{0,1} is a particular result of one such trial, then the generating model for a sensor report (S) can be written as Si ⫽ br ⫻ B(Gi) ⫹ (1 ⫺ br) ⫻ B(Fi).

(1)

Here, r is the reliability of a sensor, Gi is the ground truth of for feature i, Fi is the background frequency distribution of each feature i, and Si is the resulting features reported to the decision model. So, for an event and a feature base rate shown in Table 1, one might observe a report like the one shown in the fourth row of Table 1. There, the event prototype was mostly correct; its one incorrect feature occurred for the fourth feature, which was a likely error to come from the feature base rate. Model of the Decision Maker Learning For the BRDM to operate, it needs to learn about the environment in which it is operating. In real situations, this experience comes from a number of sources: from textbook resources, from anecdotes and wisdom passed down by other officers, and from experienced training exercises and actual events. To simulate this

A Bayesian Recognitional Decision Model

7

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 8

Figure 1. Features for five hypothesized states of the world (top panel) and the corresponding representations learned by the model through experience with those situations (bottom panel). Each row depicts an event class, and each cell depicts the relative importance of each feature, with darker cells indicating higher probabilities. learning process, one exposes the model to a large number of exemplars of each of the relevant events. The relative probabilities of event classes are specified according to Zipf’s law (see a more detailed description in Demonstration 1, as well as in Zipf, 1949), and reports generated with noise introduced by a probabilistic model of information reporting, which are described next. The BRDM learns two primary things about these events: (a) a typical featural prototype for each event class and (b) a base rate for each event in the environment. Although individual reports provide an incomplete and unreliable view of the world, the model learns prototypes by accumulating an average representation of the observed exemplars. That is, if three exemplars of an event class had features [1 1 1 0 0] and two had features [1 1 0 0 0], its composite representation would be [1 1 .6 0 0]. Thus, an event class prototype becomes the average feature distribution of events having that label. The environmental model generates information in a noisy fashion (sometimes erroneously sampling from the base rate of a feature instead of the ground truth), and so the representations for low-frequency states become contaminated by features from high-frequency states (see Figure 1). In Figure 1, each row represents an event class, and each cell shows the relative strength of a feature for that event, with darker cells indicating greater importance. For five states with 20 features, the true environmental prototypes are shown in the top panel. After 10,000 reports, sampled according to the power law distribution and generative model described earlier, the model’s event prototypes resemble the original environmental

8

Journal of Cognitive Engineering and Decision Making / Summer 2009

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 9

prototypes, but the high-frequency events (toward the bottom) dominate the base rate and so features from those high-frequency events bleed into the low-frequency events because of the error-generating process. The model also learns the base rate of different threats in the environment. In specific contexts and situations (domestic vs. deployed; wartime vs. peacetime; and depending upon the capabilities of the enemy), the officer will have a fairly clear picture of which threats are most likely, attained via coordination with his or her intelligence resources and other officers in the region. In practice, much of the officer’s work involves learning how likely these different threats are (rather than simply watching for evidence of attacks). In the model, this knowledge is captured by accumulating an empirical event counter that provides a simple frequency distribution of experienced events. Clearly, if this base rate is inferred from other sources, behavior may produce biases such as depicted by the representativeness heuristic (Tversky & Kahneman, 1974). Model of the Recognitional Decision Process Once a reasonable representation for a set of events has been learned, the model is able to classify new events based on its past experience. It receives a report and classifies the event by identifying which of its memory prototypes was most likely to have generated the experienced event. To do so, the model weighs the evidence inherent in the report with the background probability of such a threat and with information regarding whether or not the reporter is reliable. The features of that message are compared in parallel with the relevant prototypes in long-term memory, and for each, a likelihood is computed, which integrates the reliability of the sensor, the current information, and the background knowledge about possible threats. This is done via a mental model of the reporting process, which corresponds to the model’s beliefs about the generating process for that report. Assume that the basic form of this mental model is accurate, although the actual parameters (e.g., base rate, reliability, feature strengths) might not be accurate. The basic decision process is one of determining whether or not the current evidence favors a particular event class. This is done by computing a posterior likelihood ratio, comparing the likelihood of a particular class (given the evidence) with the likelihood of nothing happening (given the evidence). This is called the posterior decision likelihood ratio, and its formula is ⎡rˆ × Gˆ ik + (1 − rˆ ) × Fˆ i ⎤⎦ (2) ␭ik = ⎣ Fˆ i

␭k =

qˆ (k ) ∏␭ , 1 − qˆ (k ) i ik

(3)

in which ␭ denotes the decision likelihood ratio; qˆ(k) is the decision maker’s estimate of the base rate of event class k; Gˆik is the estimate of Gik, the probability of feature i for event class k; Fˆ is an estimate of Fi, the probability of feature i across all

A Bayesian Recognitional Decision Model

9

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 10

experiences; and rˆ is the model’s estimate of the reliability with which the sensor reported the data. Note that all of the values used in this computation are estimates of the values used to generate the data: G and F are the model’s long-term memory for the probability of feature occurrence in given event classes; qˆ is the estimate for the base rate of event classes; and rˆ is the estimate for the reliability of the sensor. This likelihood comparison process allows decisions about events to be made in isolation: There is no need to choose between specific events. An event class is judged as satisfactory on its own merits, rather than in comparison with an alternative. An event that produces a satisfactory result might lead to action, even if there are other events that are better fits. This aspect of the decision process has some strong correspondences to the basic decision mechanisms hypothesized by the RPD model. If the report was more likely to have come from a specific event class than it was to have arisen by chance, the likelihood ratio (i.e., ␭ik in Equation 2) is greater than 1, and the model will consider this option. For a given report, this process could generate a number of outcomes. If the report has one prototype whose value ␭ is greater than 1.0, and the rest are less than 1.0, there is clear evidence for that event class. However, this situation might not always happen. In other situations, no events may have values of ␭ that are greater than 1.0, and so the event with highest value of ␭ might be selected as a working hypothesis, and more evidence will be gathered to corroborate that information. In other situations, multiple event prototypes might have ␭ values greater than 1.0. In these cases, the situation is less clear, but one might either choose the most descriptive option or gather more information that distinguishes between the attractive options. The particular strategies for disambiguation and corroboration probably differ across domains, depending on the costs associated with gathering information, the costs of errors, and the benefits of correct identification. Example Calculation Table 2 shows an example calculation of the likelihood ratios (␭) for two messages (A and B), compared with two event prototypes (Events 1 and 2). In this example, assume that the prior probabilities of these events are all equal to .5 and that the BRDM estimates this accurately. This has the effect of multiplying the likelihood product by (.5)/(1 – .5) which is equal to 1.0, and so has no impact. In this example, Message A is generated by a more reliable sensor (r ⫽ .9) than the one generating Message B (r ⫽ .2). Assume that the model accurately estimates this probability (rˆ ⫽ r). Thus, for Message A, matches are given more positive evidence, and mismatches are given more negative evidence. Message A shares two feature values with Event 1 and four with Event 2; B shares two features with Event 1 and three with Event 2. The values computed under p(A|Event Class 1) show the probability that the particular feature of A was generated by the Event 1. The next column shows the ratio between that probability and the probability of the feature arising from any

10

Journal of Cognitive Engineering and Decision Making / Summer 2009

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 11

TABLE 2. Sample Calculation of Likelihood Ratio for One Reliable Message (A) and One Unreliable Message (B) A [11000] r ⫽ .9

1 0 0 1 1 Product (␭)

1 1 1 0 0 Product (␭)

B [10101] r ⫽ .2

p(A|Event Class 1)

␭(A,Event Class 1)

p(B|Event Class 1)

␭(B|Event Class 1)

.9*1 ⫹ .1*.8 ⫽ .98 .9*0 ⫹ .1*.3 ⫽ .03 .9*1 ⫹ .1*(1⫺.2) ⫽ .98 .9*0 ⫹ .1*(1⫺.1) ⫽ .09 .9*0 ⫹ .1*(1⫺.6) ⫽ .04

Event Class 1 .98/.8 ⫽ 1.225 .03/.3 ⫽ 0.1 .98/(1⫺.2) ⫽ 1.225 .09/(1⫺.1) ⫽ 0.1 .04/(1⫺.6) ⫽ 0.1 0.0015

.2*1 ⫹ .8*.8 ⫽ .84 .2*1 ⫹ .8*(1⫺.3) ⫽ .76 .2*0 ⫹ .8*.2 ⫽ .16 .2*0 ⫹ .8*(1⫺.1) ⫽ .72 .2*1 ⫹ .8*.6 ⫽ .68

.84/.8 ⫽ 1.05 .76/(1⫺.3) ⫽ 1.09 .16/.2 ⫽ 0.8 .72/(1⫺.1) ⫽ 0.8 .68/.6 ⫽ 1.13 0.827

p(A|Event Class 2)

␭(A,Event Class 2)

p(B|Event Class 2)

␭(B,Event Class 2)

.9*1 ⫹ .1*.8 ⫽ .98 .9*1 ⫹ .1*.3 ⫽ .93 .9*0 ⫹ .1*(1⫺.2) ⫽ .08 .9*1 ⫹ .1*(1⫺.1) ⫽ .99 .9*1 ⫹ .1*(1⫺.6) ⫽ .94

Event Class 2 .98/.8 ⫽ 1.225 .93/.3 ⫽ 3.1 .08/(1⫺.2) ⫽ 0.1 .99/(1⫺.1) ⫽ 1.1 .94/(1⫺.6) ⫽ 2.35 0.982

.2*1 ⫹ .8*.8 ⫽ .84 .2*0 ⫹ .8*(1⫺.3) ⫽ .56 .2*1 ⫹ .8*.2 ⫽ .36 .2*1 ⫹ .8*(1⫺.1) ⫽ .92 .2*0 ⫹ .8*.6 ⫽ .48

.84/.8 ⫽ 1.05 .56/(1⫺.3) ⫽ 0.8 .36/.2 ⫽ 1.8 .92/(1⫺.1) ⫽ 1.02 .48/.6 ⫽ 0.8 1.24

Note. The base rate of features in the environment is (.8, .3, .2, .1, .6).

other event (reflected by the feature base rate). When the product of all these values is found, one obtains the posterior probability that a specific event generated the observed event. Both messages provide fairly strong evidence against Event 1. Message A gives odds of about 100:1 against, and Message B gives odds that are slightly biased against (.83). However, the odds that Event 2 generated A are close to even (.98), and the odds that Event 2 generated message B are about 5:4 (1.24) in favor. Thus, Message A provides equivocal evidence for the two states, but Message B favors Event 2. This example illustrates the basics of how information is used by the model to identify the state of the world. Once an event class is identified, this will activate prototypical action scripts, which will be altered and executed as appropriate. These mechanisms are outside the domain of the decision model described here. The example also demonstrates a curious and counterintuitive result. Sensor A is more reliable than Sensor B, and it matches Event 2 on more features than Sensor B does, but it ends up being ambivalent about whether Event 2 generated the message. In contrast, the unreliable Sensor B provides greater evidence in favor of Event 2. This happens because errors count against A more than B, and the error that A made (Feature 3) was one that was unlikely to have occurred by chance. In contrast, Sensor B mismatched Event 2 on features that were more likely to have matched by chance. A simulated example of this effect is shown in Demonstration 2. But first, Demonstration 1 will show the effects that environmental base rate can have on the probability of accurately classifying events.

A Bayesian Recognitional Decision Model

11

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 12

Demonstration 1: Base Rate Manipulations To further demonstrate the basic operation of the model, I next report results of Monte Carlo simulations of the model operation. In these demonstrations, the simulated environment contained five states of interest, each defined by a sparse sampling across 20 binary features. The top panel of Figure 1 shows a typical initial configuration, with each prototype having five or fewer “on” features. These five world states occurred with a power law distribution, such that State 1 had the highest frequency and State 5 had the lowest frequency. In order, the states happened with probability .39, .22, .16, .13, and .11, such that state i has frequency proportional to 1/i␥ (here with ␥ ⫽ .8). This distribution mimics commonly found, naturally occurring power-law frequency distributions, as has been described by Zipf (1949). The system learned concepts based on these environmental prototypes, based on the sensor generation process described earlier. Because the base rate across features is composed mostly of high-frequency items, and the error generation process produces features from this base rate, concepts tend to grow similar to the base rate because of errors. The bottom panel of Figure 1 shows the learned representations based on the world states after 10,000 sampled events. Notice that “phantom” features appear in lower frequency event classes because of these errors. Frequent events have representations that are similar to the base rate, and so the evidence evaluation process should have trouble distinguishing these particular events from things that happen all the time. In contrast, rarer events tend to be less similar to the average, and so they should be easier to identify. But the model also incorporates its estimates of the environmental base rate of these events. If the base rate is used optimally, the high-frequency events (which should be harder to identify) will benefit from being frequent because an optimal model will be willing to classify the event with less direct evidence. In contrast, lower frequency event classes will provide stronger evidence for their existence but will be hurt from being less frequent, as an optimal model will need more evidence to classify the event. For any given decision context, it is unclear exactly how prior probabilities are used by an expert or novice decision maker. Consequently, I will bracket performance by demonstrating two extremes: one model in which prior odds are not used at all, and a second in which prior odds are used optimally. If one disregards the cases in which base rate information may be used antipathetically (which could be an interpretation of the gambler’s fallacy), the behavior of most experts should fall somewhere between these two bounds. To examine this, I simulated 50 sets of event classes as described previously, then encoded 250 exemplars of each of the five states and computed the likelihood according to the model that each exemplar was generated from each of the five world states. Figure 2 shows the three measures of decision quality when prior base rate of event classes is ignored. The left panel shows mean log-likelihood of the evidence strength. Each line shows the mean value of a message generated from each event type, compared with each of the five event types. Spikes in the

12

Journal of Cognitive Engineering and Decision Making / Summer 2009

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 13

Figure 2. Measures of decision quality, unweighted by base rate information (average over 250 exemplars of each event class for 50 randomly sampled messages per class). likelihood occur for each “correct” event-message match, and the mean value of these matches is typically around 1.0, the “natural” decision bound for this task. Here, high frequency events produce the weakest evidence on average, whereas rarer events produce stronger evidence. The average likelihood for correct decisions is a bit misleading, so the center panel shows the proportion of events that surpassed this “natural” even-odds threshold. Thus, if this threshold were used as a decision threshold in this environment, the decision maker would take action in response to real events about 8 times out of 10. However, false alarms are also substantial, with roughly 2 out of every 10 events being falsely classified. The right panel shows how often the highest likelihood state was also the correct one. Here, a considerable proportion of messages would be improperly classified. The base rate effect in Figure 2 reverses when knowledge of base rates is incorporated into the decision process. Figure 3 shows the same three performance measures as those in Figure 2 when base rate information is taken into account for the same simulated data. Here, rare items produce lower likelihoods and are less likely to be chosen. The results of this simulation illustrate a number of fundamental behaviors. First, it shows that in principle, the BRDM can produce highly accurate event classification. Incorrect options rarely provide positive evidence for a specific event class, but messages with enough noise sometimes do not provide enough evidence for a specific event. Furthermore, the base rate of events in the environment produces opposite effects, dependent on how strongly prior evidence is weighted. In principle, this suggests a testable hypothesis for the BRDM: Manipulating how strongly prior evidence is considered should reverse frequency-dependent accuracy effects.

A Bayesian Recognitional Decision Model

13

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 14

Figure 3. Measures of decision quality, weighted by base rate information (average over 250 exemplars of each event class for 50 randomly sampled messages per class).

Demonstration 2: Information Trust The interviews with domain experts indicated that their confidence and trust in the source of a report was a strong factor in determining how the evidence presented should be considered and what actions should be taken. The BRDM offers an account of how the reliability of an information source might be incorporated into the decision process, via a mental model of the reporting process. In the BRDM, a state is classified by computing a posterior likelihood value of observed data given a specified mental model of the generation process. As discussed earlier, a sensor reports accurate information with probability r, and otherwise information is reported in proportion to each feature’s natural occurrence. To assess the likelihood of evidence given the hypothesized model, the decision maker incorporates an estimate of this reliability (rˆ) into his or her mental model of data generation. This assessment may be inaccurate and be lower or higher than the sensor’s true reliability. For a report from an unknown sensor, or one with known poor reliability, rˆ might be relatively low (e.g., rˆ ⫽ .2). This corresponds to the boy who cried “wolf.” If the sensor’s reliability truly is low (r ⫽ .2), the model will optimally account for evidence produced by the sensor. If the sensor’s reliability is higher, underestimating the reliability will produce distrust for the data, and it will be less likely to lead to an action. Similarly, for a known reporter with good reliability (e.g., a verbal report from a trusted member of the team), estimated reliability might be relatively high (e.g., rˆ ⫽ .8). Under typical conditions, this reporter’s “true” reliability might also be similarly high (r ⫽ .8), and so the evidence will be accounted for optimally by the decision maker. However, if the sensor is temporarily reporting inaccurately, or the human reporter is tired, confused, or misled, the reliability

14

Journal of Cognitive Engineering and Decision Making / Summer 2009

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 15

of a report might in fact be lower. Here, the reliability would be overestimated, and the evidence would be given more strength than is warranted. A simulated example of these effects is shown in Figures 4 and 5. One interesting result of this model is that it makes a somewhat counterintuitive prediction. One might think that overestimating the reliability of a report will be more likely to lead to action, yet this will happen only if the sensor is reporting environmental cues accurately. A higher value of rˆ will give stronger weight to all the cues, including those that mismatch. Consequently, for a relatively erroneous report, using a lower estimate for rˆ can be more likely to lead to action. The way in which the BRDM handles reliability can be illustrated by considering the following example. Suppose you want to know whether to buy what looks like a rare antique clock offered for sale at a local junk shop. To understand the true value, you can either obtain the value assessed by an appraiser or try to find information on the Internet. You might find the same information from both sources (e.g., that it is a rare clock and is worth 10 times the offered price), but the assessment of a trusted expert would lead you to purchase the clock, whereas the advice you get online may make you hesitant. This describes the basic reliability effect. Now, suppose the appraiser’s estimate says, “This is an beautiful antique clock. Its inner workings are intact and preserved. It is made of a rare South American hardwood. But it is an example of a period reproduction, and it has little value.”

Figure 4. Mean log-likelihood of correct identification for sensors with high- and low-reliability r and for which the reliability is overestimated and underestimated. Unreliable sensors tend to produce reports with lower log likelihood values, whereas underweighting or overweighting the reliability of a sensor impacts the mean likelihood of the reports as well.

A Bayesian Recognitional Decision Model

15

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 16

Figure 5. Probability of correct identification for sensors with high- and low-reliability r and for which the reliability is overestimated and underestimated. Unreliable sensors tend to produce reports with lower accuracy than do reliable sensors, but overweighting the reliability of those sensors has counterintuitive effects: Highly reliable sensors, when overweighted, produced lower accuracy in some conditions. When you read this, you begin to wonder whether the appraiser is trying to convince you that the relic it is worthless, in order to acquire it behind your back. Now, if you place high trust in the appraiser, you might not purchase the clock. However, if you mistrust the appraiser, this negative evidence might be ignored and you might purchase the clock anyway. This is the essence of how misestimating the reliability of a source can have counterintuitive effects.

Summary and Future Extensions Although the model described here was designed to account for some of the decision-making processes of an expert monitoring sensor reports about chemical and biological weapons activity, it has wider applications in other contexts. Most obviously, it has natural implications for understanding the human factors involved in many sensor monitoring and information fusion tasks. Furthermore, it provides a new implementation for the recognition component of the RPD model and, in particular, brings into the picture notions of (a) the decision maker’s understanding of the base rate of events; (b) the decision maker’s mental model of a sensor (and its reliability); (c) the notion of evaluating an option on its own merits, rather than requiring the comparison between alternatives; and (d) the use of knowledge prototypes to make decisions rather than the use of just exemplar-based or casebased reasoning (cf. Gonzalez et al., 2003; Hintzman, 1988).

16

Journal of Cognitive Engineering and Decision Making / Summer 2009

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 17

Although the BRDM combines evidence optimally, the demonstrations I reported showed how errors can stem from a number of sources. For example, the report used by the decision maker is a noisy representation of the environment. This constitutes a major source of error in the model. Next, the BRDM uses learned representations for the events it knows about and learned base rate distributions across both features and events. These representations can be distorted for a number of reasons. Also, the BRDM uses a mental model of the evidence-reporting process, which could theoretically be misspecified in many ways, but which in the present example could be misspecified by having an inaccurate assessment of the reliability of the sensor. Together, these offer a number of sources of error in the model and, perhaps, for decision makers as well. The prior modeling of REM-II (e.g., Mueller & Shiffrin, 2006) provided some suggestions about new directions for the BRDM. That model was specifically developed to account for how information from an event is incorporated into knowledge structures to be used on later events. Furthermore, it was successfully deployed to learn meaningful contextual representations based on processing naturalistic text corpora (Mueller & Shiffrin, 2007). Thus, it is in principle possible to allow a model to learn the semantic context of a decision maker, in order to better predict his or her choices and to provide a better simulated teammate. In addition, the Bayesian framework enables alternative representations to be used and, so, has the opportunity to extend beyond simple feature-based representations of knowledge. The BRDM provides a formulation for the recognitional pattern-matching aspect of the RPD model. However, the original descriptions of the RPD model (e.g., Klein, 1993) go far beyond what the current generation of computational implementations capture. Most specifically, the model suggests that mental simulation is important for identifying whether a possible course of action is tenable. An important aspect of mental simulation is the need to have a mental model of the situation, so that potential results of actions can be assessed and either accepted or rejected. Small strides toward that goal have been made by the BRDM model by incorporating explicit mental models of how information is reported. I anticipate that developing these notions further, and allowing mental models to be a first-class aspect of knowledge, will increase understanding of how experts engage in mental simulation and how mental simulation supports decision making.

Acknowledgments Part of this research was presented at the Workshop on Developing and Understanding Computational Models of Macrocognition, Havre de Grace, Maryland, June 3–4, 2008, organized by Laurel Allender and Walter Warwick. This work was partly funded by the Defense Threat Reduction Agency. The author thanks Anna Grome and Beth Crandall for conducting interviews and developing qualitative models of chemical officers, upon which some of the present model is based, and Gary Klein and Rich Shiffrin for developing many of the ideas on which this model was based.

A Bayesian Recognitional Decision Model

17

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 18

References Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Erlbaum. Busemeyer, J. R., & Townsend, J. T. (1993). Decision field theory. Psychological Review, 100, 432–459. Dennis, S., & Humphreys, M. S. (2001) A context noise model of episodic word recognition. Psychological Review, 108, 452–478. Dougherty, M. R. (2001). Integration of the ecological and error models of overconfidence using a multiple-trace memory model. Journal of Experimental Psychology: General, 130, 579–599. Dougherty, M. R., Gettys, C. F., & Ogden, E. E. (1999). MINERVA-DM: A memory process model for judgments of likelihood. Psychological Review, 106, 180–209. Fan, X., Sun, B., Sun, S., McNeese, M., & Yen, J. (2006). RPD-enabled agents teaming with humans for multi-context decision making. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi Agent Systems (pp. 34–41). New York: Association for Computing Machinery. Fan, X., Sun, S., McNeese, M., & Yen, J. (2005). Extending the recognition-primed decision model to support human-agent collaboration. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multi Agent Systems (pp. 945–952). New York: Association for Computing Machinery. Fan, X., Sun, S., Sun, B., Airy, G., McNeese, M., Yen, J., et al. (2005). Collaborative RPDenabled agents assisting the three-block challenge in C2CUT. In Proceedings of the 2005 Conference on Behavior Representation in Modeling and Simulation (pp. 113–123). Orlando, FL: Simulation Interoperability Standards Organization. Gonzalez, C., Lerch, F. J., & Lebiere, C. (2003). Instance-based learning in real-time dynamic decision making. Cognitive Science, 27, 591–635. Hintzman, D. L. (1984). MINERVA 2: A simulation model of human memory. Behavior Research Methods, Instruments, and Computers, 16, 96–101. Hintzman, D. L. (1988). Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological Review, 95, 528–551. Huber, D. E., Shiffrin, R. M., Lyle, K., & Ruys, K. (2001). Perception and preference in shortterm word priming. Psychological Review, 108, 149–182. Ji, Y., Massanari, R. M., Ager, J., Yen, J., Miller, R. E., & Ying, H. (2007). A fuzzy logic-based computational recognition-primed decision model. Information Science, 177, 4338–4353. Johnson, J. G., & Raab, M. (2003). Take the first: Option-generation and resulting choices. Organizational Behavior and Human Decision Processes, 91, 215–229. Klein, G. A. (1989). Recognition-primed decisions. In W. B. Rouse (Ed.), Advances in man-machine system research (Vol. 5, pp. 47–92). Greenwich, CT: JAI Press. Klein, G. A. (1993). A recognition-primed decision (RPD) model of rapid decision making. In G. A. Klein, J. Orasanu, R. Calderwood, & C. E. Zsambok (Eds.), Decision making in action: Models and methods. Norwood, NJ: Ablex. Klein, G. A. (1998). Sources of power: How people make decisions. Cambridge, MA: MIT Press. Lipshitz, R., Klein, G., Orasanu, J., & Salas, E. (2001). Taking stock of naturalistic decision making. Journal of Behavioral Decision Making, 14, 331–352. Meyer, D. E., Yantis, S., Osman, A. M., & Smith, J. E. K. (1985). Temporal properties of rapid information processing: Tests of discrete versus continuous models. Cognitive Psychology, 17, 445–518. Mueller, S. T., & Shiffrin, R. M. (2006). REM II: A model of the developmental co-evolution of episodic memory and semantic knowledge. In Proceedings of the International Conference on Development and Learning. Bloomington: Indiana University Department of Psychological and Brain Sciences.

18

Journal of Cognitive Engineering and Decision Making / Summer 2009

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 19

Mueller, S. T., & Shiffrin, R. M. (2007). Incorporating connotation of meaning into models of semantic representation: An application to text corpus analysis. In D. S. McNamara & J. G. Trafton (Eds.), Proceedings of the 29th Annual Cognitive Science Society (pp. 64–70). Austin, TX: Cognitive Science Society. Mueller, S. T., & Weidemann, C. T. (2008). Decision noise: An explanation for observed violations of signal detection theory. Psychonomic Bulletin and Review, 15, 465–494. Norling, E. (2004). Folk psychology for human modelling: Extending the BDI paradigm. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems: Vol. 1 (pp. 202–209). Washington, DC: IEEE Computer Society. Norling, E., Sonenberg, L., & Rönnquist, R. (2000). Enhancing multi-agent based simulation with human-like decision making strategies. In Proceedings of the Second International Workshop on Multi-Agent-Based Simulation (pp. 214–228). Springer: New York. Norris, D. (2006). The Bayesian reader: Explaining word recognition as an optimal Bayesian decision process. Psychological Review, 113, 327–357. Raab, M., & Johnson, J. G. (2007). Expertise-based differences in search and option generation strategies. Journal of Experimental Psychology: Applied, 13, 158–170. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85, 59–108. Raza, M., & Sastry, V. V. S. (2007). Command agents with human-like decision making strategies. In Proceedings of the International Conference on Tools with Artificial Intelligence (pp. 71–74). Los Alamitos, CA: IEEE Computer Society. Raza, M., & Sastry, V. V. S. (2008). Variability in behavior of command agents with human-like decision making strategies. In Proceedings of the Tenth International Conference on Computer Modeling and Simulation (pp. 562–567). Los Alamitos, CA: IEEE Computer Society. Regal, R., Reed, R., & Largent, M. (2007, June). A quantitative model-driven comparison of command approaches in an adversarial process model. Paper presented at the 12th International Command and Control Research and Technology Symposium, Newport, RI. Seale, D. A., & Rapoport, A. (1997). Sequential decision making with relative ranks: An experimental investigation of the “secretary problem.” Organizational Behavior and Human Decision Processes, 69, 221–236. Shapiro, M. S., & Kacelnik, A. (2007). Simultaneous and sequential choice as a function of reward delay and magnitude: Normative, descriptive and process-based models tested in the European starling (Sturnus vulgaris). Journal of Experimental Psychology: Animal Behavior Processes, 34, 75–93. Shiffrin, R. M., Ratcliff, R., & Clark, S. E. (1990). List-strength effects: II. Theoretical mechanisms. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 179–195. Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM—Retrieving effectively from memory. Psychonomic Bulletin and Review, 4, 145–166. Sokolowski, J. A. (2002). Can a composite agent be used to implement a recognition-primed decision model? In Proceedings of the Eleventh Conference on Computer-Generated Forces and Behavioral Representation (pp. 473–478). Sokolowski, J. A. (2003). Enhanced decision modeling using multiagent system simulation. Simulation, 79, 232–242. Sokolowski, J. A. (2007). Representing knowledge and experience in RPDAgent. In Proceedings of the 12th International Command and Control Technology Symposium (ICCRTS) (pp. 419–422). Washington, DC: Command and Control Research Program. Thomas, R. P., Dougherty, M. R., Sprenger, A. M., & Harbison, J. I. (2008). Diagnostic hypothesis generation and human judgement. Psychological Review, 115, 155–185. Tversky, A., & Kahneman, D. (1974). Judgement under uncertainty: Heuristics and biases. Science, 185, 1124–1130.

A Bayesian Recognitional Decision Model

19

Vol_3-2-03-Mueller_0900011.qxd

6/1/09

9:24 AM

Page 20

Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108, 550–592. Warwick, W., McIlwaine, S., Hutton, R. J. B., & McDermott, P. (2001). Developing computational models of recognition-primed decision making. In Proceedings of the Tenth Conference on Computer Generated Forces (pp. 323–331). Orlando, FL: Simulation Interoperability Standards Organization. Zipf, G. K. (1949). Human behavior and the principle of least-effort. Cambridge, MA: AddisonWesley. Shane T. Mueller, PhD is a senior research scientist at the Klein Associates Division of Applied Research Associates, Inc. His research interests include mathematical, computational, and statistical models of human behavior, including knowledge formation, episodic memory, and decision making.

20

Journal of Cognitive Engineering and Decision Making / Summer 2009

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.