Precision, Probability, Paradox, and Prudence

June 23, 2017 | Autor: Jeremy Shipley | Categoría: Probability Theory, Formal Epistemology, Philsophy of Probability
Share Embed


Descripción

Precision, Probability, and Paradox Jeremy Shipley [email protected] October 30, 2015

Abstract I argue that parametrization paradoxes (e.g., Bertrand’s Paradox) are problematic for proponents of precise doxastic probabilities (i.e., credences) uniquely fixed by a principle of indifference. In particular, Roger White’s argument that Bertrand’s Paradox extends to a general paradox about evidential symmetry fails. White’s arguments against decision making with imprecise probabilities, based on a puzzle derived from the phenomena of dilation, are not persuasive. Neither, I argue more briefly, is Adam Elga’s argument. Indeed, modelling decisions with imprecise probabilities permits an appealing account of the virtue of prudence. Having responded to White and Elga, I compare my point of view with that of James Joyce. Keywords: subjective probability, imprecise probability, Bertrand Paradox, prudence, decision theory

1

1

Introduction

Informally the Principle of Indifference [PoI]1 says: If you have absolutely no idea how things are going to “shake out,” just count up the ways things could go and divide the probability equally among them. In the case of a continuous random variable this may mean selecting a uniform prior probability density function as a “reference prior.” The PoI is subject to paradoxes of parameterization. That is, the idea to “count up the ways things could go” depends on a way of dividing up possible outcomes: i.e., dependent on a parameterization of the problem. It so happens that for different parameterizations we may arrive at different probabilities by application of the PoI, both in the discrete and continuous case. Our present concern is whether the paradoxes of parameterization are uniquely problematic for the PoI and for specific probabilist approaches to epistemology based on the PoI, or whether the paradoxes indicate a deep issue with the notion of epistemic symmetry that is central to any epistemological approach that assumes propositions may be compared and ranked with respect to epistemic support. The notion that belief comes in degrees that ought to measured by probability functions that are to be arrived at by enumerating possible outcomes has been traced to the emergence of the theory of mathematical probability from its application to games of chance. Dice (and suchlike) are characterized by chances measured by probability functions, and credences are thought to be analogous. In dice games the possible outcomes are, for all practical purposes, well defined. It may be assumed that physical facts feature in explaining the observation of relatively stable, long run frequencies. For example if the dice is evenly weighted the long run frequency for each side rolling will be equal. Despite the reasonableness of this assumption, there is a debate whether long run frequencies require grounding in an in1

aka, Principle of Insufficient Reason

1

deterministic property of physical propensity, whether frequencies may be explained by a notion of physical probability consistent with determinism, or whether the frequencies themselves just are ungrounded physical probabilities.2 Just as Hume held that the observation of constant conjunction will not provide evidence for (or even an idea of) an underlying causal connection, some hold that observed frequencies ought not lead us, if we are to be principled empiricists, to suppose a metaphysical ground for frequency. That dispute need not detain us, though a perspective according to which deterministic chances are grounded in physical symmetries is in effect assumed later in this paper. Regardless of the metaphysics of chance, on all accounts the features of the world characteristic of games of chance are measured by probability functions taking precise values in the real unit interval. The normative theory that belief should be apportioned in degrees conforming to the mathematical structure of a probability function arises by application of a mathematical form familiar from chance games to a philosophical remodeling of doxastic structure resulting from a variety of related social and scientific developments in the 16th and 17th centuries.3 An analogy between evidential and physical symmetry may, if apt, justify applying the PoI as a rule for fixing credences under conditions of uncertainty. If justified, the PoI, combined with Bayesian updating, would yield strict evidentialist constraints on what our credences ought to be. However, many epistemologists and probability theorists have found the PoI to be subject to paradoxes of parameterization, which I will detail shortly. Consequently, some have simply dispensed with the PoI altogether, leaving the selection of prior credence constrained only by probabilistic coherence, allowing for considerable variation in personal credences. To fix vocabulary, “evidentialism” may be contrasted with “personalism” in the theory of ra2

See, for example, (Lyon, 2010) for a discussion of deterministic vs indeterministic chance. 3 See Ian Hacking’s book The Emergence of Probability (Hacking, 1984).

2

tional credences. The strict evidentialist, as I propose to use the term, holds that a unique credence is rationally determined by each body of evidence. The personalist permits variation. The strict evidentialist holds that there is only one rational credence to have given a body of evidence. The strict personalist holds that any probabilistically coherent credence is rationally permissible. Since constraints on choice of priors may be imposed which exceed coherence but fall short of uniquely determining a single probability function, strict evidentialism and personalism allow for intermediate views. Classically, both credal evidentialists and credal personalists accept that credences are measured by probability functions. Others have endorsed models of uncertain inference that do not demand fixation of precise, real unit interval values. I am especially interested in an approach that has come to be called “imprecise probability” employing classes of probability functions to model rational inference. One way to avoid the paradoxes of parameterization is to use imprecise probabilities to measure credences. Imprecise probabilities are represented at by classes of probability functions (called representor classes). On this approach, one may include probability functions derived from distinct parameterizations into the representor class. Proponents of imprecise probability deny what Isaac Levi calls “credal uniqueness” (Levi, 1980). Those cited by Levi who have explored the rejection of credal uniqueness include (Keynes, 1921), (Koopman, 1940), (Good, 1960), (Smith, 1961), (Schick, 1958), (Kyburg, 1974, 1961), and (Dempster, 1967). Shafer’s approach to imprecise probabilities (Shafer, 1976), which builds on Dempster’s, is detailed in (Kyburg and Teng, 2001). Peter Walley’s Imprecise Dirichlet Model (Walley, 1996) provides a theory of induction with imprecise probabilities based on classes of probability functions derived from Dirichlet distributions. One hope motivating the rejection of credal uniqueness is to provide a theory that restores the normative dependence of doxastic states on evidence without running into the paradoxes of parameterization besetting 3

classical credal evidentialism. Furthermore, proponents of imprecise credences have provided motivation independent of the paradoxes. James Joyce maintains that credence should reflect evidence. He finds this rationality constraint to be satisfied by imprecise credence whenever there is a complete lack of evidence; imprecise credences are, according to Joyce, a desideratum of epistemic normatively (Joyce, 2005). Scott Sturgeon takes imprecise credences to be a datum, arguing that attributions of “thick confidence” are indispensable in our explanations of human behavior, and argues that an explanatorily, phenomenologically, and metaphysically adequate theory of belief will include thick confidences (Sturgeon, 2008, 2010). The possibility of denying credal uniqueness renders the classical dichotomoy between precise credal evidentialism and precise credal personalism a false dichotomy. Nevertheless, since the choice of representor class allows for varying constraints, just as there are precise credal evidentialists and precise credal personalists there may also be imprecise credal evidentialists and imprecise credal personalists. For example, restricting to a representor class derived from Dirichlet distributions is already a constraint, but the IDM requires a further choice of variance that is not clearly determined by one’s evidence.4 Roger White has recently argued that the PoI should not be abandoned (White, 2010). Among his several arguments, my present concern is one purporting to show that the apparent paradox vexing the PoI is in fact a general paradox for the notion of evidential symmetry. I think that White’s effort to show that the paradox originates from judgments of evidential symmetry fails and that our intuitions about evidential symmetry should be held separate from our intuitions about physical symmetry. Distinct experimental events that are known to be physically possible may be considered symmetric with respect to the physical laws governing an experimen4 The question of evidentialism vs. personalism in the context of imprecise credences, however, raises a number of subtle issues that are outside the scope of our present concerns.

4

tal apparatus when the laws and design of the apparatus no more favor one outcome than the other. For instance, heads and tails are physically symmetric possible outcomes of a fair coin toss, but landing on edge is strongly disfavored by the physical laws, because it is an unstable state, so that is a non-symmetric outcome. Notably, insofar as the PoI is motivated by intuitions related to our assigning heads and tails equal probability, landing on edge is regarded by all as having negligible probability. We would not assign all three possibilies equal probability.5 Quite plausibly, physically symmetric events (or positively measured intervals of events in the continuous case) should be assigned equal positive probability, but it is, I believe, a ludic fallacy to take evidential symmetry to imply the same. The term “ludic fallacy” for illicitly transferring features of games to other uncertain epistemic scenarios was coined by Nassim Taleb (Taleb, 2011). It is also a theme of (Hacking, 1984) that many of the philosophical quandaries surrounding the concept of probability arise from mixing intuitions due to what he calls the “duality” of that concept. I would put the point this way: There are many distinct sorts of phenomena that we wish to measure using the mathematical structure that was finally axiomatized by Kolmogorov; there are many philosophical subtleties in articulating the proper relation between distinct probability measures in light of various relations between the distinct phenomena; so we must be cautious about naive assumptions guided by uncritically deploying easy analogies between the distinct phenomena, and we should remain open to the possibility that structures other than Kolmogorov probability may be better suited to a given domain. In particular, I think we should be open to the possibility that we are lead astray by insisting on measuring credence by a single, uniquely determined probability function. I argue, thus, in the first part of the paper for treating evidential and physical symmetry differentl: 5

And these three possibilities do not exhaust the a priori epistemic possibilities, which may include events that are in fact physically impossible, such as that the coin hovers in the air spinning indefinitely.

5

viz., by using imprecise probabilities for the former and precise probabilities for the latter. However, some philosophers have argued that imprecise probabilities are unsuitable for decision making. In particular, Roger White and Adam Elga have pointed to some peculiar features of decisions with imprecise probabilites that must be addressed (White, 2010; Elga, 2010). James Joyce has proposed that the objections raised by White and Elga can be answered (Joyce, 2010). I will address White’s objection at length in the second part of the paper. A complete defense of my view is beyond the scope of this paper; so much of what Elga and Joyce have said will not be addressed. I will sketch how decisions with imprecise probabilities can in fact model some potentially desirable features of decision and action. In particular, my response to White will show that in fact decision making with imprecise probabilities points toward a nice account of prudence.

Part I

Symmetry and Paradox 2

Symmetry and Indifference

We should now be a bit more formal than we were in the introduction. Here is White’s presentation of the principle of indifference. First he defines evidential symmetry: Propositions p and q are said to be evidentially symmetric (abbreviated: p ≈S q) for a subject S if S’s evidence no more supports one than the other. Evidential symmetry is invoked in giving two versions of the PoI. The notation CS (p) stands for S’s credence (degree of belief) in p: i.e., for a measure 6

of S’s degree of belief: PoI.1: If p ≈S q then rationality requires that S’s credences for each proposition be equal: i.e., CS (p) = CS (q). PoI.2: If {pi }i=1...n is a partition of S’s knowledge of a chance outcome such that ∀i, j ≤ n, pi ≈S pj then rationality requires that ∀i ≤ n, CS (pi ) = 1/n. A partition of one’s knowledge is a known to be exhaustive partition of possible outcomes. When I speak of the PoI, I mean PoI.2.6 As James Joyce has emphasized, the value taken by a credence function cannot be taken to fully characterize a doxastic state with respect to a proposition or propositions (Joyce, 2005, 2010). In the simplest case, two agents may each assign equal probability to the outcomes heads and tails for a given coin, while one agent is much easier to convince that it is biased. That is, one agent may be unswayed by a relatively short run of consecutive heads while the other very quickly suspects bias. Initially, each agent may have the same precise credence of 1/2 but they will have different credal commitments. Joyce discusses many more subtle examples of differences in total credal state that are not captured by narrow focus on credal value. This leads to some subtle complications concerning the use of equality formulas like CS (p) = CS (q). We shall take these “equalities” not to imply that the doxastic states, credal commitments, etc. and all, are exactly the same but rather that they presently stand in an equivalence relation characterized by their current value. White says that PoI.2 is an “obvious corollary” of PoI.1. This isn’t exactly clear. PoI.1, which recalls Carnap’s way of stating the principle in “On Inductive Logic” (Carnap, 1945), does not stipulate what sort of values 6

In addition to (Hacking, 1984), Colin Howson and Peter Urbach provide a helpful discussion of the PoI (Howson and Urbach, 2005), providing a nice, mathematically detailed overview of the history of the PoI in work of Bernoulli and Bayes that was directly concerned with Hume’s problem of induction.

7

CS (p) and CS (q) must have, just that whatever sort of value they take they should be equal when p and q are evidentially symmetric. There has been, of course, a longstanding assumption that credences are measured by probabilities, but proponents of credences measured by imprecise probability, what White calls “mushy” credence, could accept PoI.1 as stated. Imprecise probabilities are captured by sets of probability functions, typically called representor classes, which can have very subtle and flexible properties. For present purposes, it suffices to characterize credences modeled by representor classes by one of their less subtle properties: viz., that they give interval rather than precise numerical values to propositions. That is, credences modeled by imprecise probabilities yield formula of the form CS (p) = [r, s] with [r, s] ⊆ [0, 1].7 Accordingly, if S is completely ignorant about both p and q, then a proponent of imprecise credences will have CS (p) = CS (q) = [0, 1]; i.e., they will be equal in that they are both indifferently distributed across the unit interval. Note that both PoI.1 and PoI.2 invoke the notion of evidential symmetry, so a paradox invoking judgments of evidential symmetry without invoking credal uniqueness would undermine both principles. Indeed, a paradox for evidential symmetry would have much furhter reaching implications than a paradox that narrowly targets PoI.2. Many approaches to rational inference get along just fine without PoI.2, but the notion of evidential symmetry is indispensable for the project of comparing and ordering propositions according to evidential support. A narrow paradox for PoI.2 can be accommodated as a reductio ad absurdum because we can get on without PoI.2. It is not clear that we can get on without the notion of evidential symmetry. So a paradox for that notion would be much more serious. Indeed, evidential symmetry is so fundamental an epistemic concept that one may be warranted to carry on applying it despite the apparent paradox, confident that a direct resolution must be possible even if it is beyond one’s present 7

Open boundaries are permissible too.

8

ken. In the face of a paradox derived from judgments of evidential symmetry it is not the notion of such symmetry that we would abandon but rather the particular judgments of evidential symmetry that lead to particular instances of the paradoxical pattern. So White’s defense of PoI.2 takes the following form: show that the paradoxes of parameterization derive from particular judgments of evidential symmetry, rather from the assumption of credal uniqueness, and accept that evidential symmetry must not be transparent and that judgments of evidential symmetry must be defeasible. My response will be to dissolve the apparent absurdities deriving from evidential symmetries, leaving credal uniqueness holding the bag for the parameterization paradoxes.

3

Parameterization and Paradox

In PoI.2 reference was made to a partition of possible outcomes. The parameterization paradoxes are, so to speak, the inscrutability of that referent ; i.e., the paradox derives from the problem of choosing among multiple competing partitions. There are numerous versions that capture the mathematical idea. It is common to use the term “Bertrand’s paradox” to indicate a family of paradoxes relating to parameterization. I depart from this usage and reserve “Bertrand’s paradox” for the one member of that family articulated by Bertrand (Bertrand, 1889). My reason for this departure is that I am in fact willing to provisionally concede that Bertrand’s original paradox has a solution, but I remain convinced that the generalized paradoxes of parameterization do not. I will return to Bertrand’s paradox in a later section of the paper. In this section I will present one parameterization paradox. White’s “mystery square” seems to me adequate to illustrate the general pattern. The problem for PoI.2 comes in specifying among competing partitions. Suppose a subject S is informed that a square with side length l ∈ [0, 2] is hidden from S’s view. It might seem natural, and would be required 9

by PoI.2, that CS (l ∈ [0, 1]) = CS (l ∈ [1, 2]) = 1/2. That seems natural, but a moment’s reflection reveals another natural seeming approach. The hidden square has an area a ∈ [0, 4] so it seems PoI.2 further requires CS (a ∈ [0, 1]) = CS (a ∈ [1, 2]) = CS (a ∈ [2, 3]) = CS (a ∈ [3, 4]) = 1/4. There are two parameters, length and area, that could be used to characterize the salient class of figures. The difficulty is that the different parameters correspond to different applications of PoI.2 yielding inconsistent credences: l ∈ [0, 1] iff a ∈ [0, 1] but CS (l ∈ [0, 1] 6= CS (a ∈ [0, 1]. Applying PoI.2 with both parameterizations yields inconsistent results. The choice of parameterization is, however, rationally arbitrary; hence, PoI.2 cannot be used to uniquely determine rational credences. The problem is actually quite general. The only thing critical about the selection of the domains [0, 2] and [0, 4] and the function f(x) = x2 is to tell a story in which competing parameters seem equally plausible. However, any invertible function will do just fine to illustrate the point, given the formal constraints of the situation. In my view, it’s not so much a “paradox” as a demonstration that PoI.2 yields a unique dictate for one’s credence only relative to a parameterization. The further point made by providing a story in which competing parameters seem equally plausible is that even an epistemic conservative, who takes how things seem to provide basic epistemic reasons, regarding intuitions about which parameters are plausible faces relativism.

4

An Evidential Symmetry Paradox?

White argues that the apparent arbitrariness of parameter choice generates absurd conclusions from judgments of evidential symmetry without invoking credal uniqueness and PoI.2. Since evidential symmetry is a fundamental notion, if White is correct we should assume that there must be a solution to the paradoxes of parameterization according to which out judgments of 10

evidential symmetry are defeasible, or else we must abandon the core epistemic project of comparing propositions with respect to strength of evidential support. White proceeds from the very plausible claim that evidential symmetry has the following properties: •

TRANSITIVITY :

If p ≈S q and q ≈S r then p ≈S r.



EQUIVALENCE :

If p and q are logically equivalent then p ≈S q.

Now he argues thusly, continuing the mystery square example: (i) l ∈ [0, 1] ≈S l ∈ [1, 2]. (ii) a ∈ [0, 1] ≈S a ∈ [1, 2] ≈S a ∈ [2, 3] ≈S a ∈ [3, 4]. (iii) l ∈ [0, 1] ≈S a ∈ [0, 1], by EQUIVALENCE. (iv) l ∈ [1, 2] ≈S a ∈ [1, 2] ∨ a ∈ [2, 3] ∨ a ∈ [3, 4], by EQUIVALENCE. (v) a ∈ [0, 1] ≈S a ∈ [1, 2] ∨ a ∈ [2, 3] ∨ a ∈ [3, 4], by TRANSITIVITY: (ii), (i), (iv). (vi) a ∈ [1, 2] ≈S a ∈ [1, 2] ∨ a ∈ [2, 3] ∨ a ∈ [3, 4], by TRANSITIVITY: (ii), (v). Allegedly, (vi) is absurd. Accordingly, one needn’t go as far as applying PoI.2 to assign precise credences to get a seemingly paradoxical result. The existence of multiple plausible partitions in cases like mystery square implies that for some (exclusive) disjunctions an agent S may have no more reason to believe the entire disjunction than he or she has for believing each disjunct individually. That seems paradoxical, since there are more ways, all of which are epistemically possible for S, the disjunction could come out true than there are that any one disjunct will pan out. Driving the point home, White invokes the following principle: 11



SYMMETRY PRESERVATION:

If p ≈S q and r is inconsistent with both p

and q then p ∨ r ≈S q ∨ r. This gives (vii) a ∈ [0, 1] ∨ a ∈ [1, 2] ≈S a ∈ [0, 1] ∨ a ∈ [1, 2] ∨ a ∈ [2, 3] ∨ a ∈ [3, 4], by SYMMETRY PRESERVATION: (vi). That is quite clearly unacceptable, since we may be certain of the right-hand side, which just disjoins a partition of all known possibilities. This argument is taken to be a reductio ad absurdum of the conjunction of (i) and (ii). Given that (i) and (ii) are equally plausible and both seem true, the conclusion would be that one has reasons of which one is unaware that undermine at least one of the evidential symmetry claims. White havers a bit whether reasons can be reasons for him if he’s not aware of them, but seems to suggest that that one may have evidence or reasons of which one is unaware. He refers to Williamson’s well known arguments that one’s evidence is not always transparent to one, to motivate the point (Williamson, 2000). White concludes that the problem with the PoI is not the assignment of precise values required by PoI.2 but rather the non-transparency of evidential symmetry. Hence, he concludes, the PoI may operate as a regulative ideal of rationality even if the judgments of evidential symmetry required for its application must be accepted as defeasible.

5

0+0=0

I am unmoved by the proposed evidential symmetry paradox. A distinction may be drawn between commitment to precise credences and commitment to an ordering on credences. That is, one may acknowledge that a given proposition p must be less likely than a given proposition q without committing myself to precise, real unit interval values for the given propositions. When the commitment to an ordering of probability assignments is 12

in virtue of logical form alone, it is acceptable to maintain that the propositions p and q may be evidentially symmetric even though one rationally must acknowledge that p will always have greater than or equal to probability than q. This is precisely the situation with the seemingly paradoxical a ∈ [1, 2] ≈S (a ∈ [1, 2] ∨ a ∈ [2, 3] ∨ a ∈ [3, 4]). We recognize that the right-hand side of the symmetry relation has a logical structure that implies probability greater than or equal to that of the left-hand side. Recognizing that a certain logical structure entails a certain ordering is consistent with maintaining evidential symmetry. The point is further motivated by considering hypothetical propositions p and q, for each of which one has absolutely no evidence whatsoever. Now, do we have more evidence for the disjunction (p∨q) than for each disjunct? I have absolutely no inclination to say that we do, but then White’s paradoxical seeming result is immediate: p ≈S (p ∨ q). To be sure, there are rational constraints such that for an agent S it ought to be the case that CS (p) ≤ CS (p ∨ q), but as long as the ordering relation is not strict I maintain that this rational constraint is consistent with p ≈S (p ∨ q). I have no evidence that there is a teapot in geosynchronous orbit above my head. I also have no evidence that there is a violin in geosynchronous orbit above my head. Furthermore, I have no evidence that there is either a teapot or a violin in geosynchronous orbit above my head. The feeling of paradox arises from the tension between recognizing a rational constraint on the ordering of credences when there is nevertheless evidential symmetry; the feeling can be assuaged by reflecting on the outcome of disjunctively adding propositions for which one has no evidence whatsoever; adding no evidence to no evidence gives no evidence. It may be acknowledged that adding no evidence to no evidence still gives no evidence but objected that in the mystery square example we do not have no evidence for the pertinent propositions. After all, isn’t information that l ∈ [0, 2] some evidence that l ∈ [0, 1]? The answer is that the 13

information merely precludes that 2 < l. If information precluding some but not all possibilities in the complement of a proposition is evidence for that proposition it is only indirectly and indeterminately so. The apparent evidential symmetry paradox illustrates the indeterminacy of magnitude of evidential support imparted by merely preclusionary information: i.e., precisely because the introduction of preclusionary information preserves the evidential symmetry a ∈ [1, 2] ≈S (a ∈ [1, 2] ∨ a ∈ [2, 3] ∨ a ∈ [3, 4]) from the state of total ignorance we should conclude that preclusionary information is evidentially indeterminate and not that there are hidden evidential asymmetries. What about

SYMMETRY PRESERVATION?

Even if I’m right that (vi) a ∈

[1, 2] ≈S (a ∈ [1, 2] ∨ a ∈ [2, 3] ∨ a ∈ [3, 4]) is not as problematic as it may seem at first glance, isn’t (vii) a ∈ [0, 1]∨a ∈ [1, 2] ≈S a ∈ [0, 1]∨a ∈ [1, 2] ∨ a ∈ [2, 3]∨a ∈ [3, 4] doom for the conjunction of (i) l ∈ [0, 1] ≈S l ∈ [1, 2] and (ii) a ∈ [0, 1] ≈S a ∈ [1, 2] ≈S a ∈ [2, 3] ≈S a ∈ [3, 4]? I don’t think so. Once we are willing to accept that (vi) is not absurd we should be inclined to lay the blame for (vii) with

SYMMETRY PRESERVATION

itself, rather than with

otherwise unproblematic premises. That is, everyone will agree that (vii) is absurd, obviously. However, anyone who agrees that in general p ≈S p∨q is not absurd will blame SYMMETRY PRESERVATION itself. Indeed, since White can only get an obviously absurd conclusion from (i) and (ii) by invoking SYMMETRY PRESERVATION

we may question whether that principle is truly

sound. In light of (vii), I propose that only with the following revision can SYMMETRY PRESERVATION



be accepted as obviously sound:

SYMMETRY PRESERVATION :

If p ≈S q, r is inconsistent with both p and

q, and CS (∼ p ≡ (q ∨ r)) 6= 1 then p ∨ r ≈S q ∨ r. This ammendment blocks the derivation of (vii) from (vi) because in mystery square CS (a ∈ [0, 4]) = 1 implies that CS (∼ (a ∈ [1, 2]) ≡ (a ∈ [0, 1]∨a ∈ [2, 3] ∨ a ∈ [3, 4]) = 1. An argument for rejecting the conjunction (i) and (ii) 14

rather than amending

SYMMETRY PRESERVATION

by this proposal would

have to show that (i) and (ii) are independently problematic. Hence, White’s appeal to SYMMETRY PRESERVATION can’t really help his case.

6

MaxEnt and physical symmetry

I have argued that the alleged paradox implied by evidential symmetry is not as severe as it may seem, and that therefore the problems arising from multiple parameterizations are specific to PoI.2, in particular to the assumption of credal uniqueness. Moreover, I have argued that (vi) is not absurd and that the absurdity of (vii) should only lead us to amend METRY PRESERVATION .

SYM -

I am not, however, entirely unsympathetic to those

who may have lingering discomfort with (vi), and who may find attractive White’s suggestion that we must conclude that evidential symmetry is not transparent and that there may be hidden evidential asymmetries that falsify either (i) or (ii). Yet, I am skeptical of such hidden evidential asymmetries. Evidence is something that, importantly, is either had or not had by one. If there are symmetries or asymmetries that re both hidden and relevant to assignment of probabilities, I want to argue that these ought to be understood as involving physical states or laws that are outside of our evidence. It will shore up the case for getting comfortable with (vi) to give reasons to be skeptical that non-transparent evidential asymmetries may falsify either (i) or (ii). Such reasons may be provided by reflecting on the distinction between physical and epistemic symmetry. To that end, I wish to consider two related topics: (1) the solution of one classical paradox of parameterization, Bertrand’s paradox, by appeal transformation invariance and (2) the maximum entropy approach to statistical mechanics. Each topic suggests the possibility of non-transparent evidential asymmetry, but in each case the genuine symmetries underlying the solutions are physical rather than evidential. Consideration of these examples 15

will clarify differences between physical symmetry between nomological possibilities on the one hand and epistemic symmetry between epistemic possibilities, on the other. MaxEnt may be an appropriate refinement of the PoI, but only on the presumption of an exhaustive partition of nomological possibilities that are appropriately treated as symmetric with respect to the underlying mechanics of the physical system. The intuitions about symmetry that may motivate discomfort with formulas like (vi) are in fact empirically motivated by our ordinary experience with physical symmetry rather than by a priori insights into the correct treatment of evidential symmetries. Bertrand’s paradox, a classical case of a parameterization paradox, is in fact amenable to solution, though that solution is contingent on an assumptions of physical symmetry between nomological possibilities in light of an assumed underlying mechanics. Joseph Bertrand posed the following puzzle. Suppose an equilateral triangle is inscribed in a circle. What is the probability that a random chord of the circle is longer than a side of the triangle? We may apply PoI.2 by defining uniform distributions over the possible chords in three different ways:

CIRCUMFERENCE :

Without loss of generality select p1 to

be a vertex of the inscribed triange (every point on the circumference is the vertex of a similar inscribed equilateral triangle). Select p2 from a uniform distribution on the circumference. This gives a probability of 1/3 that the chord passing through p1 and p2 has length greater than the length of a side of the inscribed triangle.8

RADIUS :

Without loss of generality select r to be a radius

perpendicular to an edge of the inscribed trianlge (every ra8

Image files licensed under the Creative Commons Attribution-Share Alike 2.0 Generic license, based on original work by Robert Dodier, available through Wikimedia Commons.

16

dius is perpendicular to a similar inscribed equilateral triangle). Choose p from a uniform distribution on r. This gives a probability of 1/2 that the chord perpendicular to r passing through p has length greater than the length of a side of the inscribed triangle.

INTERIOR :

Choose a point p from a uniform distribution

on the interior of the circle. There is a unique chord with p as midpoint. The circle inscribed in the tringle has radius 1/2 that of the circumscribed tringle, hence area 1/4. This method gives a probability of 1/4 that the length of the chord defined by p is greater than the length of a side of the inscribed triangle.

Let chords be generated at random by each method from a given circle c. Only RADIUS generates a distribution of chords such that the probability derived from the distribution is invariant for translation or rescaling. That is, we can use a fixed c to generate, by RADIUS, a sample of lines then apply that same sample to any other circle in the plane and get the same probablity. To apply

CIRCUMFERENCE

and

INTERIOR

we need to define a new

distribution of chords for each circle. E.T. Jaynes championed the transformation invariance response to Bertrand’s paradox, citing Poincaré and Borel as early proponents of this solution (Jaynes, 1973). Although it may be debated, let us grant the unique correctness of this solution. What, exactly, has been shown? What has been shown is that despite our initial intuition a close examination of the distributions of chords generated by each method gives us reason to conclude that only

RADIUS

gives a genuinely random

distribution of chords, in the sense of not depending on the position of the circle used to define the distribution. In other words, only RADIUS respects the symmetry of the physical geometry. The challenge presented by the 17

problem is to find a random distribution of chords, and it turns out that that a solution is uniquely determined under the plausible assumption that the distribution should be transformation invariance. The invariance argument proves that

RADIUS

uniquely generates a dis-

tribution that treats each possible chord symmetrically. Because of the geometrical character of the problem we have considerable geometrical knowledge that we can bring to bear. This a priori knowledge somewhat complicates the philosophical question of the nature of the possibilities to be treated symmetrically because it confounds physical and epistemic symmetry. Jaynes seems to be aware of this in his commentary on Bertrand’s paradox. He considers whether the transformation invariance argument entitles us merely to a measure of our subjective, epistemic situation or whether it gives us information about the actual physical frequencies. Despite noting that the manner of derivation has only “subjective meaning” Jaynes acknowledges that “there is one ’objective fact’ which has been proved by the above derivation: Any rain of straws which does not produce a frequency distribution agreeing with [RADIUS] will necessarily produce different distributions on different circles” (Jaynes, 1973). Jaynes’ focus is on the relationship of the “subjective meaning” of probability (i.e., rational credences) to the frequency interpretation. Within the theory of rational credneces we may distinguish between credences derived from statistical models applying extensive knowledge of physical possibilies and laws and those that are entirely epistemic and based on severely limited physical knowledge. To illustrate the distinction I am drawing consider the distinction between knowing that an urn contains black and white balls differing only in color (not, e.g., shape or mass) and knowing just that there is an urn with some balls in it. The urn is well shook and a ball is blindly drawn. In the first case we have some physical knowledge of the situation that limits the epistemically possibile outcomes of a drawing from the urn to the known physical possibilities. If we have further knowl18

edge that there are equal numbers of black and white balls we have further knowledge that the possibility of drawing a black ball is physically symmetric to the possibility of drawing a white ball. To consider a more purely epistemic case, imagine knowing just that there is an urn with some balls in it. They may have different color, mass, volume, etc. The epistemic possibilities are practically boundless in the purely epistemic case. In his analysis of Bertrand’s parameterization paradox the notion of “random chord” is intuitively understood by Jaynes through a physical analogy with a “rain of straws”, and from this analogy he concludes that “distributions predicted by the method of transformation groups turn out to have a frequency correspondence after all.” Jaynes adds to this the proviso that “strictly speaking, this result holds only in the limiting case of ’zero skill,’ but as a moment’s thought will show, the skill required to produce any appreciable deviation from [RADIUS] is so great that in practice it would be difficult to achieve even with a machine” (Jaynes, 1973). The image is not entirely unproblematic. How is the rain of straws produced? If I take a handful of straws and toss them up in the air they will all land within some bounded region. Even without skillful manipulation my straw tossing may favor some distributions over others, if the space over which we are to consider transformations is larger than that bounded region. There may be unskilled biases involved in the straw tosser’s natural predilections. Still, we may grant Jaynes a pinch of salt on this point. The relevance of “skill” invoked here introduces an anthropomorphic aspect to the purported physical significance of the probability distribution arrived at by RADIUS, but in the limit the idea is that the physical situation itself, the constitution of space and the causal order, do not favor any possibility over any other; the only way a possibility could be favored would be by a skilled intervention. So, the objective significance of the distribution arrived at by

RADIUS

rests on

physical symmetries between possible positions of chords grounded in the spatial invariance of causal laws. 19

The transformation invariance approach was of critical importance to Jaynes because he wished, ultimately, to justify the MaxEnt principle as a regulative principle of rationality in order to justify an evidentialist, credal interpretation of probabilities in statistical mechanics. I will say more about the connection between these ideas in a bit, after first introducing the MaxEnt principle. The MaxEnt principle instructs us to select the prior probability density function consistent with our background knowledge that maximizes Shannon entropy, which is supposed to measure the uninformativity P of the prior. The Shannon entropy is given by − ni=1 pi (x) · ln(pi (x))dx. When this value is relatively high p(x) may be considered relatively uninformative. In common circumstances this is equivalent to the result implied by the PoI, and MaxEnt is considered a generalization and refinement of the PoI. MaxEnt is motivated by a striking formal analogy between the Shannon formula expressing the uncertainty (i.e., “entropy”) in a continuous probability distribution and the Gibbs formula expressing the uniformity (i.e., “entropy”) in a dynamic system in macrostate S. The Gibbs formula arises in statistical mechanics in the following way. A system is characterised by observations of macrostates such as temperature. An “ensemble” is a function ρ(q, p, t) such that at each fixed time τ, ρ(q, p, τ) is a probability density function defined on the space of all possible microstates associated with a given configuration, so that if R ⊆ Γ is a region of phase space, associated for R instance with an observable macrostate, pτ (R) = R ρ(q, p, τ)dΓ is the probability that a randomly selected microstate is in R at τ. The Gibbsian maximum entropy principle states that for a system in equilibrium the Gibbs R entropy defined by −kB ρ · ln(ρ)dΓ is maximal. Apart from the Boltzmann constant kB , which occurs only to relate the physcial magnitudes defining the microstates (i.e., the position and momentum of point particles in the standard idealization of classical mechanics) of the system to observable macrostates (e.g., temperature and pressure of a gas), the Shannon entropy 20

and Gibbs entropy are the exact same expression.9 Well, they are nearly the exact same expression. The Shannon entropy is given as a sum of discrete probabilities, not an integral over a continuous distribution. The obvious proposal is to define the Shannon entropy of a continuous distribution by R p(x) · ln(p(x))dx, but this measure is not invariant over change of parameters. Appeal to transformation invariance with respect to physically relevant magnitudes plays a crucial role in Jaynes’ technical response to this objection (Jaynes, 1957b). Consequently, the applicability of MaxEnt depends on knowledge of crucial physical factors determining the physically symmetric physical possibilities. Jaynes’ proposed that statistical mechanics could be derived from canons of rationality, using MaxEnt, rather than by derivation from the general laws of mechanics. In his important 1957 essay “Information Theory and Statistical Mechanics” Jaynes writes10 : The mere fact that the same mathematical expression −

P

pi log(pi

occurs both in statistical mechanics and in information theory does not in itself establish any connection between these fields. This can be done only by finding new viewpoints from which thermodynamic entropy and information-theory entropy appear as the same concept (Jaynes, 1957a) (p. 621). Jaynes’ approach is to conceptually unify Shannon entropy and Gibbs entropy by founding a theory of rational credence on the MaxEnt principle, then deriving statistical mechanics and the laws of thermodynamics from the laws of thought rather than of motion. Accordingly, entropy becomes, for Jaynes, a strictly information theoretical concept attached to subjective degrees of ignorance. He describes the interpretation of probability he employs as “subjective” because probabilities are relativized to subject’s infor9 I have drawn my understanding from Roman Frigg’s recent work on these topics (Frigg, 2008a,b; Frigg and Werndl, 2011). “Entropy: a guide for the Perplexed” is especially relevant. 10 it is the sequel to this article that gives the continuous form of MaxEnt

21

mation states, but in contemporary parlance it is an objective theory because a unique probability is logically determined for any well-posed problem. It is, I think, important to note that Jaynes did not think that his theory was universally applicable because he didn’t think every problem to which theorists wish to apply probability is well-posed. For instance, in the conclusion of his essay on Bertrand’s paradox he mentions in passing that the transformation invariance approach does not solve every parameterization paradox, and that the von Mises water/wine problem may simply be ill-posed (Jaynes, 1973). In my view, one of the virtues of Jaynes’ writing is his emphasis on the well-posed/ill-posed distinction, which has not been clearly recognized by every one of his followers or critics. However, despite this virtue I think that it is precisely the limited applicability conditions that makes MaxEnt ill-suited as a foundation for a generally applicable objective theory of personalist probability. In general, we may wish to make rational decisions even when we lack the sort of knowledge of physical symmetries typical of canonical applications of MaxEnt. That is, we may wish to deal rationally with uncertainty and risk even when dealing with problems that are ill-posed by Jaynes’ standards. For example, a stated advantage of Peter Walley’s Imprecise Dirichlet Model is that parameters can learned and introduced as data comes in (Walley, 1996). The knowledge of parameters that well-posedness requires presents a relative limitation of MaxEnt. To further illustrate this point, note that Jaynes’ indicates two distinct possible responses the MaxEnt modeler may have to data. First, we might update the MaxEnt prior using Bayesian conditionalization. Second, we might respond to new information by changing the constraints used in determining the MaxEnt prior in the first place. In particular, if the MaxEnt prior provides a sharply peaked prior distribution that makes experimental data highly unlikely, then Jaynes treats the original constraints as incomplete or falsified. So rather than simply update on the new data we must rethink the constraints used to compute the MaxEnt prior in the first place. 22

Consider now the case where the theory makes definite predictions and they are not borne out by experiment. This situation cannot be explained away by concluding that the initial experiment was not sufficient to lead to the correct prediction; if that were the case the theory would not have given a sharp distribution at all. The most reasonable conclusion in this case is that the enumeration of the different possible states (i.e., the part of the theory which involves our knowledge of the laws of physics) was not correctly given. Thus, experimental proof that a definite prediction is incorrect gives evidence of the existence of new laws of physics. The failures of classical statistical mechanics, and their resolution by quantum theory, provide several examples of this phenomenon (Jaynes, 1957a) (p. 627). Indeed, Jaynes presupposes exhaustive knowledge of physically possible outcomes in justifying the MaxEnt method in the first place: The only place where subjective statistical mechanics makes contact with the laws of physics is in enumeration of the different possible, mutually exclusive states in which the system might be. Unless a new advance in knowledge affects this enumeration, it cannot alter the equations which we use for inference (Jaynes, 1957a) (p. 627). Empirical knowledge of the relevant physical parameters to be used in defining the phase space over which the MaxEnt distribution is invariant is requisite for a problem to be well-posed for solution by MaxEnt. Furthermore, the sense in which MaxEnt minimizes the amount of information in the distribution in fact imposes substantial assumptions of probabilistic independence between events. Now, because its application in statistical mechanics has been experimentally verified we may take ourselves to have empirical evidence for this assumption in that case. This is hardly, however, an expla23

nation of why, for instance, initiating a system in a low entropy macro-state is probabilisitcally independent of its end macrostate. Given the assumption of independence one recovers the second law of thermodynamics simply because the MaxEnt distribution assigns much larger measure to the regions of phase space corresponding to high entropy as opposed to low entropy states, but why assume a priori that low entropy initial states are not systematically correlated with low entropy end states? In some applications the independence assumptions built into MaxEnt are highly questionable.11

7

Physical vs epistemic symmetry

MaxEnt provides a sophisticated version of the principle of indifference. Still, its most evidently correct applications are in instances where we possess significant knowledge of physical symmetry between physically possible events. This sort of knowledge is a necessary condition for the use of invariance arguments like the one for using

RADIUS

to define prior proba-

bilities in the solution to Bertrand’s paradox. However, rational symmetry between epistemic possibilities clearly cannot be reduced to physical symmetry between physical possibilities. To see this point, consider (again) the following cases: CASE

1 One knows that there are equal numbers of black and white

balls of equal shape and mass in a well shook urn. In this case the physical possibility that a black ball is drawn can be reasonably assumed to be symmetric to the possibility that a white ball is drawn. The position that it is rationally obligatory to assign precise probabilities in 11

In particular, I hold initial skepticism about application of MaxEnt in quantitative finance. The default assumption of no dependencies as well as the choice of maximally dispersed probability distributions may lead to fundamental underestimation of risk. However, I must admit that my skepticism is only initial and very broadly theoretical, since I have only very limited knowledge of the fine details of actual implementations of MaxEnt to problems like commodities pricing and credit network estimation.

24

this case may be resisted by some Bayesian personalists, but it is clearly a much more modest evidentialist constraint than the general PoI. Next, consider the following slightly altered case. CASE

2 One knows only that there are black and white balls in an urn

but nothing about the ratio. In this case one can at least enumerate the physical possibilities, and thereby may restrict the class of epistemic possibilities, but does not know that the physically possible outcomes (viz., that a white ball or that a black ball is drawn) are any more than epistemically symetric. In this case it requires a much stronger variant of the principle of indifference to assign equal probabilities to the event that a black ball is drawn and the event that a white ball is drawn. All that one knows in the second case is that the two outcomes are the only physical possibilities. In this case physical symmetry and epistemic symmetry come apart. It is precisely because one knows nothing about whether there is physical symmetry that epistemic symmetry holds. Here, I think that epistemologists have strong grounds to caution against illicitly transfering empirical intuitions derived from experience of cases, like CASE 1, in which judgments of epistemic symmetry are based on knowledge of physical symmetry to cases, like CASE 2, in which we are ignorant whether there is physical symmetry between what we know to be physically possible. Finally, consider the following: •

CASE

3 One knows that there are variously colored balls in an urn but

nothing about what colors or in what ratios. Here, I think that epistemologists have very strong grounds for cautioning against illicit transference of empirical intuitions. In this case, we don’t even know what the physical possibilities are, and it strikes me as clearly imprudent to assign a precise positive probability to the strictly epistemically possible (but perhaps physically impossible) outcome that a black ball 25

is drawn. It will not do, either, to use a uniform distribution over the color spectrum. One will get inconsistent results by parameterizing by wavelength and frequency intervals. Again, I mention the Imprecise Dirichlet Model as more appropriate to multinomial inference without prior knowledge of parameters than MaxEnt, which in its original conception should be applied to problems that are well-posed in precisely the sense that we know which parameters to require invariance over. Returning now to White’s argument, I contend that the result (vi) a ∈ [1, 2] ≈S (a ∈ [1, 2] ∨ a ∈ [2, 3] ∨ a ∈ [3, 4]) only appears to be paradoxical because we are illicitly bringing to bear intuitions more appropriate to the consideration of physical symmetry between physical possibilities than to the consideration of purely epistemic symmetry between merely epistemic possibilities. In the former case, an event (or interval of events) that is counted as a physically possible outcome plausibly ought to be assigned some positive magnitude by any appropriate probability function. It is not, however, clear at all that strictly epistemic possibilities, which (absent constraint by physical information) may be practically boundless, ought similarly to be assigned a definite positive magnitude. Indeed, absent distinctly positive evidence in favor of a merely epistemic possibility we should treat the magnitude of possibility as indefinite.

8

Conclusion 1

PoI.2 is not a corollary of PoI.1. Furthermore, there is no general paradox for evidential symmetry related to Bertrand’s Paradox. PoI.1 is not subject to the paradoxes of parameterization and gratifies with respect to the intuitive importance of evidential symmetry, but PoI.1 combined with credal uniqueness yields PoI.2 and that is subject to the paradoxes. Taking seriously the proposal that evidential symmetry may be non-transparent in the paradoxical cases, I argued, rather, that when there are non-transparent (or 26

at least non-obvious) symmetries that allow for assignment of precise probabilities they are physical symmetries revealed by uncovering tacit physical or a priori knowledge. It is not plausible to suppose that such tacit knowledge exists in all cases of purely epistemic symmetry. For example, in a case like CASE 3 of the preceding section no such knowledge can be said to exist. The mathematical structure of probability abstracted from understandings of chance has been insightfully applied to the problem of reasoning under uncertain conditions, but an insistence on a strict analogy between rational credence and physical probability, whether deterministic or not, is an impediment to progress in understanding and reforming rationality through application of more subtle models such as imprecise probability.

9

The coin trick

I just discussed White’s response to a direct argument against PoI.2. He claimed that the Bertrand paradox is a problem for evidential symmetry in general and in need of a general solution, so that simply dropping PoI.2 won’t be enough. I think his generalized version of the paradox is unpersuasive, leaving evidential symmetry untouched, allowing us to see multiple parameterization as a problem for PoI.2 in particular, and motivating the move away from credal uniqueness. In this section I will present White’s coin puzzle, which aims to make specific problems for imprecise credences. Although White is not the first to identify or be worried by the dilation phenomena at the core of the coin puzzle, his insightful and initially persuasive discussion of the issue has gained some prominence in epistemological circles, so I shall continue to focus on his arguments. I will proceed in the storytelling tradition of conceptual-analysis epistemology, though it should be clear to the formally inclined how to begin axiomatizing the decision procedures of our protagonists. Suppose our HERO is mushy of credence and is presented with the following situation: 27

H1

HERO

has no clue whether p,

H2 knows that AGONIST knows whether p, H3 knows that

AGONIST

has put the truth about p on the heads side of a

coin by affixing a tab marked either ℘ or ¬℘ (depending on whether or not p) to the heads side, and that AGONIST has affixed the opposite tab to the tails side, H4 and, like the rest of us,

HERO

has a nice robust credence of 1/2 for

heads on a toss. A minor detail:

AGONIST

is clever enough to completely conceal the under-

lying face of the coin when affixing the tabs. To be clear: the upshot of (H3) is that

HERO

knows that p is true if and only if it says ℘ on the tab cover-

ing heads and that ¬p is true if and only if it says ¬℘ on the tab covering heads. Furthermore, we can assume that HERO realizes easy deductive consequences, such as that p is true if and only if it says ¬℘ on the tab covering the tails side. AGONIST

flips. ℘ flops.

HERO

updates. Let h be the proposition that

heads lands up on this toss. Since HERO can’t see the underlying face there’s apparently no new information about h (that heads is up) so

HERO ’s

new

rational credence CH, new (h) should remain the same as HERO’s old rational credence CH (h) = 1/2. Furthermore, there’s really no new evidence whether p. clue whether p before and still doesn’t.

HERO

whether p, based on (H1), (H2), and (H3), if

HERO

had no

would have new evidence

HERO

knew whether heads

was up, but that remains as unknown as it was pre-flop. CH (p) should therefore also not change, so it seemingly rationally ought to be the case that CH, new (p) = CH (p) = [1, 0]. Trouble is brewing. Let ℘.up be the proposition that the symbol ℘ lands up for this toss.

HERO

reasons that, since ℘.up, h if and only if p is true. 28

Hence, CH, new (h ↔ p) = 1. Here’s the trouble. From what we’ve just said, it rationally ought to be the case that: H5 CH, new (h) = 1/2. H6 CH, new (p) = [1, 0]. However: H7 CH, new (h ↔ p) = 1. From (H7), White quite plausibly concludes that it rationally ought to be the case that: H8 CH, new (p) = CH, new (h). If we’re to keep (H8), which seems unimpeachable, then either (H5) or (H6) must change. Shall

HERO

“dilate” for h upon updating on the flop and set

CH, new (h) = [0, 1]? Or shall

HERO

“constrict” for p and have CH, new (p) =

1/2? Call what AGONIST has done “coin-tricking” the proposition p. All knowable propositions are coin-trickable. It’s helpful to reflect on the outcome of coin-tricking propositions for which HERO has precise credences. There’s no question about how to proceed if the coin-tricked proposition is precisely credenced. Try coin-tricking a proposition that hero knows: i.e., for which HERO ’s

that

credence is 1. The outcome of coin-tricking a known proposition is

HERO

can know after the toss and without looking under the affixed

symbol whether h. Also, if

HERO ’s

rational credence for a coin-tricked p

is precisely 3/4 (or, etc.), then after the toss

HERO ’s

credence for h ought

rationally be either 3/4 (if ℘.up) or 1/4 if (if ¬℘.up), depending on the outcome of the toss. It will be helpful to keep in mind the dynamics of belief when the coin-tricked proposition has a precise credence. In the uncontroversial cases, we give up the analogue of (H5). The result of coin-tricking a proposition with precise credence is that after the flop you gain information about h from the information you have about p and the coin-tricking. 29

The controversy is whether by coin-tricking you can gain uncertainty, in the sense of probabilistic dilation, about h from uncertainty about p by learning that h is equivalent to h ∧ p. To put the concern poignantly: Is uncertainty transmissible? Suppose that HERO constricts for p. Then CH (p) = [1, 0] and CH, new (p) = 1/2. The flop could have been either ℘ or ¬℘. Had it been ¬℘, HERO would have known that (p ↔ ¬h) and by similar reasoning would constrict for p. No matter what flopped HERO

HERO

would have constricted. Before the flop, if

is a constrictor, HERO can know that he will shortly be constricting so

that CH,new (p) = 1/2. And if he’s going to constrict either way, why not get it over with? Why shouldn’t CH (p) be precise if HERO knows that CH, new (p) is going to be precise in just a moment? A

VILLAIN

(a bookie, whom we

shall refrain from culturally stereotyping) lurks. White cites the following theorem of probability calculus: IRRELEVANCE :

for any probability function P, if P(h|e) = P(h|¬e)

then P(h|e) = P(h). White correctly notes that, for a constricting HERO, if the approach of modeling imprecise credence by classes of functions (representor classes) is taken, then the functions in VANCE .

HERO ’s

representor will be in violation of

IRRELE -

So, regardless whether one ever actually is coin-tricked, if one

thinks the correct reaction hypothetically is to constrict, then one is committed to having a representor containing probabilistically incoherent functions. The way to keep (H5) and (H8) and not have

HERO ’s

credence be

represented by a class of functions that violate IRRELEVANCE is for HERO to abandon imprecise credences. Constriction seemingly leads to global precisification. So, if HERO is to stay mushy of credence, then dilation it must be. In fact, dilation for credences is the result one gets if one understands imprecise credence using representors. Formally, the matter is just as uncontroversial 30

as the coin-tricking of a proposition with precise credence. This is because the representor for the credence is a class of precise probability functions and they behave upon coin-tricking just as they should; the probability that h updates to match the probability that p. That means the representor for HERO ’s

If

HERO

credence that h will come to be identical with the credence for p. is a dilator, then just as he can gain information about h when a

precisely credenced p is coin-tricked, he can gain ignorance (in a manner of speaking) about h when an imprecisely credenced p is coin-tricked. White presses the counter-intuitiveness of dilation. If his case is persuasive, that would leave constriction (and hence global precisification) as the option for

HERO .

The dictate of reason would be: Get rid of representors

and hopefully find a suitable version the PoI to fix precise prior probability values (or assign them arbitrarily). Against dilation, White argues the following points. KNOWN CHANCE :

Don’t change a credence based on a known

chance because of the coin-trick. You know the chance of heads is 1/2. Don’t get all mushy because of the coin-trick. REFLECTION :

The proponent of imprecise credence with dilation

needs to explain why HERO should not dilate before the flop. HERO

knows he’s going to dilate no matter what flops. Why

not dilate now? White argues that you rationally ought to adopt right now any doxastic state that you know you will soon be in by only doing what one rationally ought. MUSHY BETTING :

A bettor with imprecise credence can be either

liberal or conservative. A liberal bettor will bet on any probability in his or her representor. So, a dilating liberal bettor with an absolutely imprecise credence for coin-tricked p will take a bet at any odds that heads has landed up. That seems clearly bad. A conservative bettor with absolutely imprecise 31

credence refuses any bet at any odds. White has us imagine a sequence of coin-tricked propositions {pi }i=1...n such that CH (pi ) = [0, 1]. A conservative bettor, White observes, rules out taking post-flop bets on heads at even what seem to be wildly favorable odds relative to the pre-flop credence of 1/2 for h. INDUCTION :

Imagine you’re coin-tricked on propositions {pi }i=1...n ,

but you can look under the affixed symbol each time. Heads will be up half the time when one looks, so one will get an inductive case that heads is up half the time when one has a mushy credence for heads. It has been argued that both the liberal and conservative strategies face immediate practical incoherence concerning packages of bets that are guaranteed losers/winners. The liberal will be committed to accepting some bad books and the conservative will be committed to refusing some good books. Considering only synchronic books, however, one may simply stipulate that each is sophisticated enough to reject/accept guaranteed losing/winning books. It may be more subtle to deal with diachronic bad/good books, but I think that for my purposes this issue can be tabled.12 I want to respond to White’s arguments and I think that it will be sufficient to illustrate some virtues of betting strategies with imprecise credences even without providing a full defense against the challenge posed to the conservative strategy by the possible rejection of diachronic good books. In the forgoing, however, it may be assumed that the liberal and conservative are each sophisticated enough to at least reject synchronic bad books and accept synchronic good books. 12

James Joyce has given a refined decision theory for imprecise probabilities that is neither liberal nor conservative, though is closer in spirit to the liberal approach because it regards in-interval probabilities as in a sense those which are permissable to use. Ultimately I would defend a more conservative stance for reasons that will be alluded to in the present essay, but which would require a full defense in light of both what Joyce has said and the pointed criticisms of Elga (Elga, 2010). I focus here on responding to White.

32

No one wants to get in late on a money pump, and White argues that the conservative bettor of imprecise credence in a coin-tricked proposition will miss out. The argument is not exactly analogous to the bad book style arguments that show probabilistic coherence is a constraint on rationality by showing that probabilistic incoherence guarantees that one is committed to accepting a bad book (i.e., guaranteed losing bet). White does not maintain that the conservative bettor of imprecise credence will, of necessity, be exploitable to lose money; rather, he describes a scenario in which the imprecise credenced, dilating, conservative bettor seems to miss a chance at terrific winnings. None of the points above demonstrate the strict practical incoherence of the “mushy” gambler (to employ White’s value-laden term); rather, they create the impression that he or she will be inadequately opportunistic. My aim will be to counter this impression. This bears emphasizing. It may well be the case that several strategies are practically coherent. So epistemologists and decision theorists hoping to argue for a strategy cannot be expected to do so only by eliminating strictly incoherent strategies. White does not argue that the strategy I shall defend is strictly incoherent and I will not argue that the strategy he defends is strictly incoherent. Instead, I hope to illustrate some practical advantages of conservative betting with imprecise credences that are overlooked by White’s focus on extreme cases. I agree that one does not want to miss out on a money pump, but will argue that one ought not be too eager to become a money pump either. We might say, following Aristotle’s suggestion to understand virtue as a mean between excess and defect, that between opportunism and caution the virtue of prudence is the mean Aristotle and (trans.) Ross (1925) (bk II,ch 6). Incidentally, I am in agreement with Diogenes Laertius that prudence is indispensable in the pursuit of pleasure (Laertius, 1997a,b), but disagree with his view that it is better for a good decision to go badly than for a bad decision to go well by chance (Laertius, 1997a), unless he simply means by this what Mill Mill (1871) made clear: viz., that making decisions 33

in accordance with virtue is instrumentally good over the long run. I believe White misidentifies a prudent decision gone badly as a bad decision when he criticizes mushy betting. It is my conviction that imprecise probabilities are extremely useful tools in the formal representation prudence. Let us not, in this age of bank failures and oil spills, with too much haste esteem opportunism unbalanced by proper caution. The very choice of the term “mushy” to describe the use of imprecise probabilities in practical deliberation, a term connoting an indecisive thumb-twiddler, is polemical and value-laden, reflecting a dominant ideology of decisiveness emerging from an economic system that permits for the systematic socialization of risk and externalization of costs.

10

Variations on the trick

Before proceeding, I’ll say a little more to relate the present argument to other recent commentaries on the topic. In a recent paper, James Joyce has provided a detailed defense of decision making with credences modeled by imprecise probability representor classes (Joyce, 2010). I am in agreement with much of what is said in that article to rebut the force of White’s objections to decision making with imprecise credences, and in many ways the comparison of strategies in this section provides a complementary intersubjective component to Joyce’s critique. Joyce identifies flaws in White’s understanding of imprecise probabilities that stem from a failure to recognizes that differences between two complete credal states may be consistent with identical interval values. I am giving a different rebutal White’s criticisms, and also want to begin to make a positive case for the virtues of a conservative approach decision making with imprecise probabilities. While complementary on many points, I do differ with Joyce in wanting to defend a conservative strategy. However, a full defense of my position is beyond the scope of the present paper. So I will not respond to all criticisms 34

that have been made of the conservative betting strategy, aiming only to pary White’s criticisms by illustrating advantages of the conservative betting strategy. Recall that White does not identify a strict inconsistency in decision making with imprecise credence. Rather, he argues that decision making agents so modeled are “mushy”, a choice of terminology which I have already noted is objectionably value-laden, and this is meant to be a kind of practical incoherence. My goal, so that the character of the argument is not misunderstood, is to illustrate practical advantages of the conservative betting strategy with imprecise credences, not to identify it as uniquely coherent. I will respond to each of White’s four arguments directly, but it will be helpful first to consider variations on the coin trick puzzle to show how well imprecise probabilities can model features of decision making with partial information. In these illustrations, we will introduce characters with partial knowledge and let their credences reflect their partial knowledge in the ways recommended by epistemologists such as Joyce and Sturgeon. An urn has only black balls and/or white balls in it.

AGONIST

urn. Let b be the proposition that the ball is black.

draws a ball from the

AGONIST

will coin-trick

b using tabs marked β and ¬β. Again, let h be the proposition that heads flops up. Three characters are as follows. E1

EAGER

is a precisely credenced applier of the PoI,

E2 is completely ignorant about the composition of the urn (other than knowing that it contains only black and/or white balls), P1

PRUDENCE

is an imprecisely credenced, dilating, conservative bettor,

P2 knows that the urn has only black and/or white balls and that 3/4 or more are black, D1

DANGER

is an imprecisely credenced, dilating, liberal bettor,

35

D2 knows that the urn has contains only black and/or white balls and has 7/8 or more black balls, *3 each agent knows that AGONIST knows whether b, *4 each agent knows that AGONIST has put the truth about b on the heads side of a coin, *5 and each has a nice robust credence of 1/2 for heads on a toss. So, after the flop each has the following credences CE, new (h) = 1/2, CP new (h) = [3/4, 1], CD, new (h) = [7/8, 1]. Note the virtue of PRUDENCE, who won’t miss out on taking advantage of the less informed eager but also won’t get taken advantage of by the more informed danger. The conservative betting strategy with imprecise credences puts one in a position to take advantage of the less informed, while not risking being taken advantage of by the more informed.

13

DANGER ,

like

PRUDENCE,

will be able to take from

EAGER .

However, DANGER risks being taken from by some one yet more informed. One of White’s arguments against conservative mushy betting was that that strategy seemed to make one miss out on potential earnings in some cases. My response is in part to illustrate the virtues of conservative mushy betting with PRUDENCE. This is not a direct response to White’s case, however.

PRUDENCE

PRUDENCE

has CP, new (h) = [3/4, 1], so there are some bets, the one’s

knows to be advantageous, that aren’t ruled out. Now let HERO

have the same information about b as EAGER does: i.e., that there are black and white balls in the urn but not the proportion. If DENCE ’s

HERO

follows

PRU -

strategy, then we’ll have CH, new (h) = [0, 1] and CE, new (h) = 1/2.

One more character: N1

NAIF

is an imprecisely credenced, dilating, liberal bettor,

13

Of course, we know that moral reasons may trump prudential reasons, so that “taking advantage of” a less informed person (such as a child or a trusting customer) may sometimes not be what one ought to do, but also note that not all advantageous opportunities come in the form of bets with winners and losers.

36

N2 like HERO and EAGER, NAIF is completely ignorant about the composition of the urn, other than knowing that it contains only black and/or white balls, N3 knows that AGONIST knows whether b, N4 knows that

AGONIST

has put the truth about b on the heads side of a

coin: i.e., exactly what was specified above for the other characters, N5 and has a nice robust credence of 1/2 for heads on a toss. EAGER ’s

advantage over

NAIF

is that

EAGER

will accept bets when β is

showing that favor heads. Why? Since CE, new (h) = 1/2, are 1:1 and any odds better are odds that geous.

NAIF

EAGER

will accept bets at any odds, since

precisely credenced, dilating, liberal bettor. So favorable odds from

NAIF ,

while

HERO

EAGER ’s

fair odds

will take to be advanta-

NAIF

is an absolutely im-

EAGER

can get subjectively

will miss out if following the strat-

egy of PRUDENCE. This seems like an advantage for EAGER’s strategy over the strategy of PRUDENCE, but the cost is exposure to DANGER. One can vary how much the different characters know about the cointricked propositions. I think that the results of doing that are favorable for the strategy of the dilating conservative making decisions with partial knowledge. White shows that, by employing the strategy of HERO

may miss out on opportunities. Let’s take a closer look at the missed

opportunity. White’s scenario is this. again and again it.

PRUDENCE,

EAGER

takes from

HERO has to stand by and watch while

NAIF .

Worse,

HERO

can never get in on

HERO ’s inability to ever be able to get in on it pumps the intuition that for

all of the advantages of prudence (as characterized by

PRUDENCE)

there’s

something deeply wrong. The source of the never-getting-in seems to be dilation. On every flop,

HERO

dilates and can’t bet any longer. It seems to

me that White is correct here. Eventually, HERO should figure out that NAIF is naive and not dangerous. Defenders of imprecise credences should say 37

why, but for now I conclude that prudence is a good competitive betting strategy with a possible exception in the extreme. This is shown even in scenarios where choice of parameters is not an issue; i.e., where quite a bit of partial information is given that would allow specification of a statistical model. I take it to be clear that having even less information ought not lead a prudent person to take more risk.

11

Inspection

Gambling scenarios are useful elicitations because they are vividly familiar and allow for information or partial information to be specified using coins, dice, cards, etc. But they may have a distorting effect on elicitation of purely epistemic attitudes. Gambling is dangerous for the ignorant. One might think that the scenario described above imports information and worry from a non-epistemic context. A better elicitation, one might insist, would be to specify that the person offering the bet is in no better situation epistemically than you are. Such knowledge, however, amounts to stipulating that the betting opponent is naive, and I agree that this should change one’s strategy. However, I would draw a distinction between taking a bet because you know the other bettor is taking an irrational risk from taking the bet because you know the odds are advantageous, so there is an epistemic aspect to uncertainty aversion in betting contexts. In any case, it may be best to take gambling out of the picture entirely to show the virtue of dilation in a purely epistemic task. Suppose INSPECTOR’s task is to discover what proportion of some collection {pi }i=1...n of coin-tricked propositions are true. Specifically, suppose that INSPECTOR

knows that an urn contains only black and/or white balls (but

not the proportion) and that AGONIST has drawn n times with replacement and recorded the order of outcomes. 14

14

For each i, let bi be the proposition

This is a very simple partial information case, and one could straightforwardly apply

38

that the ith draw is black, and suppose that AGONIST has coin-tricked the bi . Imagine the n coins already flipped and arranged in a row in the order in which they were flipped.

INSPECTOR

can see whether β or ¬β has landed

up on the ith coin in the row, but not whether heads or tails is up. What should

INSPECTOR ’s

credence be for any of the propositions in the class of

propositions expressing the various possible proportions of heads? What’s the proportion of heads among all the tosses combined? What’s the proportion of tosses that landed heads up among only the tosses that β up? It is clarifying to note that if the urn has only black balls in it, then every toss that landed β up is a toss that landed heads up. For each i, define the following: • bi =df the ith ball is black. • hi =df the ith coin is heads. • β.upi the ith toss shows β up. We may characterize

INSPECTOR ,

who (again) sees the affixed tabs but

not the faces of the coins, as follows: I1 For each i, CI (bi ) = [0, 1] I2 For each i, CI (ch(hi ) = 1/2) = 1 15 So,

INSPECTOR

has imprecise credences for each bi and believes that the

coins are fair coins: i.e., that the chance on each toss was 1/2. The implication of (I2) seems to be that it ought to be the case that: an imprecise binomial model, but it isn’t essential to the example that it be known that the urn contain only black and white balls. See Walley Walley (1996) for detailed examples with multinomial data but without a specified prior parameterization. 15 Please indulge a slight abuse of notation here. Ordinarily “ch” indicates an objective chance. Strictly speaking, after the coins have been tossed the objective chance of heads is either 0 or 1, and anyone who doesn’t see the faces of the coin just doesn’t know which. Strictly speaking, ch(hi ) = 1/2 only holds before the coins have been tossed, and as we are imagining the scenario they’ve all been tossed. So, I do not intend for “ch” to indicate the objective chance at the time the credence is formed, but rather the last objective chance that INSPECTOR knew. Incidentally, this is why INSPECTOR does not violate the so-called “principal principle”. Indeed, if objective chances were timeless, then there would be a counterexample to the principal principle ready to hand in the uncontroverial cases.

39

I3 For each i, CI (hi ) = 1/2 Here is the trouble. I4 For each i, CI (β.upi ) = 1 ∨ CI (β.upi ) = 0) This is just because

INSPECTOR

can see all of the tabs.

INSPECTOR

reasons

that: I5 For each i, CI (bi ↔ hi ) = 1 ∨ CI (bi ↔ ¬hi ) = 1 Why? If CI (β.upi ) = 1 then (bi ↔ hi ). If CI(ri) = 0 then (bi ↔ ¬hi ). By (I5), it is extremely plausible that inspector should satisfy. I6 For each i, CI (bi ) = CI (hi ) ∨ CI (bi ) = 1 − CI (hi ) If (I3) then from (I6) INSPECTOR ought to satisfy: I7 For each i, CI (bi ) = CI (hi ) Between (I1), (I3), and (I7) something needs to give. Can the inference to (I7) be blocked? (I4) is stipulated of

INSPECTOR ;

it’s just the fact that

inspector for each i can see whether β or ¬β is up. There’s no challenging the permissibility of such a stipulation. Furthermore, the reasoning from (I4) to (I7) is unimpeachable. So either (I1) or (I3) must go. Either dilation or constriction is required for INSPECTOR. I will argue for dilation of inspector’s credences for the hi . First, the inference from (I2) to (I3) is unwarranted. Second, the implications of the conjunction of (I7) and (I3) are unwelcome for INSPECTOR’s rate of convergence on the proportion of true propositions in the collection {bi |β.upi }i=1...n of coin-tricked propositions that landed with β showing up. Third, a plausible account of why dilation occurs while (I2) remains true is available. That, in general, inference from (I2) to (I3) is unwarranted is shown in the dynamics of the uncontroversial cases. If

INSPECTOR

believed, for ex-

ample, that all of the balls in the urn were black then in light of (I4) and (I7) 40

it would be rationally incumbent that

INSPECTOR

have either CI (hi ) = 1

or CI (hi ) = 0, contradicting (I3). White may maintain that the inference in question is not warranted in general but is warranted in the coin-trick situation beginning with complete ignorance.

INSPECTOR

is effectively ignorant

about the bi . How can ignorance do what information does in blocking the inference? We can get clearer on how to answer the worry about uncertainty transmission by staying clear about what

INSPECTOR

is trying to learn. What is

being inspected? It is not just the frequency of heads for an arbitrary class of tosses.

INSPECTOR

may also be interested to know what proportion of the

collection {bi |β.upi }i=1...n are true. Indeed, if n is very large, large enough to prohibit checking under every tab, INSPECTOR may take just a sample from {bi |β.upi }i=1...n and use the results to make a probabilistic inference about the remaining propositions. In that task, INSPECTOR should not assume either that there is or that there is not a positive correlation between landing β up and landing heads up. Some notation will be helpful: • Nβ.h =df the number that landed both β up and heads up. • Nβ.t =df the number that landed both β up and tails up. • N¬β.h =df the number that landed both ¬β up and heads up. • N¬β.t =df the number that landed both ¬β up and tails up. • Nβ =df the number that landed β up. • N¬β =df the number that landed ¬β up. • Nh =df the number that landed heads up. • Nt =df the number that landed tails up. INSPECTOR

knows that Nβ.h + Nβ.t + N¬β.h + N¬β.t = n. Based on (I2)

it is reasonable for inspector to have: 41

I2.1 CI ([Nβ.h + N¬β.h]/n < 1/2) = 1/2 16 (I2.1) is equivalent to saying that for arbitrary n tosses INSPECTOR anticipates more than half to be heads equally as much as that less than half will be heads. If one assumes that there’s no correlation between landing β up and landing heads up, then it makes sense to also have: I2.2 CI (Nβ.h/Nh < 1/2) = 1/2 From (I2.2)

INSPECTOR

one would be well on the way to (I3), but I believe

(I2.2) is clearly illicit. It is to assume what was to be inspected. Furthermore, to make this assumption will impede the progress of inspector’s task. This indicates the problem with constriction. Suppose consriction: i.e. accept (I3) and (I7) but not (I1). Then, for each i, CI (bi ) = CI (qi ). Following White and others, I have been expressing credences simply as ratios or intervals. This expression is a bit limited, however. The use of representor classes of probability functions to model credences allows for the modeling of some further very intuitive aspects of credence. One such aspect is robustness. A batting average of .400 is not very robust after the 5th at bat of the season; by 500 at bats it’s something to get excited about. Robustness is a measure of resistance to change, and information about robustness is lost in simple numerical expressions. Taking the identity literally as indicating that the credences are identical in all formal respects, (I7) now requires that for all i CI (bi ) = CI (hi ) not only in numerical value but also in robustness. Hence, CI (Nβ.h/Nh < 1/2) = 1/2 will be robust against evidence as CI ([Nβ.h + N¬β.h]/n < 1/2) = 1/2 if

INSPECTOR

if the urn has 3/4 black balls, then it will take

constricts. For example,

INSPECTOR

as many obser-

vations to reach this conslusion as to reach the conclusion that the coins are biased; this is an especially troubling consequence if we suppose that 16

For simplicity, just assume n is odd.

42

INSPECTOR

has prior knowledge about the coins, for instance that they are

newly minted standard currency. The forgoing lends to a satisfying account of the inheritance of uncertainty by dilation.

INSPECTOR

can be thought of as half-way to the in-

formation under inspection in virtue of (I4). If

INSPECTOR

constricts, that

completes the information in a way that amounts to making a very strong assumption concerning propositions about which one is as ignorant is ever.

12

Conclusion 2

The account of the preceding subsection takes for granted a task for inspection. The task is to seek a correlation, and the problem with constriction is that it interferes with that task. Task selection is a separate matter from how to proceed with a given task. To select a task is to inspect a relation between parameters. In our everyday and our scientific investigations the ways that we select tasks–the sorts of correlations we take to be worth looking into–are a mixture of prior knowledge, instinctual insight, tradition, and maybe even lucky guesses. In the kinds of highly artificial scenarios that have occupied this paper matters are considerably simplified. Historically, task selection has been influenced by adoption of dogma, by embrace of causal framework, and by imitation of great experiments. In everyday life, it can be influenced by a sense of narrative; one tells a story to oneself about oneself and one’s fellows. Certainly, in contexts of gambling we’ll be interested to inspect whether favorable odds can be obtained. There’s no reason to be eager to start accepting bets when one is ignorant. Good advice for HERO is to inspect and be prudent. Given a sequence of opportunities to make bets on propositions this will include inspecting the frequency of ℘ that are revealed to be heads up among all that were ℘ up. It should also involve inspection of one’s fellow bettors. In the

INSPECTOR

43

scenario the propositions in the

sequence {hi } were known to be probabilistically dependent (they were random draws with replacement), hence a betting strategy could be updated as more information comes in. This is closer to real life scenarios, where we learn to act more advantageously as we go through life, and prudently seek to invest time and money in those areas of opportunity where we are most comfortable and informed. There is a world of difference between taking a risk based on an informed judgment of precise or nearly precise probability and taking a risk based on complete ignorance. White is able to put the most intuitive pressure on the acceptability of the dilating conservative strategy by imagining a sequence of propositions that are not known to be probabilistically dependent. It is indeed absurd to suggest that one would not risk a penny at astronomical odds because one has dilated. However, the thrill of the opportunity itself may be worth more than a penny. This is not White’s point, and by considering ever greater minimal stakes renders intuitions considerably less clear; I mean, even at those same astronomical odds it might be less appealing to bet your life savings than to bet a penny. Here now are my responses to White’s arguments: KNOWN CHANCE:

To constrict is to assume what is under in-

spection. Really, the reasons

KNOWN CHANCE

is misleading

are the same in both the controversial and the uncontroversial cases. The inference from (I2) to (I3) is fallacious. REFLECTION :

After the flop one finds out which class under in-

spection the ith flip belongs to. Information (partial) not ignorance is the reason for dilation. MUSHY BETTING : HERO ’s

17

best strategy is to be an imprecise, di-

17

I take this point to be relevant to van Fraasen’s van Fraasen (2006) comments on dilation as well. I find his specific cases unpresuasive because I think most of us ordinarily do, to follow his example, have information relevant to whether weather will impact our performance on an exam. Furthermore, even if we did not have such information there may be practical reasons not to dilate (since we know confidence itself has an impact). But in the purely epistemic case where we update on some information, the relevance of which we are inspecting, I think I’ve shown that dilation is more natural than has been suggested.

44

lating, prudent, inspector. This means that hero will eventually get in with eager on naif, after adequate inspection. INDUCTION :

White recommends induction on “the occasions in

this scenario in which you’ve seen the coin land and your credence is C(h) = [0, 1]”, and notes that you will find that it is about 1/2. That’s true; it’s the analogue of the inference from (I2) to (I2.1), which is a valid inference. To continue to (I2.2) however one needs to do induction on the occasions in this scenario in which you’ve seen the coin not only land but also land ℘ up and your credence is C(h) = [0, 1]; in particular, consideration of iterated gambling with partial information shows that that’s the relevant class for HERO to inspect. White’s reasons for abandoning imprecise credences are unpersuasive. In fact, close inspection of a variety of cases reveals a number of virtues for the prudent inspector.

45

References Aristotle and D. (trans.) Ross. Nichomachean Ethics. Oxford University Press, 1925. J. Bertrand. Calcul des probabilities, 1889. R. Carnap. On inductive logic. Philosophy of Science, XII(2), 1945. A. Dempster. Upper and lower probabilities induced by a multivalued mapping. Annals of Mathematical Statistics, 38:325–339, 1967. A. Elga. Subjective probabilities should be sharp. Philosophers’ Imprint, 10 (05), 2010. R. Frigg. A field guide to recent work on the foundations of statistical mechanics. 2008a. URL http://philsci-archive.pitt.edu/3964/. R. Frigg. Probability in Boltzmannian Statistical Mechanics. Cambridge University Press, 2008b. R. Frigg and C. Werndl. Entropy: A Guide for the Perplexed. Oxford University Press, 2011. I. Good. Subjective probability as the measure of a nonmeasurable set. In Logic, methodology and philosophy of science: proceedings of the 1960 International Congress. Stanford University Press, 1960. I. Hacking. The emergence of probability: a philosophical study of early ideas about probability, induction and statistical inference. Cambridge University Press, 1984. C. Howson and P. Urbach. Scientific Reasoning: The Bayesian Approach. Open Court, 3 edition, 2005. E. Jaynes. Information theory and statistical mechanics. Physical Review, 108 (2), 1957a. 46

E. Jaynes. Information theory and statistical mechanics ii. Physical Review, 108(2), 1957b. E. Jaynes. The well posed problem. Foundations of Physics, 3:477–493, 1973. J. Joyce. How probabilities reflect evidence. Philosophical Perspectives, 2005. J. Joyce. A defense of imprecise credences in inference and decision making. Philosophical Perspectives, (24), 2010. J. Keynes. A Treatise on Probability. MacMillan, 1921. B. Koopman. The bases of probability. Bulletin of th American Mathematical Society, 46:763–774, 1940. H. Kyburg. Probability and the Logic of Rational Belief. Weseleyan University Press, 1961. H. Kyburg. The Logical Foundations of Statistical Inference. Reidel, 1974. H. Kyburg and C. Teng. Uncertain Inference. Cambridge University Press, 2001. D. Laertius. Letter to menoeceus. In Hellenistic Philosophy. Hackett, 2 edition, 1997a. D. Laertius. The principal doctrines. In Hellenistic Philosophy. Hackett, 2 edition, 1997b. I. Levi. The Enterprise of Knowledge. MIT Press, 1980. A. Lyon. Deterministic probability: neither chance nor credence. Synthese, pages 413–432, 2010. J. Mill. Utilitarianism. Longmans, Green, Reader, and Dyer, 1871. F. Schick. Explication and Inductive Logic. PhD thesis, Columbia University, 1958. 47

G. Shafer. A Mathematical Theory of Probabilities. Princeton University Press, 1976. C. Smith. Consistency in statistical inference and decision. Journal of the Royal Statistical Society, B.23:1–25, 1961. S. Sturgeon. Reason and the grain of belief. Nous, 42(1), 2008. S. Sturgeon. Confidence and coarse grained attitudes. In Oxford Studies in Epistemology, volume 3. Oxford University Press, 2010. N. Taleb. The Black Swan: The Impact of the Highly Improbable. Random House Trade Paperbacks, 2011. B. van Fraasen. Vague expectation value loss. Philosophical Studies, (127): 483–491, 2006. P. Walley. Inferences from multinomial data: learning about a bag of marbles. Journal of the Royal Statistical Society, 1996. R. White. Evidential symmetry and mushy credence. In Oxford Studies in Epistemology, volume 3. Oxford University Press, 2010. T. Williamson. Knowledge and its Limits. Oxford University Press, 2000.

48

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.