Empiricism and/or Instrumentalism?

July 26, 2017 | Autor: P. Bandyopadhyay | Categoría: Philosophy of Science

Descripción

Erkenn (2014) 79:1019–1041 DOI 10.1007/s10670-013-9567-8 ORIGINAL ARTICLE

Empiricism and/or Instrumentalism? Prasanta S. Bandyopadhyay • Mark Greenwood Gordon Brittan Jr. • Ken A. Aho

•

Received: 4 October 2010 / Accepted: 9 October 2013 / Published online: 13 December 2013 Springer Science+Business Media Dordrecht 2013

Abstract Elliott Sober is both an empiricist and an instrumentalist. His empiricism rests on a principle called actualism, whereas his instrumentalism violates this. This violation generates a tension in his work. We argue that Sober is committed to a conflicting methodological imperative because of this tension. Our argument illuminates the contemporary debate between realism and empiricism which is increasingly focused on the application of scientific inference to testing scientific theories. Sober’s position illustrates how the principle of actualism drives a wedge between two conceptions of scientific inference and at the same time brings to the surface a deep conflict between empiricism and instrumentalism.

A version of the paper was presented at the American Philosophical Association meetings, Central Division. P. S. Bandyopadhyay (&) G. Brittan Jr. Department of History and Philosophy, Montana State University, Bozeman, MT, USA e-mail: [email protected] G. Brittan Jr. e-mail: [email protected] M. Greenwood Department of Mathematical Sciences, Montana State University, Bozeman, MT, USA e-mail: [email protected] K. A. Aho Department of Biological Sciences, Idaho State University, Pocatello, ID, USA e-mail: [email protected]

123

1020

P. S. Bandyopadhyay et al.

1 Introduction Elliott Sober writes that empiricism is a thesis about the significance of observation (Sober 1993). According to him, any credible scientific hypothesis must be tied to what we actually observe, and this fact is the one to which we should attach epistemological significance. Sober also is an instrumentalist who takes the notion of (expected) predictive accuracy as the goal of science. In spelling out the notion of predictive accuracy, he makes reference to data which are merely possible, as yet unobserved. Sober’s two approaches, empiricism and instrumentalism, thus conflict; one tells us to take possible data into account, the other proscribes them. The paper is divided into four sections. In the first section, we introduce the socalled principle of actualism (PA) and explain its ramifications for scientific inference.1 In the second section, we broach Sober’s empiricism and discuss how his empiricism incorporates PA in the background of the realism/antirealism debate. In the third section, we discuss Sober’s AIC-based notion of instrumentalism, which is inconsistent with the PA because the former violates the PA. In the fourth, we consider two issues. We first discuss a conflicting methodological imperative in his work for his attempt to combine empiricism with instrumentalism by bringing in two conflicting strands in his philosophy described above (see, Godfrey-Smith 1999 for a different type of tension in Sober’s work).2 His empiricism rests on the PA, whereas his instrumentalism violates it. Second, we consider, and then reject a possible objection which contends that the AIC framework respects the PA, and as a result, it is alleged, there is no inconsistency in his methodology.

2 The Principle of Actualism in a Larger Context According to PA, one should make judgments about the correctness of the hypothesis based on data we actually have, rather than on unobserved, possible data. In the words of Sober, we should form our judgments about hypotheses based on evidence we actually possess; possible but nonfactual evidence does not count (Sober 1993, p. 43). We call this the Principle of Actualism (PA hereafter). One could construe the thesis in terms of its epistemological or decision-based features. In this section, we consider it epistemologically. We will explore its decision-based feature in Sect. 4 when we evaluate an objection to our account. Consider the epistemological consequences of the PA stated in terms of its two conditions. We call the first epistemological condition, the Principle of Actualism’s Evidential Equivalence Condition for two experiments (PAT) and the second one, the Principle of 1

The PA or what is often called the ‘‘conditionality principle’’ is a corner-stone in Bayesian inference. For an excellent treatment of Bayesian inference, see (Howson and Urbach 2006). The paper does not try to defend or criticize the PA.

2

Godfrey-Smith has pointed out that when Sober’s earlier view on simplicity is domain-specific; his current use of the AIC framework that exploits the notion of simplicity is, however, ‘‘domainindependent.’’ (Godfrey-Smith 1999, p. 181).

123

Empiricism and/or Instrumentalism?

1021

Actualism’s Empirical Significance Condition for a single experiment (PAS). The letter ‘‘T’’ in ‘‘PAT’’ represents ‘‘two experiments to share equal evidential support.’’ The letter ‘‘S’’ in ‘‘PAS’’ represents a ‘‘single’’ experiment ‘‘having the evidential significance.’’Although epistemological consequences of PA are motivated by two fundamental principles in Bayesian statistics,—the PAT is an analogue for the likelihood principle, and the PAS is an analogue for the conditionality principle—one need not be a Bayesian to accept them (Birnbaum 1962; Cassella and Berger 1990; Berger 1985; Pawitan 2001).3 At least Sober adopts this line of thought that one need not to be a Bayesian to subscribe to these principles. We will first sketch these two epistemological consequences of PA and bring out their significance in the foundations of statistical inference, and also formulate a consequence of PA in light of our discussion of them.4 Then, following Sober, we discuss the bearing of PA on the realism/antirealism debate in next section. This way of relating PA to the foundations of statistical inference and the realism/ antirealism debate is crucial for him because he is one of the few epistemologists who bring those two issues together under one principle. The purpose of this section is to address what he might have meant by the PA. Consider first the two consequences of PA. The Principle of Actualism’s Evidential Equivalence Condition for Two Experiments (PAT): Two experiments, E1 and E2, provide equal evidential support for a hypothesis, that is, the parameter in question, if and only if their likelihood functions are proportional to each other as functions of the hypothesis, and therefore, any inference about the hypothesis based on these experiments should be identical. The Principle of Actualism’s Evidential Significance Condition for a single experiment (PAS): If two experiments, E1 and E2 are available for the hypothesis (i.e., the parameter in question) and if one of the two experiments is selected randomly, then the resulting inference about the hypothesis should only depend on the selected (actually performed) experiment and the evidence received from that experiment alone. The notion of an experiment is a key to understanding both consequences of the PA. Usually the notion of an experiment involves causing something to happen,— perhaps making a measurement—and observing the result. Here, an experiment is taken to mean making an observation, usually under known or reproducible circumstances. Consider an example of the PAT to see how this principle works. Suppose a biologist asks a statistical consultant to evaluate evidence that the true 3

Evans (2013) has recently questioned whether the likelihood principle is at all a ‘‘must’’ for a Bayesian. He argues that it is completely irrelevant to Bayesianism.

4

Hereafter we have dropped the use of ‘‘epistemological’’ features of the PA when we address the PA unless otherwise stated. We return to this usage when we distinguish epistemological features of the PA from its decision-based feature in Sect. 4.

123

1022

P. S. Bandyopadhyay et al.

proportion of birds in a particular population that are male is larger than . The biologist emails the statistician that 20 out of 30 birds obtained were male but provides no further information about how the birds were collected. If no additional data are available on the experiment, what could she infer about the hypothesis that the true proportion, h, is equal to ? We call this initial hypothesis, h = , a constrained hypothesis because the parameter value is set at h = . Its contrasting hypothesis could be that the true proportion is larger than , parameter h [ . This contrasting hypothesis will be called the unconstrained hypothesis. Both hypotheses are taken to be simple statistical hypotheses. We investigate this kind of inductive problem with the help of our background knowledge about the problem in question together with the available data to see whether the hypotheses posited might be able to account for those data. In fact, the investigator has realized that two types of probability models that rely on two types of experiments could be proposed to explain the data. In this scenario, the experiment E1 consists of randomly selecting 30 birds from the population and then determining 20 to be male and 10 to be female. The experiment E1 might have led to those data. Then the Binomial probability model for E1 would be Bin(30, h). Or the experimental set-up could be like E2 which consists of collecting birds until one gets 10 females so that it could be represented by a Negative Binomial model, NBin(10, h). (For information about both Binomial and Negative Binomial models, see Berger 1985.) According to the PAT, two experiments, E1 and E2 modeled by two distinct probability models, provide identical evidential support for the hypothesis that h = because the likelihood function under both models is proportional to h20 ð1 hÞ10 . The likelihood function provides an answer to the question, ‘‘how likely are the data given the model?’’ Therefore, most importantly by the PAT the inference about the hypothesis h should be identical for both experiments. The PAT is a version of PA because evidence at hand is actual evidence which, according to the PA, is alone relevant for making the inference about the hypothesis. Consider PA’s Evidential Significance Condition for a single experiment (PAS). Suppose in another setting, an investigator would like to know whether the hypothesis that the true proportion of males in a population of birds is . She is considering two experiments, E1 and E2, to see whether they could give her a verdict about the correctness of the hypothesis. As before, E1 consists of collecting the predetermined 30 birds and enumerating how many males the investigator observes. E2 consists of sampling birds until the investigator observes 10 female birds, which implies that E2 could continue forever. Suppose the investigator randomly got to conduct E1 and found 20 males. The PAS states that the inference about the correctness of the hypothesis depends solely on both the selected experiment and the data received from the selected experiment being actually performed. Data of the actually performed experiment and not what data the other experiment could have yielded are evidentially relevant for examining the credibility of the hypothesis in question. There is a further elaboration of the PAS in terms of the selected actually performed experiment. It says that had the actual experiment been chosen nonrandomly from the very beginning and resulted in the same observation as the

123

Empiricism and/or Instrumentalism?

1023

randomly performed experiment, then both actually performed experiments would have provided an equal evidential support for the parameter h. This elaboration, at first blush, seems to bring the PAS closer to the PAT. However, one needs to be careful that the PAS refers to two instances of the same type of experiment, whereas the PAT refers to two types of experiments. If, for example, the experiment in question is Binomial, then in case of the PAS, both are instances of the same Binomial experiment. In contrast, in the case of the PAT, if one is a Binomial experiment then the other can’t be a Binomial experiment, and conversely, if one is a Negative Binomial experiment then the other can’t be a Negative Binomial experiment. Sober accepts the PA and also seems to accept at least a combination of both the consequences of PA as we shall see in Sect. 2. Ellery Eells (1993) and Ian Hacking (1965) also accept the PA in something like the two consequences already discussed. However, the connection between two consequences of PA and the PA itself needs to be made clearer. The latter has been stated so far without reference to any experiment, while the former have been stated with reference to experiments (see the quote from Sober about the PA in the beginning of Sect. 1). To connect the two consequences of the PA to the Principle of Actualism, as does Sober, we reformulate the PA as follows: (PA): We should form our judgment about the correctness of hypotheses based on evidence we gather from any actually performed experiment/test; possible but nonfactual evidence from any yet to be performed experiment/test does not count. Here, ‘‘test’’ could mean a host of things ranging from a diagnostic test, a statistical test, to non-statistical tests like tests to see whether some chemical reactions take place, to testing theories of high-level theories. However, there are disagreements regarding both consequences of PA and the epistemological stance that accompanies the PA. Many statisticians who work within the frequentist interpretation of probability tend to disagree as to the value of these consequences of the PA in statistical inference. The focal point of the frequency interpretation is the idealized repetition of an experiment in which the probability of an event is construed to be its long run probability. The framework of classical statistics is one example which is based on this frequency interpretation. The Akaikean Information Criterion (AIC) framework also rests on this interpretation. We will return to it in Sect. 3. According to classical statistics, a proper classical treatment of the two models, Binomial and Negative Binomial, can lead to different answers when we are concerned with finding evidence that the proportion of male birds is greater than . As a result, the PAT which treats both models as having equal evidential support is rejected in classical statistics. In the same vein, classical statisticians argue for averaging over all the possible samples. This is especially important for the experiment E2 which, if repeated, could have yielded a different number of birds for the investigator than what she has in fact gathered. The mere possibility of different outcomes from E2 has led classical statisticians to question the legitimacy of the PAS. So they also reject the PAS. They contend the need for incorporating possible data in performing statistical inference. Since the refrain of their contention captures a single theme

123

1024

P. S. Bandyopadhyay et al.

applicable to both consequences of the PA, we will only confine ourselves to their epistemological stance toward the PAT. The principal motivation behind classical frequentist statistics is to produce a statistic which does not depend on the nature of the hypothesis (i.e., parameter h) or any prior knowledge about h. Statisticians of that stripe have proposed instead a procedure p(x) and some criterion function (h, p(x)) and a number R such that a repeated use of p would yield average long run performance of R. The idea behind this is that if we use p repeatedly on a series of independent random samples where the hypothesis about h is true, then it is easy to show with probability one that this procedure on repeated use will reject the hypothesis 100R % of the time. The term ‘‘R’’ captures this idea of ‘‘significance level’’ within classical statistics. When data are gathered to check whether a hypothesis is correct, the investigator within this framework is typically looking for a significant result. The latter means that she has found something out of the ordinary relative to the hypothesis which she is willing to reject based on this significant result. A statistically significant result is one that would have a very small probability of happening by chance. Given this difference between classical statistics sympathizers and the PAT sympathizers, we are able to explain why a classical treatment of the two models can lead to different answers with regard to the first consequence of the PA. Recall that when we computed the likelihood functions under two models, Binomial and Negative Binomial, we computed the likelihood function under each model relative to the observed data which are Y = 20. However, the significance calculations involve not just the observed Y = 20, but also, for example the potentially observed ‘‘more extreme’’ Y C21, assessing the rarity of the observed result by incorporating the probability of unobserved results that contain similar or stronger evidence against the hypothesis. Since classical p values take into account those extreme or more extreme results as possible results, classical statistics violates the PA. So the significance level of Y = 20 for the hypothesis that the true proportion of male birds is greater than , i.e., against h = , would be the probability of getting between 20 and 30 male birds out of the 30 sampled: PrðY [ 20Þ ¼ f ð20jh ¼ 1=2Þ þ f ð21jh ¼ 1=2Þ þ f ð22jh ¼ 1=2Þ þ þ f ð30jh ¼ 1=2Þ ¼ 0:049: Likewise, for the Negative Binomial model, the significance level for Y = 20 when h = would be the probability of getting 20 or more male birds to obtain 10 female birds: PrðY [ 20Þ ¼ f ð20jh ¼ 1=2Þ þ f ð21jh ¼ 1=2 Þ þ f ð22jh ¼ 1=2 Þ þ f ð23jh ¼ 1=2 Þ þ ¼ 0:031: While the motivating force for classical statistics involves averaging the performance over all possible data, when it is known which data occurred, the motivating force for the PAT is the performance of p(x) for the actual data that are observed in a given experiment, where the overall performance of a procedure p is considered to be of secondary importance.

123

Empiricism and/or Instrumentalism?

1025

30 20 h ð1 hÞ10 20

In our example, the likelihood function for E1 is f1 ð20jhÞ ¼ 29 20 and for E2 it is f2 ð20jhÞ ¼ h ð1 hÞ10 : Under both E1 and E2, there are two 9 different likelihoods we can calculate, the likelihood of the data given that h = and another where h is estimated via maximum likelihood estimation (MLE) techniques. Then for E1 or E2, the ratio of these two likelihoods generates the likelihood ratio which is a measure of the relative evidential support for one hypothesis over its rival. It is based only on the actual observations, so it does not violate the PA. The likelihood ratios for both E1 and E2 are identical for either E1 or E2, providing a result of 5.5 in our example, suggesting how the PAT is at work. The PAT says that two models based on two experiments provide the same evidential support for a hypothesis if and only if the likelihoods of these models are proportional. Table 1 provides the four likelihoods and the resulting likelihood ratios for each probability model along with the classical p values under different models. The need for four likelihoods is as follows. We are considering two probability models, the Binomial model and the Negative Binomial model. Under each probability model, we are considering two hypotheses. For example, under the Binomial model we are considering two hypotheses: one is that the true proportion is h [ and the second one is that h = . Under the Negative Binomial model we are similarly considering the same two hypotheses as for Binomial. Column 2 represents the classical p value that violates PA, whereas columns 3 and 4 represent likelihoods and likelihood ratios that respect PA. The rationale for the likelihoods being different in column 3, while still respecting PA is due to the fact that E1 and E2 lead to different likelihood functions because they are based on two distinct probability models discussed above. However, their likelihood ratio under E1 or E2 in column 4 is the same. According to the likelihood approach, the ratio is what matters in measuring evidence against the hypotheses. One gets a conflicting recommendation regarding the amount of evidential strength against the hypothesis, h = , in question depending on whether one disobeys or obeys the PA. The classical statistics framework using p values disobeys the PA. The former rejects both the Binomial and Negative Binomial model-based hypothesis, i.e., h = as the correct hypothesis, since the p values are ‘‘small’’ in both cases. However, under E1 or E2, the evidence against the initial hypothesis, h = , represented by p values is unequal, i.e., 0.049 under E1 and 0.031 under E2. In contrast, the likelihood ratios, which obey the PA, provide equal evidential support against h = under E1 or E2. The likelihood ratio for the hypothesis, h [ , versus the hypothesis, h = , provides 5.5 times more evidential support for the hypothesis, h [ under E1 or E2. So we find a conflicting recommendation regarding the amount of evidence that exists against the initial hypothesis under E1 or E2 depending on whether we reject or accept PA. So far, we have discussed the PA together with its two consequences and their application in statistical inference. It was in this context that Sober first introduced

123

1026

P. S. Bandyopadhyay et al.

Table 1 Summary of results under E1 or E2 when PA is obeyed/disobeyed Models

Classical p value

Likelihood

Likelihood Ratio

E1: Binomial E1 if h =

0.049

0.153

5.5 times

–

0.028

E2: Negative Binomial

0.031

0.051

E2 if h =

–

0.009

5.5 times

the principle. We now turn to his discussion of the more ‘‘philosophical’’ aspects of the principle.

3 Sober’s Empiricism and PA Sober’s empiricism can be made precise by relating it to the realist/antirealist debate especially by way of his critique of Bas van Fraassen’s ‘‘constructive empiricism.’’ Sober makes his position abundantly clear in his formulation of PA that the judgments one could form about a hypothesis could be judgments about the truth or empirical adequacy or probability of a hypothesis. For example, while a realist contends that the goal of science is to arrive at true theories, an antirealist/empiricist contends that the goal is to arrive at empirically adequate theories. A theory is empirically adequate just in case all observable consequences of a theory are true. X is observable if and only if there is a location such that if a human were there at that location in appropriate condition and circumstances for observing, the human would observe x.5 For an antirealist/empiricist, observation is ‘‘an unaided act of perception.’’ Antirealists take things/beings that are observable to be on a more secure epistemological status because they provide us reason to believe them to be true, rather than unobservables which lack that status. Sober insists that the PA, however, does not discriminate between unobservables and observables. This principle, he argues, does not imply that our knowledge about top quarks or the double-helix is less secure than our knowledge about cars and houses because while the former are unobservables whereas the latter are observable. As a result, he thinks that the epistemological significance of this principle is that whether one is an empiricist or a realist, one could accept it without ado. He concludes that the principle is ontology-independent. The position of antirealism he has outlined above is the well-known position of van Fraassen. The latter, who is both an empiricist and antirealist, assigns a special weight to what we can observe, or what are usually called ‘‘observables’’.6 The 5

How to spell out ‘‘observable’’ and the terms employed to define ‘‘observable’’ have generated a great deal of discussion in recent philosophy of science. To appreciate the complex nature of modal arguments in this context, here are some references (which are by no means exhaustive) to this literature. Ladyman thinks that there is a tension in constructive empiricism in trying to incorporate both modality and observability together (Ladyman 2000). In contrast, Hanna has defended constructive empiricism against Ladyman’s objection (See Hanna 2004). For the question of observability, see also van Fraassen (1980 and 1989). For a critique of van Fraassen’s work, see (Kitcher 1993, 2001).

6

See footnotes 5 and 7 for the question of observability.

123

Empiricism and/or Instrumentalism?

1027

moons of Jupiter, dinosaurs and distant stars are observable, but microbes are not. According to him, the former counterfactual possibility (i.e., to be able to observe the moons of Jupiter and the like), is epistemologically relevant whereas the latter (i.e., to be unable to observe the microbes and the like) is not. For example, had we been in front of the moons of Jupiter, then, we could have observed them without any aid to our vision. However, had we been in front of microbes, he contends, then, we could not have observed them without instruments. So what we can or cannot observe plays a crucial role in van Fraassen’s antirealist empiricism. Unlike the constructive empiricist, Sober argues that there should be an asymmetry between what things are seen and what are not, (and not between what is seeable and what is not), because our knowledge about things unseen is both mediated and understood by our knowledge about things we see. As a result, he thinks that what we actually observe has a special status for us. The PA captures that epistemological status of actual data. However, he argues that constructive empiricism violates the PA. Given this construal of constructive empiricism, Sober assumes that it accepts the principle which we will call the Principle of Possible Evidence (or in short PPE).7 (PPE): If one’s actual evidence for the hypothesis H2 is about as good as one’s actual evidence for another hypothesis H1, and it is possible to have better evidence for H1, but not for H2, then we should form the judgment about the correctness of H1 solely based on possible, nonfactual evidence.8 He uses this example to illustrate and make his point that the PPE is false. Consider a doctor making a diagnosis about whether a patient has diabetes (H2) or small pox (H1). Two laboratory tests are actually carried out and their results contain both ‘‘positive’’ and ‘‘negative’’ outcomes. As is usually the case, both tests have error probabilities associated with them. Assume that both tests are equally reliable: if a positive outcome on the first test strongly supports the hypothesis that the patient suffers from diabetes (H2), then a positive outcome on the second test strongly supports the hypothesis that a patient suffers from small pox (H1), and conversely. The same assumption holds with respect to negative outcomes. Suppose, counterfactually, that there is an infallible third test that detects correctly if the patient suffers from diabetes (H2). It is crucial for Sober’s argument to assume that 7

One might worry that anyone including van Fraassen will ever disagree with Sober regarding the implications of the following example. In an email communication, Joseph Hanna wrote to one of the authors of the paper the following. He writes that a theory is empirically adequate, according to van Fraassen, if and only if it saves all the actual observables, past, present and future. So, empirical adequacy depends on all sorts of ‘‘observables’’ no one has ever, in fact, observed. But these observables are actual events that occur in the history of the world. On the other hand, the actual status of a theory—depends only on actual observations. So, Hanna concludes that there is nothing in van Fraassen’s empiricism that violates PA and respects PPE (see Hanna 2004 for his stance toward constructive empiricism). We sympathasize with Hanna along with those who might share the same concern with him that Sober errors in mistakenly attributing the PPE to van Fraassen. However, we are not interested in whether Sober is right in thinking that van Fraassen’s position violates PA. We are only interested in the kind of argument Sober employs to defend the principle.

8

One worry against Sober’s work here is that ‘‘the notion of evidence’’ being possible ‘‘is slippery here’’. To do a full justice to this worry we refer the reader to footnote 9 as the ensuing discussion will help to understand both this point and our stance toward Sober’s view.

123

1028

P. S. Bandyopadhyay et al.

there is no such infallible test for small pox. Such a possible test for diabetes would apparently have evidential significance for someone like van Fraassen; it tells us what would be the case if certain observations were made. But our supposition is that there is no such test as it has not been carried out.9 The mere possibility of a third test, albeit infallible, if it is not carried out has no import. Therefore, the counterfactual that there exists a possible infallible third test for diabetes is, for Sober, in this sense empty, because only actual tests count. But this is his just a short-hand way of rejecting the PPE and thus linking the principle of actualism with empiricism properly construed. For our purposes it does not matter whether van Fraassen himself is committed, as Sober thinks, to the significance of (merely) possible evidence or, in fact, whether this particular counter-example to ‘‘constructive empiricism’’ is very telling. The general point is that ‘‘observable’’ means ordinarily something like ‘‘capable of being observed.’’ If the emphasis is placed on observability, then it would seem to follow that possible as well as actual observations be taken into epistemic account. But Sober thinks that this is deeply counter-intuitive. There are cases, such as the one he imagines, in which there is a possible test procedure which does not in fact exist. That it does not exist entails that it cannot supply evidence. Evidence has to do with actual observations. In fact, his rejection of constructive empiricism rests on the PA because the outcomes of the first two tests in his example are outcomes of actually performed tests, whereas the third test is yet to be performed. So the latter’s outcome has turned out to be epistemologically irrelevant for evaluating its impact on the 9

This is in continuation with the previous footnote regarding why one might think that the notion of ‘‘possible’’ in the notion of evidence being possible is a ‘‘slippery slope’’. The worry is that if we imagine that there is not actually any test for X, then there is a sense in which evidence for the test is impossible. But such a non-actual test is still possible (in a broader sense) and so the evidence from it is possible. There is a scope of ambiguity concerning the example of the test for X being possible in one sense and impossible in others. The reason for this is that to which does the variable ‘‘X’’ refer? Does ‘‘X’’ refer to the (infallible) possible test which has not been carried out for diabetes or the test for small-pox which does not exist in Sober’s example, yet it is still possible in a broader sense? However, a charitable interpretation of the worry might be able to reveal its true spirit. It is likely that the imaginary critic who presents this worry means by ‘‘the test for X,’’ a possible test for small-pox which does not exist at this point, but it is still possible in some sense. We agree with the critic that although Sober’s example clearly assumes that there is no such (infallible) test possible for small-pox, such a test for small-pox is undeniably ‘‘possible’’ in some broader sense. The relevant question, however, is ‘‘does such a test for X being possible pose a threat to Sober’s criticism of constructive empiricism’’? We have two comments here. First, if we allow such a test for X to be possible then this will force us to revise the PPE in terms of the Revised Principle of Possible Evidence (RPPE). According to the RPPE, if one’s actual evidence for the hypothesis H2 is about as good as one’s actual evidence for another hypothesis H1, and although it is possible to have evidence for both H1 and H2, but it is only possible to have better evidence for H1, but not for H2, we should form the judgment about the correctness of H1 solely based on better possible, nonfactual evidence. However, the RPPE will still be regarded by Sober as dubious as it violates the PA. In short, this revision will complicate the PPE, without disputing his rejection of the principle. Our second comment has to do with Sober’s own example (see, Sober 1993). Since we would to like to follow him in this regard, we would continue to stick to his usage by saying that there is a possible infallible test for diabetes, but there is no such ‘‘possible’’ test for small-pox; consequently, the PPE is the principle to be investigated and not any of its variants. The point of this along with footnote 7 is to discuss how Sober has construed the PPE and its relationship to the PA without trying to evaluate whether his argument against constructive empiricism is tenable.

123

Empiricism and/or Instrumentalism?

1029

competing hypotheses. Sober’s counterexample contains some features of both consequences of the PA, the Evidential Significance Condition for a Single Experiment (PAS) and the Evidential Equivalence Condition for Two Experiments (PAT) discussed in Sect. 1. Like both consequences, two relevant diagnostic tests sharing equal evidential strength are actually performed tests, and what data we could have obtained from the third test becomes irrelevant for his counterexample. Like the PAS, his rejection exploits the idea of an actually performed test. In addition, like the PAT, both tests are evidentially equivalent.

4 Sober’s AIC-Based Notion of Instrumentalism and PA Sober discusses and motivates his brand of instrumentalism in connection with curve fitting. Malcolm Forster and Sober wrote an influential paper together on the curve fitting problem (Forster and Sober 1994).10 This problem arises when an investigator is confronted with optimizing two conflicting desiderata, (i) simplicity and (ii) goodness-of-fit, together.11 In this section, we will discuss their work on the curve fitting problem and then explain how their Akaikean Information Criterion (AIC) framework helps Sober develop his version of instrumentalism. In general, with regard to the curve fitting problem, their goal is to measure the closeness of a family of curves to the truth. They use the Akaikean Information Criterion (AIC, Akaike 1973), to achieve this goal. Based on AIC, they propose an important measure of a curve’s closeness to truth in terms of the curve’s predictive accuracy. AIC assumes that a true probability distribution exists that generates independent data points, Y1, Y2, Y3, … YN. However, in most cases we have no way of knowing the true curve. We could only estimate parameters of the curve based on the data. Forster and Sober explain how an agent can predict future data from past data. First, an agent uses the available data (Y) to obtain the best-fitting estimates of the parameters of the true distribution, e.g., to find the maximum likelihood estimator (MLE) of that distribution family. This best-fitting member of the family is then used to approximate what future data will look like. The question is how well the curve in the family that best fits the data will perform in predicting the future data. A family might fare well in these two-stage processes, (i.e., (i) to find optimal parameter estimates and (ii) to predict future data) on one occasion, but fail to do so on another. The estimated predictive accuracy of a family depends on how well it would do on average, were these processes repeated again and again.

10

For a Bayesian approach to the curve-fitting problem, see Bandyopadhyay et al. (1996), Bandyopadhyay and Boik (1999), Bandyopadhyay and Brittan (2001), and also Banyopadhyay (2007).

11

See Miller 1987 for a critique of Sober’s account of simplicity. Miller argues that one needs to incorporate causal simplicity in theory choice. In this case, as in many others, formal simplicity which Sober has advanced must be distinguished from causal simplicity. As Richard Miller (1987, 247) reminds us, ‘‘by adding a variety of novel propositions, without any corresponding simplification, evolutionary theory reduces the formal simplicity of science. [But] An enormous gain in causal simplicity results.’’ According to Miller, the regularities that we observe in the variation of species, for example, have causal explanations only when the evolutionary explanations are added.

123

1030

P. S. Bandyopadhyay et al.

One goal of AIC is to minimize the average (expected) distance between the true curve and its estimate. It is equivalent to choosing the family of curves, which minimizes the distance between the true distribution and the estimated distribution, by choosing the family of curves which maximizes predictive accuracy. AIC was designed to be an estimator of the expected Kullback–Leibler (KL, Kullback and Leibler 1951) distance between a fitted MLE candidate model and the model generating the data, which we consider the truth. The expected KL distance provides a measure of the distance or divergence from a model to the true distribution based on all possible data. Specifically AIC can be considered an approximately unbiased estimator for the expected KL divergence which uses a maximum likelihood estimator (MLE) to create estimates for the parameters of the model. By an ‘‘unbiased estimator’’, we mean that in the long run the average or expected value of the estimator will equal the population parameter, with the long run being relative to repeated random samples obtained in the same fashion as our original sample. Our two sampling methods lead to the probability models used in E1 and E2. If the researcher were to repeat the experiment, the number of male birds would likely vary, possibly providing 21 male birds out of the total of 30 in E1 or 21 male birds out of 31 in E2. These realizations of the sampling process would provide similar but not identical evidence against the hypothesis as in our original sample in both models. Another possible realization of each sampling process could provide 19 out of 30 for E1 and 19 out of 29 for E2, but each case represents one possible realization from all the possible realizations which AIC averages across. Using AIC, the investigator attempts to select the model with minimum expected KL distance across these repeated, equally possible, random samples, based on a single observed random sample. Models with smaller values of AIC are better than models with larger values of AIC. Akaike has proved that maximizing the predictive accuracy of a hypothesis is asymptotically equivalent to choosing the hypothesis that minimizes the quantity ^ þ k s: AIC ¼ 2 logðLÞ Here ‘‘k’’ represents the number of parameters and ‘‘s’’ is a constant to be multiplied by the number of parameters; usually ‘‘s’’ equals 2. Likelihood represented by L^ is simply the probability of a particular outcome given maximum likelihood estimates for model parameters. Five comments are in order. First, the K-L distance is a crucial concept in the AIC framework. Akaike suggests that we search for the model that minimizes the average KL information h R hover alli possible i realizations of Y, which reduces to ^ ^ EY ðKLðg; f ÞÞ ¼ EY ln fi ðzjhÞ gðzÞdz where gðzÞ is the unknown true distrii

bution, and fi ðzjhÞ is an approximating distribution (more details available in Boik 2004). This model will then have the minimum estimated predictive accuracy across repeated realizations of the random sample from the population. This is an advantage of the AIC approach in that we attempt to select a model that has this property on average (Burnham and Anderson 1998). Our second comment follows from the first. Any AIC-based framework violates the PA because AIC rests on unobserved, possible data that could be generated by

123

Empiricism and/or Instrumentalism?

1031

repeated samplings. The average K-L distance relies on the average across all possible realizations of the data and because of this sort of realization of the data it results in a violation of the PA. Although we have paraphrased Sober’s stance toward the PA while discussing the AIC framework, we think that it would be appropriate to quote him at this point. He has diagnosed correctly that the AIC framework has violated the PA when he writes, A family might do well in this two stage [the first stage is to use the available data to get the best-fitting member of the family and the second is to use this best-fitting member of the family to predict what new data will look like] prediction task on one occasion, but not so well on another. Intuitively speaking, the predictive accuracy of a family is how well it would perform on average, were this two-step processes repeated again and again. (Sober 1996) Consider the last sentence in the quote. Sober wrote ‘‘how well the predictive accuracy of a family would perform’’, not how well it did perform. His use of subjective conditional under any standard account of the same entails possible cases for its analysis. Third, it is important to understand what the expectation is taken with respect to in order to discern potential violation of the PA. Taking any expectation does not necessarily violate the PA. For example, we could calculate the mean using an expectation based on a completely enumerated population without relying on unobserved data. Suppose that we know that in a population of 1,000 birds there are 600 males. Then, the expected value for a single randomly selected bird is 0.6, which demonstrates expected value use that does not violate the PA since we have observed the population that each possible random sample can be taken from. The expectation used in the derivation of the expected KL distance, however, averages over all possible samples where we know we will only observe a single random sample thus violating PA. To be precise, the AIC-based framework violates PA because of its goal, which is to find the expected K-L distance. Fourth, consider Sober’s instrumentalism as a consequence of his AIC-based framework of scientific inference. To understand Sober’s new form of intrumentalism, his view can be contrasted with traditional conception of instrumentalism. Traditionally, instrumentalism stands for the position that theories are just instruments for making predictions; they are neither true nor false about the world. For him, however, instrumentalism stands for two things, (i) theories have truthvalues, in fact, to be more specific, key theories of science are all false and (ii) (estimated) predictive accuracy is the primary consideration for science. He thinks that the AIC provides a mathematical framework for capturing these themes about his new version of instrumentalism. According to him, ‘‘real’’ scientists, contra realists, are not always interested in the truth; rather they are interested in maximizing estimated predictive accuracy. His view about models being false has been echoed by many working scientists. In the words of George Box ‘‘all models are wrong’’ (Box 1979). Sober’s paradigmatic example in this regard is the scientist’s use of the null hypothesis, which represents the hypothesis of ‘‘no difference.’’ If the purpose of science, he argues, is to find the truth, then scientists

123

1032

P. S. Bandyopadhyay et al.

should not have employed the null hypothesis, which is known to be false from the very moment of its construction.12 However, scientists use it routinely. AIC provides a mathematical background for instrumentalism about the use of null hypotheses.13 AIC can be used to compare the hypothesis where h = with the situation where h is estimated via MLE to be 0.67 = 20/30 for either E1 or E2. In both experiments, AIC suggests that estimating h, the hypothesis, h [ , provides smaller estimated average predictive accuracy than assuming the hypothesis, h = . Column 2 of Table 2 shows different AIC values respectively under model E1: Binomial, with h [ versus with h = , and under E2: Negative Binomial with h [ 1/2 versus with h = . They are, 5.75 under E1: Binomial with h [ 1/2 versus 7.15 under E1: Binomial with h = . In the same column of Table 2, we also find 7.95 under E2: Negative Binomial with h [ versus 9.35 under Negative Binomial with h = . In both E1 and E2, the difference in AIC values is 1.4, which would be referred to as the ‘‘rAIC’’ for the constrained model under E1 or E2. The rAIC is the difference between each calculated AIC and the minimum AIC in the set of models. For example, the rAIC model under E1: h [ , is 5.75–5.75 = 0, where the first number stands for its AIC value and the second number for the minimum AIC value. However, the interesting feature of the application of the rAIC is to observe the evidential support for a pairwise comparison between E1: Binomial, with h [ versus with h = , and E2: Negative Binomial with h [ versus with h = . The difference between these two pair-wise comparisons of models under the rAIC is 1.4. This difference suggests the same evidential support between the two models under each experimental setup. This satisfies the PA’s evidential equivalence condition for two experiments (PAT) which seems to suggest that AIC satisfies PAT. In this regard, the likelihood and AIC frameworks seem to share some similar feature regarding the evidential support between two competing models. If the likelihood ratio provides the same results in either experiment, then the likelihoods share the same proportionality and according to the PAT, both experiments provide equal evidential support for the parameter in question. In the same vein, when rAIC comparisons are made between candidates with equally proportional likelihoods and the same difference in the number of estimated parameters in the constrained and unconstrained models, the AIC provides the same evidential support regardless of the experiment that is assumed to have been conducted (see the ‘‘Appendix’’ for a proof of this result). Here, both unconstrained models have one estimated parameter (h) while the constrained models have 0 since h is assumed to be , and the difference in the number of estimated parameters between the constrained and unconstrained models is the same in both experiments. Since the rAIC provides the same comparisons here regardless of whether E1 or E2 was conducted, it appears to satisfy the PA in the same way that the likelihood ratio satisfies the PA. The apparent similarity between the likelihood ratio and the AIC framework to seem to obey the PA stems from a mathematical maneuver within the AIC 12

Whether Sober’s view is correct here goes beyond the objective of this paper.

13

For his recent position about the AIC, see (Sober 2008).

123

Empiricism and/or Instrumentalism? Table 2 AIC values under E1 or E2

Models

1033

AIC values

rAIC within E1 and E2

E1: Binomial

5.75

0

E1 if h =

7.15

1.4

E2: Negative Binomial

7.95

0

E2 if h =

9.35

1.4

framework. When we compare AIC values within E1 or E2, the constant cancels out, leading to equivalent evidential support for h in either experiment (see the ‘‘Appendix’’ for this proof). However, it is in the goal of the AIC where we should find the justification for the use of the AIC framework and not in some mathematical maneuver. The goal of the AIC is to estimate the expected K-L distance that invokes all possible realizations of the data. Table 2 captures it in a unique way although it is not apparent at first sight. In Table 2, the AIC values under column 2 are calculated relative to a true model, where each calculated AIC value marks a relative distance measure from that true model. Other than providing distances to the true model, the magnitude of the AIC values signifies very little. Each number of Table 2 in column 2 represents the AIC value under that corresponding model in the row showing an estimated expected KL distance for each model. We have already discussed that the estimated predictive accuracy rests on the estimate of the expected KL distance, and that the use of the expected KL distance violates the PA by invoking all possible realizations of the data. The last column in Table 2 shows the rAIC values compared either under E1 or E2. These rAIC values are not relative distance measures from the truth. They are differences in estimated distance to the truth comparing to the distance between the two models in E1 or E2. In column 3, we have noted rAIC values, under E1 or E2 compared at h = under E1 and E2, which provide the same evidential support for the parameter giving the impression that it satisfies the PA. However, rAIC is not theoretically different from AIC. Consequently, the computation of rAIC values has to begin with the same assumptions as were used to get the AIC values in the first place; so the goal of AIC is again preeminent. This goal is the goal to estimate the expected K-L distance. So whether we calculate AIC or rAICs, the goal of the AIC framework always exists explicitly or implicitly behind our calculations. In its goal as an estimator of the expected KL distance, the AIC framework violates the PA. Reviewing the formula for the AIC, one can see that its calculation also involves the number of estimated parameters along with the likelihoods, showing that the AIC is more than just the likelihood, in order to correct the bias in using just the likelihood to estimate the expected KL distance. One would also be able to appreciate how the AIC framework violates the PA if one would consider the notion of expected predictive accuracy that lies at the heart of Sober’s AIC-based instrumentalism. As an instrumentalist, he contends that the notion of expected predictive accuracy is the goal of science. The target of the expected predictive accuracy rests on all possible realizations of the data, which, in

123

1034

P. S. Bandyopadhyay et al.

turn, leads to his brand of instrumentalism. Thus, for him, instrumentalism follows from the AIC framework in which unobserved possible data play a crucial role. Fifth, and finally, noting that AIC violates the PA need not be counted as a criticism on the use of AIC. AIC-based frameworks have been both successfully and widely used in building models from data, with the AIC providing a means of comparing models in terms of support for the different models while also avoiding various issues associated with the p value. However, it is important to acknowledge the implications of the theoretical justification for the method one uses. The violation of the PA is one such implication of AIC.

5 A Conflicting Methodological Imperative and a Possible Objection In this section, we will discuss two points. First, we summarize our argument which we will brand as Sober’s conflicting methodological imperative. Second, we evaluate an objection to our impossibility result argument. Consider the first point first. Sober’s two accounts, empiricism and instrumentalism, pull in opposite directions. His account of empiricism exploits the PA, whereas his AIC-based account of instrumentalism violates it. Further evidence of this tension is indicated if one combines Sober’s instrumentalism with his empiricism. Recall the earlier example involving two experiments, E1 and E2, in which an agent will evaluate whether the proportion of males and females is the same in a population of birds. According to the example, E1 consists of sampling 30 birds, whereas E2 consists of sampling birds until one gets 10 females. We came to know from the example that we have obtained 20 male birds as our data. We argued that it is possible that an agent could have performed both E1 and E2 successively, and found 20 males out of 30 birds in both cases. However, we don’t know which experiment she conducted. If the agent accepts PA, then she is bound to say that E1 and E2 are on the same epistemological footing regarding the true proportion, whether she decides to conduct E1 or E2. The rationale for this is that 20 males out of 30 birds are actual observed data. However, she could also reject the PA on the grounds that it is possible that she could have gotten outcomes in E2 different from what she actually obtained. Therefore, she might conclude that E1 and E2 are not necessarily on the same epistemological footing regarding the probability of the hypothesis being true. From a broader philosophical perspective, like the agent in our example, acceptance and rejection of PA generates a tension in Sober’s philosophy. This tension is what we call ‘‘a conflicting methodological imperative’’ in his philosophy. Although depending on which choice about PA the agent makes, she is led to holding incompatible stances toward PA, we have explained in Sect. 3 why this tension in selection of the correct hypotheses in two scenarios does not get reflected and subsequently could be confusing for the reader that AIC in fact relies on the PA instead of violating the PA. This ends the summary of our argument for Sober’s conflicting methodological imperative. Consider a possible objection to our argument. We argued that the AIC framework violates the PA. The objector, however, contends that this is a mistake, contending that the AIC actually respects it. According to this objection, the PA

123

Empiricism and/or Instrumentalism?

1035

needs to be taken in the AIC framework as a principle of hypothesis choice, a decision rule. This is what we call the decision-based understanding of the PA to be distinguished from the epistemic understanding broached in Sect. 1 as a criterion of evidence. The objector continues ‘‘if there is a decision rule which suggests decisions that depend only on the available evidence, and if someone tries to justify the rule with an argument which refers to unobserved entities, this does not change the fact that the decisions suggested by the rule depend only on the available evidence.’’14 In our bird example, 20 male birds out of 30 birds are actual/observed data; we should make our hypotheses with respect to them, even though the spelling and grounding of the AIC requires that we consider all possible realizations of data when we estimate the expected K-L distance. Thus, the objector has offered a decision-based distinction that the AIC framework, after all, respects the PA, not in the estimation of the K-L distance (an epistemic consideration), but in the hypothesis we eventually should choose (a practical consideration). Our response is that if this sort of response were taken seriously, it would have consequences reaching far beyond the plausibility of the AIC framework. Recall our earlier argument that classical statistics, when it involves the logic of significance testing, disobeys the PA. In our bird example, when we computed the likelihood functions under two models, Binomial and Negative Binomial models, we computed the likelihood functions under each model relative to the observed data, Y = 20. Significant testing involves our actual/observed data to be a sample of 20 male birds out 30, although the potentially observed ‘‘more extreme’’ Y [ 21, Y [ 22 and so on have entered in significance testing computations. If the decision rule in this example suggests that a decision regarding the null hypothesis (i.e., h = 1/2) ought to be made only on the available data then, it follows, according to this response, that the logic of significance testing in fact obeys the PA, although the logic of significance testing spells out and grounds the rule by reference to unobserved data. Otherwise stated, if the PA is re-cast as a ‘‘decision rule’’ in the way indicated, then the ongoing epistemic debate concerning what to count as ‘‘evidence’’ between those who accept the PA and those who question it should be regarded as pointless. From our point of view, the response goes forward simply by changing the subject, and in the process treating merely possible data instrumentally, as posits for the sake of clarification and explanation. One could, of course, claim that the considerable discussion among statisticians concerning the status of the PA has been beside the point, but this involves throwing out a great deal of bathwater just to save one’s (controversial) baby. However, a more charitable construal of the objection is possible, although it won’t in the final analysis salvage the objection. The PA taken as a decision rule could be given (i) a narrow interpretation and (ii) a wide interpretation. According to a narrow interpretation, the decision regarding which hypothesis to be chosen should be based on an investigator’s stopping rule (e.g., to stop at 30 sampling of birds consisting of males and females or stop at whenever one will obtain 10 females.) The wide interpretation implies that the decision regarding which hypothesis to be chosen should be based on available evidence. The objection 14

This objection has been raised by a critic after reading the previous version of the paper.

123

1036

P. S. Bandyopadhyay et al.

makes use of a wide interpretation (ii) although as we have just claimed the only way to make sense of the contemporary debate about the status of the PA in the foundation of statistics is to construe it narrowly (i). Both interpretations assume that the PA consists of a decision rule and a theoretical justification for the former. According to the narrow interpretation, the rationale for trusting its decision rule is rooted in the PA, whereas the wide one takes the justification for its decision rule to fall back on the expected K-L distance that invokes both actual and possible data. Consider our bird example to see how both interpretations work. In both narrow and wide interpretations, how the decision rule functions in the bird example is the same, i.e., one should base one’s judgment about theories/models only on actual/ observed data in which actual/observed data are 20 male birds out of 30 birds. There is no disagreement between them about how the PA as a decision rule works with regard to the bird example. However, their justification for the use of the rule falls apart. In the case of the narrow interpretation, the justification for the decision rule could be that we have seen 20 male birds out of 30 birds, and this is why the rule should rely only on actual/observed data. This justification is rooted in the epistemological feature of the PA. The former, as we can see, does not involve any reference to unobserved evidence. In contrast, the justification for the wide interpretation involves reference to all possible data because of integration over the sample space. In fact, Akaike’s (1973) theorem makes reference to unobserved entities, because the definition of unbiasedness contains an integral which extends over the space of all possible samples. In our bird example, if the investigator were to repeat the experiment, as we have noted in Sect. 3, the number of male birds would be likely to vary, possibly 21 male birds out of the total 30 in Binomial experiment, or 21 male birds out of 31 in Negative Binomial experiment. These are some of the other possible realizations of the data. So to make the debate over the PA worthwhile, the debate over acceptance and rejection of the PA really boils down to which justificatory rationale each statistical school (e.g., classical statistics and AIC followers on the one hand, and Bayesians and likelihood followers, on the other) invokes. Classical statisticians and AIC followers subscribe to a wide interpretation, and Bayesian and Likelihood followers to a narrow interpretation about the epistemological status of the PA. In light of our above construal of the PA, it becomes evident that whether the AIC framework violates the PA depends on which construal, wide or narrow, turns out to be crucial for a proper understanding of the PA. The rationale for rejecting a wide interpretation of the PA is that it disregards contemporary debate over the status of the PA in the foundations of statistics. The upshot of this section is to reinstate our objection that Forster and Sober’s AIC framework violates the PA in its goal. Recall that its goal is to find the expected K-L distance which rests on all possible realizations of sample data. The use of AIC rests on that goal which provides a theoretical justification for its use in the first place. Use of any statistical inferential methodology requires a theoretical justification regarding why one should trust it, otherwise any successes or failures of a methodology would be anecdotal not inferential. Since the AIC methodology draws its justification from the estimate of the K-L distance from all possible realizations of data, the core of the application of the AIC methodology lies in its theoretical foundation. And it is

123

Empiricism and/or Instrumentalism?

1037

precisely at this point where the AIC framework violates the PA. Our point, to put it as simply as possible, is that the AIC theorist wants to have it both ways: possible data (in the spelling out and grounding of Akaike’s theorem) and actual data (in the choice between rival hypotheses). In the process, ‘‘empiricism’’ has been emptied of its traditional epistemic content. Indeed it would seem to have very little content left. There is a broader implication of questioning the objector’s wide interpretation of the PA. The objector’s way of construing the PA as a decision rule makes the rule vacuous. If the possible objection stemming from a wide interpretation is about the PA being a decision rule, then (most) statistical methods would be taken to satisfy the PA, because all use actual data to make decisions. However, as we have already reported, it is the justification for the decision rule which is most important, and it is at this level that it matters whether the PA is violated or not. In previous sections, we discussed the epistemological aspects of the debates over the PA, focusing on the justification for the methods, which figures prominently in the assessment of the narrow and wide interpretations. This is why we think that epistemological features for the grounding of the PA are the most significant aspects in appreciating the debate over the status of the PA in scientific inference as well as for the possibility of empiricism-instrumentalism coupling.

6 Summing Up We argued that Sober’s two accounts, (i) the AIC-based notion of instrumentalism, which violates the principle of actualism, and (ii) his empiricism, which respects it, have led to a tension in his impressive philosophical work. However, the purpose of this paper is not just to report a tension in Sober’s work. A larger significance of this paper lies in making explicit the bearings of our varying intuitions about the principle of actualism on our understanding of scientific inference and whether the empiricism/instrumentalism coupling can be conjoined consistently. These intuitions are so fundamentally rooted in us that whether one is a Bayesian/ Likelihoodist15 or a non-Bayesian/Akaikean turns on whether one rejects or accepts PA. Sober realizes how fundamental, yet divergent our intuitions about PA are. Torn between them, he wants to have the best of both views on scientific inference, Bayesian and non-Bayesian. We have touched on how two Bayesian principles, the likelihood principle and the Conditionality Principle, have motivated two consequences of PA. In the case of Sober, what is significant is that these conflicting intuitions about PA have generated two divergent but at the same time inter-related views about his stances toward scientific inference and the empiricism/instrumentalism coupling. The first lesson we would like to take home from this is that perhaps one should adopt a definite stance toward scientific inference; one cannot have both a little bit of Bayesianism coupled with a little bit of non-Bayesianism in one’s 15 Sober defends a likelihood approach to both evidence and scientific inference. Although there are fundamental differences between a likelihoodist and a Bayesian, for our present purpose, those differences are unimportant.

123

1038

P. S. Bandyopadhyay et al.

epistemology of inference. Otherwise, like Sober, we are likely to end up with a conflicting methodological imperative. The second lesson would be that his conflicting stances toward PA results in a conceptual difficulty in combining empiricism with instrumentalism. Whether this conceptual difficulty for the empiricism/instrumentalism coupling is much more general remains a desideratum for further study.16 Acknowledgments We would like to thank Prajit Basu, John G. Bennett, Robert Boik, Abhijit Dasgupta, Michael Evans, Roberto Festa, Dan Flory, Malcolm Forster, Jayanta Ghosh, Dan Goodman, Jason Grossman, Joseph Hanna, Bijoy Mukherjee, Megan Raby, Tasneem Sattar, Mark Taper, Susan Vineberg, and C. Andy Tsao for discussion/comments regarding the content of the paper. We are thankful to three anonymous referees of this journal and a dozen other referees from different journals for their helpful feedback. We owe special thanks to Elliott Sober for serving as an official commentator for our paper at the APA divisional meetings, and John G. Bennett for several correspondences regarding the issues raised in the paper. The research for the paper has been funded by our university’s NASA’s Astrobiology Center (Grant No. 4w1781) with which some of the authors of this paper are affiliated.

Appendix A: Proof of equivalent rAICs if the equivalent condition for two experiments (PAT) holds Recall the PAT: two experiments, E1 and E2, provide equal evidential support for a model, that is, the parameter in question, if and only if their likelihood functions are proportional to each other as functions of the models, and therefore, any inference about the model based on these experiments should be identical. If we assume that under two different experimental designs, E1 and E2, their likelihoods have the same proportionality under two different models regarding the same parameter, then, it follows from the PAT that both models provide equal evidential support for ^ and L1 ðhÞ as the likelihoods under E1 for the the parameter. First, define L1 ðhÞ model using the MLE of h and the model with h = 0.5, respectively. We could ^ and L2 ðhÞ similarly. We also define the likelihood ratio for each define L2 ðhÞ ^ ¼ c1 L1 ðhÞ and L2 ðhÞ ^ ¼ c2 L2 ðhÞ. experiment as the constants c1 and c2 in L1 ðhÞ These constants are also the likelihood ratios reported in the Table (1) above, which weighs the evidence between the two models in either E1 and E2. Then if c1 = c2 = c, then we find equal evidential support using the likelihood ratio, satisfying the PAT. Now we show how if the PAT holds, then rAIC will also provide equal evidence between the hypotheses. First, we assume that the AIC is reduced for the unconstrained model. Also, we use dim(h) to indicate the number of parameters that are estimated. Then consider the definition of rAIC for, say, E1:

16 Van Fraassen’s constructive empiricism that combines empiricism with instrumentalism also leads to a contradiction (see Bandyopadhyay 1997).

123

Empiricism and/or Instrumentalism?

1039

^ 2 dimðhÞ ^ rAIC ¼ AICh AICh^ ¼ 2 logðL1 ðhÞÞ þ 2 dimðhÞ þ 2 logðL1 ðhÞÞ . ^ þ 2 dimðhÞ 2 dimðhÞ ^ ¼ 2 log L1 ðhÞ L1 ðhÞ . . ^ ¼ 1=c ¼ L2 ðhÞ L2 ðhÞ ^ by the PAT. Thus, rAIC will be the same But L1 ðhÞ L1 ðhÞ under E1 or E2 if the PAT holds and the difference in the dimensions of h and h^ are the same across the experiments. Note that this is a stronger condition than just assuming that the PAT holds since it also involves the number of estimated parameters in the constrained and unconstrained models, in addition to proportionality of the likelihoods. In our example, we can see specifically how the rAIC provides the mathematical result of equivalent evidence between the hypotheses regardless of which experiment is E1 or E2. For the Binomial model, E1, the likelihood is f1 ð20jhÞ ¼ performed, 30 20 h ð1 hÞ10 . With h = 0.5, under the constrained hypothesis, the likelihood is 20 30 f1 ð20jhÞ ¼ 0:520 ð0:5Þ10 . The AIC is calculated as -2log(L(h)) ? 2dim(h), 20 30 which, for the constrained model is AICh ¼ 2 log 0:520 ð0:5Þ10 þ 2 dimðhÞ 20 30 30 10 20 2 log 0:520 ð0:5Þ10 . ¼ 2 log 2 log 0:5 ð0:5Þ þ 2 0 ¼ 2 log 20 20 30 ^20 ^ 10 þ h ð1 hÞ For the unconstrained model, we find AICh^ ¼ 2 log 20 30 2 log 0:6720 ð0:33Þ10 þ 2 1 and the rAIC to be 2 dimðhÞ ¼ 2 log 20 30 30 rAIC ¼ AICh AICh^ ¼ 2 log 2 log 0:520 ð0:5Þ10 þ 0 þ 2 log þ 20 20 2 log 0:6720 ð0:33Þ10 þ 2 ¼ 2 log 0:520 ð0:5Þ10 þ2 log 0:6720 ð0:33Þ10 þ 2, showing that the constant which is the only difference in the likelihoods between E1 and E2, cancels when we consider rAIC in this situation. 29 20 h For the Negative Binomial model, E2, the likelihood is f2 ð20jhÞ ¼ 9 ð1 hÞ10 . Withh = 0.5 under the constrained hypothesis, the likelihood is 29 29 0:520 ð0:5Þ10 . So the AICh ¼ 2 log 0:520 ð0:5Þ10 þ 2 dimðhÞ ¼ f2 ð20jhÞ ¼ 9 9 29 29 2 log 2 log 0:520 ð0:5Þ10 þ 2 0 ¼ 2 log 2 log 0:520 ð0:5Þ10 . For 9 9 29 ^20 10 ^ h ð1 hÞ þ 2 dimðhÞ ¼ the unconstrained model, we find AICh^ ¼ 2 log 9 29 2 log 2 log 0:6720 ð0:33Þ10 þ 2 1 and the rAIC to be rAIC ¼ AICh 9

123

1040

P. S. Bandyopadhyay et al.

AICh^ ¼ 2 log

29 9

2 log 0:520 ð0:5Þ10 þ 0 þ 2 log

29 9

þ 2 log 0:6720 ð0:33Þ

10

þ

2 ¼ 2 log 0:520 ð0:5Þ10 þ 2 log 0:6720 ð0:33Þ10 þ 2 as in E1, showing that since the constant cancels in either experiment, that same rAIC is found. The constant also cancels in the likelihood ratio when the likelihood ratios in either E1 or E2 are considered.

References Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Second international symposium on information theory (pp. 267–281). Budapest: Akademia Kaido. Bandyopadhyay, P. S. (1997). On an inconsistency in constructive empiricism. Philosophy of Science, 64, 511–514. Bandyopadhyay, P. S. (2007). Why Bayesianism? A primer on a probabilistic philosophy of science. In S. K. Upadhyay, U. Sing, & D. K. Dey (Eds.), Bayesian statistics and its applications (pp. 42–62). New Delhi: Anamaya Publishers. Bandyopadhyay, P. S., & Boik, R. (1999). The curve-fitting problem: A Bayesian rejoinder. Philosophy of Science, 66, S390–S402. Bandyopadhyay, P. S., Boik, R., & Basu, P. (1996). The curve-fitting problem: A Bayesian approach. Philosophy of Science, 63, S265–S272. Bandyopadhyay, P. S., & Brittan, G., Jr. (2001). Logical consequence and beyond: A close look at model selection in statistics. In J. Woods & B. Hepburb (Eds.), Logical consequence (pp. 1–17). Oxford: Hermes Science Publishing Company. Berger, J. (1985). Statistical decision theory and Bayesian analysis. New York: Springer. Birnbaum, A. (1962). On the foundations of statistical inference. Journal of the American Statistical Association, 57, 269–306. Boik, R. (2004). Commentary. In M. Taper & S. Lele (Eds.), The nature of scientific evidence (pp. 167–180). Chicago: The University of Chicago Press. Box, G. (1979). Robustness in scientific model building. In R. I. Launer & G. N. Wilkinson (Eds.), Robustness in statistics (pp. 201–236). New York: Academic Press. Boyd, R. (1985). Observations, explanatory power, and simplicity. In P. Achinstein & O. Hannaway (Eds.), Observation, experiment, and hypothesis in modern physical science. Cambridge, MA: MIT Press. Burnham, K., & Anderson, D. (1998). Model selection and inference. New York: Springer. Cassella, G., & Berger, R. (1990). Statistical Inference. California: Wadsworth & Brooks. Eells, E. (1993). Probability, inference, and decision. In J. Fetzer (Ed.), Foundations of philosophy of science (pp. 192–208). New York: Paragon House. Evans, M. (2013). What does the proof of Birnbaum’s theorem prove? Electronic Journal of Statistics, 1–9. Forster, M., & Sober, E. (1994). How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions. British Journal for the Philosophy of Science, 45, 1–35. Forster, M., & Sober, E. (2004a). Why likelihood? In M. Taper & S. Lele (Eds.), The nature of scientific evidence. Chicago: University of Chicago Press. Forster, M., & Sober, E. (2004b). Rejoinder to why likelihood? In M. Taper & S. Lele (Eds.), The nature of scientific evidence. Chicago: University of Chicago Press. Godfrey-Smith, P. (1999). Procrustes probably. Philosophical Studies, 95, 175–180. Good, I. J. (1983). The white shoe is a red herring. First published in BJPS, 1967, 17: p. 332. Reprinted in good thinking. Minnesota: University of Minnesota Press. Hacking, I. (1965). Logic of statistical inference. Cambridge: Cambridge University Press. Hanna, J. (2004). Contra Ladyman: What really is right with constructive Empiricism. British Journal for the Philosophy of Science, 55, 767–777.

123

Empiricism and/or Instrumentalism?

1041

Howson, C., & Urbach, P. (2006). Scientific reasoning: The Bayesian approach (3rd ed.). La Salle, IL: Open Court. Jeffreys, W., & Berger, J. (1992). Ockham’s Razor and Bayesian analysis. American Scientists, 80, 64–72. Kitcher, P. (1993). The advancement of science. New York: Oxford University Press. Kitcher, P. (2001). Science, truth, and democracy. Oxford: Oxford University Press. Kullback, S., & Leibler, R. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79–86. Ladyman, J. (2000). What’s really wrong with constructive Empiricism? Van Fraassen and the metaphysics of modality. British Journal for the Philosophy of Science, 51, 837–856. Miller, R. (1987). Explanation, confirmation, and reality in the natural and social sciences. Princeton: Princeton University Press. Pawitan, Y. (2001). All likelihood: Statistical modelling and inference using likelihood. Oxford: Oxford University Press. Roberts, C. (1994). The Bayesian choice. New York: Springer. Royall, R. (1997). Statistical evidence. New York: Chapman & Hall. Sober, E. (1988). Reconstructing the past: Parsimony, evolution, and inference. Cambridge: MIT Press. Sober, E. (1993). Epistemology for empiricists. In P. French, T. Uehling, & H. Weinstein (Eds.), Midwest studies in philosophy, XVIII. Sober, E. (1995). From a biological point of view. Cambridge: Cambridge University Press. Sober, E. (1996). Parsimony and predictive equivalence. Erkenntnis, 44, 167–197. Sober, E. (1998). Instrumentalism revisited. Critica, 31, 3–38. Sober, E. (2001). Instrumentalism, Parsimony, and the Akaike framework. Philosophy of science (Proceedings), Fall. Sober, E. (2008). Evidence and evolution: The logic behind the science. Cambridge: Cambridge University Press. Swinburne, R. (2001). Epistemic justification. Oxford: Clarendon Press. Van Fraassen, B. (1980). The scientific image. Oxford: Oxford University Press. Van Fraassen, B. (1989). Laws and symmetry. Oxford: Oxford University Press.

123

Lihat lebih banyak...

Empiricism and/or Instrumentalism?

Descripción

Comentarios