An empirical comparison of neural network and logistic regression models

Share Embed


Descripción

Marketing Letters 6:4 (1995): 251-263 © 1995 Kluwer Academic Publishers, Manufactured in the Netherlands

An Empirical Comparison of Neural Network and Logistic Regression Models AKI-HL KUMAR

College of Business, University of Colorado, Boulder, CO 80309-0419 VITHALA R. RAO

S.C. Johnson Graduate School of Management, CorneU University, Ithaca, NY 14853-4201 HARSH SONI

S.C. Johnson Graduate School of Management, Cornell University, Ithaca, NY 14853-4201 Key words: neural networks, logistic regression, back-propagation, empirical comparison, sigmoid fianction, C-Index

Abstract The purpose of this paper is to critically compare a neural network technique with the established statistical technique of logistic regression for modeling decisions for several marketing situations. In our study, these two modeling techniques were compaw.xi using data collected on the decisions by supermarket buyers whether to add a new product to their shelves or not. Our analysis shows that although neural networks offer a possible alternative approach, they have both strengths and weaknesses that must be clearly understood.

1. Introduction Several standard statistical approaches exist in the literature for building models from data-for example, linear regression, discriminant analysis, and logistic regression. These models are usually estimated either by least squares or maximum likelihood methods. The general goal is to find a mathemaücal relationship (such as, linear or polynomial) between the independent variables and the final decision. However, very often the researcher initially does not understand the nature of the relationship. Neural networks offer an approach for detecting patterns in relationships, especially when the relationships are complex and cannot be easily expressed in a mathematical form. In recent years, there has been a surge in the applications of neural networks in the area of finance (see Schwartz, 1992) to model the stock market behavior. Tam and Kiang (1992) describe an applicaüon of neural networks to predict bank failures. Among other recent business applicaüons, neural networks have been used to classify tourists into market segments (Mazanec, 1992) and to describe information processing of persuasive communications (Briesch and Iacobucci, 1993). Interest among researchers for applying neural networks to marketing applications has been growing recently. Some evidence for this may be found in Robins (1993) and Bessen (1993). Sharda (1994) provides a bibliography of applicafions of neural networks and suggests its promise for extended use in practice.

252

AKHIL KUMAR, VITHALA R. RAO, AND HARSH SONI

Given this increasing interest, it is important to evaluate how weil a neural network technique performs in comparison with extant statistical techniques. This paper empirically compares and analyzes a neural network model against a logistic regression model. We model the deeision-making behavior of a supermarket chain that receives information about hundreds of new products every week and must decide whether to carry a certain product or not. This decision is clearly important to both manufacturers and supermarkets, owing to the high costs of product development at the manufacturer level and to competition for limited shelf space at the retailer level. The motivation for applying neural network methods to this problem was the belief that subtle patterns exist in past decisions made by a supermarket chain that are not easily tractable because the interactions between the underlying variables are complex. In this paper, we describe how a neural network model can be built with a large amount of real data and offer some evidence on its predictive validity relative to logistic regression. We also highlight the pros and cons of this approach.

2. Neural network methods A neural network consists of several layers of nodes: an input layer, an output layer, and zero or more hidden layers. The input layer consists of one node for each independent variable, while the output layer consists of one or more nodes that correspond to the final decision. (For a dichotomous decision, only a single output whose value ranges from 0 to 1 would suffice.) The hidden layers lie in between, and each consists of several nodes that receive inputs from nodes in layers below them and feed their outputs to nodes in layers above. Figure 1 shows an illustration of a neural network for six inputs, three intermediate nodes, and one f'mal output. The number associated with each link is called a weight (or a coefficient). These networks are calledfeed-forward networks beeause nodes can feed their outputs in only one (forward) direction. Although other network configurations are possible, we restrict ourselves to such networks. As shown in Figure 1, in general, a hode can have several inputs and outputs. Also, zero or more hidden layers could be present. The external inputs received by the input layer nodes are directly fed into the nodes in the first hidden layer. The outputs of the nodes in the first hidden layer become inputs for the nodes in the second hidden layer. In general, the outputs of nodes in layer i serve as inputs for nodes in layer i + 1. Given such a network and a set of external input values, it is possible to compute the final output in an iterative manner by working upward from the input layer, one layer at a time. To determine the output of a node (say, node i), first, compute the net input into hode i, neti; and then, convert net/by applying a transformaüon function. The net~ received by node i is equal to the weighted Sum of all inputs fed into it by all nodes j, whose output is connected to i--that is, neti = ~j wji o i, where oj is the output of node j, and w ii is the weight on link ji. In the next step, the output of hode i is computed by applying a sigmoid transformation funcüon as: oi = 1/[1 + exp(-neti)]. Thus, in a feedforward neural network, each hidden layer hode and output hode produces a value between 0 and 1 because of the sigmoid transformation function. Other länds of transformation funcüons are also possible; however, the sigmoid is by far the most common (Rumelhart and McClelland, 1986; Wasserman, 1989).

NEURALNETWORKANDLOGISTICREGRESSlONMODELS

253

output layer / ~

~w~0 / w7,10 /

hiddenllafer "k~') input layer Q~ /

x1

w-w~ ~ ~

x2

we

o

~w9'10

~¢Q~~. w59

a (~B ~ 48~~~) ~

x3

x4

~ ~/

x5

w69 ~(~

x6

Figure 1. An illustrationof a neuralnetwork.

The discussion so far assumed that the network design--that is, the connectivity--and the weights were known. Now, we describe how the network is designed and weights are assigned to links. The design is performed partly by intuition and partly by trial and error. If an initial design does not produce good results on testing, then it is modified. For each candidatedesign, the best weightsare determinedby nmning a training algorithm. Usually, about one-half to two-thirds of the data set is used for training the network, and the rest is reserved for testing. Once the network has been trained on the training data set, then it is validated on the testing data set to evaluate the success of the training. The most common method for determining the weights is caUedthe back-propagation algorithm. Ttfis methodstarts with small, random, iniüal weightsand computes(as described above) the outputs for each training sample. The output produced (from these weights) is comparedagainstthe correct output, and the differenceis fed back (or "back pmpagated") for adjusting the weights. The weights of the links going into a node are thus adjusted in proportion to the difference in the net input that would produce this desired change, using a learning rate parameter as a proportionality constant. This exercise is repeated for every observaüon in the training set, and the weights are adjusted after each observation is processed. One epoch consists of making one pass through all the samples in the training set. After running a few hundred epochs in this manner, it is orten possible to make the system convergeand to obtain a reasonably good set of weights. If the systemdoes not converge, then the various weight parameters are modified. If varying the parameters is also unsuccessful, then the network design is modified by changing some links, adding or deleting nodes, or introducing an additional layer.

254

AKHILKUMAR,VITHALAR. RAO,AND HARSHSONI

3. Building a neural network for the supermarket decision The back-propagation technique was used to develop a neural network (NN) model for the supermarket decision for new product acceptance. Data were gathered on a large number of variables that related to the decision of a supermarket chain whether or not to carry new products that were offered to them. The fifteen variables 1 identified as salient for this decision in past research (Rao and McLaughlin, 1989) were grouped into five groups2 as follows: fmancial (gross margin, profit, and opportunity cost), compeüüon related (number of compeüng firms and competing brands), marketing strategy related (product uniqueness and vendor effort), terms of trade (oft invoice, slotting allowance, bin back, free cases, low price, and medium price), and growth and synergy (expected category growth and synergy dummy). The total number of observations used was 1,048. A hierarchical three-layer neural network with an input layer, an output layer, and one hidden layer was considered (similar to the one shown in Figure 1). In our design, there are five nodes in the hidden layer, one for each group of variables. The input nodes for all the variables in a given category are connecteMto the corresponding node for that category in the hidden layer. For example, there are three variables in the financial group: gross margin, profit per shelf volume, and opportunity cost. The nodes for these three variables in the input layer are connected to the financial group hode in the hidden layer. The hidden layer nodes are in turn all connected to a single output node/Since the final decision is binary (0 or 1), the output of this node is the predicted value of the decision. If the output is 0.5 or more, then the decision is assumed to be an acceptance, while if it is less than 0.5, then the decision is a rejection. To train the network, we used the University of Nevada's Nevada Prop package, which implements the back-propagaüon algorithm. This package, which was developed at Carnegie Mellon University and the University of Nevada, is available in the public domain as free software and runs on UNIX, DOS, and Macintosh systems. This package allows the user to specify the neural network design (in terms of the number of input, hidden, and output nodes and their connections), and the values of various parameters such as the range of the initial random weights, the learning rate, and so on. The package also accepts the training data and produces reports periodically, giving its classification accuracy and the root mean square error (RMS) and C-Index values for the classificaüon. The C-Index is computed by comparing all possible pairs of one reject and one accept decisions and counting the number of pairs for which the predicted value of the reject decision is smaller than the predicted value of the accept decision. Such pairs are called concordant pairs, as distinct from discordant pairs, where the opposite is true. If the two predicted values are equal, then the pair is caUed a tied pair. The C-Index is computed as C-Index =

Number of concordant pairs + 0.5 × Number of tied pairs Total number of palrs

If all pairs are concordant, the C-index will be 1, indicating a perfect fit. The C-index will be zero when all pairs are discordant, showing that the fit is the worst. Intermediate values between zero and 1 indicate varying levels of fit.

NEURAL NETWORKAND LOGISTICREGRESSIONMQDELS

255

4. Empirical comparisons There were 1,048 observations or records in our data set, out of which the decision was reject in 770 cases and was accept in 278 cases. In this section we compare the results of running the back-propagation algorithm for the neural network described above with the results from the logistic regression approach (described in Rao and McLaunglin, 1989), for the sample as a whole and for each of the four product groups. This procedure is summarized briefly below.

4.1. Logistic regression model (LR) In the logistic regression approach, the relationship between the final decision and the independent variables is expressed as Pj = 1/[1 + exp(-ot - /3'xj)], where Pj = probability of acceptance of the jth item, Xj = (P × 1) vector of descriptors measured for the jth item,/3 = (p X 1) vector of parameters, and a = an intercept term. Notice that, in logistic regression also, the vector multiplication simply involves multiplying each variable by a coefficient or weight,/3, and computing the sum of all these products, along with a constant term, ot. This is then transformed by applying a logistic function, which is identical to the sigmoid function. The LOGISTIC maximum likelihood procedure available in the SAS package was utilized to solve this model and determine the weights corresponding to the fifteen variables listed before.

4.2. Classification tests for neural network model versus logistic regression The results for the first set of tests, based on the whole data set, are given in the top panel of Table 1. In this table, we compare the performance of the neural network and logistic regression rnodel for the cutoff of 0.5 on three criteria: (1) percentage of accepts accurately classified, (2) percentage of rejects accurately classified, and (3) overall percentage of accurate classifications. This table clearly shows that the neural network outperforrns the logistic regression mode1. In particular, note that in terms of overall classification accuracy (last two columns of the table), the neural net is at least as good, if not much superior, and the irnprovement ranges from 1.14 percent (category 4) to 17.22 percent (category 2). Table 1 also compares the performance of the neural network and the logistic regression models on RMS error and C-Index values when the cutoff of 0.5 was used in the classification. Again, we find that the neural network performs better than the logistic regression model (with the exception of All, where the neural network is better on the C-Index but worse on the RMS criterion). These indices are consistent with the criterion of percent correctly classified. For instance, the RMS values for the logistic model in product categories 1 and 2 are considerably higher than the RMS values for the neural network, and this is reflected in the large difference in the overall classification performance of the two models. On the other hand, in product categories 3 and 4, where the differences in the RMS values are smaller, the gap in classification performance is either small or does not exist. Finally,

256

AKHIL KUMAR, VITHALA R. RAO, AND HARSH SONI

Table I. Classificaüon performance of logistic regression and neural network models.

Product Category

Percent Correct Predictions Among Accepts

Percent Correct Predictions Among Rejects

LR

LR

NN

Percent Conv~t Predicüons Among All Cases

NN

LR

NN

< 0.5 = reject; ->0.5 = accept 1

71.23

2 3 4 All

37.50 93.33 80.00 41.40

82.19 90.00 93.33 88.00 50.36

91.21

90.09 94.44 93.65 93.10

93.96 94.59 94.44 100.00 93.38

85.49 76.16 94.20 89.77 79.40

90.59 93.38 94.20 90.91 81.97

76.86 68.87 91.30 86.37 68.99

87.06 83.44 94.20 87.50 69.98

70.59 63.57 91.30 75.00 53.63

81.57 78.15 94.20 81.82 50.57

0-0.375 = reject; 0.625-1 = accept 1 2 3 4 All

58.90 27.50 80.00 80.00 25.90

80.82 62.50 93.33 68.00 29.50

84.07 83.78 94.44 88.89 84.55

89.56 90.99 94.44 95.24 84.38

0-0.25 = reject; 0.75-1 = accept 1 2 3 4 All

39.73 27.50 80.00 52.00 12.59

68.49 57.50 93.33 64.00 12.95

82.97 76.58 94.44 84.13 68.44

86.81 85.59 94.44 88.89 64.16

RMS Error and C-Index Values for the 0.5 Cutoff RMS Error

C-Index

Category

LR

NN

LR

NN

1 2 3 4 All

0.325 0.377 o. 157 0.292 0.356

0.287 0.244 0.147 0.263 0.369

0.904 0.851 0.925 0.930 0.823

0.914 0.973 0.988 0.899 0.838

o b s e r v e that the classification p e r f o r m a n c e by product c a t e g o r y is considerably better than the classifiction for all the categories together, for both the neural n e t w o r k and the logistic r e g r e s s i o n m o d e l s . This suggests that there are certain features unique to each c a t e g o r y that cannot be easily captured in an analysis o f the data set as a w h o l e .

4.2.1. Sensitivity with respect to cutoff. In the comparisons described a b o v e , the threshold o r c u t o f f for accepting o r rejecting a product was set at 0 . 5 - - t h a t is, any prediction less than 0.5 was treated as a reject, while a prediction of 0.5 or h i g h e r was treated as an accept.

NEURAL NETWORKAND LOGISTIC REGRESSIONMODELS

257

Ideally, the predicted value must be as close to 0 as possible for a reject and in the neighborhood of 1 for an accept. A n output in the neighborhood of 0.5 lies in a gray area: it is hard to treat a value of 0.49 as a reject (or 0.51 as an accept) with much confidence. Therefore, two further comparisons were made in whieh the classification criterJa were made more stringent by reducing the fange of legitimate accept and reject values thereby increasing the gray region for classification. Table 1 also shows the results of classification tests in which the size of the accept and reject ranges was reduced to 0.375 and 0.25--that is, an output less than 0.375 (0.25) was treated as a reject and greater than 0.625 (0.75) as an accept. The results show that in most cases, the relative performance of the neural network irnproves even further when the classification criterion becomes more stringent. For instance, for category 4, the gap between the overall performance of the two models grows to 6.82 percent when the range is 0.25 versus 1.14 percent when the range is 0.5. 4.3. Validation tests

The observations in each food category were divided into two sets--the training set consisting of about two-thirds of the data set and the tesüng set consisting o f the remaining one-third observations. The observations in each set were chosen randomly; however, the proportions of accept and reject cases was maintained the same as in the respective food category taken as a whole. Identical training sets were used to build the logistic regression model and also to train the neural network model. These results are shown in Table 2 for the three cases where the size of the accept and reject ranges are 0.5, 0.375, and 0.25. Table 2. Validation tests of logistic regression and neural network models.

Category

Percent Correct Predictions Among Accepts

Percent Correct Predictions Among Rejects

LR

LR

NN

NN

Percent Correct Predicfions Among All Cases LR

NN

0.5 = accept 1

2 All

75.00 23.02 62.37

75.00 84.62 52.69

91.80

97.30 81.71

96.72 89.19 89.49

87.06 78.00 76.57

90.59 88.O0 79.71

83.53 76.00 62.28

89.41 84.00 64.86

75.20 76.00 47.14

82.35 84.00 44.00

0--0.375 = reject; 0.625-1 = accept 1

2 All

66.67 15.38 47.31

70.83 69.23 32.26

90.16

97.30 67.70

96.72 89.19 76.65

0-0.25 = reject; 0.75-1 = accept 1 2 All

37.50 15.38 24.73

70.83 69.23 21.25

90.16 97.30 55.25

86.89 89.19 59.92

258

AKHIL KUMAR, VITHALA R. RAO, AND HARSH SONI

The logistic regression model could be fitted only for product categories 1, 2, and All because the data sets for the other categories were small; the miscellaneous category was not analyzed because it included several product classes. Notice that the relative performance of the neural network improves dramatically in all three categories as the threshold becomes smaller. As the size of the accept and reject regions becomes smaller, the neural network does better for all three analyses. This suggests that the logistic model is making more predictions in the "gray" region. By discarding the prediction in this region, where the confidence in the prediction is low, the overall confidence in the predictions improves, and the neural network seems to perform considembly better in making such high-confidence predictions.

4.4. N o n l i n e a r i n t e r a c t i o n s in the L R m o d e l

It can be argued that the neural network performs better than the LR mode1 because the former has a hidden layer. This hidden layer aUows complex interactions between variables that lie in different groups or sets of elements. Even with one hidden layer, the relationship between the output node and various input nodes becomes very complex because the sigmoid transformation is applied twice and leads to a complicated mathematical form that does not lend itself to a closed form solution. However, one reasonable way for accornmodating nonlinear interactions between variables is to consider cross-product terms across all pairs of sets of elements. Therefore, we now compare the performance of the LR model when all such nonlinear terms are included with that of the neural network model. Note that this approach is not parsimcnious because it introduces 1/2(n2 - ~n 2) more parameters for the second-order terms, where n i is the number of variables in the ith node in the hidden layer of the network and r~k=1 n i = n. On the other hand, the neural network adds only K more parameters, one for each of the K hidden layer nodes. We did not consider second-order terms for variables that lie within a group because such interactions are not present even in the neural network. For a LR model using all the 1,048 observaüons with first-order interactions, the C-index was 0.895, as compared to 0.838 for the NN model (see Table 3). But the overall classification accuracy dropped to 78.6 percent (51.8 percent for accepts and 88.3 percent for rejects). Hence, cross-product terms do not improve the classification performance of the LR model significantly, 3 We encountered convergence problems for the LR model with interactions by product category due to the additional number of parameters related to number of observations. Table 3.

Classification performance after including first-order interactions. % Correct Predictions Among Method

C-index

Accepts

Rejects

All Cases

LR with linear terms NN LR including first-order interactions

0.823 0.838 0.895

41.40 50.36 51.80

93.10 93.38 88.30

79.40 81.97 78.60

NEURAL NETWORKAND LOGISTICREGRESSIONMODELS

259

Clearly, the LR model is less parsimonious than the neural network model when nonlinear interactions are also eonsidered. 4 Moreover, even though there is a small improvement in its performance, the results produced by the LR model are still inferior in comparison with the neural network.

4.5. Comparison of weights Let the weight on the link connecting the input hode i to the node j in the hidden layer be denoted by Oij, and the weight on the link connecting node j to the output node be denoted by Oj in our hierarchical network. Let L denote the number of hidden nodes in the network developed above. Recall that the network computes a weighted sum of the individual input nodes and applies the sigmoid transformatAon function to the sum to obtain the P-value of the hidden node. The weights are maximum when the P-values of various nodes are equal to 0.5. We calculate the maximum weight of the ith input variable for the neural network comparable to that of logistic regression using the formula Z~=l Oijd~j. The corresonding maximum impact for the ith input variable in the logistic regression is calculated as/5 i where Bi is the coefficient for the ith predictor variable. Table 4 shows the relative weights associated with the seven variables in the LR model that are significant, along with the MAXIMUM weights associated with the corresponding variables in the NN model. All but one variable have the same sign in both the models. But the magnitudes of the weights are quite different. It should be noted clearly that these weights are not entirely comparable for several reasons. First, as mentioned above the relative weights in the LR model arefixed, while in the NN model they are variable (the nurnbers in Table 4 representing only the maximum values). Second, the LR model is a one-stage model, while the NN model is a more complex, two-stage model as explained in section 2 and is solved heuristically by means of the back propagation method. Third, since each stage in the NN model is nonlinear, the weights d t not lend themselves easily to intuitive analysis (or physical explanation), and this is widely acknowledged by researchers in this area. Bearing these caveats in mind, our comparison is more in the nature of a "reality check" and not a complete explanation of the weights.

Table4. Maximumweights associatedwith the significantvariables in the LR and NN models (Category2). Variable

LR

NN

Gross margin Product uniqueness Vendor effort Bill back Free cases Low price Medium price

0.38 1.18 -0.56 3.49 1.68 -1.45 -1.36

-0.04 1.61 -.39 1.63 0.82 -1.23 -4.56

260

AKI-IILKUMAR,VITHALAR. RAO,AND HARSHSONI

5. Discussion

Although the implied weights of the two models appear to be different, the logistic regression and neural network approaches are similar in some ways. The logistic regression model computes a weighted sum of the input variables and then transforms it by the logistic function. Since the logistic and sigmoid functions are identical, the logistic regression method results in the design of a neural network with only one output hode and no hidden layer. Thus, it is possible to mimic a neural network with orte hidden layer by creating indices and then using them as inputs to a logistic regression analysis. The advantage of the logistic regression method is that there is a systematic procedure for solving this model that is implemented in the SAS package. In the oase of the neural network, the procedure for finding the best network design is somewhat arbitrary. Prior substantive knowledge of the decision problem will be useful in selecting an appropriate design. Determinaüon of the weights corresponding to the network selected is also somewhat arbitrary. The final weights depend on various parameters, in particular the iniüal weights, learning rates, and maximum step size at each iteration. Unfortunately, there is no known systematic procedure for setting these parameter values, and our own experience in trying to train the neural network confirms this view. We experimented with several different network designs, and in each case tried various values of initial weights and learning rates and eventually selected the hierarchical design described in this paper. We have tested this proposition for two different data sets. The first data were on the student's success at the end of the first year in an MBA program at a large university; success was defmed as being in the top one-third of the class. Using eight predicters such as undergraduate major, GMAT scores, age, gender, and so on, we found that the classification accuracy for the neural network model to be somewhat lower than that of the logistic regression model (72.4 percent versus 74.4 percent). The second data set was on direct marketing solicitations for an insurance product; the outcome variable was whether or not the individual responded to the solicitaäon. Interestingly, the proportion of positive outcomes in this dataset was as low as 1.37 percent. In this case, neural network predicted accurately 38 percent of the positive outcomes, while logistic regression predicted accurately none of them. In each case, once the neural network has been successfully trained, the results produced by it are often superior than those obtained from the logistic regression model for these data. This is because of the presence of the hidden layer. This layer serves a useful purpose of extracting features that relate to various aspects of the input data. As described earlier, the fifteen input variables were grouped into five different categories. Each category relates to one aspect of the supermarket's decision. One way of analyzing the supermarket's decision process is to view it as a two-step process: first, evaluate the new product on each of the five categories of variables and, next, combine the individual category evaluations to produce the final decision. On the other hand, the logistic regression approach combines all fifteen variables together by a linear function and then transforms the result into a range of 0 to 1. This means that if a product has very large values on the variables in the financial group, this could compensate for lower evaluations on the other variable groups. However, in the neural network model such compensation is less likely because the maximum value for the evaluation of each variable group lies between 0 and 1, since each node in the hidden layer generates an output between 0 and 1.

NEURAL NETWORKAND LOGISTICREGRESSIONMODELS

261

Hence, the pattern recognition abilities of the neural network are superior because it extracts features associated with variable groups and then makes a decision. Consequently, it makes fewer errors and also classifies fewer items in the gray area. This is confirmed by an examination of the RMS and C-index values in addition to the higher accuracy percentages produced by the neural network model. As shown in Table 1, the RMS values for the neural network were almost uniformly lower than for the logistic regression model (except for the All category). But for the All category in Table 1, the overall classification accuracy of the neural network is better than that of the logistic regression model by about 2.5%. This means that, in many cases, the neural network makes predictions which are very close to the actual value (that is, 0 or 1), while the logistic regression model makes predictions which are further away from the actual value. To verify this, an additional test was done with the data from category 2. Recall that there were fifty observations in the testing set of product category 2 that were used to test and validate respective logistic regression and neural network models built from the remaining 101 observations. Among these fifty observations, there were thirtyseven rejects and thirteen accepts. The neural network predicted that twenty-three out of the thirty-seven rejects would lie in the range 0 to 0.01, while the logistic model predicted none of these would lie in this range. In the case of accept predictions, the logistic regression classified ordy three of the thirteen cases correctly, while the neural network predicted eleven cases correctly, and three cases with values in the range 0.99 to 1.0. This illustrates that the neural network produces values closer to 1 or 0, leading to better overall performance. A fmal point about neural networks, also observed elsewhere (Freeman and Skapura, 1991), is that they are better at interpolation than at extrapolation. This is in part because no assumption is made about the nature of the relationship between variables. Therefore, the nature of the training data is very important. We found that in some cases we were able to train the network with small amount of data to perform better than in other cases where even larger amounts of data were available. Hence, it is especially important that the training data cover extreme examples. Regression based approaches can generally perform better at extrapolation, perhaps because they try to discover an underlying mathematical relationship.

6. Condusions This paper has compared neural networks with logistic regression. These results should be useful in developing a repertoire of studies to enable conducting a meta-analysis of the neural network analyses. Our results are generally in agreement with the five cases surnmarized by Sharda (1994). We have also attempted to address some of the concerns raised by Chatfield (1993), who calls for more systematic evidence of a fair comparison between statisücal techniques and neural networks. The pros and cons of the two approaches are summarized in Table 5, where each technique is rated on several dimensions. The table shows that neither method dominates on all the attributes. The neural network approach is parsimonious, produces better classificaüon, handles complex underlying relationships bettet, and is stronger at interpolation. On

262

AKHIL KUMAR, VITHALA R. RAO, AND HARSH SONI

Table 5. Neural network and logistic regression approaches compared by attribute.

Attribute

Neural Network

Logistic Regression

Parsimony Classification accuracy Soluüon methodology Interpretability Intuitive appeal Complex interactions Statistical testing Interpolating Extrapolating Interpretaüon of importance weights

Good Good Fair Poor Poor Good Fair Good Fair Fair

Fair Fair Good Good Good Poor Good Fair Good Good

the other hand, the logistic regression technique has a superior solution methodology (closedform versus heuristic) and better interpretability. Further, the estimated weights from the logistic regression can be easily interpreted, while additional calculations are needed for this purpose in the neural network; even then, the weights are hard to interpret for the network model. One may argue that logistic regression has better extrapolation capabilities than neural networks because of its ability to fit a statisücal relationship rather than finding patterns in existing data accomplished by a neural network. Subgroup differences can be staüstically tested using the logistic regression, and this possibility is limited with the neural networks. In some sense, therefore, it has a greater intuitive appeal. Nevertheless, our experience indicates that neural networks add a useful approach to the repertoire of tools available to marketing researchers, and one does not have to be an expert in artificial intelligence to use this technique. Logistic regression is only one of the statistical methods for predicting categorical variables. Other techniques include CHAID (Kass, 1980). CART, and discriminant analysis (Breiman et al., 1984). It will be worth evaluating the predictive accuracy of these techniques relative m that of the neural network model. Finally, in this paper, we assumed that the costs of making a mistake with an accept or reject decision were equal. In the absence of accurate estimate of these costs, it was decided that they would be treated as equal. If the actual costs are different, this would affect both the neural network and regression models. Future research is necessary to study this issue and determine whether differential costs will have an impact on relative performance of the two approaches. The effect of unequal prior probabilities of acceptance or rejection on the classification accuracy also needs further examination.

Acknowledgments We thank the anonymous reviewers and Donald R. Lehmann, co-editor of Marketing Leiters, for their comments and Wim Vandenhoeck and James Banks for their research assistance.

NEURAL NETWORK AND LOGISTIC REGRESSION MODELS

263

No~s 1. In addition, each data record also contains dummy (0-1) variables corresponding to four product categories to which the new produet belongs: (1) frozen foods (225 eases), (2) canned foods (151 cases), (3) household supplies (69 eases), and (4) candy and gum (88 cases). If a product does not belong to any of these groups» then it is classified in the "others" category (485 eases). 2. We believe that these five groups of variables are logical from a substantive viewpoint. A factor analysis of the fifteen variables that retained five factors showed a substantial overlap between these groups and the variables that loaded high on the factors. The alpha eoefficients for two of our groups were 0.83 and 0.78, while they were very small for other groups. 3. One may eonsider using factor scores and interactions among them to approximate a neuml net. Factor analysis of the whole data set yielded seven faetors with eigenvalue greater than unity. The LR model with these seven factor scores accurately predicted 77.4 percent of the cases with a C-index of 0.79. The LR model with linear and first-order intemction terms of the factor scores accurately predicted 79.8 pereent of the cases with a Cindex of 0.83. This approach eomes quite close to the NN model for out data. 4. We also estimated a fully eonnected neural network model with as many hidden nodes as the input variables. This network captures several more interactions among the input variables. But the predictive validity of this network was only marginally higher than the hierarchical network reported earlier.

References Bessert, Jim. (1993). "Riding the Market Information Wave" Harvard Business Review (Sept.-Oct.), 150-161. Breiman, L., L Friedman, R. Olschen, and C. Stone. (1984). Classißcation and Regression Trees. Belmont, CA: Wadsworth. Briesch, Richard, and Dawn Iacobucci. (1993). "Using Neural Networks to Compare Theorefical Models: An Applicaüon to Persuasive Communications" Working paper, Northwestem University, Department of Marketing. Chatfield, Chris. (1993). "Neural Networks: Forecasting Breakthrough or Passing Fad." International Journal of Forecasting 9, 1-3. Freeman, J., and D. Skapum. (1991). Neural Networks : Algorithms, Applications and Programming Techniques. Reading, MA: Addison Wesley. Kass, G.V. (1980). "An Exploratory Technique for Investigaüng Large Quantities of Categorical Data." Applied Statistics 29(2), 119-127. Mazanec, Josef. (1992). "Classifying Tourists into Market Segments: A Neural Network Approach." Joumal of Travel and Tourism 1(1), 39-59. Rao, Vithala R., and Edward McLaughlin. (1989). "Modeling the Decision to Add New Products by Charmel Intermediaries?' Journal of Marketing 59, 80-88. Robins, Gary. (1993). "Neural Networks?' Stores (January), 39-42. Rumelhart, David E., and James E. McClelland. (1986). Parallel Distributed Processing (vol. 1). Carnbridge, MA: MIT Press. Schwartz, E. (1992). "Where Neural Networks Are Already at Work (Special Report)." Business Week, November 2, 136-137. Sharda, R. (1994). "Neural Networks for the MS/OR Analyst: An Applicafion Bibliography." Interfaces 24 (MarchApril), 1t6-122. Tam, Kar Yan, and Melody Y. Kiang. (1992). "Managerial Applications of Neural Networks: The Case of Bank Failure Predictions." Management Science 38(7), (July), 926-947. Wasserman, Philip D. (1989). Neural Computing: Theory and Practice. New York: Van Nostrand Reinhold.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.