A General Consumer Preference Model for Experience Products: Application to Internet Recommendation Services

July 3, 2017 | Autor: Vithala R. Rao | Categoría: Marketing, Bayesian estimation
Share Embed


Descripción

A General Consumer Preference Model for Experience Products: Application to Internet Recommendation Services

Jaihak Chung* and Vithala R. Rao** Revised, September 2, 2010 For Presentation at the 2010 NEMC

*Jaihak Chung ([email protected]) is Associate Professor of Marketing at the School of Business Administration, Sogang University, Seoul, Korea, and Vithala R. Rao ([email protected]) is Deane W. Malott Professor of Management and Marketing and Quantitative Methods at the S.C. Johnson Graduate School of Management, Cornell University, Ithaca, NY 14853-6201. We Srinagesh Gavirneni for his suggestions for improving this paper.

A General Consumer Preference Model for Experience Products: Application to Internet Recommendation Services

Abstract We present a general consumer preference model for experience products that overcomes the limitations of consumer choice models, especially when significant information on nonquantifiable attributes is missing. For this purpose, we decompose the deterministic component of a product’s utility into two parts: observed component, accounted for by observed attributes and unobserved component, due to non-observed attributes (or residual). We estimate the unobserved component by relating it to the corresponding residuals of virtual experts (who represent groups of homogeneous persons) who had experienced the product earlier and evaluated it. Our methodology involves identifying such virtual experts and determining the relative importance to be given to them in the estimation of the target person’s residual. Using Bayesian estimation methods and MCMC simulation inference, we applied our approach to two types of consumer preference data: (1) online consumer ratings (stated preferences) data for Internet recommendation services and (2) offline consumer viewership (revealed preferences) data for movies. We empirically show that our new approach outperforms several alternative models of collaborative filtering and attribute-based preference models both in-sample and out of sample fits. Our model is applicable for both Internet recommendation services and consumer choice studies and enables firms to take full advantage of all different types of information in retailers’ consumer databases. Keywords: Consumer Model for Stated/Revealed Preference Data, Recommendation Systems, Virtual Experts, Experience Products, Bayesian estimation, MCMC methods

From the perspective of a quantitative modeler studying consumer choice behavior, attributes of any product can be broadly classified into two categories: those that are quantifiable, and those that are not quantifiable. The first set of attributes can be used as predictors in typical choice models. However, existing choice models do not take into account how the second category of attributes affects product preferences. Consequently, these models are limited in predicting consumers’ preferences, especially for experience products like entertainment services, online contents, and games. The larger the contribution of the second category of attributes to product utilities is in a choice model, the lower will be the model’s predictive power. Interestingly, although the provision of recommendation services is important in marketing, most prediction models for recommendation services have been developed by researchers in engineering, information systems, and information science (Koren 2009, Bell, Koren, and Volinsky 2008, Ariely, Lynch, and Aparcio 2003, Breese, Heckerman, and Kadie 1998) and typically use collaborative filtering methods. We can partially attribute this situation to the prevailing practice in marketing research of using an attribute-based random utility choice model for predicting consumer preferences. Collaborative filtering methods developed in computer science and intelligent agent studies do not use attribute information on products and are not limited by the presence of non-quantifiable attributes because they rely on other users’ revealed preference data (ratings) for prediction and not product attributes. We use the spirit of collaborative filtering and suggest a general method for consumer preference models to overcome the limitation of incorporating the effect of non-quantifiable attributes. For this purpose, we decompose a product’s utility in a standard preference model into two parts: one part accounted for by a set of product attributes available for choice models (usually because they are quantifiable); and another not accounted for due to the non-

2

quantifiable attributes. We develop a general preference model to capture the latent residuals of a standard consumer preference model for a target consumer by relating it to the corresponding residuals of a number of virtual experts who are representative of those who have experienced the product earlier and evaluated it. We develop a Bayesian method to estimate the latent residuals of virtual experts and to identify such virtual experts by clustering. Our model allocates the relative importance of virtual experts’ residuals according to two pieces of information: (1) experts’ preference similarities (how similar a target person is to the virtual expert); and (2) experts’ precision levels (the inverse of the variance of residuals within a cluster). We describe our model with reference to recommendation services, which provide a better environment to highlight our modeling approach. We suggest this modeling approach as a general method to improve any preference model that can be applied to consumer preference behaviors (both stated and revealed), particularly when it is not easy to quantify some product attributes (as for entertainment services and online contents) or when there are too many product attributes (as for automobiles). To test the predictive accuracy of our approach, we applied our model to two types of consumer preference data: (1) online consumer ratings (stated preferences) for Internet recommendation services; and (2) offline consumer viewership (revealed preferences) for movies. We undertake an extensive comparison of our model’s predictive power to that of several major collaborative filtering and attribute-based preference models. Furthermore, our graphical analysis of latent residuals provides model diagnostics. Our empirical analysis showed that the predictions from our approach are superior to those of the previous efforts. The hit rates of our model for prediction are 48% compared to 42% for the best collaborative algorithm and

3

40% for the best attribute-based preference model for stated preference data. The improvement is larger for the revealed preference data (86% versus 72%). The rest of this study is organized in four more sections. In the second section, we provide a brief review of relevant literature on recommendation systems and modeling efforts with data on recommendations. In the third section, we develop our virtual-expert preference model and describe how to estimate it with choice or preference data using Bayesian estimation methods. In the fourth section, we describe the results from two applications of our model, one to recommendation systems data, and the other to actual choice data. We conclude with a section discussing the advantages and limitations of our model and identifying some directions for future research. LITERATURE REVIEW The extant methodological approaches to the prediction of consumer preferences for experience products in recommendation services and standard choice studies can be classified into two categories according to types of information used for preference prediction: (1) collaborative filtering (CF) models based on consumer preference similarity, developed mainly in computer science; and (2) attribute-based preference models or choice models based on product attribute similarity, applied almost routinely in marketing research. We review the major methods which use common data available for recommendation and consumer choice studies as shown in Table 1. Insert Table 1 here. The attribute-based preference models describe the utility of a product as the sum of the effects of product attributes and estimate them by using consumers’ historical preference data. The statistical models are then used to estimate the total values of products that the consumer has not used or experienced and to make recommendations to the consumer. Urban, Sultan, and 4

Qualls (2000) applied a standard logit form of consumer utility using only information on product attributes to their recommendation system. Ansari, Essegaier, and Kohli (AEK) (2000) developed an attribute-based preference model by accounting for product heterogeneity with the interaction effect of observed consumer characteristics (age and gender) and product-specific parameters and estimated it with Bayesian linear regression. More recently, Ying, Feinberg, and Wedel (2006) extended the AEK model by considering the effect of missing responses via a twostage ordinal probit model. In this model, the first stage describes the response choice of the movies for rating, and the second stage describes the ratings themselves in terms of movie characteristics and individuals’ characteristics to account for heterogeneity1. The Netflix Prize competition of the 2000s (see www.netflixprize.com) accelerated the burgeoning variety of collaborative filtering (CF) modeling approaches to recommendation services (Koren 2009) initiated by the Tapestry system (Goldberg, Nichols, Oki, and Terry 1992) in the early 1990s. CF models can be divided into two subcategories according to how they incorporate consumer preference similarity (Sarwar et al. 2001): memory-based and model-based. Sometimes called original approach, memory-based CF models utilize stated preference data (ratings) of other consumers (reference consumers) as the predictors for target consumers’ preferences. The most popular one is the Neighborhood model (Koren 2009, Bell and Koren 2007, Breese, Heckerman, and Kadie 1998, Shardanand and Maes 1995). The model-based CF approach employs a variety of general models, such as matrix factorization (Bell, Koren, and Slovensky 2008, Koren 2009), and is applied to recommendation data. These models differ from memory-based CF models in that they use model parameters to capture preference similarities 1

Whereas these two studies dealt with stated preference data, Bodapati (2008) proposed a model for purchase choice data (revealed preference data) consisting of two stages of the purchase process (awareness and satisfaction) to account for missing responses that are called unary data. He used additional information of the firm-initiated data of responses to recommendations.

5

via a learning model instead of by using reference consumers’ data directly (Koren 2009, Takács, Pilászy, and Németh 2008). Both the CF and attribute-based approaches to the prediction of consumer preferences have advantages and limitations. CF models use the holistic ratings of others and not product attributes as input for prediction. Therefore, they can be used for the recommendation of any product including experience products, but they do not provide any theoretical insights on consumer choice. Furthermore, the standard attribute-based preference models that use only observed product attributes and consumer characteristics are limited in recommending experience products. As mentioned earlier, we develop a general consumer preference model which can be used to improve the predictive power of any consumer preference model by incorporating consumer preference similarity, which is the main source of information for recommendations in collaborative filtering methods.

DEVELOPMENT OF THE VIRTUAL EXPERT MODEL Our general preference model uses the commonly available types of data in recommendation and choice studies2: (1) consumers’ stated or revealed preference data, (2) product attributes, and (3) consumer characteristics. For the purposes of clarity, we refer to those individuals whose preferences are to be predicted as the target group. We assume that there exist multiple virtual experts whose preferences will be employed in predicting the preferences of the target group; we call these virtual experts the reference group. Virtual experts represent

2 We may note that some previous studies in the recommendation literature used additional data such as the date of rating (Koren 2009), firm-initiated response data (Bodapati 2008), or movie magazines’ ratings (Ansari, Essegaier, and Kohli 2000).

6

consumer groups, each group consisting of people with homogeneous preferences. We defer the discussion of how these experts are identified to a later section of the study. We now describe our model to predict target consumers’ preferences using the preferences of virtual experts. We let Yij represent the stated preference data such as rating or the revealed preference data of a target consumer (i) for a product (j) as collected in most recommendation systems or in most consumer choice studies (surveys, retailers’ scanner systems). This measure is binary in the case of revealed preference data (e.g. buy or do not buy) or a scale with multiple points (usually ordinal) in the case of stated preference data. In general, we let R denote the number of points on this scale and r represents a specific response. We model these data by postulating the existence of (R-1) threshold values for the latent random utilities (Uij) as below: Yij = r

if Ci , r −1 < U ij ≤ Cir

r = 1, 2,..., R .

We decompose the latent utility of product j for consumer i, Uij, into two parts for the effects of observable and unobservable product attributes, Xj and X*j, respectively as: U ij = W ij + η ij

(1)

Wij is the observed component that can be captured by using observable product attributes, Xj, and ηij is the unobserved component that cannot be captured by a standard attribute-based utility model. We also refer to the unobserved component as the latent residual of standard attributebased model. Figure 1 gives a graphical description of this model. Insert Figure 1 here. Modeling the observed component ( Wij ): We employ the standard preference model based on product attributes to model the observed component of the utility, Wij, by using observed product attributes X j , as

Wij = βi 0 + β0 j + X 'j βi (2)

7

where X j is a vector of observed product attributes and βi is the corresponding vector of individual preference parameters. The intercepts β i 0 and β0 j capture the main effects of consumer i and product j respectively, as suggested by Ansari, Essegaier, and Kohli (AEK) (2000). The last term captures the interaction effect of consumer preference and the observed characteristics of product j. Furthermore, we model the individual-specific vector of coefficients,

βi as a linear function of individual characteristics, Zi, in a hierarchical structure: (3) βi = Ψ ' Zi + ξ β i ,

ξ β i ~N (0,Σ β i )

∀ i and j ,

where Z i is a vector of individual characteristics including an intercept and ξ β i is the error term that accounts for unobserved heterogeneity across people. Ψ is a matrix of the corresponding parameters and represents the effects of observed individual characteristics on individual preferences. Because the model can have only one estimated intercept mean, the mean of β0 j is set to zero.

Modeling the unobserved component ( η ij ): Assuming that there exists a similar virtual expert g for a consumer i, we can conceptually show how the unobserved component (or residual) of a product utility for consumer i is related to that for the virtual expert g. We employ the econometricians’ viewpoint by assuming that the unobserved component, η ij is due to the lack of information on some product attributes that are unobserved, denoted by X*j , and preference parameters, β i* for unobserved attributes.

ηij = U ij − X 'j βi = X*'j βi* (4). We can relate β i* , to the unobserved preference parameter, β g* , for the g-th virtual expert as: 8

βi* = αig β g* + ε i*. where αig indicates the extent of preference similarity between expert g and customer i, and the error term ε i* can be interpreted as the degree of preference heterogeneity between customer i and expert g. Then, the relationship between η ij and η gj is:

ηij = X*'j βi* = X*'j β g*α ig + X*'j ε i* = η gjα ig + ε ij . where η gj = X*'j β g* and ε ij = X*'j ε i* . This equation shows clearly why we can utilize the residuals of other consumers for capturing the residual for consumer i. We can generalize this to accommodate a set of multiple virtual experts who vary in terms of similarity with the consumer i and estimate the unobserved component (latent residual) of a product j for target consumer i as shown below.

ηij = ∑η gjα ig + ε ij (5) , g

∑α

ig

= 1 and 1 ≥ αig ≥ 0 for g = 1,…, G;

g

where εij is an error term which is normally distributed with zero mean and variance of 1. The weights αig are mixing coefficients for each of the virtual experts and we describe how we determine them below. Note that it is also possible to utilize only a few of the most similar experts for a parsimonious model. This modeling structure is somewhat similar to that of a single layer mixture-of- experts model with multiple expert networks (Jordan and Jacobs 1994.)3

3 The Mixtures-of-Experts model is a hierarchical mixture model with a tree-structured architecture that makes predictions by combining multiple predicted values of relatively simple models, called ‘experts’, by a set of local mixing weights called the `gating functions' which can depend on the input (Jordan and Jacobs 1994, Peng, Jacobs, and Tanner 1996). This model has been applied to a variety of prediction problems in the artificial neural net studies (Jiang and Tanner 1996). The experts in their models are independent sub-models whereas our experts are the estimates of latent residuals of standard preference models.

9

Modeling Mixing Coefficients for Multiple Experts: Given that virtual experts differ in their usefulness in predicting a target consumer’s preferences, we allocate weights, called the mixing coefficients, α i' = (α i1 ,α i 2 ,...,α iG ) to multiple experts as a logistic function of two descriptors: (1) the preference similarity (membership probability) between a target consumer i and an expert g,

ϖ 1gi ; and (2) expert’s precision (standardized inverse variance of consumers’ estimated latent residuals in the corresponding clusters, ϖ 2 gj , as given below:

(6) α ig =

exp(γ 'ϖ gi )

∑ exp(γ 'ϖ

g 'i

for all i and g. )

g'

where γ ' = (γ 0 , γ 1 , γ 2 ) is the parameter vector for the importance of expert g’s similarity and precision level and an intercept, ϖ gi =(1, ϖ 1gi ,ϖ 2 gj ). This approach is similar to Mixtures-ofExperts models (see Jordan and Jacobs 1994). Missing Responses: Most recommendation systems use databases consisting of online users’ ratings for a small number of products, which are sometimes chosen by users on the basis of their consumption experience or by recommendation systems based on their non-random selection process (Ying, Feinberg, and Wedel 2006, Bodapati 2008). We accommodate this problem of missing not completely random responses (Little and Rubin 1987) in our model by inserting a stage, called response choice stage, as suggested Ying, Feinberg, and Wedel (2006). Consumers’ response behaviors can be regarded as another type of choice behavior (i.e. an individual choosing to respond to a preference question) and can be denoted by DSij with the corresponding utility U ijs as below:

DSij = 1 if 0 ≤ U ijs or 0 otherwise.

10

Let PSij denote the probability of individual i’s response choice of product j. With Xj denoting a set of an intercept and covariates consisting of product attributes, and the disturbance εS term ij following a normal distribution with zero variance, the corresponding latent utility of response choice, U ijs can be modeled as below. U ijs = X 'j β is + ε ijs

If consumer i’s response choices are related to her preference behaviors, her preference behavior probability, PRijr depends on the fact that she rated product j as below (Ying, Feinberg, and Wedel 2006).

E(U ij DSij = 1) = E(Wij + Vij + ε ij β is X ij + ε ijs > 0)

ϕ ( β is X j ) = Wij + Vij + ρi Φ ( β is X j )

(7)

where Φ (.) and ϕ (.) are the corresponding cdf and pdf, respectively.

Its inclusion is optional and depends on the manner in which data are collected. We use the general form in our first empirical application (online recommendation data) by including the effects of such nonignorable missing responses (Little and Rubin 1987), but not for the second empirical study (survey data), which contains no missing responses. Finally, Appendix A summarizes the virtual expert model along with diffuse priors and Appendix B describes Gibbs sampler for its estimation.

11

Identifying Virtual Experts and Estimating Residuals: In order to identify multiple virtual experts, we apply the standard Bayesian clustering method to the preference data of a group of consumers called the reference group. We expect the virtual experts to show more reliable preference patterns than individual consumers in the corresponding groups. We define a number of dummy variables to represent the Yhj data as reference consumer h’s revealed or stated preference behavior (buying or rating) for product j; h = 1, 2, …, H and j = 1, 2, …, J. The entry for Yhj will range from 1 to R. If the available data is on purchase behavior (buy or not buy), then R = 2. Furthermore, any type of preference behavior can be converted to a vector DYhj of dummy variables: DYhj = (DYhj1, DYhj2,…, DYhjR), where DYhjr = 1 if Yhj = r or DYhjr = 0 otherwise. The probability of individual h’s preference behavior, DYhj, for product j being r, PRhjr, can be modeled as a mixture distribution that is a product of the probability of h’s belonging to a certain class g, PGhg, and the probability of class g’s preference behavior denoted by r for product j (conditional preference probability), PRgjr, given by PRhjr = P (Ch , r −1 < U hj ≤ Chr ) = ∑ P ( h ∈ cluster g )P (Ch , r −1 − μ gj < ε gj ≤ Chr − μ gj h ∈ cluster g ) g

G

R

g

r '=1

G

= ∑ PGhg ∏ PRgjr 'hjr ' = ∑ PGhg PRgjr DY

(8)

g =1

where Ch0 = -∞, Ch1 = 0, and ChR = ∞. The individual-specific cutoff points Chr that account for scale-usage heterogeneity enable us to cluster people based on preference similarity. In Appendix C we describe the specification of prior distributions for model parameters as diffuse conjugate families in a Bayesian framework and details of the Gibbs Sampler used for the identification of Virtual Experts. We estimate the posterior distributions of model parameters using the reference consumers’ rating data by using 12

simulation-based inference through the Gibbs sampler (Gelfand and Smith 1990) for posterior density estimation with a fixed number of classes4. The number of classes is determined by choosing the model with the largest marginal likelihood. The marginal likelihood, m( y | M G ) , for a mixture model M G with G clusters is calculated with Bridge sampling estimator (Meng and Wong 1996, Frühwirth-Schnatter 2004), which is robust for the estimation of marginal likelihoods for mixture models (details are described in Appendix D). The membership of each individual in the reference group is determined by choosing the cluster with the largest posterior probability P(h ∈ g {y hj } Jj =1 ) : P(h ∈ g {y } ) = J hj j =1

P(h ∈ g)P({yhj}Jj=1 h ∈ g) G

∑ ⎡⎣P(h ∈ g )P({y '

g'

}

J hj j =1

h ∈ g )⎤⎦

(9)

'

We estimate the latent residuals of a standard attribute-based preference model for products in the validation sample with reference consumers’ preference data in each cluster by using MCMC draws from the posterior distribution of latent residuals as shown in Appendix E. We create a virtual expert to represent each cluster and assign group-specific statistics such as the mean and variance of residuals for each product in the validation set as the opinion and precision level of the corresponding virtual experts for the same product. To enable the analyses, our data structure is as follows. We classify consumers into two groups: target consumers whose preferences we need to predict and reference consumers whose preference data we use for prediction of target consumers’s preferences. Movies are also

4 Another optional method to identify clusters among the reference group is to use their latent residuals instead of their whole rating data. This method assumes that consumers’ preference parameters for unobserved product attributes are homogeneous but those for observed attributes are not. We tested this optional approach by using a standard Gaussian mixture distribution. Its predictive performances was slightly worse in application 1 but almost similar in application 2. Therefore, we believe that this optional method can be used depending on the nature of data. The empirical results of this optional method are available from authors.

13

classified into two groups: estimation set and validation set. Then, the preference data consist of the following four sub-datasets as shown in Figure 2. Insert Figure 2 here. To recapitulate, our model estimation procedure consists of the following 3 steps. The first step is clustering consumers indexed by h in the reference group with the estimation dataset (Datasets I & II) for the identification of virtual experts indexed by g described in Appendix C. The second step is estimating each expert’s opinions (latent residuals) for products in the validation set (Dataset II), as described in Appendix D. The third step is estimating the virtual expert model with virtual experts’ latent residuals as predictors additional to product attributes with the estimation dataset (Dataset III), as described in Appendices A and B. The last step is predicting target consumers’ preference behaviors in the validation data (Dataset IV) for validation test.

EMPIRICAL APPLICATIONS We now report on two comprehensive applications of our approach and compare the results with other appropriate benchmark models. In the first application, we apply our model to recommendation data (stated preferences) obtained from EachMovie, which has become the best-known database for recommendation studies and compare its performance to that of three collaborative filtering models and three attribute-based preference models. The second application deals with viewing choices (revealed preferences) of movies. In both applications, the virtual expert model shows considerable improvement for both in- and out-of-sample predictions.

14

Application 1: Online Preference Data for Movie Recommendations To test the predictive performance of our model for recommendation services, we apply the model to EachMovie database, the well-known benchmark dataset for recommendation studies. The Compaq Systems Research Center provided this database by offering free Webbased recommendation services to people for 18 months up to September 1997. The database consists of 2,811,983 numeric ratings for 1,628 different movies (films and videos) entered by 72,916 users. This database contains: (1) consumers’ stated preference data (movie ratings with 6-point ordered rating scales), (2) movie attributes (genres: action, animation, art and foreign, classic, comedy, drama, family, horror, romance, thriller), and (3) individual characteristics of consumers (age and gender). We randomly formed two data sets of movies: 200 movies for model estimation and 100 movies for model prediction. We further sampled 2,335 people who rated more than five movies for model estimation and at least one movie for model validation, for the development of virtual experts. We selected 989 people who rated at least two movies for model estimation and at least one movie for model validation. As described in Figure 2, we use the data of reference consumers and 200 movies (Dataset I) in identifying virtual experts and the data of reference consumers and 100 movies (Dataset II) in estimating virtual experts’ opinions (residuals). Then, we use the data of target consumers and 200 movies (Dataset III) for assigning virtual experts to each individual. The last dataset of 989 people and 100 movies (Dataset IV) is used for validation. We now describe the various aspects of these analyses. All our analyses utilize MCMC methods as described in Appendixes A-F. Table 2 shows some basic statistics for our sample on various descriptors in the data. Insert Table 2 here.

15

The identification of virtual experts. We applied the finite mixture model-based clustering5 to Dataset I consisting of the ratings of the reference group of 2,335 people on 200 movies by varying the number of clusters up to 16. More details on how to implement MCMC simulation for this model are provided in Appendix C. The log marginal likelihood values for models with different number of clusters varying from 2 to 16 varied from -44,313 (for the 2-cluster-based model) to -28,756 (for the 16-cluster-based model). The likelihood value monotonically increases up to -22,554 for 13 clusters and then decreases. Accordingly, we chose thirteen clusters6 to identify virtual experts. Figure 3 gives the distribution of individual memberships ranging from 80 to 340 for the 13-clusters. Insert Figure 3 here. Figure 4 shows the mean preference ratings by genre for the virtual experts. It reveals that experts have very divergent preferences in movies across genres. Insert Figure 4 here. The estimation of latent residuals of virtual experts. We estimated the latent residuals7 of reference consumers in each of the 13 clusters as described in Appendix C. We discuss more details on the characteristics of the latent residuals in the model comparison section.

Estimating the virtual expert model for prediction. We estimated a preference model for the 989 target consumers’ preferences with ratings on 200 movies by target consumers (Dataset III) and information on product attributes (genre) and individual characteristics (gender and age). We 5 Two Markov chains were generated, in which 15,000 draws were implemented for the burn-in period and 6,000 draws (3,000 draws from each of the two chains) were additionally generated for the estimation of posterior distributions. (See Appendix A for details.) 6 In this choice of 13clusters, we made sure that that there was adequate coverage of numbers of products rated by people in the clusters. 7 Two Markov chains were generated for latent residuals, in which 3,000 draws were implemented for the burn-in period with convergence tests and 3,000 draws from each of the two chains were additionally generated for the estimation of posterior distributions. 3000 draws were enough for MCMC convergence. The chains converged fast since the MCMC draws are from a univariate posterior distribution (truncated normal distribution) as shown in Appendix F.

16

incorporated the effects of nonignorable missing observations. We used MCMC simulation8 for the estimation of joint posterior distributions of model parameters [U, C, β, Ψ, Σβ, α, γ, Σα, ρ]. Refer to Appendix B for details on implementing MCMC simulation for this model. We report only the grand means of the estimated parameters averaged over individuals and iterations. We calculated standard deviations of the parameters reported in the following tables by taking the square roots of the average of the corresponding diagonal elements of covariance matrix draws, obtained from the MCMC simulation. We computed standard deviations of posterior means for individual preference parameters as measures of preference heterogeneity among individuals. We show in Tables 3-1, 3-2, and 4 the means, standard deviations, and heterogeneity estimates of model parameters for response choice behaviors and for preference behaviors of the target group. Insert Table 3-1, 3-2, and 4 here. In brief, Table 3-1 shows that the users in the reference group are more likely to rate action, animation, and romance movies than thriller, family, and drama movies. The first row of Table 4 clearly shows that expert opinion makes a significant contribution in explaining the missing parts of the utility of movies for target consumers, supporting the role of the virtual experts’ opinions in our model. With regard to the grand mean values for the model parameters in the first-level equation, the coefficients of all genres except for drama, family, and thriller are significant at the 0.01 level of Bayesian P-values. Heterogeneities for the model parameters in Table 4 are much larger than the corresponding standard deviations in Table 3-2 for the grand mean parameters. Note that target consumers’ preferences are more heterogeneous for classic and art/foreign movies. Table 5 shows the relationship between the attribute (genre) coefficients and two demographic variables (age and gender) in equation (3). The older the consumers, the more they 8

Two Markov chains were generated, in which 20,000 draws of each chain were implemented for the burn-in period with convergence tests and 6,000 draws (3,000 draws from each of the two chains) were used for inference.

17

prefer classic, thriller, and drama movies. As one might conjecture, the results indicate that female consumers seem to prefer romance, art/foreign, and animation movies, while male consumers seem to prefer action and horror. Insert Table 5 here. Table 6 shows the relationship between the mixing coefficients (or weights) for each expert and the two descriptor variables (preference similarity and expert precision), in equation (6). As could be expected, the coefficients for the expert precision variable are positive in most cases indicating experts who are more precise in their opinions are more informative. Insert Table 6 here. The cutoff points shown in Table 7 reveal considerable variability in the thresholds for the higher preference ratings. The estimated correlation between response choices and preference behaviors is nonzero. This indicates that it is important to consider the effect of missing observations in the model. Insert Table 7 here.

Predictive validity test and model comparison. We test the predictive power of our model against that of three major CF models and three attribute-based preference models using the data for the 989 people and 100 movies (Dataset IV in Figure 2). First, we discuss the reasons for choosing the specific models for comparison and present a summary table showing the aspects they consider and do not consider. In addition to details of the predictive power of each model, we provide a graphical diagnostic analysis to show visually how much our model captured the unobserved components of product utilities compared to the other preference models.

18

We include the baseline predictor model that considers the independent effects of consumers and products only (Model 1) (Takács et. al. 2008, Bell, Koren, and Volinsky 2008, Paterek, 2007). The Neighborhood model and the MF (Matrix Factorization) model are the two most popular approaches among Memory-based Neighborhood CF models and among Memorybased CF models, respectively (Koren 2009, Gjoka and Soldo 2008, Bell, Koren, and Volinsky 2008). We select two major CF models that are among the best candidates of CF models used by the Netflix competition winning team, Bellkor (Bell, Koren, and Volinsky 2008, Koren 2009)9: the Neighborhood model (Model 2) and the SVD++ (Model 3). The SVD++ is a variant of SVD (Singular Value Decomposition) models, which outperformed other variants of MF models (Bell, Koren, and Volinsky 2008, Koren 2009). MF models including the SVD++ decompose the rating matrix as a product of two lower dimensional matrixes P (product-specific) and Q (consumerspecific) via a singular value decomposition to obtain a rank-k approximation of the original matrix. From our modeling perspective, we can interpret this approach as estimating a lowdimensional characteristics vector for each user and a low-dimensional product attribute vector for each movie. We consider three attribute-based preference models (Models 4, 5, and 6). Model 4 is a standard consumer choice model in an ordinal probit framework without incorporating consumer heterogeneity. Model 5 is the same consumer choice model but with consumer heterogeneity. Model 6 is the extension of Model 5 that includes the interaction between observed consumer characteristics and genre (Ansari, Essegaier, and Kohli 2000) and the effects of response choice

9 No study has compared the Netflix competition winning team’s full model because the final solutions of the winning model are an ensemble of prediction values from hundreds of independent models (Koren 2009). Information on how to estimate each model was not fully provided for others to replicate the final model. However, most of their models are variants of underlying major methods and the winning team acknowledged that most performance of their model is achievable by Neighborhood and Matrix Factorization (MF) methods (Gjoka and Soldo 2008, Takács, Pilászy, and Németh 2008, Bell, Koren, and Volinsky 2008).

19

behaviors (Ying, Feinberg, and Wedel 2006). Furthermore, we test two versions of our virtual expert model: a model with only the most similar expert (Model 7) and a model with all 13 experts (Model 8). We summarize in Table 8 the mathematical descriptions of product utilities for these models to show what information is used and which effects are considered in what way by each model in a comparable and consistent manner. In particular, we show how the unobserved component is handled either implicitly or explicitly in these models as compared to our model. Insert Table 8 here. ^

The estimated product utilities for Models 1, 2, and 3, E ( yij Model k ) ,are different as summarized in Table 8. Please refer to Appendix F for more details on the specifications of Models 1, 2, and 3 and estimation methods. Table 9 shows the predictive performances of models using the criteria of hit rates and RMSE values. Insert Table 9 here. Model 1 (the baseline predictor model) is the worst performer among CF models. Its performance (without using additional information on the date of rating) is quite similar to that of Model 4 (the standard preference model) in terms of both in- and out-of-sample fits. Model 3 (SVD++) shows the best performances among CF models and attribute-based models. Model 2 (Neighborhood model) is less accurate than Model 3; this is consistent with the previous studies (Bell, Koren, and Slovensky 2008). Given that available information on product attributes is limited only to movie genre, it is not surprising that the collaborative filtering models that do not rely on the attribute information showed better prediction compared to attribute-based models. Given that Model 3 and Model 6 both consider the effects of response choice behavior, the better performance of Model 3 (without using additional information on the date of rating) against

20

Model 6 implies that the MF approach was better than Model 6 in capturing unobserved components. The virtual expert model with a single expert (Model 7) significantly outperforms all other models except SVD++ (Model 3) in both in- and out-of-sample fits. Model 3 (with additional information on date of rating) performs slightly better than Model 7. Model 3 (without additional information) performs slightly worse than Model 7 in terms of RMSE for the estimation sample but Model 7 showed a little better performance in the holdout sample. However, our final model with multiple experts (Model 8) outperforms all the other models including SVD++. The virtual expert model outperforms the collaborative filtering models (Models 1 to 3) mainly because our model can utilize not just other consumers’ preference data but product attributes and consumer characteristics as well with a more appropriate modeling framework (Please see Appendix F for more details). The virtual expert model outperforms choice models (Models 4, 5, and 6), which implies that it improves these attribute-based models by logically capturing the unobserved component of a product utility as a function of other consumers’ latent residuals. Finally, the results show that virtual expert model (Model 7 or Model 8) has some advantages compared to Model 6. In our view, Model 6 can only partially capture the unobserved component because the contribution of the interaction effect in Model 6 depends on how much of consumer preference can be explained by age and gender10. In addition, it does not utilize consumer characteristics to capture the individual-specific preference parameters for observed product attributes. 10 This

can be seen by the following. Let’s assume that the individual preference parameter vector, β = ψ + ψ Z where consumer characteristics vector Zi is related to her preference parameters by ψ and ψ is her preference not explained by Z. Then, the unobserved component of a product utility product attributes, ηij , can be given as below: i

' 1

η ij = X *j ' β i* = X *j ' (ψ

i0

' + ψ 1' Z i ) = X *j ψ

i0

i0

' 1

i

i0

' ' + X *j ψ 1Z i .

This equation shows well Model 6 can capture a part of the unobserved component ηij by consumer characteristics ' . Zi, but not X *j ψ i0

21

In order to provide some managerial guidelines on which model outperforms others in what condition, we conduct two additional analyses: (1) a graphical analysis of latent residuals for the effect of consumer heterogeneity and (2) an analysis to determine the effect of the number of ratings by each target consumer on the predictive powers of major models.

Graphical Analyses for the Effect of Consumer Heterogeneity: We visually compare the distributions of latent residuals obtained from each model to provide an understanding of how well preference models capture the unobserved component. As an illustration, we compare the latent residual distributions of three preference models for three groups of target consumers11: (1) high homogeneity group; (2) medium homogeneity group; and (3) low homogeneity group, according to their preference heterogeneity. The residual distributions of three preference models (Model 5, 6, and 8) are shown in Figure 5. The three graphs, (Figure 5-1, 5-2, and 5-3) show that the more heterogeneous the target consumer is the more likely that the distribution of unobserved components of standard preference models (e.g. Model 5) is multimodal. Insert Figures 5, 5-1, 5-2, 5-3, 5-4 here. In case of low homogeneity group, the residual distributions of three models in Figure 51 are similar with low standard deviations (less than 0.4). This implies that preference models performed well for the predictions of homogeneous consumers in their preferences. However, the gaps among model performances in terms of RMSEs become larger in case of relatively heterogeneous consumers as given in Figures 5-2 and 5-3. This implies that Model 5 become less accurate for the predictions of heterogeneous consumers and that Models 6 and 8 that capture the unobserved components become more accurate than a standard preference model. In addition, they show that Model 6 is not enough to capture the unobserved component compared to our

11 For this purpose, we first compute the MAD (the mean absolute deviance) between the target consumer’s ratings and the mean ratings across all target consumers. Then, we sample three groups out of them: high homogeneity group with the lowest MADs (20% of target consumers), medium homogeneity group in the middle MAD values (about 40% to 60% of target consumers) and low homogeneity group with high values of MAD (20% of target consumers).

22

model, especially when consumers are heterogeneous in their preferences. The latent residual distributions of our model are tighter and unimodal compared to those of the other preference models. This implies that our models not only captured the unobserved components well and also dealt with multimodality of latent residuals while those of Model 6 are more dispersed and multimodal. The similar results can be found from the comparison of latent residuals of three models obtained from all consumers as shown in Figure 5-4. We can conclude that it is helpful to employ a finite mixture of different distributions obtained from 13 clusters for capturing the unobserved component since they are multimodal. This additional diagnostic analysis shows another way of utilizing one of our by-products, the latent residuals. Our approach allows researchers to diagnose the drawbacks of their models by visually showing their residual distributions12.

Analyses for the Effect of the Amount of Preference Data: Finally, we grouped target consumers into five subgroups according to their number of ratings, sometimes called users’ support, in the estimation dataset (Dataset III). We tested the effect of the number of ratings by each target consumer on the predictive powers of major models in each area: Neighborhood model (Model 2), Matrix Factorization model (Model 3)13, attribute-based preference model (Model 6), and our hybrid model with multiple experts (Model 8); the results are shown in Figure 6. Insert Figure 6 here. In the first group (the number of ratings per person < 6), the MF model was the best and the Neighborhood model was the worst in terms of RMSE. Since the preference data is very limited in this group, the performance of our model became worse, but was not as bad as the Neighborhood model due to the inclusion of product attributes and multiple experts. The

12

We appreciate the anonymous reviewers for suggesting the use of multiple experts for our model. CF models without using additional information on the date of rating were used for fair comparison with attribute-based models. 13

23

performance of all models improved with the increasing number of preference ratings provided by consumers. Interestingly, MF models performed well when user support is between 11 to 20 (average group), but its performance did not improve significantly as consumers provide more ratings, while Model 6 and Model 8 performed better with a larger number of preference ratings. This comparison shows that the MF model is still useful for early users with limited number of preference ratings. Our model can make a larger contribution as the recommendation system accumulates preference ratings from users. This result implies that if a firm employs the ensemble strategy of combining multiple models, the inclusion of both the MF model and our model can be very useful. Application 2: Off-line Purchase Data Although we have shown the superiority of our model against other major algorithms, it is important to demonstrate that our modeling approach is a general method to improve any preference model for experience products or products with too many product attributes. To accomplish this objective, we now apply our model to revealed preference data collected through personal interviews on viewing behavior (or revealed preferences) for 30 movies (15 foreign and 15 Korean) from a convenience sample of 312 individuals in the Seoul metropolitan area. The movie list is in Table 1 of the Web Appendix. In addition to viewing behavior (whether the respondent saw the movie), we collected data on movie attributes (genre and national origin), and individual characteristics (age and gender). We classified movie genre into seven categories (drama, romance, action, animation, comedy, horror, and fantasy) following those of the Korean Film Council (www.kofic.or.kr) and national origin into three categories (Western, Asian except Korean, and Korean). Because this dataset had no missing data, we did not consider response choice behavior. We used a nested model with R = 2 (probit model).

24

Estimation. We obtained four datasets as described in Figure 2 by randomly dividing the sample into two groups of 200 and 112 people and assigned them to the reference consumer group and the target consumer group, respectively. We randomly selected 10 foreign and 10 domestic movies for model estimation and 5 foreign and 5 domestic movies for prediction. We identified three virtual experts (clusters)14 from Dataset I by clustering reference consumers for 2 to 5 clusters and using the highest marginal likelihood15 value as compared to the other solutions. The membership probabilities were 27%, 33%, and 41% respectively for the three clusters in this solution; these are numbered 1 through 3. We used the data of reference consumers and 10 movies (Dataset II) in estimating virtual experts’ opinions (residuals). We estimated the virtual expert model with Dataset III to predict 112 target consumers’ choice behavior regarding 30 movies, using information on product attributes, individual characteristics, and virtual experts’ opinions and confidences. The dataset for model estimation16 contained 4,000 observations (20 responses from 200 individuals). The parameter estimates such as grand means and standard deviations of the estimated parameters averaged over individuals are in Web Appendix Table 2. For the predictive validity test, we compared the performances of our two models to three different models applied to Dataset IV as shown in Table 10. Model 1 is a standard choice model with product attributes only, and Model 2 is another standard choice model with product attributes and consumer characteristics. In addition to product attributes and consumer characteristics, Model 3 is a virtual expert model that uses the most similar expert’s opinions only, whereas Model 4 is another virtual expert model that uses multiple experts. Insert Table 10 here. 14We

used two Markov chains, in which 6,000 draws were implemented for the burn-in period and 4,000 draws from the two chains were additionally generated for the estimation of posterior distributions. 15 Marginal likelihood is calculated with Bridge sampling estimator (Meng and Wong 1996, Frühwirth-Schnatter 2004) as described in Appendix E, The values of the marginal likelihood are -376.8, -316.36, -343.6, and -358.1 respectively, for 2, 3, 4, and 5 clusters. 16 We used MCMC simulation for the estimation of joint posterior distributions of model parameters with two Markov chains. 5,000 draws of each chain were implemented for the burn-in period with convergence tests, and 3,000 draws from each chain were used for inference.

25

Model 2 outperformed Model 1 in both in- and out-of-sample fits, implying that respondents’ characteristics are important variables for explaining consumer heterogeneity and for predicting their choice behaviors. Virtual expert models (Models 3 and 4) outperformed standard choice models without any exception in terms of in- and out-of-sample fits. This means that virtual expert models successfully used additional information extracted from the Bayesian residuals. The best model is Model 4, which uses multiple experts’ opinions with mixing coefficients adjusted for each expert with the help of two informative variables. Compared to Model 2, the out-of-sample fit of that model was improved by 15.3% with the inclusion of virtual experts’ opinions, and by 19.4% with the inclusion of virtual experts’ confidence information.

CONCLUSIONS AND FUTURE RESEARCH We developed a general methodology for preference models for experience products or products with too many product attributes that takes into account the effect of unobservable product attributes. We applied our virtual expert model to two situations of stated preference (ratings) and revealed preference (choice) data. The virtual expert model delivers a superior performance compared to the major CF and attribute-based models by taking advantages of these two main approaches yet eliminating their limitations. In both the applications, we show considerable improvements in predictive power with the virtual expert model. This study makes four methodological contributions to the literature on choice modeling. •

First, it develops a general methodology that can be used for the improvement of the predictive power of any preference model. This is true regardless of whether they are stated preference data obtained from Internet recommendation systems or revealed preference data from real purchase data, especially for experience products with non-quantifiable (or unobservable) product attributes.



Second, our model links two different methodological streams: collaborative filtering and consumer choice models both conceptually and methodologically. If this model is used for recommendation in a real situation, it can generate a new type of word-of-mouth (WOM) effect without direct communication among consumers. Our model can be interpreted as a model which embeds automated WOM information (which is the primary fuel of the

26

predictive power of collaborative filtering) into the random utility model in a complementary manner. •

Third, our model enables firms to fully use both the revealed and stated preference data simultaneously with no modification. In fact, firms like Amazon.com and CDnow.com now routinely collect data on consumers’ stated preference data on their first visits and augment them by data on actual purchases over time. Further, our model is flexible enough to use different types of information (e.g. product attributes, consumer characteristics, and consumer preference similarity) available in e-retailers’ consumer databases.



Finally, our method can be included both into preference models and also into any model generating residuals such as CF models, including the MF model from methodological perspective. For instance, the inclusion of the virtual expert model into the MF model can improve the performance of recommendation systems. However, one apparent limitation in applying our model, the heavy computational load to

provide recommendations for new users, will disappear as computing becomes faster. Our modeling approach is general and can be employed to improve the predictive performance of any consumer preference model, and the computational concern is not serious for choice studies. In addition, the applicability of our approach is limited by the availability of reference consumers. In fact, reserving people for reference groups can be a drawback of our approach for its application to studies with a small number of observations. We conclude with a few recommended directions for further research. • First, it is possible to improve the predictive accuracy of the virtual expert model, especially for new users, by using both the typical preference data, and other types of behavioral data obtained from product-specific consumption or Web searches for shopping. •

Second, to enhance the scalability of the model for recommendation services, it is desirable to simplify the model estimation procedures or to extend the model by incorporating adaptive learning (Chung, Rust, and Wedel 2009).



Third, if a recommendation system provides a communication channel among users or location-based services, it would be helpful to combine our approach and preference interdependence models such as Yang and Allenby (2003) for model improvement by using additional information on relationship among users because this model can account for the effects of communication among consumers on their preferences.



Fourth, one may test the performance of our model for other products with quantifiable but numerous attributes, such as automobiles. 27

Appendix A: Summary of the Virtual Expert Model Observed preference data Yij = r

if Ci ,r −1 < U ij ≤ Cir

r = 1, 2,..., R, ∀ i and j.

Latent utility for preference behaviors (Form A) U ij = Wij + Vij + ε ij , where ε ij ~ N (0,1) ∀ i and j. Observed component Wij = β i 0 + β 0 j + X 'j β i for all i and j, where X j = (X j , genre1 , X j , genre 2 ,..., X j , genre P )', and β = ( βi 0 , β 0 j , βi )~MVN (( μ β , μ β , μ β ),Σ β ). i0

0j

i

μβ = Ψ ' Zi , i

where Z i

= (Z i1 ,Z i 2 ,...,Z iD )' , Z iD = 1, (consumer characteristics)

Ψ = (ψ 1 ,ψ 2 ,...,ψ P ),

ψ p = (ψ p1 ,ψ p 2 ,...,ψ pD ) ', P is the number of product attributes and D is the number of individual characteristics with an intercept. Set μ β = 0 for model identification. 0j

Unobserved (non-quantifiable attribute) Component Vij = η 'jα i , where ∑ α ig = 1 and 1 ≥ α ig ≥ 0 for g = 1,…, G. g

where α i = (α i1 ,,..α g1 ,., α iG ) denotes individual-specific mixing coefficients and η 'j = (η1 j ,..η gj ,..,ηGj ) denotes the estimates of latern residuals of product j for consumer g obtained from the standard attribute-based model.

α ig =

exp(γ 'ϖ gi )

∑ exp(γ 'ϖ

g 'i

for g=1,2,...,G. )

g'

where ϖ gi =(1, ϖ 1gi ,ϖ 2 gj ) denotes expert g’ precision level (inversed variance of residuals), preference similarity (membership probability), and an intercept. γ ' = (γ 0 , γ 1 , γ 2 ) denotes the corresponding parameters for ϖ gi and the corresponding error term ξ j ~N (0,Σα ) for all i and j.

28

The Effects of Response Selection Behaviors If it is necessary to include the effects of non-ignorable missing observations due to consumers’ response selection behaviors, the utility model should be reformulated as below and the response probabilities are conditioned on response selection behaviors as below. φ (Wijs ) (Form B) E(U ij ) = Wij + Vij + ρ i , Φ (Wijs ) where tanh −1 ( ρ i ) ~ N ( μ ρ , σ ρ ). Wijs = X 'j β is , (the utility of target i' s responding to product j for preference evalution)

β is = Ψ s ' Z i + ξ is ,

ξ is ~ N (0, Σ sβ )

∀ i and j

where Ψ = (ψ ,ψ ,...,ψ ), s

s' 1

s' 2

s' P

s ψ ps = (ψ ps 1 ,ψ ps 2 ,...,ψ pD )' ,

and ξ is = (ξ is1 , ξ is2 ,..., ξ iPs ).

The corresponding priors for the response choice model are the same as those for the preference choice models.

If the model considers the effects of missing observations, the response choice and the preference choice models are simultaneously estimated via the following bivariate normal distribution straightforwardly:

(ε ijs , ε ijs ) ~ MVN (1,1, 0, 0, ρi ).

Note: Appendixes B-F are not included in this document. Interested readers can contact the authors for getting a copy of the same.

29

References Albert, James and Siddhartha Chib (1995), “Bayesian Residual Analysis for Binary Response Regression Models”, Biometrika, 82, 747-759. Ansari, Asim, Skander Essegaier, and Rajeev Kohli (2000), “Internet Recommender Systems,” Journal of Marketing Research, 37, (August), 363-375. Ariely, Dan, John G. Lynch, Jr., and Manuel Aparicio (2004), “Learning by Collaborative and Individual-Based Recommendation Agents, Journal of Consumer Psychology, 14 (1 - 2), 81-94. Bell, R. and Y. Koren (2007), "Improved Neighborhood-bsaed Collaborative Filtering", KDDCup and Workshop, ACM Press, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Bell, R., Y. Koren, and C. Volinsky (2008), "The BellKor 2008 Solution to the Netflix Prize," http://Netflix.com. Breese, John S., David Heckerman, and Carl Kadie (1998), “Empirical Analysis of Predictive Algorithms for Collaborative Filtering,” in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence. Bodapati, Anand V. (2008), “Recommendation Systems with Purchase Data,” Journal of Marketing Research, 45, 1(August), 77-93. Chung, Tuck S., Roland T. Rust, and Michel Wedel (2009), “My Mobile Music: An Adaptive Personalization System for Digital Audio Players,” Marketing Science, 28, 1, (JanuaryFebruary), 52-68. Chib, Siddhartha (1995), “Marginal Likelihood from the Gibbs Output,” Journal of the American Statistical Association, 90, 432 (December), 131-321. Chien, Yung-Hsin and Edward I. George (1999), “A Bayesian Model for Collaborative Filtering,” Working paper, Department of MSIS, University of Texas, Austin.Cowles, Mary Kathryn (1996), “Accelerating Monte Carlo Markov Chain Convergence for Cumulative-link Generalized Linear Models,” Statistic and Computing, 6, 101-111. Darlene B. Smith (1990), "The Economics of Information: An Empirical Approach to Nelson's Search-Experience Framework," Journal of Public Policy and Marketing, 9 (1990), 111128. Frühwirth-Schnatter, Sylvia (2001), “Fully Bayesian analysis of switching Gaussian state space models,” Annals of the Institute of Statistical Mathematics, 53, 31–49. Frühwirth-Schnatter, Sylvia (2004), “Estimating marginal likelihoods for mixture and Markov switching models using bridge sampling techniques,” The Econometrics Journal, 7 (1), 143-167. 30

Gelfand, A.E. and A.F.M. Smith (1990), “Sampling Based Approaches to Calculating Marginal Densities,” Journal of the American Statistical Association, 85, 972-985. Gjoka, Minas and Fabio Soldo (2008), “Exploring collaborative filters: Neighborhood-based approach,” working paper, Department of MSIS, University of Texas, Austin. Goldberg, David, David Nichols, Brian M. Oki and Douglas Terry (1992), “Using Collaborative Filtering to Weave an Information Tapestry,” Communications of the Association for Computing Machinery, 35, 12, 61-70. Haubl, Gerald and Valerie Trifts (2000), “Consumer Decision Making in Interactive Online Shopping Environments: The Effects of Interactive Decision Aids,” Marketing Science, 19 (1), 4-21. Herlocker, Jonathan L., Joseph A. Konstan, Al Borchers, and John Riedl (1999), “An Algorithmic Framework for Performing Collaborative Filtering,” ACM SIGIR (Special Interest Group on Computer-Human Interaction). Jacobs, R. A., Peng, F. & Tanner, M. A. (1997), “A Bayesian approach to model selection in hierarchical mixture-of-experts architectures,” Neural Networks 10, 231-41. Jiang, W. and Tanner, M. A. (1999), “Hierarchical Mixtures-of-Experts for Exponential Family Regression Models: Approximation and Maximum Likelihood Estimation,” Annals of Statistics 27, 987-1011. Jordan, M. I. and R. A. Jacobs (1994), “Hierarchical Mixtures of Experts and the EM Algorithm,” Neural Computation, 6(2), 181–214. Koren, Yehuda (2008), “Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model”, in Proceedings of 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD'08), ACM press. Koren,

Yehuda (2009), "The BellKor http://www.netflixprize.com.

Solution

to

the

Netflix

Grand

Prize",

Piotte, M. and M. Chabbert (2009), "The Pragmatic Theory solution to the Netflix Grand Prize" . Newton, Michael A. and Adrian E. Raftery (1994), "Approximate Bayesian Inference with the Weighted Likelihood Bootstrap," Journal of the Royal Statistical Society. Series B (Methodological), 56 (1), 3-48. Paterek, A. (2007), “Improving Regularized Singular Value Decomposition for Collaborative Filtering”, in Proceedings of KDD Cup and Workshop. Little, R. J. A., and D.B. Rubin (1987), Statistical Analysis with Missing Data, New York: John Wiley & Sons, Inc.

31

Schafer, J.L., Ezzati-Rice, T.M., Johnson, Resnick, Paul and Hal Varian (1997), Recommender Systems, Communications of the ACM, (March), 56-58. Resnick, Paul N., N. Iacovou, M. Suchak, P. Bergstrum and J. Riedl (1994), “Group Lens: An Open Architecture for Collaborative Filtering of Net News,” Proceedings of ACM, Chapel Hill NC, 175-186. Salakhutdinov, R., A. Mnih, and G. Hinton (2007), “Restricted Boltzmann Machines for Collaborative Filtering,” in Proceedings of ICML’07, the 24th International Conference on Machine Learning, 791–798. Sarwar, Badrul M., Joseph A. Konstan, Al Borchers, John Herlocker, Brad Miller and John Riedl (1998), “Using Filtering Agents to Improve Prediction Quality in the GroupLens Research Collaborative Filtering System,” in Proceedings of the 1998 Precision on Computer Supported Collaborative Work. Sarwar, Badrul M., G. Karypis, J. A. Konstan, and J. Riedl (2001),”Item-based collaborative filtering recommendation algorithms,” in Proceedings of WWW’01:10th International Conference on World Wide Web, 285–295, ACM Press. Schafer, Ben J., Joseph Konstan and John Riedl (1999), “Recommender Systems in ECommerce,” ACM Conference on Electronic Commerce (EC-99), (November). Shardanand, Upendra and Pattie Maes (1995), “Social Information Filtering: Algorithms for Automating Word-of-Mouth,” Proceedings of the Conference on Human Factors in Computing Systems (CHI), (ACM Press: New York, NY), 210-217. Takács, Gábor, István Pilászy, Bottyán Németh, and Domonkos Tikk (2009), “Scalable Collaborative Filtering Approaches for Large Recommender Systems,” Journal of Machine Learning Research 10, 623-656. Takács, Gábor, István Pilászy, and Bottyán Németh (2008), “Matrix Factorization and Neighbor Based Algorithms for the Netflix Prize Problem,” 2nd ACM Conf. on Recommendation Systems, New York: ACM, 267-274. Töscher, A., M. Jahrer, and R. Bell (2009), "The BigChaos Solution to the Netflix Grand Prize", (2009). “The BigChaos Solution to the Netflix Prize 2009," http://www.netflixprize.com. Ungar, Lyle H. and Dean P. Foster (1998), A Formal Statistical Approach to Collaborative Filtering, Conference on Automated Learning and Discovery (CONALD). Urban, Glen L, Fareena Sultan, and William Qualls (2000), “Placing Trust at the Center of your Internet Strategy," Sloan Management Review, Fall, 2000. Peng, F., Jacobs, R. A., and Tanner, M. A. (1996), “Bayesian inference in mixtures-of-experts and hierarchical mixtures-of-experts models with an application to speech recognition,” Journal of American Statistical Association, Vol. 91, 953-960. 32

Wind, Jerry and Vijay Mahajan (1997), “Issues and Opportunities in New Product Development: An Introduction to the Special Issue,” Journal of Marketing Research, 34, (February), 112. Yang, Sha and Greg M. Allenby (2003), "Modeling Interdependent Consumer Preferences," Journal of Marketing Research, 40, 282-294. Ying, Yuanping, Fred Feinberg, and Michel Wedel (2006), “Leveraging Missing Ratings to Improve Online Recommendation Systems,” Journal of Marketing Research, 43 (August), 355-365.

33

Table 1: A Classification of Major Models for Recommendation Services Model Classification

Model Type

Linear regression model Attribute-based Model (focus on product similarity)

Consumer preference data

(Ansari, Essegaier, and Kohli 2000)

(ratings) Ordinal regression model (Ying, Feinberg, and Wedel 2006)

Choice data (purchases)

Logit model (Urban, Sultan, and Qualls 2000, Bodapati 2008)

Memory-based CF Consumer preference data Collaborative filtering-based model (focus on consumer similarity)

(ratings)

Neighborhood models (Koren 2009, Bell and Koren 2007, Breese, Heckerman, and Kadie 1998, Shardanand and Maes 1995, Resnick et. al. 1994) Clustering (Chien and George 1999, Ungar and Foster 1997, Breese, Heckerman, and Kadie 1998)

Model-based CF MF (Matrix Factorization) models (Koren 2009, Koren 2008, Bell, Koren, and Volinsky 2008, Takács et. al. 2008, Paterek, 2007, Töscher, A., M. Jahrer, and R. Bell 2009) Linear regression (Töscher, A., M. Jahrer, and R. Bell 2009, Bell, Koren, and Volinsky 2008)

Neural Network & Restricted Boltzmann Machines (Breese, Heckerman, and Kadie 1998, Salakhutdinov, Mnih and Hinton 2007)

34

Table 2: Description of Data A: Sample Description

Gender: 83% male. Age: Mean of 32.2 for males and 30.5 for females. Genre

B. Description of Movies by Genre (Product attributes)

Action Animation Art/Foreign Classic Comedy Drama Family Horror Romance Thriller

%

Preference Mean (S.D*)

13 5 12 2 15 16 6 5 12 14

3.31 (0.21) 3.75 (0.22) 4.26 (0.35) 4.36 (0.22) 3.33 (0.24) 4.02 (0.26) 3.44 (0.23) 3.31 (0.41) 3.92 (0.26) 3.81 (0.23)

Preferences: Score

Percent (%)

1

2

3

4

5

6

13

6

14

23

26

18

35

Table 3-1: Estimated Model Parameters for Response Choice Behaviors Variables

Mean

Standard deviations

Heterogeneityb

Intercept ( β i 0 )

-2.288

1.871

0.661

Genre 1

Action

2.631*

0.662

0.413

Genre 2

Animation

2.062*

0.577

0.462

Genre 3

Art/Foreign

0.274

0.151

0.141

Genre 4

Classic

-0.691

0.463

0.523

Genre 5

Comedy

1.712*

0.581

0.361

Genre 6

Drama

-0.215

0.283

0.126

Genre 7

Family

-0.393

0.188

0.249

Genre 8

Horror

0.736*

0.337

0.599

Genre 9

Romance

1.265*

0.321

0.202

Genre 10

Thriller

-0.753

0.430

0.414

a. Standard deviations are calculated by averaging the square roots of the corresponding covariance matrix draws obtained from MCMC. b. We use a standard deviation of posteriors for individual preference parameters across people as a measure of preference heterogeneity. * significant at the 0.01 level of Bayesian P-value.

Table 3-2: Estimated Cut-off points for Response Choice Behaviors Cut-off points C

Mean

Standard Deviation

Heterogeneityb

1

0.231

0.147

0.174

2

0.649

0.464

0.198

3

1.342*

0.321

0.132

4

2.311*

0.412

0.202

36

Table 4: Estimated Model Parameters for Preference Behaviors: Attribute and Expert Opinion Coefficients for level 1 Variables

Mean

Standard deviations

Heterogeneityb

Variables

Mean

Standard deviations

Heterogeneityc

Intercept ( β i 0 )

-3.899*

1.265

1.761

Expert opinion 1

0.059

0.095

0.106

-

-

0.775

Expert opinion 2

0.062

0.067

0.079

Genre: Action

0.760*

0.268

0.642

Expert opinion 3

0.059

0.095

0.036

Genre: Animation

1.122*

0.401

0368

Expert opinion 4

0.074*

0.034

0.045

Genre: Art/Foreign

-0.731

0.489

0.576

Expert opinion 5

0.062

0.059

0.052

Genre: Classic

-1.438*

0.333

0.436

Expert opinion 6

0.059

0.026

0.050

Genre: Comedy

0.325*

0.103

0.252

Expert opinion 7

0.064

0.094

0.075

Genre: Drama

0.211

0.172

0.117

Expert opinion 8

0.080*

0.038

0.055

Genre: Family

-0.271

0.238

0.148

Expert opinion 9

0.082

0.081

0.079

Genre: Horror

-0.215

0.124

0.296

Expert opinion 10

0.092*

0.041

0.057

Genre: Romance

0.343*

0.211

0.301

Expert opinion 11

0.081

0.059

0.015

Genre: Thriller

-0.583

0.422

0.319

Expert opinion 12

0.116*

0.049

0.044

Expert opinion 13

0.131*

0.051

0.047

Intercept (

β0 j )

a. Standard deviations are calculated by averaging the square roots of the corresponding covariance matrix draws obtained from MCMC. b. We use a standard deviation of posteriors for individual preference parameters across people as a measure of preference heterogeneity. * significant at the 0.01 level of Bayesian P-value.

37

Table 5: Estimated Relationships between Genre Coefficients and Demographics for level 2 Genre coefficient

Action Animation Art/Foreign Classic Comedy Drama Family Horror Romance Thriller

Demographic variables Agea -0.147 (0.263)b -0.445* (0.211) 0.265 (0.166) 1.546* (0.246) -0.238* (0.111) 0.237 (0.141) -0.346 (0.322) 0.162 (0.143) 0.067 (0.056) 0.533* (0.132)

Genderb -0.252 (0.197) 0.125 (0.678) 0.131 (0.113) 0.025 (0.021) 0.053 (0.045) 0.036 (0.023) 0.521 (0.421) -0.324* (0.149) 0.342 (0.223) -0.541 (0.424)

a Age variable is standardized. b Gender variable is coded as 1=female and 0=male. c The value in each parenthesis is the corresponding standard deviation. * significant at the 0.01 level of Bayesian P-value.

38

Intercept 0.725 (0.072) 0.745 (0.042) -0.418* (0.102) -0.699* (0.104) 0.238* (0.101) -0.133 (0.0242) -0.422 (0.332) -0.212 (0.122) -0.242 (0.232) -0.316* (0.122)

Table 6: Estimated Relationship between Mixing Coefficients and Expert Informativeness Variables

Expert Opinion

Expert Similarity (Membership Prob.)

Expert Precision (inversed variance)

Intercept

Expert 1

0.144 (0.278)

0.375* (0.114)

-1.522 (0.668)

Expert 2

0.214* (0.424)

0.361 (0.484)

-0.993* (0.165)

-0.110 (0.223)

0.498* (0.254)

-2.110* (0.148)

0.686* (0.202)

0.355* (0.125)

-1.00 (0.129)

Expert 5

0.623* (0.298)

0.421 (0.332)

-0.417 (0.269)

Expert 6

-0.136 (0.150)

0.398* (0.106)

-0.110 (0.350)

Expert 7

0.647* (0.230)

0.325* (0.104)

-1.923 (0.243)

Expert 8

0.590* (0.257)

0.332 (0.186)

-1.997* (0.774)

Expert 9

0.575 (0.450)

0.250* (0.101)

-0.430* (0.737)

Expert 10

0.687* (0.279)

-0.135 (0.132)

-0.443 (0.375)

Expert 11

0.695* (0.242)

0.251* (0.111)

-0.466* (0.021)

Expert 12

0.645 (0.483)

-0.114 (0.100)

0.110 (0.153)

Expert 13

0.783* (0.202)

0.151 (0.102)

-1.776 (0.650)

Expert 3 Expert 4

* Significant at the 0.01 level of Bayesian P-value

Table 7: Estimated Cut-off points and Correlation (Preference Behaviors) Cut-off points C 1 2 3 4 ñ (correlation)

Mean 0.397* 0.772 1.545 2.524* 0.091

Standard Deviation 0.163 0.337 0.987 1.188 0.055

b

Heterogeneity 0.077 0.164 0.333 0.669 0.061

* Significant at the 0.01 level of Bayesian P-value

39

Table 8: Model Structure Comparison Model

A. Intercept

Utility Structure if Ci ,r −1 < U ij ≤ Cir

Yij = r DS ij

r = 1, 2,..., R

βi 0 + β0 j

Neighborhoo d model (Model 2)

Linear regression (Koren 2009, Bell, Koren, and Volinsky 2008)

E ( yij ) = b0 + βi 0 (tij ) + β 0 j (tij ) Linear regression (Koren 2009, Bell, Koren, and Volinsky 2008) E ( yij ) = b0 + βi 0 (tij ) + β0 j (tij )κ i +

R(i )

−1/2



j '∈R ( i )

+ N (i)

−1/2

Matrix Factorization - SVD++ (Model 3)

b0 + βi 0 (tij ) + β0 j (tij )



* i

Linear regression (Koren 2009, Bell, Koren, and Volinsky 2008) E ( yij ) = b0 + βi 0 (tij ) + β0 j (tij )κ i +



j '∈N ( i )



β 3j ' ⎟

-

-

b0 + βi 0 (tij ) + β0 j (tij ) Consumer effect Product effect

R(i )

−1/2



j '∈R ( i )

( yij − (b0 + bi 0 + b j 0 )) β 1jj '

β 1jj ' : the effect of rating product j’ on the rating of product j by consumer i, related to the values of the ratings

β jj2 '

⎛ −1/2 +q 'j ⎜ pi (tij ) + N (i) ⎝

X β *' j

-

Consumer effect Product effect

( yij − (b0 + bi 0 + b j 0 )) β 1jj '

j '∈N ( i )

C. Unobserved Component

X 'j β i

U ij = βi 0 + β0 j +X'j βi + X*'j βi* Baseline model (Model 1)

B. The Effect of Observed Attributes

-

b0 + βi 0 (tij ) + β0 j (tij )

q j pi (tij )

E. Response choice effect D. Scale heterogeneity Yij = r DS ij

Ci -

κ (tij ) =

Ordinal probit

E(U ij ) = β 0 + X β

' j 1

ij

Individual and day-specific scale effects

Scale feature

q j , pi (tij ) : parameter vectors

Consumer effect Product effect

for singluar value decomposistion qi (tij ) = (qi1 (tij ),..., qiK (tij )) '



pi (tij ) = ( pi1 (tij ),..., piK (tij )) '

β0

N (i )

κ i + κ i ,t

κ (tij ) = κ i + κ i ,t

ij

pik (tij ) = pik + τ ik ( sign(t − t i ) t − t i )

Preference Model without heterogeneity (Model 4) Preference Model with heterogeneity (Model 5)

-

X'jβ1

−1/2

βi = Ψ ' Zi + ξ β i

βi 0

X'j βi

Consumer effect

40

β jj2 '

β jj2 ' : the effect of rating product j’ on the rating of product j by consumer i, related to only which movies were rated previously

⎛ −1/ 2 q 'j ⎜ N (i ) ⎝

β

3 j'



j '∈N ( i )



β 3j ' ⎟ ⎠

: the effect of rating

product j’ by consumers.

-

-

-

-

Ci

-

Ordinal probit

E(U ij ) = β 0 + X 'j β i



j '∈N ( i )

Model 6

Ordinal probit (Ansari et al. 2004 & Ying et al. 2006) (1) E (U ij ) = βi 0 + β0 j +

ϕ ( βis X j ) + X β + z β + ρi Φ( βis X j ) ' j

' i

x i

ϕ ( βis X j ) E (U ij ) = βi 0 + β0 j + X 'j βi + η 'jαi + ρi , Φ( βis X j ) βi = Ψ ' Zi + ξ β i , αig =

g 'i

X 'j βix

βi 0 + β0 j Consumer effect Product effect

zi' β jz Z: Consumer Characteristics (gender, age)

Unobserved Component

X 'j β i

βi = Ψ ' Zi + ξ β i

η 'jα i = ∑η gjα ig where g

∑α

exp(γ 'ϖ gi )

∑ exp(γ 'ϖ

Consumer effect Product effect

z j

Ordinal probit Virtual Expert Model (Model 7 & 8)

Product heterogeneity

βi0 + β0 j

ρi

ϕ ( βis X j ) Φ( βis X j )

Ci

ρi

ϕ ( βis X j ) Φ( βis X j )

= 1 and 1 ≥ α ig ≥ 0.

g

∀ i, j.

η 'j = (η1 j ,..η gj ,..,ηGj )

)

g'

(1) The grand mean and individual deviations of β ix and

ig

Ci

:

Use of experts’ residuals

β jz s are combined into one parameter for consumer heterogeneity (and similararly for product heterogeneity) for the sake of comparability.

41

Table 9: Model Comparison Description of Data Used for Prediction*

Model

I 1

2

I 3

2 Neighborhood model 3 Matrix Factorization (SVD++)

Hybrid Modeling Approach

7 (Virtual Exp ert Model with the most similar expert) 8 (Virtual Expert Model with multiple experts )

0.30/0.37* (1.82/1.60*)

0.27/0.33* (2.02/1.80*)





0.41/0.45* (1.43/1.28*)

0.38/0.40* (1.59/1.40*)







0.45/0.50* (1.20/1.07*)

0.40/0.45* (1.39/1.31*)

0.29 (1.89)

0.26 (2.11)

0.35 (1.72)

0.31 (1.84)

0.45 (1.28)

0.40 (1.42)



5 (Preference Model with Consumer Heterogeneity) 6 (Preference Model with missing data and product heterogeneity)

Holdout sample



4 (Preference Model without Consumer Heterogeneity) Attribute-based Preference Models

In-sample 4

1 Baseline Model Collaborative Filtering Models

Hit Rate (RMSE)

















0.48 (1.21)

0.44 (1.34)









0.52 (1.10)

0.48 (1.21)

* The two entries represent the performance measures without and with the temporal effect. I: Missing Data: √= considered; Blank = not considered II: Product Attributes: √= present; Blank = absent III: Scale Heterogeneity: √= present; Blank = absent IV: Use of Others’ ratings: √= present; Blank = absent

42

Table 10: Model Performance

Model

In-sample fits

Out-of-sample fit Hit Rate (out-of-sample)

Standard Choice Model

1 Choice Model with product attributes only 2 Choice Model with product attributes and consumer characteristics 3 Choice model with the

Virtual Expert Model

Marginal log likelihood

Hit Rate (In-sample)

-966.32

74%

68%

-936.32

77%

72%

-821.09

86%

82%

most simlilar expert

4 Choice model with multiple experts

-755.91

43

89%

86%

Figure 1: The Structure of Virtual Expert Model Utility of Product j for Target Consumer i Uij

Utility of Product j For G Virtual Experts U1j,…UGj

Observed

Unobserved

Observed

Unobserved

Component

Component

Components

Components

Wij = X 'j β i

η1j,..,,ηij,…ηIj

W 1j,..,W gj,..,W Gj

η 'j =(η1j ,..ηgj ,..ηGj)

Wgj = X 'j β g

ηij = X *'j β i*

Wij = X 'j β i

Relate: ηij = η 'jα i + ε ij

The utility part explained by

The utility part explained by

observed attribute information

virtual expert opinions

Virtual Expert Model Predicted Utility of Product j for Target Consumer i Uij= ' ( W ij = X j β i )

Wij

ηij

+

Mixing coefficients

α i1 α ig α iG

βi

(ηij = η'jαi + εij ), st. ∑ α ig

=1

g

(Observed product attributes X)

X 'j

(Consumer characteristics Z) βi = Ψ ' Zi + ξi

η 'j = (η1j ,..η1j ,..ηGj) α ig =

(experts’ opinions(residuals) η 'j )

exp(γ 'ϖ gi )

∑ exp(γ 'ϖ g'

44

g 'i

(experts’ informativeness ϖ gi )

)

Figure 2: Estimation and Validation Data Movies Estimation set

Validation set

Dataset I

Dataset II

Estimation dataset for Clustering*

Estimation dataset for experts’ residuals

Target

Dataset III

Dataset IV

Group

Estimation dataset for virtual expert model

Validation dataset for recommendation model (prediction)

Reference Group (h) Consumers

(i)

* The reference group’s data, Dataset I and II, are used for the identification of virtual experts.

45

Figure 3: The Membership Distribution of the Reference Group 400 350 The Number of Members

300 250 200 150 100 50 0 1

2

3

4

5

6

7

8

9

10

11

12

Cluster

Figure 4: The Genre Preferences of Virtual Experts Expert Preference on Movie Genre

6

Mean Rate

5 4 3 2 1 0

Movie Genre

46

13

Figure 5: The distributions of the unobserved components (latent residuals) Figure 5-1: Low heterogeneity group

Figure 5-2: Medium heterogeneity group

Probability density

Probability density 1.8

3

1.6

Virtual expert

Virtual

1.4

Model 8

expert Model

1.2

2.5

2

1.5

Model 6

(1.12) Model 6

8

1

(0.872)

(1.42) 0.8

Model 5

0.6

1

(0.898)

Model 5

0.4

(1.82)

0.5

0.2 0 -1.5

-1

-0.5

0

0.5

1

0 -2

1.5

-1.5

Figure 5-3: High heterogeneity group

-1

-0.5

0

0.5

1

1.5

2

2.5

Figure 5-4: total group

Probability density

Probability density

1.5 1.8

Virtual expert 1.6

Virtual expert

Model 8 (1.411) 1

1.4

Model 8

1.2

Model 5 (2.318)

0.5

Model 6

1

(1.772)

0.8

(1.212) Model 5 Model 6

(1.821)

(1.424)

0.6 0.4 0.2

0 -3

-2

-1

0

1

2

3

0 -2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

a: The value in each parenthesis is RMSE.

47

Figure 6: The Effect of the Number of Ratings on Model Performance

1.9 1.8 1.7

RMSE

1.6 1.5

Model 2

Model 6

1.4 1.3

Model 3

1.2

Model 8

1.1 1 1

5

2

10

3

20

4

40

567

number of ratings per person

Web Appendix is not included in this version. Interested readers can contact the authors for the same.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.