Managing contextual preferences

July 1, 2017 | Autor: Panos Vassiliadis | Categoría: Information Systems, Personalization, Level Of Detail (LOD), Data Structure, Indexation

Share Embed

Laporkan tautan ini

Descripción

Managing Contextual Preferences Kostas Stefanidis∗, Evaggelia Pitoura and Panos Vassiliadis Department of Computer Science, University of Ioannina, GR-45110 Ioannina, Greece.

Abstract To handle the overwhelming amount of information currently available, personalization systems allow users to specify through preferences which pieces of data interest them. Most often, users have diﬀerent preferences depending on context. In this paper, we introduce a model for expressing such contextual preferences. Context is modeled using a set of hierarchical attributes, thus allowing context specification at various levels of detail. We formulate the context resolution problem as the problem of (a) identifying the preferences that qualify to encompass the context of a query and (b) selecting the most appropriate among them. We also propose algorithms for context resolution based on data structures that index preferences by exploiting the hierarchical nature of the context attributes. Finally, we evaluate our approach from two perspectives: usability and performance. Usability evaluates the overheads imposed to users for specifying context-dependent preferences, as well as their satisfaction from the quality of the results. Our performance results focus on context resolution using the proposed indexes. Key words: Preferences, Personalization, Context.

1. Introduction Personalized information delivery aims at addressing the explosion of the amount of data currently available to an increasingly wider spectrum of users. Instead of overwhelming the users with all available data, personalization systems provide users with only the data that is of interest to them. Preferences have been used as a means to address this challenge. To this end, a variety of preference models have been proposed most of which follow either a qualitative or a quantitative approach. With the qualitative approach (such as the work in [1, 2]), preferences between two pieces of data are specified directly, typically using binary preference relations. For instance, using a qualitative model, users may explicitly state that they prefer visiting archaeological sites than science museums. With the quantitative approach (e.g., [3, 4, 5]), users employ scoring functions that associate a numerical score with specific pieces of data to indicate their interest in them. For instance, one’s preference in archaeological sites may be expressed by assigning high scores to such places. However, most often users have diﬀerent preferences under diﬀerent circumstances. For instance, the current weather conditions may influence the place one wants to visit. For example, when it rains, a museum may be preferred over an open-air archaeological site. Context is a general term used to capture any information that can be used to characterize the situations of an entity [6, 7]. Common types of context include the computing context (e.g., network connectivity, nearby resources), the user ∗ Corresponding author. Tel.:+302651098858. E-mail addresses: [email protected] (Kostas Stefanidis), [email protected] (Evaggelia Pitoura), [email protected] (Panos Vassiliadis).

Preprint submitted to Information Systems

context (e.g., profile, location), the physical context (e.g., noise levels, temperature) and time [8, 9]. In this paper, we propose enhancing preferences with context-related information. We use context to indicate any attribute that is not part of the database schema. Context is modeled using a set of multidimensional context parameters. A specific context state or situation corresponds to an assignment of values to context parameters. By allowing context parameters to take values from hierarchical domains, diﬀerent levels of abstraction for the captured context data are introduced. For instance, the context parameter location may take values from a city, country or continent domain. Users employ context descriptors to express their preferences on specific database instances for a variety of context states expressed with varying levels of detail. Each query is also associated with one or more context states. The context state of a query may, for example, be the current state at the time of its submission. Furthermore, a query may be explicitly enhanced with context descriptors to allow exploratory queries about hypothetical context states. A central problem is identifying those preferences that are applicable to the context states that are most relevant to the states of a query. We call this problem context resolution. Context resolution is divided into two subproblems: (a) the identification of all candidate context states that cover the query states and (b) the selection of the most appropriate states among these candidates. The first subproblem is resolved through the notion of the cover partial order between states that relates context states expressed at diﬀerent levels of abstraction. For instance, the notion of coverage allows relating a context state in which location is expressed at the level of a country and a context state in which location is expressed at the level of a continent. To resolve the August 17, 2010

second subproblem, we consider appropriate distance metrics that capture similarity between context states. We introduce algorithms for context resolution that build upon two data structures, namely the preference graph and the profile tree, that index preferences based on their associated context states. The preference graph explores the cover partial order of context states to organize them in some form of a lattice. A top-down traversal of the graph supports an incremental specialization of a given context state, whereas a bottom-up traversal an incremental relaxation. The profile tree oﬀers a space-eﬃcient representation of context states by taking advantage of the co-occurrence of context values in preferences. It supports exact matches of context states very eﬃciently through a single root-to-leaf traversal. Our focus is on managing context for preferences, i.e., expressing, storing and indexing contextual preferences. In general, preferences may be collected using various ways. Preferences may be provided explicitly by the users or constructed automatically, for instance, based on the past behavior of the same or similar users. Such methods for the automatic construction of preferences have been the focus of much current research (e.g., [10]). A practical way to create profiles that we have used in our experiments is to assemble a number of default profiles and then ask the users to update them appropriately. We have evaluated our approach along two perspectives: usability and performance. Our usability experiments consider the overhead imposed to the users for specifying contextdependent preferences versus the quality of the personalization thus achieved. We used two databases of diﬀerent sizes. The sizes of the database have two important implications for usability. First, they aﬀect the number of preferences. Then, and most importantly, they require diﬀerent methods for evaluating the quality of results. Our performance experiments focus on our context resolution algorithms that employ the proposed data structures to index preferences for improving response and storage overheads. In a nutshell, in this paper, we

respectively. Section 7 describes related work and finally, Section 8 concludes the paper with a summary of our contributions. 2. Contextual Preferences In this section, we present first, our model of context, then how to specify context states and finally, contextual preferences, e.g., preferences annotated with context information. In the rest of this paper, we use the following two databases as our running examples. Movie Database. The movie database (MD) maintains information about movies. Its schema consists of a single database relation with schema: Movies (mid, title, year, director, genre, language, duration). Points of Interest Database. The points of interest database (PID) maintains information about interesting places to visit. Its schema consists of a single database relation with schema: Points o f Interest (pid, name, type, location, open-air, hours o f operation, admission cost). 2.1. Context Model We model context using a finite set of special-purpose attributes. For a given application X, we define its context environment CE X as a set of these attributes. Definition 1 (Context environment). Let X be an application. Its context environment, CE X , is a set of n attributes, CE X = {C1 , C2 , . . . , Cn }, n ≥ 1, where each attribute C i , 1 ≤ i ≤ n is called a context parameter. For example, for the movie database, we consider three context parameters as relevant, namely, accompanying people, mood and time period. That is a context environment, CE MD = {accompanying people, mood, time period}. Preferences about movies depend on the values of these context parameters. For instance, a high preference score may be associated to movies of the genre cartoons, for users accompanied by their children and a low preference score for those accompanied by their friends. For the points of interest database, we consider the context environment, CE PID = {user location, weather, accompanying people}. For instance, a point of interest of type zoo may be a more preferable place to visit than a brewery when accompanied by family and an open-air place like Acropolis a better place to visit than a non open-air museum, when weather is good. To allow flexibility in defining context specifications, we model context parameters as attributes that can take values with diﬀerent granularities. In particular, each context parameter has multiple levels organized in a hierarchy schema. Let C be a context parameter with m levels, m > 1. We denote its hierarchy schema as L = (L1 , . . . , Lm ). L1 is called the lowest or most detailed level of the hierarchy schema and L m the top or most general one. We define a total order among the levels of each hierarchy schema L such that L 1 ≺ . . . ≺ Lm and use the notation Li L j between two levels to mean L i ≺ L j or Li = L j . Fig. 1 depicts the hierarchy schemas of the context parameters of our running examples. For instance, the hierarchy schema

• propose a model for annotating preferences with contextual information; our multi-dimensional model of context allows expressing contextual preferences at various levels of detail, • formulate the problem of context resolution, as the problem of selecting appropriate preferences for personalizing a query based on context, • present data structures and algorithms for implementing context resolution and • evaluate our approach in terms of both usability and performance. The rest of this paper is structured as follows. In Section 2, we present our context and preference model, while in Section 3, we formulate the context resolution problem. In Section 4, we introduce data structures used to index contextual preferences and algorithms for context resolution. Sections 5 and 6 present our usability and performance evaluation results, 2

cod(C) be the single context descriptor C i ∈ {v1 , . . . , vl }, we shall use the notation Context(cod(C i )) = {v1 , . . . , vl }. Context states are specified using multi-parameter context descriptors that combine single parameter ones.

of context parameter user location has four levels: city (L 1 ), country (L2 ), continent (L 3 ) and the top level ALL (L 4 ). Each level L j , 1 ≤ j ≤ m, is associated with a domain of values, denoted dom L j (C). For any two levels L j , Lk , j k, domL j (C) ∩ domLk (C) = {}. We require, for all parameters, that their top level has a single value All, i.e., dom Lm (C) = {All}. We define the domain, dom(C), of C as: dom(C) = ∪ mj=1 domL j (C). A concept hierarchy is an instance of a hierarchy schema. Similar to [11], a concept hierarchy of a context parameter C with m levels is represented by a tree with m levels with nodes at each level j, 1 ≤ j ≤ m, representing values in dom L j (C). The root node (i.e., level m) represents the value All. Fig. 1 depicts the concept hierarchies of the context parameters of our running examples. For example, for the context parameter user location, Greece is a value of level country. Such concept hierarchies may be constructed using, for example, the WordNet [13] or other ontologies. The relationship between the values at the diﬀerent levels of a concept hierarchy is achieved through the use of a family of ancestor functions anc LLkj [12], 1 ≤ j < k ≤ m. The functions

Definition 4 (Multi-parameter context descriptor). Let CE x = {C1 , C2 , . . . Cn } be a context environment. A multi-parameter context descriptor cod is an expression of the form cod = ∧ nj=0 cod(C i j ) where i j ∈ {1, 2, . . . , n}, cod(C i j ) is a single context parameter descriptor for C i j and there is at most one single parameter context descriptor for each C i j . Given a set of context parameters, a multi-parameter context descriptor specifies a set of context states. These states are computed by taking the Cartesian product of the contexts of all the single parameter context descriptors that appear in the descriptor. If a multi-parameter context descriptor does not contain descriptors for all context parameters, we assume that the values of the absent context parameters are indiﬀerent. In particular, if a context parameter C i is missing from a multi-parameter context descriptor, we assume the implicit condition C i ∈ {All} to be part of the descriptor.

L

, 1 ≤ j < m, assign each value of the domain of L j to ancL j+1 j a value of the domain of L j+1 . An edge from a node at level L j to a node at level L j+1 in the concept hierarchy represents that the latter is the ancestor of the former. Given three levels L j , Lk and Ll , 1 ≤ j < k < l ≤ m, the function anc LLlj is equal

Definition 5 (Context of a multi-parameter context descriptor). Let CE X = {C1 , C2 , . . ., Cn } be a context environment and cod = cod(C i1 ) ∧ . . . ∧ cod(C ik ), 1 ≤ k ≤ n, be a multi-parameter context descriptor. The set of context states of cod, denoted Context(cod), is S1 × S2 × . . . × Sn , where for 1 ≤ i ≤ n, Si = Context(cod(C i )), if cod(C i ) appears in cod and S i = {All}, otherwise.

L

to the composition anc LLkj ◦ ancLLlk . Finally, desc Llj (v), 1 ≤ l < j ≤ m, gives the level l descendants of v ∈ dom L j (C), that is, L L descLlj (v) = {x ∈ dom Ll (C) | ancLlj (x) = v}. For example, for the concept hierarchies in Fig. 1, anc LL21 (Athens) = Greece whereas descLL21 (weekend) = {S a, S u}. A context state corresponds to an assignment of values to context parameters.

For the movie example, consider the multi-parameter context descriptor (accompanying people ∈ {friends, family} ∧ time period ∈ {summer holidays}). This descriptor defines the following two context states: (friends, All, summer holidays) and (family, All, summer holidays).

Definition 2 (Context State). A context state cs of a context environment CE x = {C1 , C2 , . . . Cn } is an n-tuple of the form (c1 , c2 , . . . , cn ), where ci ∈ dom(C i ), 1 ≤ i ≤ n.

2.3. Contextual Preference Model We annotate preferences with context descriptors that specify the context states under which a preference holds. Regarding preference specification, there are, in general, two diﬀerent approaches: a quantitative and a qualitative one. In the quantitative approach (e.g., [3]), preferences are expressed indirectly by using scoring functions that associate a numeric score or degree of interest with each item. In the qualitative approach (e.g., [1, 2]), preferences between two items are specified directly, typically using binary preference relations. Context descriptors can be used with both a quantitative and a qualitative approach. Here, we use a simple quantitative preference model to demonstrate the basic issues underlying contextualization. In particular, we assume that preferences for specific tuples of a database are expressed by providing a numeric score which is a real number between 0 and 1. This score expresses a degree of interest, where value 1 indicates extreme interest and value 0 indicates no interest. Interest is expressed for specific values of attributes of a database relation, for instance, for the various attributes (e.g., genre, language) of our movie database relation. This is similar to the general quantitative framework of [3]. Formally, a contextual preference is defined as follows:

For instance, (friends, good, holidays) and (friends, All, summer holidays) are context states for our movie example. The set of all possible context states, called world CW, is the Cartesian product of the domains of the context parameters: CW = dom(C1 ) × dom(C 2 ) × . . . × dom(C n ). 2.2. Context Descriptors Context states can be specified through context descriptors. Specifically, a single parameter context descriptor specifies values of one context parameter. Definition 3 (Single parameter context descriptor). A single parameter context descriptor cod(C) of a context parameter C is an expression of the form cod(C) = C ∈ {v 1 , . . . , vl }, where vk ∈ dom(C), 1 ≤ k ≤ l. For example, for the context parameter time period, a single parameter context descriptor can be time period ∈ {Christmas} or time period ∈ {Christmas, Easter, summer holidays}. Let 3

accompanying_people ALL All

time_period ALL

user_location ALL All

All

continent Europe ... relationship friends family alone date interval working_days weekend holidays weather Easter country Greece ... ALL All occasion M Tu W Th F Sa Su summer Cristmas Valentine’s holidays day characterization good

bad

mood ALL emotion

All happy good

bad

city Athens Thessaloniki ...

conditions mild warm hot freezing cold

Figure 1: Hierarchy schema and concept hierarchy of accompanying people, weather, time period, location and mood.

Definition 6 (Contextual Preference). Given a database schema R(A1 , A2 , . . . , Ad ), a contextual preference p on R is a triple (cod, Pred, score), where

a given query. Our focus is on the context part. First, we define contextual queries. Then, given a contextual query, for a contextual preference to be selected, its context must be the same with or more general than the context of the query. This is formalized by the cover relation between context states that relates context states expressed at diﬀerent hierarchy levels. Among such qualified candidate preferences, we select the ones whose context is the most similar to the context of the query based on two proposed distance metrics between context states. Once the appropriate preferences are selected, the query can be extended to to take the selected preferences into account, as in the case of non-contextual preferences (e.g., [14, 15]).

1. cod is a multi-parameter context descriptor, 2. Pred is a predicate of the form A i1 θi1 ai1 ∧ Ai2 θi2 ai2 ∧ . . . ∧ Aik θik aik that specifies conditions θ i j on the values ai j ∈ dom(Ai j ) of attributes Ai j , 1 ≤ i j ≤ d, of R and 3. score is a real number between 0 and 1. The meaning of such a contextual preference is that in the set of context states specified by cod, the database tuples that satisfy the predicate Pred are assigned the indicated interest score. In this paper, we assume that θ ∈ {=, , ≤, ≥, } for the numerical database attributes and θ ∈ {=, } for the remaining ones. As an example, take the instance of our movie database example shown in Fig. 2. The contextual preference ((accompanying people ∈ {alone} ∧ mood ∈ {bad} ∧ time period ∈ {weekend, holidays}), genre = horror, 0.8) expresses the fact that when in a bad mood, alone at a weekend or during a holiday, horror movies are preferred with interest score 0.8. Note that preferences that hold irrespectively of the values of the context parameters, i.e., context-free preferences, may be expressed using an empty context descriptor, whose context is context state (All, All, . . . , All). By using multi-parameter context descriptors, one can express preferences that depend on the values of more than one context parameter. Furthermore, hierarchies allow the specification of preferences at various levels of detail. For instance, one can specify preferences at the country, city or both levels. Finally, we define profile P as follows:

3.1. Contextual Queries Contextual queries are queries annotated with information regarding context. Definition 8 (Contextual Query). A contextual query Q is a query enhanced with a multi-parameter context descriptor denoted cod Q . The context descriptor may be postulated by the application or be explicitly provided by the users as part of their queries. Typically, in the first case, the context implicitly associated with a contextual query corresponds to the current context, that is, the context surrounding the user at the time of the submission of the query. To capture the current context, context-aware applications use various devices, such as temperature sensors or GPS-enabled devices for location. Methods for capturing context are beyond the scope of this paper. Besides this implicit context, we also envision queries that are explicitly augmented with multi-parameter context descriptors by the users issuing them. For example, such descriptors may correspond to exploratory queries of the form: what is a good film to watch with my family this Christmas or what are the interesting points not to be missed when I visit Athens with my friends next summer. The context associated with a query may correspond to a single context state, where each context parameter takes a specific value from its most detailed domain. However, in some cases, it may be only possible to specify the query context using rough values, for example, when the context values are provided by sensor devices with limited accuracy. In such cases, a context parameter may take a single value from a higher level of the hierarchy or even more than one value.

Definition 7 (Profile). Given an application X, a profile P is the set of all contextual preferences that hold for X. An example profile for the movie database is shown in Fig. 3. The context Context(P) of a profile P is the union of the contexts of all context descriptors that appear in P, that is, Context(P) = ∪i Context(codi ), for each (cod i , Predi , scorei ) ∈ P. 3. Contextual Preference Selection In this section, we consider the problem of selecting appropriate contextual preferences from a profile so as to personalize 4

mid

title

year

director

genre

language

duration

t1 t2 t3

Casablanca Psycho Schindler’s List

1942 1960 1993

Curtiz Hitchcock Spielberg

Drama Horror Drama

English English English

102 109 195

Figure 2: Database instance.

3.2. The Cover Relation Let us first consider a simple example related to the movie database. Assume a contextual query Q enhanced with the context descriptor cod Q = (accompanying people ∈ friends ∧ mood ∈ {good} ∧ time period ∈ {summer holidays}). If a preference with exactly the same context descriptor exists in the profile, preference selection is straightforward, i.e., this preference is selected. Assume now, that this is not the case. For example, take a profile P that consists of three preferences: p 1 = ((accompanying people ∈ friends ∧ mood ∈ {good} ∧ time period ∈ {holidays}), Pred 1 , score1 ) and p2 = ((accompanying people ∈ friends ∧ mood ∈ {good} ∧ time period ∈ {All}), Pred 2 , score2 ) and p3 = ((accompanying people ∈ friends ∧ mood ∈ {good} ∧ time period ∈ {Working days}), Pred 3 , score3 ). Intuitively, in the absence of an exact match, we would like to use those preferences in P whose context descriptor is more general than the query descriptor, in the sense that its context “covers” that of the query.

Going back to our example, although the context states of both p1 and p2 cover those of the query Q, p 1 is more closely related to the descriptor of the query and it is the one that should be used. Next, we formalize this notion of the most specific state or tight cover. Definition 10 (Tight cover). Let P be a profile and cs 1 a context state. We say that a context state cs 2 ∈ Context(P) is a tight cover for cs1 in P, if and only if: (i) cs2 covers cs1 and (ii) ¬∃ cs3 ∈ Context(P), cs3 cs2 , such that cs2 covers cs3 and cs3 covers cs1 . In general, there may be more than one tight covers for a query context state. For example, consider again the previous query context descriptor cod Q and assume now that P includes a fourth preference, p 4 = ((accompanying people ∈ { f riends} ∧ mood ∈ {All} ∧ time period ∈ {summer holidays}), Pred 4 , score4 ). The context states of both p 1 and p4 in P satisfy the first condition of Def. 10, but none of them covers the other. We can now provide a formal definition for context resolution, that is, of the process of selecting appropriate preferences from a profile based on context.

Definition 9 (Covering context state). A context state cs 1 = (c11 , c12 , . . . , c1n ) ∈ CW covers a context state cs2 = (c21 , c22 , . . . , L c2n ) ∈ CW, if and only if, ∀ k, 1 ≤ k ≤ n, c 1k = c2k or c1k = ancLij (c2k ) for some levels Li ≺ L j .

Definition 11 (Context Resolution Set). Given a profile P and a contextual query Q, a set RS of context states, RS ⊆ Context(P), is called a context resolution set if (a) for each context state csq ∈ Context(cod Q ), there exists at least one context state cs p in RS such that cs p is a tight cover of csq in P and (b) cs p belongs to RS only if there is a cs q ∈ Context(cod Q ) for which cs p is a tight cover in P.

In the example above, the context states of p 1 and p2 cover that of q, whereas those of p 3 do not. It can be shown that the cover relation imposes a partial order among context states. Theorem 1. The cover relation among context states is a partial order. Proof. We must show that the cover relation is (i) reflexive (i.e., cs covers cs), (ii) antisymmetric (if cs 1 covers cs2 and cs2 covers cs1 , then cs1 = cs2 ) and (iii) transitive (if cs 1 covers cs2 and cs2 covers cs3 , then cs1 covers cs3 ). (i) Reflexivity is straightforward. (ii) Assume for the purpose of contradiction that the antisymmetric property does not hold. In this case, there L is a certain parameter C k for which c1k = ancLij (c2k ) and c2k = ancLLij (c1k ). But this cannot happen due to the total order of levels in a hierarchy. (iii) Assume that cs1 covers cs2 (1) and cs2 covers cs3 (2). L From (1), ∀k, 1 ≤ k ≤ n, c 1k = c2k or c1k = ancLij (c2k ), Li ≺ L j (3). Respectively, from (2), ∀k, 1 ≤ k ≤ n, c 2k = c3k or c2k L = ancLij (c3k ), Li ≺ L j (4). Therefore, from (3), (4), we get

After identifying such a set of context states, we use the contextual preferences associated with the corresponding descriptors for personalizing the query. Note that for a specific query Q and profile P, there may be no context resolution set. In this case, query Q is executed as a non contextual query. As shown above, for a query context state, there may be more than one tight cover. For a set of context states to qualify as a context resolution set, it must include at least one of them. Thus, there may be more than one context resolution sets based on which of the tight covers they include for each query context state. There are many alternatives in selecting which context resolution sets to use to personalize a query. In the rest of this paper, we take the approach of selecting the smallest context resolution sets, that is, for each query context state, we choose exactly one of its tight covers. In the next section, we provide a systematic way to choose for a given query context state which of its tight covers to include

L

that, ∀k, 1 ≤ k ≤ n, c 1k = c3k or c1k = ancLij (c3k ), Li ≺ L j , that is, cs1 covers cs3 . 5

1 ≤ i ≤ n (3). From (1), (2) and (3), we get l 3i l2i l1i , ∀i, 1 ≤ i ≤ n (4). Since cs2 cs3 , for at least one j, 1 ≤ j ≤ n, it holds that l3j l2j (5). Thus, from (4), (5) and Def. 14, it holds that distH (cs1 , cs3 ) > distH (cs1 , cs2 ).

in a context resolution set by defining distances among context states. Note that although we use such distances to select just one tight cover, such distances can be used to select more than one tight cover per query context state, for example, to select the k (k > 1) most similar ones.

Property 1 states that between two context states that cover cs1 , if one of them is a tight cover, then it is the one with the smallest hierarchy state distance between them. The context state with the minimum hierarchy state distance is not necessarily unique. For instance, assume that we want to select the context state that is the most similar to cs 1 = ( f riends, good, summer holidays) between cs 2 = (All, warm, holidays) and cs 3 = ( f riends, All, All). For these context states, distH (cs1 , cs2 ) = distH (cs1 , cs3 ) = 3. To resolve such ties, again we choose those context states that are more specific but now in terms of the values of the detailed (lowest) level of the hierarchy that they include. The motivation is that context values that have few detailed values as descendants are more specific than those that have many such values. For two values of two context states corresponding to the same context parameter, we measure the fraction of the intersection of their corresponding lowest level value sets over the union of these two sets and consider as a better match, the “smallest” context state in terms of cardinality. Formally, this is expressed through the Jaccard distance.

3.3. Distances between Context States To select the most appropriate among a number of tight covers, we introduce a distance metric between context states. The motivation is to choose the most specific among the candidate context states, that is, the context states defined in the most detailed hierarchy levels. We define first the level of a context state as follows. Definition 12 (Levels of a context state). Let cs = (c 1 , c2 , . . . , cn ) be a context state. The hierarchy levels that correspond to this state are levels(cs) = [L j1 , L j2 , . . . , L jn ] such that ci ∈ domL ji (Ci ), i = 1, . . . , n. The distance between two levels is defined as their path distance in their hierarchy schema. Definition 13 (Level distance). Let C be a context parameter with m levels. The level distance, dist L (Li , L j ), between two levels Li and L j , 1 ≤ i, j ≤ m, is defined as: distL (Li , L j ) = | j − i|.

Definition 15 (Jaccard distance). Let C be a context parameter with m levels. The Jaccard distance, dist J (co , c p ), of two context values co and c p , co ∈ domLi (C) and c p ∈ domL j (C), 1 ≤ i, j ≤ m, is defined as:

We can now define a level-based distance between two context states. Definition 14 (Hierarchy state distance). Let cs 1 = (c11 , c12 , . . . , c1n ) and cs2 = (c21 , c22 , . . . , c2n ) be two context states with levels(cs1 ) = [l11 , l12 , . . . , l1n ] and levels(cs2 ) = [l21 , l22 , . . . , l2n ]. The hierarchy state distance, dist H (cs1 , cs2 ), is defined as: distH (cs1 , cs2 ) = ni=1 distL (l1i , l2i ).

L descL j (c p )| 1 1 Lj L |descLi (co ) descL (c p )| 1 1 L

dist J (co , c p ) = 1 -

|descLi (co )

.

It is easy to show that values at higher levels in the hierarchy have larger Jaccard distances than their descendants at lower levels, as the following lemma states:

For example, let cs = (Athens, cold, alone) be a query context state and cs2 = (Europe, cold, alone) and cs 3 = (Athens, bad, alone) be two context states in the profile. It holds: distH (cs1 , cs2 ) = 2 and dist H (cs1 , cs3 ) = 1. Both cs2 and cs3 cover cs1 , but cs2 and cs3 do not cover each other. If we assume that both cs2 and cs3 are tight covers, then, using the hierarchy state distance, we would choose the preference associated with cs3 . We show next, that the hierarchy state distance produces an ordering of context states that is compatible with the cover partial order in the sense expressed by the following property. 1

Lemma 1. Let C be a context parameter with m levels and co , c p , cq be three values of C, such that c o ∈ domL j (C), c p ∈ domLk (C) and co ∈ domLl (C), L j ≺ Lk ≺ Ll , 1 ≤ j, k, l ≤ m. If cq = ancLLlk (c p ) and c p = ancLLkj (co ), then dist J (co , cq ) ≥ dist J (co , c p ). Proof. By definition,

L descLk (c p )| 1 1 Lj L |descL (co ) descLk (c p )| 1 1 Lj

dist J (co , c p ) = 1 −

Property 1. Let cs1 = (c11 , c12 , . . . , c1n ) be a context state. For any two diﬀerent context states cs 2 = (c21 , c22 , . . . , c2n ) and cs3 = (c31 , c32 , . . . , c3n ), cs2 cs3 , such that cs2 covers cs1 and cs3 covers cs1 , if cs3 covers cs2 , then distH (cs1 , cs3 ) > distH (cs1 , cs2 ).

and dist J (co , cq ) = 1 −

|descL (co )

L descLl (cq )| 1 1 L L |descL j (co ) descLl (cq )| 1 1 Lj

|descL (co )

. L

In both fractions, the numerator reduces to desc L1j (co ) due to the transitivity property of the ancestor functions (i.e., all descendants of co at the detailed level are also descendants of c p and cq ). The denominator of the first fraction is desc LLk1 (c p ), whereas the denominator of the second fraction is desc LLl1 (cq ) ⊇ descLLk1 (c p ), again due to the transitivity property of the ancestor function. Therefore dist J (co , cq ) ≥ dist J (co , c p ).

Proof. Let levels(cs1 ) = [l11 , l12 , . . . , l1n ], levels(cs2 ) = [l21 , l22 , . . . , l2n ] and levels(cs3 ) = [l31 , l32 , . . . , l3n ]. From Def. 9, since cs2 covers cs1 and the fact that the level of any ancestor of ci is larger than the level of c i , it holds that l2i l1i , ∀i, 1 ≤ i ≤ n (1). Similarly, since cs 3 covers cs1 , it holds that l3i l1i , ∀i, 1 ≤ i ≤ n (2) and, since cs 3 covers cs2 , it holds that l3i l2i , ∀i, 6

p1 = ((accompanying people ∈ { f riends}), genre = horror, 0.8) p2 = ((accompanying people ∈ { f riends}), director = Hitchcock, 0.7) p3 = ((accompanying people ∈ {alone}), genre = drama, 0.9) p4 = ((accompanying people ∈ {alone}), (genre = drama ∧ director = S pielberg), 0.5)

The Jaccard distance between two context states is defined as follows. Definition 16 (Jaccard state distance). Let cs 1 = (c11 , c12 , . . . , c1n ) and cs2 = (c21 , c22 , . . . , c2n ) be two context states. The Jaccard state distance, dist J (cs1 , cs2 ), is defined as: dist JS (cs1 , cs2 ) = ni=1 dist J (c1i , c2i ).

Figure 3: Example profile.

For example, the Jaccard state distance of context states cs = ( f riends, good, summer holidays) and cs 2 = (All, warm, holidays) is equal to: dist J (cs1 , cs2 ) = 13/6. Now, returning to our previous example for cs 1 = ( f riends, good, summer holidays) and the two candidates states, cs 2 = (All, warm, holidays) and cs 3 = ( f riends, All, All), with the same hierarchy state distance, it holds that dist JS (cs1 , cs2 ) = 13/6 and dist JS (cs1 , cs3 ) = 72/55. Therefore, the state that is the most similar to cs1 is state cs3 . It is easy to prove a property similar to Property 1, that is: 1

Definition 17 (Predicate Subsumption). Given two predicates Pred1 and Pred2 , Pred1 subsumes Pred2 , if and only if, ∀ t ∈ r, Pred1 [t] ⇒ Pred2 [t]. In this case, we say that Pred 1 is more specific than Pred 2 . When a tuple t satisfies predicates that one subsumes the other, to compute a score for t, we consider only the preferences with the most specific predicates because these are considered as used by the users to specialize or refine the more general ones. In all other cases, we use the preference with the highest score, considering preferences to be indicators of positive interest.

Property 2. Let cs1 = (c11 , c12 , . . . , c1n ) be a context state. For any two diﬀerent context states cs 2 = (c21 , c22 , . . . , c2n ) and cs3 = (c31 , c32 , . . . , c3n ), cs2 cs3 , such that cs2 covers cs1 and cs3 covers cs1 , if cs3 covers cs2 , then dist JS (cs1 , cs3 ) ≥ dist JS (cs1 , cs2 ).

Definition 18 (Tuple Score). Let P be a profile, cs a context state and t ∈ r a tuple. Let P ⊆ P be the set of preferences pi = (codi , Predi , scorei ) such that cs ∈ Context(cod i ), Predi [t] holds and ¬ ∃ p j = (cod j , Pred j , score j ) ∈ P such that cs ∈ Context(cod j ), Pred j [t] holds and Pred j subsumes Predi . The score of t in cs is: score(t, cs) = max pi ∈P scorei .

Proof. Let level(cs1 ) = [l11 , l12 , . . . , l1n ], level(cs2 ) = [l21 , l22 , . . . , l2n ] and level(cs3 ) = [l31 , l32 , . . . , l3n ]. From the proof of Property 1, we have that l3i l2i l1i , ∀i, 1 ≤ i ≤ n. From Lemma 1, ∀ c1i , c2i , c3i , 1 ≤ i ≤ n, we get that dist J (c1i , c3i ) ≥ dist J (c1i , c2i ) (1), because that disance becomes larger as the context values belong to higher hierarchy levels. From (1) and Def. 16, we get that dist JS (cs1 , cs3 ) > dist JS (cs1 , cs2 ).

For example, assume the movie relation of Fig. 2 and a profile with the preferences depicted in Fig. 3. In the context state ( f riends, All, All), tuple t2 satisfies the predicates of both preferences p1 and p2 . Similarly, in context state (alone, All, All), tuple t3 satisfies the predicates of both preferences p 3 and p4 . In the first case, none of the two predicates subsumes the other and the score for t 2 is the maximum of the two scores, namely 0.8. Under context state (alone, All, All), the predicate of p 4 subsumes the predicate of p 3 and so, t3 has score 0.5. The motivation is that the user has assigned to drama movies in general, score 0.9 and to drama movies directed by S pielberg in particular, score 0.5. Tuple t 3 belongs to the second category and thus, it is assigned the corresponding score. Definition 18 specifies how to compute the score of a tuple under a specific context state. However, the result of context resolution for a query Q may include more than one context state.

There are may be still ties. In this case, we just select any of the qualifying states. The Hiearchy and the Jaccard state distances provide a means for choosing among tigh covers when no semantic information is available. One intuitive extension of such distances when such information exists is by providing weighted versions. For example, a weighted Hierarchy state distance can be defined as: distH (cs1 , cs2 ) = ni=1 wi distL (l1i , l2i ). where the weight w i associated with a context parameter C i is an indication of its importance. For example, if user location is the determining factor for choosing a point of interest one can assign weight 1 to this context parameter and 0 to the other two. 3.4. Preference application The result of context resolution is the set of context states cs j in the profile P that are the most similar to the query context. Next, the preferences (cod i , Predi , scorei ) ∈ P such that cs j ∈ Context(codi ) are selected and applied to produce a score for the results of the query. In general, more than one of the selected preferences may be applicable to a specific database tuple t in the result r. In other words, a tuple t may satisfy the predicate part of more than one of the selected references. We shall use the notation Pred[t] to denote that tuple t satisfies predicate Pred. Let us consider first the special case in which the selected predicates are related by subsumption.

Definition 19 (Aggregate Tuple Score). Let P be a profile, CS ⊆ Context(P) be a set of context states and t ∈ r a tuple. The score of t in CS is: score(t, CS ) = max cs∈CS score(t, cs). It is straightforward (by Definition 19) that: Property 3. Let cs be a context state and CS a set of context states. If cs ∈ CS , then for any t ∈ r, score(t, CS) ≥ score(t, cs). This means that the score of a tuple computed using a set of context states is no less than the score of the tuple computed using any of the context states belonging to this set. 7

Other ways of aggregating scores besides choosing the highest score are possible. Our main motivation is that we treat preferences as indicators of positive interest. By using the highest score, we may overrate a database tuple, resulting in some form of a false positive, but we never miss any highly preferred tuple in any of the matching context states, thus, high scores overwrite lower ones. Of course, one can argue for other interpretations; our context resolution procedure and the related algorithms are still applicable.

For example, consider the preference graph shown in Fig. 4(b) and the query context state cs = ( f amily, All, holidays). Search starts from the root node with context state cs0 = (All, All, All). Since this state covers the query context state, we visit its children. Both retrieved states, i.e., cs 6 = (All, All, holidays) and cs 5 = ( f amily, All, All), cover the query state. Node of cs6 leads to nodes with context states cs 3 = ( f riends, All, holidays) and cs 4 = ( f amily, All, holidays). cs 5 also leads to cs4 . cs3 does not cover cs, while cs 4 is an exact match of cs, and so, the score set W cs4 is returned.

4. Data Structures and Algorithms

Theorem 2 (Correctness). If the PG Resolution Algorithm is applied to all context states specified by the context descriptor of a query Q, a context resolution set is returned.

In this section, we present algorithms for the eﬃcient computation of context resolution sets. First, we consider the problem of finding the tight cover of a single context query state. One way to achieve this is by sequentially scanning all context states of all preferences in P. To improve response time and storage overheads, we consider indexing the preferences in P based on the context states in Context(P). To this end, we introduce two alternative data structures, namely the preference graph and the profile tree. We show how these structures can be used to find tight covers of a single state and then to compute context resolution sets. Finally, we present a more eﬃcient algorithm for computing context resolution sets that finds tight covers of more than one query state in a single pass.

Proof. The correctness of the algorithm is based on the following observation. A state cs v of a node v is a tight cover of cs, if and only if, cs v covers cs and (i) v is a leaf node or (ii) v is an internal node and none of its children covers cs. This holds because, in both cases, there is no other state that is covered by csv and covers cs, since if there were one, v should have an outgoing edge to the corresponding node. If there are more than one tight cover, we select the one with the minimum distance, i.e., the one that diﬀers the least from the query state based on the hierarchy state distance and, in case of ties, the Jaccard state distance. If again there are ties, we select one of them at random. Given a set of n context parameters C 1 , C2 , . . . Cn with h1 , h2 , . . ., hn hierarchy levels, respectively, the maximum path length between any two nodes in the graph, i.e., the maximum number of edges that connect them, is equal to h 1 + h2 + . . . + hn − n. The PG Resolution Algorithm follows a top-down strategy, traversing the graph starting from the nodes corresponding to the most general context states and moving towards the nodes corresponding to the most specific ones. We can also follow a bottom-up approach and traverse the graph starting from the nodes with the most specific context states up to the nodes with the most general ones. In this case, at each search path, search stops when the first node that covers the query state is met. In general, the bottom-up traversal visits more nodes than the topdown one, since the number of nodes at the lower levels of the graph tends to be much larger than that of the nodes higher up. Intuitively, the bottom-up traversal is expected to outperform the top-down one only for query states with specific context values, that is, for query states with context values that belong to the lower levels of their corresponding hierarchies. To take advantage of this simple observation, we use the following heuristic for selecting the appropriate type of traversal. We associate with each query state cs a level score ls equal to the average value of the hierarchy levels of the context values in cs. Similarly, we compute a level score lp for the preference graph as follows. Let n be the number of nodes in the graph and ls i the level score of the context state of node n i , then, lp = ( ni=1 lsi )/n. We expect that if a query state cs has level score greater (resp. smaller) than the level score of the graph, the matching state of cs will appear higher (resp. lower)

4.1. Preference Graph We introduce a graph representation of preferences that exploits the cover relation between context states. We call score set of a context state cs the set Wcs = {(Predi , scorei ) | (codi , Predi , scorei ) ∈ P and cs ∈ Context(cod i )}, that is, the set of predicates and the related interest scores of all preferences that include in their context descriptor the state cs. Definition 20 (Preference Graph). The preference graph PG P = (VP , E P ) of a profile P is a directed acyclic graph, where a node vi ∈ VP for each preference csi ∈ Context(P) of the form vi = (csi , Wcsi ), where Wcsi is the score set of csi and an edge (v i , v j ) ∈ E P , if csi is a tight cover of cs j . For example, for the movie database and the profile with context states shown in Fig. 4(a), the preference graph depicted in Fig. 4(b) is constructed. Note that the preference graph is acyclic, because the cover relation over context states is a partial order (Theorem 1). Also, the graph has a single root, if we assume a default preference with context state (All, All, . . . All). Given a context state cs ∈ Context(cod Q ) and a profile P, the PG Resolution Algorithm (Algorithm 1) finds the states in Context(P) that are tight covers of cs through a top-down traversal of the preference graph PG P of P starting from the nodes with no incoming edges. Search stops at a node when the node has no outgoing edges (i.e., it is a leaf) or its state does not cover the query state cs. Algorithm 1 returns results of the form (Wcsi , distance) that correspond to nodes v i whose context state csi is a tight cover of the query context state cs and distance is the distances between cs and cs i . 8

(a) Context States with Score Sets cs1 : ((friends, good, summer holidays), Wcs1 ) cs2 : ((family, good, summer holidays), Wcs2 ) cs3 : ((friends, All, holidays), Wcs3 ) cs4 : ((family, All, holidays), Wcs4 ) cs5 : ((family, All, All), Wcs5 ) cs6 : ((All, All, holidays), Wcs6 )

(b) Preference Graph

(c) Profile Tree friends

((All, All, All), Wcs0)

((All, All, holidays), Wcs6)

((friends, All, holidays), Wcs3)

((friends, good, summer holidays), Wcs1)

((family, All, All), Wcs5)

((family, All, holidays), Wcs4)

((family, good, summer holidays), Wcs2)

good

All

summer holidays holidays

Wcs1

Wcs3

family

good

All

All

summer holidays

Wcs2

accompanying people

All

holidays

Wcs4

mood

All

Wcs5

holidays All time period

Wcs6

Wcs0

Figure 4: (a) A set of context states and an instance of (b) a preference graph and (c) a profile tree.

Algorithm 1 PG Resolution Algorithm

tree for a profile P has n+1 levels. Each one of the first n levels corresponds to one of the context parameters and the last one is the level of the leaf nodes. Let context parameter C i be mapped to level i of the tree.

Input: A preference graph PG P of a profile P, the searching context state cs. Output: A ResultS et of the form (W, distance) characterizing a node whose context state tight covers the searching context state. Begin ResultS et = ∅; tmpVP = ∅; for all nodes vi ∈ VP do if vi has no incoming edges then tmpVP = tmpVP ∪ {vi }; end if end for while tmpVP not empty do for all vi ∈ tmpVP do if csi covers cs then if vi has no outgoing edges then ResultS et = ResultS et ∪ {(Wcsi , dist(csi , cs))}; else if ∀ v j s. t. (vi , v j ) ∈ EP , cs j does not cover cs then ResultS et = ResultS et ∪ {(Wcsi , dist(csi , cs))}; else for all vq s. t. (vi , vq ) ∈ EP and vq unmarked do tmpVP = tmpVP ∪ {vq }; mark vq ; end for end if end if end if tmpVP = tmpVP − {vi }; end for end while End

(i) At level C 1 , there is a single root node. (ii) Each non-leaf node at level k (1 ≤ k ≤ n) contains cells of the form [key, pointer], where key corresponds to a value of the parameter C k that appeared in some state cs in Context(P). No two cells within the same node contain the same key value. (iii) The pointer of each cell of the nodes at level k, (1 ≤ k < n), points to the node at the next lower level (level k + 1) containing all the distinct values of the next context parameter (parameter C k+1 ) that appeared in the same context state cs with key. The pointers of cells of the nodes at level n point to leaf nodes. (iv) Each leaf node contains the score set W cs of the state cs that corresponds to the path leading to it. For example, for the movie database, the profile tree for the context states of Fig. 4(a) is depicted in Fig. 4(c). The size of the tree depends on which context parameter is assigned to each level. Let m i , 1 ≤ i ≤ n, be the cardinality of the domain of parameter C i , then the maximum number of cells is m1 × (1 + m2 × (1 + . . . (1 + mn ))). The above number is as small as possible when m1 ≤ m2 ≤ . . . ≤ mn , thus, in general, it is better to place context parameters with domains with higher cardinalities lower in the profile tree. We describe next how the profile tree is used for context resolution. Given a contextual query Q with a context descriptor cod Q , for each query context state cs = (c 1 , c2 , . . . , cn ) ∈ Context(cod Q ), we search the profile tree for a state that matches it. If there is a state that exactly matches it, that is a state (c1 , c2 , . . . , cn ), then, the associated preference is returned. Note that this state is easily located by a single depth-firstsearch traversal of the profile tree. Starting from the root of the tree (level 1), at each level i, we follow the pointer associated with key = ci . If such a state does not exist, we search for a state cs that best covers cs. If more than one such state exists, we select the one with the smallest distance using the hierarchy distance and, in case of ties, the Jaccard distance. If again there are ties, we select one of them at random.

in the graph and so, the top-down (resp. bottom-up) approach is selected. 4.2. Profile Tree The profile tree explores the repetition of context values in context states by using a prefix-based approach for storing the context states in Context(P) of the profile P. Each context state cs in Context(P) corresponds to a single root-to-leaf path in the tree. Definition 21 (Profile Tree). Let CE X be a context environment with n context parameters {C 1 , C2 , . . . , Cn }. The profile 9

Algorithm 2 PT Resolution Algorithm

Assume further, that at level i of the tree, we have two candidate sub-paths with values sp 1 = (sp11 , . . . , sp1i ) and sp2 = (sp21 , . . . , sp2i ). If for the distances between the states cs 1 = (sp11 , . . . , sp1i , ci+1 , . . . , cn ) and cs2 = (sp21 , . . . , sp2i , All, . . . , All) and the query state cs holds that dist(cs 1 , cs) > dist(cs2 , cs), then we can safely prune sub-path sp 1 . The reason is that, even if, for the rest of the values of sp 1 , we could find in the tree values equal to that of the query (best case scenario), its distance from the query state would still be greater than that of sp 2 even if we could not find anything better than All for the remaining values of sp2 (worst case scenario). Furthermore, the textually described variant can give the tight covering descriptor because for each state we select the tight covering state. The profile tree is very eﬃcient in the case we are looking only for exact matches of the query state cs = (c 1 , c2 , . . . , cn ). In this case, context resolution requires just a simple root-toleaf traversal. At each level i, we search for the cell having as key the value c i and descend to the next level, following the corresponding pointer. Thus, we visit as many nodes as the height of the profile tree. For the general case of looking for tight covering states, assume that context parameter C i has hi hierarchy levels and val(C i ) distinct values in the profile. Then, the number of cells that are visited for each query state is val(C 1 ) + val(C 2 ) × h1 + val(C 3 ) × h2 × h1 + . . . + val(C n ) × hn−1 × . . . × h1 . Although the profile tree is eﬃcient, we can exploit further the hierarchical nature of context parameters to speed up context resolution, by adding cross edges, named hierarchical pointers, among context values that belong to a specific node. In particular, we link each value with its first ancestor that exists in the node, that is, the value that belongs to the first upper level of the corresponding hierarchy. For instance, for the profile tree of Fig. 4(c), we add cross edges from good to All at level mood and edges from holidays to All at level time period. The values within each node are sorted according to the hierarchy level to which they belong. Formally, in the resulting enhanced profile tree with n context parameters, each non-leaf node at level k maintains cells of the form [key, label, pointer, hpointer], where key and pointer are defined as in the profile tree and label denotes the level key belongs to. The hpointer field points to a value key that beL (key) longs to the same node with key such that key = ancLkey key

Input: A node RP of the profile tree, the query context state cs = (c1 , c2 , . . . , cn ), the current distance of each candidate path. Output: A ResultS et of the form (W, distance) characterizing a candidate path whose context state is either the same or covers cs. Begin if RP is a non leaf node then L for all x ∈ RP such that x = ci or x = ancLij (ci ) do PT Resolution(x → child, {ci+1 , . . . , cn }, dist(x, ci ) + distance); end for else if RP is a leaf node containing the score set Wcsr then ResultS et = ResultS et ∪ {(Wcsr , distance)}; end if End

Given a profile tree whose root node is R P , the PT Resolution Algorithm (Algorithm 2) descends the profile tree starting from the root node in a breadth first manner. It maintains all paths whose context state is either the same or covers the query context state. Each candidate path counts the distance from the query to context-state path. At first we invoke PT Resolution(R P , {c1 , c2 , . . . , cn }, 0). At the end of the execution of this call, we sort all the results on the basis of their distances and select the one with the minimum distance, i.e., the one that diﬀers the least from the searched path based on the two distances. For example, for the profile tree of Fig. 4(c) and the query context state cs = ( f amily, All, summer holidays), we start from the root node and follow the pointer of the cell containing the value f amily. At the next level, we follow the All pointer. We descend to the last level following the pointers of both holidays and All. Only the score set W cs4 is returned, since this is associated with the state with the smallest hierarchy distance to cs. Theorem 3 (Correctness). If the PT Resolution Algorithm is applied to all context states specified by the context descriptor of a query Q, a context resolution set is returned. Proof. For each query context state, the algorithm returns a state that is the most similar to that, that is, the one with the smallest distance. By Property 1, for the hierarchy distance, and Property 2, for the Jaccard distance, it holds that the state with the smallest distance is the one that tightly covers the query state. The set of context states that are returned, specify a context descriptor. This descriptor covers the descriptor of the query, because each state is expressed by another similar one.

L

and there is no key such that key = ancLkey (key ) and key = key L

(key). If there is no such key value, then hpointer points ancLkey key to null. The EnhancedPT Resolution Algorithm (Algorithm 3) uses the enhanced profile tree to retrieve the most similar preferences to a contextual query Q. Here, instead of searching for the more general values of each context value, we just follow the hierarchical pointers of the tree to locate them.

The last step of the process described above can be easily replaced by a simple runtime check that keeps the current closest leaf if its distance is smaller than the one currently tested. In particular, to reduce the number of candidate paths maintained, we could prune those paths under construction for which there is at least another sub-path that has smaller distance to the one searched, independently of the rest of the values of the path. Assume a query context state cs = (c 1 , . . . , ci , ci+1 , . . . , cn ).

4.3. Multi-State Context Resolution Let cod Q be the context descriptor of a query Q and Q C = Context(cod Q ) be the set of context states derived from it. In the previous section, we have used the preference graph and the profile tree to check for each individual cs in Q C . Here, we propose algorithms that tests for all context states in Q C . 10

Algorithm 3 EnhancedPT Resolution Algorithm

Algorithm 4 QG Resolution Algorithm

Input: A node RP of the enhanced profile tree, the query context state cs = (c1 , c2 , . . . , cn ), the current distance of each candidate path. Output: A ResultS et of the form (W, distance). Begin if RP is a non leaf node then L Find the first x ∈ RP such that x = ci or x = ancLij (ci ); EnhancedPT Resolution(x → child, {ci+1 , . . . , cn }, dist(x, ci ) +distance); y = x → hpointer; while y ! = NULL do EnhancedPT Resolution(y → child, {ci+1 , . . . , cn }, dist(y, ci ) + distance); y = y → hpointer; end while else if RP is a leaf node containing the score set Wcsr then ResultS et = ResultS et ∪ {(Wcsr , distance)}; end if End

Input: The preference graph PG P of a profile P, the query graph G Q of a query Q. Output: A ResultS et of the form (W, distance). Begin ResultS et = ∅; l = 1; while VQ not empty do for all vi ∈ VQ with no incoming edges in EQ do S l = S l ∪ {vi }; end for VQ = VQ − S l ; for all edges e = (vi , v j ) with vi ∈ S l do EQ = EQ − {e}; end for l++; end while m = l; for l = 1 to m do for all vi ∈ S l do locate the nodes z in PGP whose states tightly cover the state of vi ; ResultS et = ResultS et ∪ {(Wcsz , dist(csz , csvi ))}; for all edges e = (v j , z) with v j , z ∈ VP do EP = EP − {e}; end for end for end for End

When the preference graph is used, the context states in Q C are represented by a graph G Q , named query graph, similar to the preference graph, i.e., there is a node v in G Q for each context state csv ∈ QC and an edge from a node v to a node u, if and only if, the state of v is a tight cover of the state of u. The QG Resolution Algorithm (Algorithm 4) performs first a topological sort of the nodes in the query graph G Q as follows. It starts by placing all nodes of G Q that have no incoming edges in a set, say S 1 . Then, it proceeds by deleting the nodes in S 1 and their outgoing edges from G Q and placing the nodes that have no incoming edges in the query graph that results after these deletions in another set, say S 2 . This procedure is repeated, until there are no remaining nodes in the query graph. Assume that m sets S i , 1 ≤ i ≤ m, are thus produced. We process the query states using this order. Our approach is based on the following observation. Let a node z with state cs z be returned as an exact match of a node in S i , 1 ≤ i < m. None of the predecessors of z in the preference graph can be a tight cover of any node in S i+1 . However, context states of nodes that have incoming edges from such predecessors (i.e., belong to a diﬀerent sub-graph) may be tight covers of the states of nodes in S i+1 . To take advantage of this, we start with the set S 1 and search the preference graph to find nodes whose state tightly covers any context state in S 1 . Let z be any such node returned for S 1 . Then, we remove from the preference graph the incoming edges of z. For finding nodes that tightly cover the states in set S 2 , we begin the search process from all nodes with no incoming edges in the updated preference graph. This procedure is repeated for all sets up to S m . For example, given a query with two context states, cs v = ( f riends, good, summer holidays) and cs u = ( f riends, All, holidays), the query graph consists of two nodes, v and u, and an edge from u to v. Given also the preference graph of Fig. 4, the search process begins from cs u and returns the score set W cs3 of the node containing the state ( f riends, All, holidays). After removing the incoming edges of this node, we search for cs v in

the updated graph and return the score set W cs1 . Theorem 4 (Correctness). Given a preference graph and a query graph representing a contextual query Q, the QG Resolution Algorithm identifies a context resolution set. Proof. The correctness of the algorithm is based on the following observation. Assume two nodes in the query graph with query context states cs v and csu such that csv tightly covers csu . Assume also a context state csz of a node z in the preference graph. If cs z tightly covers csv , we can safely prune from the preference graph the incoming edges of z before searching for the tight covers of cs u . That is, searching for the tight covers of csu in the updated preference graph does not miss any tight cover of csu , since the context states of the predecessors of z do not tightly cover cs u . To locate a context resolution set for Q, the algorithm first searches in the preference graph for the set of query context states that are not covered by any other state in the query graph, then for the states that are not covered by any state in the query graph after removing the first ones and so forth. By following this order for searching the query context states and removing the appropriate edges from the preference graph after all query context states of a set are examined, the algorithm identifies the correct context states. We follow a similar approach for the profile tree. In particular, the query context states are represented by a data structure similar to the profile tree, that we call query tree, so that, there 11

is exactly one path in the tree for each cs ∈ Q C . Again, there is one level in the query tree for each context parameter. The QT Resolution Algorithm (Algorithm 5) uses both the profile tree and the query tree. In a breadth first manner, the algorithm searches for pairs of nodes that belong to the same level, i.e., for values that belong to the same context parameter. Each pair consists of a node of the query tree and a node of the profile tree. Initially, there is one pair of nodes, (R Q , RP , 0) (level i = 1). For each value of the query node R Q that is equal to a value of the profile node R P or belongs to a lower hierarchy level, we create a new pair of nodes (R Q → child, RP → child, distance). These nodes refer to the next level (i + 1). After having checked all values of all pairs at a specific level, we examine the pairs of nodes created for the immediately next level and so on. At level n + 1, we retrieve from the profile tree, for each query state, the score set of the path with the most similar state to the query one. Observe that the QT Resolution Algorithm tests for all query states within a single pass of the profile tree. For example, for the above query context states, cs v and csu , the root node of the query tree contains the value f riends and points to a node with two cells that contain the values good and All. In turn, good and All point to nodes with values summer holidays and holidays, respectively. Beginning from the pair of root nodes of this query tree and the profile tree of Fig. 4, containing the value f riends, we create a new pair of nodes; both nodes contain the values good and All. At the next step, we create two new pairs of nodes; the first pair has nodes containing the value summer holidays, while the second one has nodes containing the value holidays. Following the relative pointers of the profile tree, we retrieve the score sets W cs1 and Wcs3 .

Algorithm 5 QT Resolution Algorithm Input: The profile tree with root node RP and n context parameters, the query tree with root node RQ , the current distance of each candidate path. Output: A ResultS et of the form (W, distance) characterizing paths whose context states are the same or similar to the states of the query tree. S N, S N sets of pairs of nodes, each pair related with a distance value. Initially: S N = {(RQ , RP , 0)}, S N = ∅ Begin for level i = 1 to n do for each pair sn ∈ S N with sn = (q node, p node, distance) do ∀x ∈ q node ∀y ∈ p node if x = y or (y = ancLLyx (x)) then if i < n then S N = S N ∪ {(x → child, y → child, distance + dist(x, y))}; else if i = n then W = y → child; ResultS et = ResultS et ∪ {(W, distance)}; end if end if end for S N = S N; S N = ∅; end for End

We used two databases of diﬀerent sizes: (a) a points of interest database and (b) a movie database. The points of interest database consists of nearly 1 000 real points of interest of the two largest cities in Greece, namely Athens and Thessaloniki. The context parameters are accompanying people, time and location. For the movie database, our data comes from the Stanford Movie Database [16] with information about 12 000 films. The context parameters are accompanying people, mood and time period. The sizes of the datasets have two important implications for usability. First, they aﬀect the size of the profile, i.e., the number of preferences. Then, and most importantly, they require diﬀerent methods for evaluating the quality of results. While for the small dataset, we can ask users to manually provide the best results, for the large dataset, we need to use other metrics [17]. We conducted an empirical evaluation of our approach with 20 users; 10 diﬀerent users were used for each of the two datasets. For all users, it was the first time that they used the system. For each database, to ease the specification of contextual and non-contextual preferences, we created a number of default profiles based on three human characteristics: (a) age (below 30, between 30-50, above 50), (b) sex (male or female) and (c) taste (broadly categorized as mainstream or out-of-thebeaten track). We created both contextual and non-contextual profiles. Based on the above characteristics, one of the 12 available non-contextual profiles and one of the 12 available contextual profiles were assigned to each user. Users were allowed to modify the default profiles assigned to them by adding, deleting

Theorem 5 (Correctness). Given a profile tree and a query tree representing a contextual query Q, the QT Resolution Algorithm identifies a context resolution set. Proof. The QT Resolution Algorithm identifies, in a single pass of the profile tree, the context states that are the most similar to the query context states represented by the query tree. The search process is done in rounds, where each round corresponds to a context parameter. At each round, the algorithm locates all context values appeared in the profile tree that are the same to or more general than the query context values for a context parameter and, at the same time, appeared in context states that contain also values that are the same to or more general than the query context values for the already examined context parameters. From Properties 1 and 2, the identified states are the ones that tightly cover the query context states. 5. Usability Evaluation The goal of our usability study is to justify the use of contextual preferences. In particular, the objective is to show that for a reasonable eﬀort of specifying contextual preferences, users get more satisfying results than when there are no preferences or when preferences are non-contextual. 12

or updating preferences. We evaluated our approach along two lines: ease of profile specification and quality of results.

system. In such cases, traceability helps a lot, since users can track back which preferences were used to attain the results and either modify the preferences or reconsider their ranking. Note that users that customized their profile by making more modifications (e.g., User 6) got more satisfactory results than those that spent less time during profile specification (e.g., User 3). Furthermore, for non exact match queries, using the most similar state (top-1) to answer a query provides only slightly better results than using the top-3 most similar states. For the movie database, due to the large number of tuples, it was not possible for the users to manually rank all of them. Instead, users were asked to evaluate the quality of the 20 higher ranked movies in the result. For characterizing the quality of the results, users marked each of the 20 movies with 1 or 0 indicating whether they considered that the movie should belong to the best 20 ones or not, respectively. The number of 1s corresponds to the precision of the top-20 movies, or precision(20), i.e., how many of the 20 movies are relevant. Furthermore, users were asked to give a specific numerical interest score between 1 and 10 to each of the 20 movies. This score is in the range [1, 5], if the previous indicator is 0 and in the range [6, 10], if the indicator is 1. We report the number of movies that were rated high (interest score ≥ 7). Finally, users were asked to provide an overall score in the range [1, 10] to indicate their degree of satisfaction of the overall result set. Table 4 depicts the detailed per user scores attained for the movie database, while Table 5 summarizes these quality measures. Our results again, show that using contextual preferences improve quality considerably. Exact match queries provide the best results, while non exact match ones reports only slightly worse results. That is, if there are no preferences with context equal to the query, other preferences with similar context can be used. When compared to the points of interest database, precision is lower. One reason for that is the following. In the movie example, the users were not aware of the whole dataset; they were just presented with the 20 movies in the result. Thus, they “left room” in their choices for better results that could be lying in the dataset that was not presented to them.

5.1. Profile Specification With regards to profile specification, we count the number of modifications (insertions, deletions, updates) of preferences of the default profile that was originally assigned to the users, for both the non-contextual and contextual profiles. In addition, we report how long (in minutes) it takes users to specify/modify their profile (again, for both cases). Since for all users this was their first experience with the system, the reported time includes the time it took the user to get accustomed with the system. The results are reported in Table 1 for points of interest and Table 2 for movies. For the points of interest database, each of the default non-contextual profile has about 100 preferences, while each default contextual one nearly 650 preferences, while for the movie database, the sizes are 120 and 1 100 respectively. The general impression is that predefined profiles save time in specifying user preferences. Furthermore, having default profiles makes it easier for someone to understand the main idea behind the system, since the preferences in the profile act as examples. With regards to time, there was deviation among the time users spent on specifying profiles: some users were more meticulous than others, spending more time in adjusting the profiles assigned to them. As expected, the specification of contextual profiles is more time-consuming than the specification of non-contextual ones, since such profiles have a larger number of preferences. The size of the database also aﬀects the complexity of profile specification basically by increasing the number of preferences and thus, the required modifications. 5.2. Quality of Results We compare the results of contextual queries, i.e., queries enhanced with a context descriptor, when executing them: (i) without using any of the preferences, (ii) using the noncontextual preferences and (iii) using the contextual preferences. In the case of contextual preferences, we also consider using (iii-a) only exact matching context states, (iii-b) only one, the most similar, context state (top-1) and (iii-c) the three most similar context states (top-3). For similarity, we use the hierarchy distance function and, to resolve ties, the Jaccard distance. Since the points of interest database has a small number of tuples, we asked the users to rank the results of each contextual query manually. Then, we compare the ranking specified by the users with what was recommended by the system. We run this experiment for all cases (i), (ii), (iii-a), (iii-b), (iii-c) mentioned above. For each case, we consider the best 20 results, i.e., the 20 points of interest that were ranked higher. When there are ties in the ranking, we consider all results with the same score. We report the percentage of the results returned that belong to the results given by the user. As shown in Table 3, this percentage is generally high. Sometimes users do not conform even to their own preferences as shown by the results for exact match queries. In this case, although the context state of the preferences used was an exact match of the context state in the query, still some users ranked their results diﬀerently than the

6. Performance Evaluation of Preference Selection In the section, we evaluate the performance of preference selection using the proposed data structures, namely the preference graph, the profile tree and their enhancements. We use both real and synthetic profiles. As real profiles, we use the ones specified by the users in our usability study for the movie and the points of interest databases. For the synthetic profiles, we consider three context parameters having domains with different cardinalities, namely, a small domain with 10 values, a medium domain with 100 values and a large domain with 1000 values. Context parameters have hierarchies with four levels, where 75% of the context values are considered to be values at the most detailed level of the hierarchies and the rest 25% are assigned to the other three levels. The preference graph and the profile tree take advantage of co-occurrences of context states and context values respectively, thus, their size depends on the distribution of context values in the context states appearing in 13

Table 1: Points-of-Interest Example: Profile Specification

Non Contextual Profile Num of updates Update time (mins) Contextual Profile Num of updates Update time (mins)

User 1

User 2

User 3

User 4

User 5

User 6

User 7

User 8

User 9

User 10

15 13

14 8

8 5

11 6

15 6

16 9

19 16

12 7

10 9

10 8

22 30

31 45

12 20

28 30

24 30

32 40

38 45

13 15

18 20

25 25

Table 2: Movie Example: Profile Specification

Non Contextual Profile Num of updates Update time (mins) Contextual Profile Num of updates Update time (mins)

User 1

User 2

User 3

User 4

User 5

User 6

User 7

User 8

User 9

User 10

37 18

14 7

29 14

10 6

22 9

31 19

29 16

15 6

17 8

12 8

67 32

49 28

52 28

91 55

37 17

72 44

69 36

41 22

46 27

37 46

Table 3: Points-of-Interest: Quality of results

No Preferences Non-Contextual Preferences Contextual Preferences Exact Match Non Exact Match Top-1 state Top-3 states

User 1

User 2

User 3

User 4

User 5

User 6

User 7

User 8

User 9

User 10

10% 10%

0% 10%

0% 0%

0% 5%

0% 5%

10% 10%

5% 15%

0% 5%

0% 5%

5% 5%

100%

90%

90%

95%

90%

100%

100%

85%

100%

100%

100% 95%

95% 90%

90% 85%

85% 95%

90% 95%

100% 90%

100% 100%

85% 75%

90% 85%

100% 95%

Table 4: Movie Example: Quality of Results per User

Num of 1 Num of ≥ 7 Overall Score Num of 1 Num of ≥ 7 Overall Score Num of 1 Num of ≥ 7 Overall Score

User 1 5 2 3 4 0 2 15 14 8

User 2 4 0 3 6 1 3 17 14 9

User 3 7 2 3 8 5 4 15 15 8

User 4 7 3 2 12 4 6 13 12 8

User 5 4 3 4 12 6 6 17 13 7

User 6 6 2 3 12 7 6 14 14 8

User 7 4 2 3 7 4 4 17 15 8

User 8 7 4 3 6 3 3 17 15 8

User 9 3 3 2 8 5 4 18 17 9

User 10 6 1 1 5 3 2 17 14 9

Top-1 state

Num of 1 Num of ≥ 7 Overall Score

15 10 6

17 14 9

13 12 7

11 9 7

13 8 6

13 12 8

14 13 8

17 12 8

17 15 8

14 11 7

Top-3 states

Num of 1 Num of ≥ 7 Overall Score

13 9 6

15 13 8

12 11 7

11 10 7

13 9 6

12 8 6

14 11 7

17 11 8

17 14 8

14 10 7

No Preferences

Non-Contextual Preferences

Contextual: Exact Match

Contextual: Non Exact Match

14

Table 5: Movie Example: Overall Quality of Results

No Preferences Non-Contextual Preferences Contextual Preferences Exact Match Non Exact Match Top-1 state Top-3 states

Precision(20) 26.5% 40%

Score ≥ 7 11% 19%

Average Overall Score 2.7 4

80%

71.5%

8.2

72% 69%

58% 53%

7.4 7

Movies Points of Interest

Table 6: Input Parameters for Synthetic Profiles

Number of contextual preferences Number of context parameters Data distributions Cardinality of context domains Hierarchy levels Percentage of values at the detailed level Percentage of values at the other levels

Value

2500

500 - 10 000 3 Uniform, zipf 10, 100, 1 000 4 75% 25%

Number of Cells

Parameter

3000

2000

1500

1000

500

0 no index

graph

(A,M,T),(A,T,L)

(A,T,M),(A,L,T)

(M,A,T),(T,A,L)

(M,T,A),(T,L,A)

(T,A,M),(L,A,T)

(T,M,A),(L,T,A)

Figure 5: Size using the real profiles.

the profile. To create context states, we consider both a uniform distribution of context values and a zipf data distribution with a = 1. Table 6 summarizes the parameters used for creating the synthetic profiles. We report results regarding (a) the storage eﬃciency (b) the complexity of preference selection using the proposed data structures. The results are averaged over 50 executions.

database and mappings (A, T, L) and (T, A, L) for the points of interest database. All trees occupy less space than storing preferences sequentially. The smallest profile tree representation occupies about 86% and 60% less space than a sequential representation for the movie and the points of interest database, respectively.

6.1. Storage

Synthetic data. We created profile trees for all six diﬀerent mappings of parameters to tree levels. Let S stand for the small, M for the medium and L for the large domain. As before, we use (S, M, L), (S, L, M), (M, S, L), (M, L, S ), (L, S, M) and (L, M, S ) to denote these diﬀerent mappings. The mapping of parameters with large domains lower in the tree results in smaller trees (Fig. 6a, 6b). For the zipf distribution (Fig. 6b), the total number of cells is smaller than that for the uniform distribution (Fig. 6a), because “hot” values appear more frequently in preferences, i.e., more preferences have overlapping context states. For the preference graph, when context values are selected uniformly (Fig. 6a), the size of the graph is almost equal to the size of sequential storage, since most of the produced context states are distinct. For the zipf data distribution, the graph occupies 36% less space than storing context states sequentially. Finally, the profile tree requires nearly 25% and 37% less space than the preference graph for the uniform and the zipf data distribution, respectively. As a final note, observe that the eﬃcacy of the mapping of parameters to levels depends not on the size of their domain per se, but, on the distribution of their values in the actual profile. That is, even if a parameter has a large domain, if only a small number of its values appear in a given profile, this parameter should be mapped to the upper levels of the tree. To demonstrate this, we created such cases as follows. We use a profile with 5000 context states and context parameters with domains

In this set of experiments, we evaluate the space (in number of cells) required to store context states using the profile tree and the preference graph as opposed to storing them sequentially (no index). For the profile tree, this depends on the mapping of context parameters to the levels of the tree. Thus, we created profile trees for all possible mappings of context parameters to levels of the trees. Real data. For the movie database, let A stand for accompanying people, M for mood and T for time period. We denote (A, M, T ) the mapping in which A is assigned to the first level of the tree, M to the second and T to the third one. Similarly, we use (A, T, M), (M, A, T ), (M, T, A), (T, A, M) and (T, M, A) for the rest mappings of parameters to tree levels. Accordingly, for the points of interest database, let A stand for accompanying people, T for time and L for location. Then, (A, T, L), (A, L, T), (T, A, L), (T, L, A), (L, A, T) and (L, T, A) denote the diﬀerent mappings of parameters to tree levels. As shown in Fig. 5, the preference graph requires 82% and 34% less space than storing preferences sequentially for the movie and the points of interest database respectively, since in the preference graph each distinct context state is stored only once. For the profile tree, the mappings that result in trees with smallest sizes are, as expected, are the ones that map the context parameters with large domains to levels lower in the tree, namely, mappings (A, M, T) and (M, A, T) for the movie 15

20

15

10

25

20

10

5

0

0 2

4

6

Number of context states (in thousands)

8

10

(S,M,L) (S,L,M) (L,S,M)

12

15

5

0

(S,M,L) (S,L,M) (M,S,L) (M,L,S) (L,S,M) (L,M,S) graph no index

Number of cells (in thousands)

25

14

30

(S,M,L) (S,L,M) (M,S,L) (M,L,S) (L,S,M) (L,M,S) graph no index

Number of cells (in thousands)

Number of cells (in thousands)

30

10 8 6 4 2 0

0

2

4

6

8

10

Number of context states (in thousands)

(a)

(b)

0

0.5

1

1.5

2

2.5

3

3.5

Parameter a

(c)

Figure 6: Size using synthetic profiles with (a) uniform, (b) zipf with a=1 and (c) combined data distribution.

with 50 (S ), 100 (M) and 200 (L) values, where the values of the parameters with domains with 50 and 100 values are selected using a uniform data distribution and the values of the parameter with 200 values using a zipf data distribution with values for a, varying from 0 (corresponding to the uniform distribution) to 3.5 (corresponding to a very high skew). As above, we use (S, M, L), (S, L, M) and (L, S, M) to denote diﬀerent mappings of parameters to tree levels. As shown in Fig. 6c, to decide on an appropriate mapping, an analysis of the actual distribution of values in the profile should be made, so that the mapping is based on the actual distribution of values.

more nodes exist lower in the graph and so, more cells visits are required. The heuristic approach reduces the number of cells accesses at about 20% compared to the top-down approach. Next, we compare the performance of searching for matching states for each state of the query one at a time versus matching all query states in a single pass using the QT Resolution and the QG Resolution algorithms. Fig. 9 depicts our results for exact (Fig. 9a) and non exact (Fig. 9b) match queries using the synthetic profile and query descriptors with 20 and 50 states. When the profile tree (p) is employed, for exact matches, searching for all states in one pass results in savings at around 55% on average, while for non exact matches, we save on average at around 25%. The preference graph (g) results in similar numbers of cells accesses for exact and non exact matches. Comparing the two data structures, for non exact matches, the profile tree needs 40% less cell accesses than the preference graph, while for exact matches, the results are even better. The results for the real profiles are shown in Fig. 10. We run this experiment for exact match queries (Fig. 10a) and for non exact ones (Fig. 10b), with query descriptors consisting of 20 context states. Using Algorithms 5 and 4, we save on average 35%.

6.2. Preference Selection In this set of experiments, we study the complexity of preference selection (in term of cell accesses). Since this complexity depends on whether there is an exact match or not, we study the two cases (i.e., exact and non exact match) separately. In the case of sequential scan, for exact matches, the profile is scanned until the matching state is found, while for non exact matches, the whole profile is scanned. With the profile tree, exact match queries are resolved by a simple root-to-leaf traversal, while non exact matches need to consider multiple candidate paths. For the profile tree and the synthetic profiles, we used the (S, M, L) mapping, while for the real profiles, we used the (A, M, T ) and (A, T, L) mappings for the movie and points of interest databases, respectively. We consider first matching a single query context state. The results for the real profiles are depicted in Fig. 7a, 7b, where with s we denote sequential scan, with pr the profile tree, with enh the enhanced profile tree and with g the preference graph, while the results of synthetic profiles are shown in Fig. 8a, 8b. For non exact match queries, the enhanced tree reduces the number of cell accesses at about 35% with regards to the profile tree. When the preference graph is used, the number of cells accesses is nearly the same for exact and non exact matches. Although the preference graph requires more cells accesses than the profile tree, in turn, the preference graph requires nearly 70% less accesses than the sequential scans. In Fig. 8c, we count the cell accesses for exact matches when the preference graph is used and the top-down, bottom-up and the heuristic approach is followed. The bottom-up approach needs more cell accesses than the other two approaches, since

7. Related Work In this paper, we use context to confine database querying by selecting as results the best matching tuples based on user preferences that depend on context. We review first research on preferences, then on context and finally, on contextual preferences. The research literature on preferences is extensive. In particular, in the context of database queries, there are two diﬀerent approaches for expressing preferences: a quantitative and a qualitative one. In the quantitative approach (e.g., [3, 4, 18]), preferences are expressed indirectly by using scoring functions that associate a numeric score with every tuple of the query answer. In this work, we have adapted the general quantitative framework of [3], since it is easier for users to employ than the qualitative one. In the quantitative framework of [5, 14], user preferences are stored as degrees of interest in atomic query elements (such as individual selection or join conditions) instead of 16

1200

1800

Points of Interest Movies

Points of Interest Movies

1600

1000

Number of Cells

Number of Cells

1400 800

600

400

1200 1000 800 600 400

200 200 0

0 s

pr

enh

g

s

pr

enh

g

s

pr

enh

(a)

g

s

pr

enh

g

(b)

Figure 7: Cell accesses to find related preferences to queries using sequential scan, the profile tree, the enhanced profile tree and the preference graph for the real profiles in (a) exact match and (b) non exact match.

7000

7000 sequential profile tree enhanced tree preference graph

6000

8000

3000

4000 3000

2000

2000

1000

1000

0

Number of cells

4000

6000

4000

2000

0 0

1000

2000

3000 4000 5000 Number of states in profile

6000

7000

8000

top-down bottom-up combined

10000

5000 Number of cells

5000 Number of cells

sequential profile tree enhanced tree preference graph

6000

0 0

1000

2000

3000 4000 5000 6000 Number of states in Profile

(a)

7000

8000

0

1000

2000

(b)

3000 4000 5000 6000 Number of states in profile

7000

8000

(c)

Figure 8: Cell accesses to find related preferences to queries using sequential scan, the profile tree, the enhanced profile tree and the preference graph for the synthetic profiles in (a) exact match and (b) non exact match and (c) for the top-down, bottom-up and the heuristic approach when the preference graph is used.

30000

30000 p, 20 states in parallel p, 20 single states p, 50 states in parallel p, 50 single states g, 20 states in parallel g, 20 single states g, 50 states in parallel g, 50 single states

20000

p, 20 states in parallel p, 20 single states p, 50 states in parallel p, 50 single states g, 20 states in parallel g, 20 single states g, 50 states in parallel g, 50 single states

25000

Number of cells

Number of cells

25000

15000

10000

5000

20000

15000

10000

5000

0

0 0

1000

2000

3000

4000

5000

6000

7000

8000

0

1000

2000

Number of states in profile

3000

4000

5000

6000

7000

8000

Number of states in profile

(a)

(b)

Figure 9: Cell accesses to find preferences related to queries using the query tree and the query graph for synthetic profiles for (a) exact match and (b) non exact match.

6000

Points of Interest Movies

Points of Interest Movies

5000 5000

Number of Cells

Number of Cells

4000

3000

2000

4000

3000

2000

1000

1000

0

0 20 20 parallel

20 20 parallel

20 20 parallel

20 20 parallel

20 20 parallel

(a)

20 20 parallel

20 20 parallel

20 20 parallel

(b)

Figure 10: Cell accesses to find preferences related to queries using the query tree and the query graph for real profiles for (a) exact match and (b) non exact match.

17

interests in specific attribute values. Our approach can be generalized for this framework as well, by making the degree of interest for each atomic query element depend on context. In the qualitative approach (for example, [1, 2, 19]), the preferences between tuples in the answer to a query are specified directly, typically using binary preference relations. This framework can also be readily extended to include context. There has been much work on developing a variety of context infrastructures and context-aware middleware and applications (such as the Context Toolkit [20] and the Dartmouth Solar System [21]). However, although there is much research on location-aware query processing in the area of spatio-temporal databases, integrating other forms of context in query processing is less explored. In the context-aware query processing framework of [22], there is no notion of preferences, instead context parameters are treated as normal attributes of relations. Recently, context has been used in information filtering to define context-aware filters which are filters that have attributes whose values change frequently [23]. Storing context data using data cubes, called context cubes, is proposed in [24] for developing context-aware applications that use archive sensor data. In this work, data cubes are used to store historical context data and to extract interesting knowledge from large collections of context data. The Context Relational Model (CR) introduced in [25], is an extended relational model that allows attributes to exist under some contexts or to have diﬀerent values under diﬀerent contexts. CR treats context as a first-class citizen at the level of data models, whereas in our approach, we use the traditional relational model to capture context as well as context-dependent preferences. Context as a set of dimensions (e.g., context parameters) is also considered in [26] where the problem of representing context-dependent semistructured data is studied, while in [27], an overview of a Multidimensional Query Language is given, that may be used to express context-driven queries. A context model is also deployed in [28] for enhancing web service discovery with contextual parameters. In [29], the current contextual state of a system is represented as a multidimensional subspace within or near other situation subspaces. Extending the typical recommendation systems beyond the two dimensions of users and items to include further contextual information is studied in [30]. Contextual information is modeled using a number of parameters with hierarchical structure [31, 30]. Using context is shown to improve the prediction of customer behavior. There has been some recent work on contextual preferences. In [32], authors consider ranking database results based on contextual qualitative preferences. Context parameters are part of the database schema, while in our approach, context parameters are considered to be outside the database. Furthermore, our context parameters have a hierarchical nature that we explore in context resolution. A knowledge-based context-aware query preference model is proposed in [33], where context parameters are treated as normal attributes of relations. Contextual preferences, called situated preferences, are also discussed in [34]. In this approach, a context state is represented as a situation. Situations are uniquely linked through an N:M relationship with

preferences expressed using the qualitative approach. Again, the context model is not hierarchical. Finally, note that a preliminary abridge version of this paper appears in [35]. The preference graph, computing scores, multi-state resolution, various other enhancements and most of the experiments are new here. In other previous work [36], we have addressed the same problem of expressing contextual preferences. However, the model used there for defining preferences includes only a single context parameter. Interest scores of preferences involving more than one context parameter are computed by a simple weighted sum of the preferences expressed by single context parameters. Here, we extend context descriptors, so that contextual preferences involve more than one context parameter and also, associate context with queries. Context resolution is also new here. In [37], we focus on the eﬃcient execution of contextual queries. In particular, we are interested in creating groups of similar preferences for which we pre-compute rankings of database tuples.

8. Summary

The focus of this paper is on annotating database preferences with contextual information. Context is modeled using a set of multidimensional context parameters. Context parameters take values from hierarchical domains, thus, allowing diﬀerent levels of abstraction for the captured context data. A context state corresponds to an assignment of values to each of the context parameters from its corresponding domain. Database preferences are augmented with context descriptors that specify the context states under which a preference holds. Similarly, each query is related with a set of context states. The problem is to identify those preferences whose context states as specified by their context descriptors are the most similar to that of a given query. We call this problem context resolution. To realize context resolution, we propose two data structures, namely the preference graph and the profile tree, that allow for a compact representation of the context-dependent preferences. To evaluate the usefulness of our model, we have performed two usability studies. Our studies showed that annotating preferences with context improves the quality of the retrieved results considerably. The burden of having to specify contextual preferences is reasonable and can be reduced by providing users with default preferences that they can edit. We have also performed a set of experiments to evaluate the performance of context resolution using both real and synthetic datasets. The proposed data structures were shown to improve both the storage and the processing overheads. In general, the profile tree is more space-eﬃcient than the preference graph. It also clearly outperforms the preference graph in the case of exact matches. The main advantage of the preference graph is the possibility for an incremental refinement of a context state. In particular, at each step of the resolution algorithm, we get a state that is closer to the query one. This is not possible with the profile tree. 18

References [1] J. Chomicki, Preference formulas in relational queries, ACM Trans. Database Syst. 28 (4) (2003) 427–466, ISSN 0362-5915. [2] W. Kießling, Foundations of Preferences in Database Systems, in: VLDB, 311–322, 2002. [3] R. Agrawal, E. L. Wimmers, A framework for expressing and combining preferences, SIGMOD Rec. 29 (2) (2000) 297–306, ISSN 0163-5808. [4] V. Hristidis, N. Koudas, Y. Papakonstantinou, PREFER: A System for the Eﬃcient Execution of Multi-parametric Ranked Queries, in: SIGMOD, 259–270, 2001. [5] G. Koutrika, Y. Ioannidis, Personalized Queries under a Generalized Preference Model, in: ICDE, 841–852, 2005. [6] A. K. Dey, Understanding and Using Context, Personal Ubiquitous Comput. 5 (1) (2001) 4–7, ISSN 1617-4909. [7] M. Bazire, P. Br´ezillon, Understanding Context Before Using It, in: CONTEXT, 29–40, 2005. [8] G. Chen, D. Kotz, A Survey of Context-Aware Mobile Computing Research, Tech. Rep. TR2000-381, Dartmouth College, Computer Science, URL ftp://ftp.cs.dartmouth.edu/TR/TR2000-381.ps.Z, 2000. [9] C. Bolchini, C. Curino, E. Quintarelli, F. A. Schreiber, L. Tanca, A dataoriented survey of context models, SIGMOD Rec. 36 (4) (2007) 19–26. [10] B. Mobasher, R. Cooley, J. Srivastava, Automatic personalization based on Web usage mining, Commun. ACM 43 (8) (2000) 142–151. [11] M. Ester, J. Kohlhammer, H.-P. Kriegel, The DC-Tree: A Fully Dynamic Index Structure for Data Warehouses, in: ICDE, 379–388, 2000. [12] P. Vassiliadis, S. Skiadopoulos, Modelling and Optimisation Issues for Multidimensional Databases, in: CAiSE, 482–497, 2000. [13] G. A. Miller, WordNet: a lexical database for English, Commun. ACM 38 (11) (1995) 39–41, ISSN 0001-0782. [14] G. Koutrika, Y. E. Ioannidis, Constrained Optimalities in Query Personalization, in: SIGMOD, 73–84, 2005. [15] W. Kießling, G. K¨ostler, Preference SQL - Design, Implementation, Experiences, in: VLDB, 990–1001, 2002. [16] Stanford Movie Database, URL http://kdd.ics.uci.edu/databases/ movies/movies.html. [17] C. Buckley, E. M. Voorhees, Retrieval evaluation with incomplete information, in: SIGIR, 25–32, 2004. [18] C. Li, K. C.-C. Chang, I. F. Ilyas, S. Song, RankSQL: Query Algebra and Optimization for Relational Top-k Queries, in: SIGMOD, 131–142, 2005. [19] P. Georgiadis, I. Kapantaidakis, V. Christophides, E. M. Nguer, N. Spyratos, Eﬃcient Rewriting Algorithms for Preference Queries, in: ICDE, 1101–1110, 2008. [20] D. Salber, A. K. Dey, G. D. Abowd, The Context Toolkit: Aiding the Development of Context-Enabled Applications, in: CHI, 434–441, 1999. [21] G. Chen, M. Li, D. Kotz, Design and implementation of a largescale context fusion network, in: MobiQuitous, 246–255, 2004. [22] L. Feng, P. M. G. Apers, W. Jonker, Towards Context-Aware Data Management for Ambient Intelligence, in: DEXA, 422–431, 2004. [23] J.-P. Dittrich, P. M. Fischer, D. Kossmann, AGILE: adaptive indexing for context-aware information filters, in: SIGMOD, 215–226, 2005. [24] L. D. Harvel, L. Liu, G. D. Abowd, Y.-X. Lim, C. Scheibe, C. Chatham, Context Cube: Flexible and Eﬀective Manipulation of Sensed Context Data, in: Pervasive, 51–68, 2004. [25] Y. Roussos, Y. Stavrakas, V. Pavlaki, Towards a Context-Aware Relational Model, in: CRR, 5–8, 2005. [26] Y. Stavrakas, M. Gergatsoulis, Multidimensional Semistructured Data: Representing Context-Dependent Information on the Web, in: CAiSE, 183–199, 2002. [27] Y. Stavrakas, K. Pristouris, A. Efandis, T. K. Sellis, Implementing a Query Language for Context-Dependent Semistructured Data, in: ADBIS, 173– 188, 2004. [28] C. Doulkeridis, M. Vazirgiannis, Querying and Updating a ContextAware Service Directory in Mobile Environments, Web Intelligence (2004) 562–565. [29] A. Padovitz, S. W. Loke, A. Zaslavsky, Towards a Theory of Context Spaces, PerCom 00 (2004) 38. [30] G. Adomavicius, R. Sankaranarayanan, S. Sen, A. Tuzhilin, Incorporating contextual information in recommender systems using a multidimensional approach, ACM Trans. Inf. Syst. 23 (1) (2005) 103–145.

19

[31] C. Palmisano, A. Tuzhilin, M. Gorgoglione, Using Context to Improve Predictive Modeling of Customers in Personalization Applications, IEEE Trans. Knowl. Data Eng. 20 (11) (2008) 1535–1549. [32] R. Agrawal, R. Rantzau, E. Terzi, Context-sensitive ranking, in: SIGMOD, 383–394, 2006. [33] A. H. van Bunningen, L. Feng, P. M. G. Apers, A Context-Aware Preference Model for Database Querying in an Ambient Intelligent Environment, in: DEXA, 33–43, 2006. [34] S. Holland, W. Kießling, Situated Preferences and Preference Repositories for Personalized Database Applications, in: ER, 511–523, 2004. [35] K. Stefanidis, E. Pitoura, P. Vassiliadis, Adding Context to Preferences, in: ICDE, 846–855, 2007. [36] K. Stefanidis, E. Pitoura, P. Vassiliadis, A Context-Aware Preference Database System, International Journal of Pervasive Computing and Communications 3 (4) (2007) 439–460. [37] K. Stefanidis, E. Pitoura, Fast contextual preference scoring of database tuples, in: EDBT, 344–355, 2008.

Lihat lebih banyak...

Managing contextual preferences

Descripción

Comentarios