Activity pattern similarity: a multidimensional sequence alignment method

July 8, 2017 | Autor: Theo Arentze | Categoría: Applied Mathematics, Sequence alignment, Activity Analysis, Euclidean Distance, Information Embedding

Share Embed

Laporkan tautan ini

Descripción

Transportation Research Part B 36 (2002) 385–403 www.elsevier.com/locate/trb

Activity pattern similarity: a multidimensional sequence alignment method Chang-Hyeon Joh a, Theo Arentze a, Frank Hofman b, Harry Timmermans

c,*

a

Urban Planning Group, Eindhoven University of Technology, P.O. Box 513, Mail Station 20, 5600 MB, Eindhoven, Netherlands Ministry of Transport, Public Works and Water Management, P.O. Box 1031, 3000 BA, Rotterdam, Netherlands c Urban Planning Group, Eindhoven University of Technology, Den Dolech 2, P.O. Box 513, Mail Station 20, 5600 MB, Eindhoven, Netherlands

b

Received 20 May 2000; received in revised form 9 January 2001; accepted 28 February 2001

Abstract The classiﬁcation of activity patterns is an important research topic in activity analysis. First, it constitutes the basis for analyzing activity patterns, for instance by correlating the derived classiﬁcation with spatial and/or socio-economic variables. Secondly, the underlying mechanisms can be used to assess the degree of correspondence between observed activity patterns and activity patterns predicted by some activity-based model of transport demand. Traditionally, conventional Euclidean distance measures have been used for the comparison of activity patterns. Consequently, the sequence information embedded in activity patterns has not been explicitly considered when comparing activity patterns. More recently, sequence alignment methods have been proposed. Although these methods have some advantages, they are uni-dimensional and hence cannot incorporate the interdependencies between attributes. This paper therefore proposes a multidimensional sequence alignment method to measure diﬀerences in both sequential and interdependency information embedded in activity patterns. Ó 2002 Elsevier Science Ltd. All rights reserved.

1. Introduction The measurement of similarity among activity patterns has been a main concern of activitytravel analysis. Similarities are typically measured on such crucial attributes as activity type, activity location, start and end time, duration, transport mode, etc. This line of research serves

*

Corresponding author. Tel.: +31-40-247-3315; fax: +31-40-247-5882. E-mail address: [email protected] (H. Timmermans).

0191-2615/02/$ - see front matter Ó 2002 Elsevier Science Ltd. All rights reserved. PII: S 0 1 9 1 - 2 6 1 5 ( 0 1 ) 0 0 0 0 9 - 1

386

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

two main purposes. First, a fundamental assumption of activity-travel analysis is that particular characteristics of individuals and households and contextual characteristics are systematically related to their activity-travel patterns (Hanson, 1982; Pas, 1984; Golledge and Stimson, 1987; Hanson and Hanson, 1993; Strathman et al., 1994). To explore this assumed relationship, the similarity between observed activity-travel patterns has formed the basis for classifying individuals. Representative activity patterns were then typically correlated with individuals’ locational and/or sociodemographic characteristics. Examples include Hanson and Burnett (1981), Pas (1983), Koppelman and Pas (1985), and Cha et al. (1995). Another example is the work on spacetime prisms, which have been suggested to identify and explain interactions between individuals’ behavior and social-institutional settings (H€agerstrand, 1970; Lenntorp, 1978; Pred, 1981a,b; Thrift, 1983). Yet another example is the analysis of intrapersonal variability in activity-travel patterns across the days of the week (Hanson and Huﬀ, 1986, 1988; Pas and Koppelman, 1986; Pas, 1988). Secondly, similarity measures have been used to quantify the goodness-of-ﬁt in assessing how well observed activity-travel patterns are predicted by some activity-based model of travel demand. Examples include Kitamura et al. (1995) and Arentze et al. (1999, 2000). In general, the formal properties of the similarity measures used for these applications should capture the speciﬁc needs underlying the goals and objectives of the analysis. Many applications perhaps require some representation of nominal and interval information. For the sake of argument and to motivate the present study, it is critical to realize that in addition to nominal and interval information, sequential information and information about interdependencies in activitytravel patterns are also crucial in these applications. Sequential relationships between activities have been a primary concern in research on trip chains, activity sequencing, and sequential choice of activities and locations, acknowledging that consecutive activities likely aﬀect one another (Kitamura, 1984; Hatcher and Mahmassani, 1992; Arentze et al., 1993; Kitamura et al., 1996; Timmermans, 1996; McNally, 1997). Unfortunately, most similarity measures used in transportation science do not properly capture the sequential relationships embedded in activity-travel patterns, with one notable exception (Recker et al., 1985). For example, in their seminal and ground-breaking work, Koppelman and Pas (1985) developed a similarity measure, based on Gower’s index (Gower, 1971) to compare activity-travel patterns. This measure allows one to include both nominal and interval data in a comparison of activity-travel patterns, which is basically derived from the degree of similarity in activity pattern composition, such as the list of included activity types, timing of activities, etc. However, by comparing elements of corresponding positions between activity-travel patterns Gower’s index is not sensitive to any diﬀerences between activity-travel patterns of the same composition, that diﬀer only in the order or sequence of its elements. Koppelman and Pas (1985) realized this characteristic of their measure, and therefore suggested to consider a linear assignment programming technique to overcome the problem. Although this would be better, it still does not completely solve the problem of sequential information. The work on sequential analysis of activity-travel pattern recently gained new momentum, when Wilson (1998) introduced a Sequence Alignment Method (SAM) in time use and transportation research. The SAM is one of the various sequence comparison methods, originally introduced in disciplines such as molecular biology, chromatography and speech recognition. The interesting feature of sequence alignment is that it employs biological distance rather than geo-

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

387

metric (Euclidean) distance as the basic concept of comparison. The smallest number of changes (mutations) required to equalize the sequential order of letters between two strings is assumed to be an indicator of the true (dis)similarity (Kruskal, 1983; States and Boguski, 1991). This very feature allows one to include sequential information in the comparison of activity-travel patterns. Unfortunately, however, the conventional SAM can handle only uni-dimensional strings. In principle, one could apply the sequence alignment method to every individual dimension and then calculate the simple sum of these uni-dimensional SAM’s, but such an approach will not capture any interdependencies between attributes that we know exist (G€arling et al., 1997). In this paper, we therefore propose a multidimensional sequence alignment method that allows one to include nominal and sequential information in the comparison of activity-travel patterns and that also takes interdependencies between the dimensions or facets underlying these patterns into account. The measure should provide a better basis for activity-travel analyses where sequential information and interdependency are of great importance. The method was developed as part of the ALBATROSS 1 project that seeks to develop a multiagent, rule-based model of transport demand. The remainder of the paper is organized into three sections. In Section 2, we deﬁne the problem of the multidimensional activity pattern comparison and introduce the proposed multidimensional extension of the sequence alignment method. An empirical analysis of actual activity-travel pattern data collected in The Netherlands, described in Section 3, will then provide evidence of interdependency between attributes and illustrate the importance of incorporating interdependency into the activity pattern comparison. The paper will be concluded with a discussion of the ﬁndings and avenues of future research.

2. Multidimensional sequence alignment 2.1. Problem deﬁnition Consider the problem of comparing two multidimensional activity patterns, called a source and target pattern, respectively, in terms of K qualitative attributes. These attributes may refer to activity type, location, transport mode, accompanying person, etc. Each activity pattern then consists of K attribute sequences, and each attribute sequence consists of a set of m, respectively n elements for source and target pattern. The activity patterns to be compared can also concern a day, week, month, year, or any other time horizon. The source and target patterns to be compared can then be represented by K m and K n matrices, respectively, as source pattern s ¼ s½s1 sk sK 0 with sk ¼ sk ½sk0 ski skm

1

ALBATROSS stands for A Learning-Based Activity and Transportation Research Oriented Simulation System.

388

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

and target pattern g ¼ g½g 1 gk g K 0 with g k ¼ g k ½gk0 gkj gkn ; where sk and gk are the vectors of the kth attribute sequences of the source and target pattern, respectively; ski and gkj are the ith and jth element of the kth attribute sequence of the source and target pattern, respectively ði ¼ 0; . . . ; m; j ¼ 0; . . . ; n; k ¼ 1; . . . ; KÞ; ski 2 Ak , and gkj 2 Ak . Consequently, row vectors sk ¼ sk ½sk0 ski skm and gk ¼ g k ½gk0 gkj gkn represent the sequential order of elements of the kth attribute of the source and target pattern, respectively, while column vectors s i ¼ s i ½s1i ski sKi 0 and g j ¼ g j ½g1j gkj gKj 0 potentially represent the interdependency between the dimensions of the ith and jth activity of the source and target pattern, respectively. The problem of multidimensional activity-pattern comparison then is to ﬁnd the minimum amount of eﬀort required to change s ¼ s½s1 sK 0 into g ¼ g½g 1 gK 0 as a measure of similarity. 2.2. Conceptualization An activity schedule involves a series of sequential instances of activity type, location, transport mode, etc. As discussed in Section 1, the uni-dimensional sequence alignment methods take the minimum eﬀort required to equalize each attribute sequence as a measure of diﬀerence between the sequences. The equalization eﬀort is measured by weighing operations such as ‘deletion’, ‘insertion’, ‘substitution’ and ‘identity’ that respectively removes an element from the source sequence, inserts a target element into the source sequence, replaces a source element with a target element, and conﬁrms that source and target elements compared are identical. The fundamental concept underlying the sequence alignment method is to ﬁnd the set(s) of operations that involves the minimum sum of operation weights (minimal eﬀort or minimum alignment costs) to equalize the two attribute sequences. The set of operations and the resultant alignment costs can be derived from a so-called comparison table (Kruskal, 1983). To illustrate the operations and alignments, let us assume as an example that sequence s ¼ s½A C B has to be equalized to sequence g ¼ g½A B C. Fig. 1 shows as an example a comparison table that equalizes s with g. Each arrow in the table represents an

Fig. 1. An example of comparison table and its optimum trajectories. Note: The null elements initialize the comparison. The operation weights of deletion, insertion, substitution and identity are given 1, 1, 2 and 0, respectively.

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

389

operation that equalizes the source sub-sequence si ¼ si ½s0 si with the target sub-sequence g j ¼ gj ½g0 gj , where i 6 m and j 6 n. A vertical arrow departing the cell ði 1; jÞ for ði; jÞ represents the deletion of the ith source element si from si . A horizontal arrow departing the cell ði; j 1Þ for ði; jÞ represents the insertion of the jth target element gj from gj into si . A diagonal arrow departing the cell ði 1; j 1Þ for ði; jÞ represents the identity if si of si and gj of g j are identical or the substitution of si with gj otherwise. In the left-hand side (LHS) table, for example, the vertical arrow from the cell ð0; 0Þ to ð1; 0Þ represents the ‘deletion’ of the 1st source element A to equalize s1 ¼ ½A with g0 ¼ ½ . The horizontal arrow from the cell ð0; 0Þ to ð0; 1Þ represents the ‘insertion’ of the 1st target element A to equalize s0 ¼ ½ with g1 ¼ ½A. The diagonal arrow from the cell ð0; 0Þ to ð1; 1Þ identiﬁes the equality (‘identity’) of the 1st source and target elements in equalizing s1 ¼ ½A with g 1 ¼ ½A. Another diagonal arrow from the cell ð1; 1Þ to ð2; 2Þ ‘substitutes’ the 2nd source element C by the 2nd target element B to equalize s2 ¼ ½A C with g2 ¼ ½A B. In particular, the substitution of a diagonal arrow can be decomposed into a deletion and insertion of two arrows such as ð1; 1Þ ! ð2; 1Þ ! ð2; 2Þ or ð1; 1Þ ! ð1; 2Þ ! ð2; 2Þ as denoted by dashed arrows. The operations, denoted by these arrows, involve equalization costs. The ﬁgure in each cell ði; jÞ of the LHS table represents the minimum cost required to reach that cell, i.e., to equalize si with g j . The last cell ðm; nÞ therefore represents the ﬁnal result as the minimum costs required to equalize the entire source and target sequences, s ¼ s½A C B and g ¼ g½A B C. For example, the source and target sub-sequences s1 and g 1 can be equalized at the cell ð1; 1Þ by one of three ways: insertion of target element A with a horizontal arrow from the cell ð1; 0Þ of s1 ¼ g 0 ¼ ½ to the cell ð1; 1Þ, deletion of source element A with a vertical arrow from ð0; 1Þ of s0 ¼ g 1 ¼ ½A to the cell ð1; 1Þ, or identity of A with a diagonal arrow from ð0; 0Þ of s0 ¼ g0 ¼ ½ to the cell ð1; 1Þ. The insertion, deletion and identity would result in the cumulative alignment costs of 2, 2 and 0 units in this case, and the comparison table chooses for the cell ð1; 1Þ the identity of diagonal arrow representing the smallest cost. Every possible ‘trajectory’ from the cell ð0; 0Þ to the cell ðm; nÞ then represents a sequence of operations that result in equalization of the entire sequences. The ﬁgure in the right-hand side (RHS) illustrates one of such minimum-cost trajectories. It consists of the following set of operations associated with their cumulative equalization costs: Moves in the table arrow arrow arrow arrow

( ( ( (

) ) ) )

(1st diagonal move) (horizontal move) (2nd diagonal move) (vertical move)

Operations meant

Resulting s

Cumulative cost

identity of A: A ¼ A insertion of B after A identity of C: C ¼ C deletion of B at the last

ACB ABCB ABCB ABC

0 1 1 2

As proven by Kruskal (1983), the minimum-cost trajectories can be found by back-tracking the alignments in the comparison table. In the LHS ﬁgure of the above example, the cumulative minimum-cost found at the cell ð3; 3Þ can be traced back to the cell ð2; 3Þ ! ð1; 2Þ ! ð1; 1Þ ! ð0; 0Þ, as shown in the RHS table. However, multiple minimum-cost trajectories can be present in a comparison table. For example, the above backtracking could have a diﬀerent trajectory if the cell ð3; 3Þ is traced back to the cell ð3; 2Þ instead of ð2; 3Þ. Note that the

390

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

minimum-cost trajectory cannot be traced back from the cell ð3; 3Þ to ð2; 2Þ because the move from ð2; 2Þ to ð3; 3Þ by substituting B with C would result in the cumulative cost of 4 instead of 2. The minimum-cost trajectories allow one to capture the sequential relationships within activitytravel patterns. They however do not capture any interdependencies between attributes. Simply summing the uni-dimensional optimum alignment costs across dimensions would not solve this problem because activity type, location, transport mode, etc. would be aligned independently of other attributes’ elements. To solve this problem, we propose the set of deletions, insertions and substitutions that are applied to the elements belonging to the same activity to be treated as a single deletion, insertion and substitution, respectively. This integrated operation combines a set of elements that can be aligned simultaneously as if it were one element. We call such a set of elements a segment. If there were no interdependencies, the resulting alignment costs would be identical to the simple sum of uni-dimensional optimum alignment costs. Any costs-savings are indicative of interdependencies across the dimensions underlying the activity-travel patterns. Consider the comparison case shown in Fig. 2. Note that Pattern 1 is equalized to Pattern 2 by using deletion and insertion operations. The elements ‘B’ of attribute 1 and ‘3’ of attribute 2 are independently deleted and inserted, because they occupy diﬀerent positions. In contrast, elements ‘D’ and ‘4’ are deleted from and inserted into the same positions in both sequences, and hence, the alignments of these elements can be combined into a single segment. Assuming unit weights for both the deletion and insertion operations, this combined alignment requires six units (deletions and insertions of ‘B’, ‘3’ and ‘D and 4’), while the independent uni-dimensional alignments requires eight units (deletions and insertions of ‘B’, ‘3’, ‘D’ and ‘4’). If this reasoning is accepted, the problem then is how to determine the aggregate alignment costs. Unless the weights attached to attributes are all the same, the alignment costs depend on the method of aggregation that is used. In general, several methods for aggregating a set of values may be used, such as, for example, determining the average, median, maximum, etc. We argue that the maximum weight across attributes of the segment is the most appropriate method of aggregation. It reﬂects the notion that people’s decisions in activity scheduling may be dominated by the most important attribute. For example, let wd be the weight attached to the deletion of an element, and bk be the weight attached to the kth attribute. Then, the deletion of segment ‘D and 4’ in Fig. 2 costs maxðb1 ; b2 Þ wd because these elements are aligned together instead of ðb1 þ b2 Þ wd as in the case uni-dimensional alignments. This results in a cost reduction of ðb1 þ b2 Þ wd maxðb1 ; b2 Þ wd ¼ minðb1 ; b2 Þ wd . In general, the alignment of two multidimensional activity patterns will involve varying degrees of cost reductions. The stronger the interdependencies between attributes, the higher the cost reduction. The problem of multidimensional pattern comparison then is to ﬁnd the set of segments that minimizes alignment costs. Unfortunately, this is not an easy problem as the only known way to

Fig. 2. A comparison of two-dimensional patterns.

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

391

guarantee the optimality of the multidimensional comparison is to try all possible alignments of each attribute, generate all possible combinations of the alignments across attributes and then ﬁnd the minimum costs among all possible alignment combinations. It will be evident that enumerating all alignments of even a single attribute may already require an enormous amount of computing time. We therefore suggest employing a heuristic approach to eﬀectively reduce the search space. We will develop such a method in the following section. 2.3. Speciﬁcation of the method The multidimensional alignment method proposed in this paper ﬁrst ﬁnds the uni-dimensional least-cost alignments, then integrates the alignments across attributes and, ﬁnally, computes the costs for the integrated alignments. More speciﬁcally, given a pair of K-dimensional activity patterns, the proposed method ﬁrst obtains K sets of operations used for the uni-dimensional optimum alignments, then identify the sets of elements to which the same operations are applied and next integrates K sets of operations applied to elements into a single set of operations applied to segments, and ﬁnally, computes the costs of the operations for segments. The rationale underlying this heuristic lies in the expectation that the combination of the unidimensional least-cost alignments likely generates a result that is nearer to the multidimensional optimum than any other approach within acceptable computing time. The method is detailed below. 2.3.1. Uni-dimensional operation set To formally specify the proposed segment-based alignment method, we ﬁrst introduce a set representation of the alignments of uni-dimensional sequences by using the ‘trajectory’ information from the comparison table. As discussed in Section 2.2, a trajectory is a set of operations used for equalizing or aligning two sequences. Alignments can therefore be summarized by recording the operations, the positions to which these operations are applied, and the attributes on which the operations are applied, translating each trajectory into a set of operations. Note that the identity operation is not recorded because it is cost-free. We shall call this a uni-dimensional operation set and deﬁne it as Ok;l ¼ fp j p ¼ dði; kÞ _ iðj; kÞ _ sði; j; kÞg

ð1Þ

with i 2 f1; . . . ; mg;

j 2 f1; . . . ; ng;

1 6 k 6 K;

where Ok;l is the lth uni-dimensional operation set of the kth attribute sequence, containing the deletions, insertions and substitutions; dði; kÞ is the deletion of the ith element of the kth source sequence; iðj; kÞ is the insertion of the jth element of the kth target sequence into the jth position of the kth source sequence; sði; j; kÞ is the substitution applied to both the ith and jth elements of the kth source and target sequences, respectively; m and n are the number of elements of the source and target sequences, respectively. For example, assuming that the RHS of Fig. 1 relates to the 2nd attribute of an arbitrary activity pattern, the alignment represented by the trajectory in that ﬁgure is summarized as

392

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

Fig. 3. Two diﬀerent trajectories of the same operation set.

O2;l ¼ fið1; 2Þ; dð3; 2Þg: The set representation facilitates the identiﬁcation of segments. By comparing the K sets relating to K tables, we can easily identify the operations that can be linked into a segment. As this set representation is derived from the optimum trajectories, we shall call the proposed method the uni-dimensional Optimum Trajectories-based MultiDimensional Sequence Alignment Method (OTMDSAM). In general, there are multiple least-cost trajectories for each attribute. Many of these least-cost trajectories, however, often result in identical operation sets. An illustration of this many-to-one relationship between trajectories and operation sets is given in Fig. 3. The uni-dimensional operation sets corresponding to the trajectories are O2;1 ¼ fið1; 2Þ; ið3; 2Þ; dð2; 2Þ; dð3; 2Þ; ið5; 2Þg and O2;2 ¼ fið1; 2Þ; dð2; 2Þ; ið3; 2Þ; dð3; 2Þ; ið5; 2Þg: While these trajectories are diﬀerent in terms of the order of the underlined operations, the operation sets are identical. In other words, the order of operations included in a set is arbitrary, and it does not make a diﬀerence in which alignment step an element is aligned. The proposed method therefore develops an eﬃcient algorithm that ignores the order of operations, applied when tracing the optimum trajectories (see Appendix A). 2.3.2. Multidimensional operation set The proposed OT-MDSAM integrates K sets of operations applied to elements into a single set of operations applied to segments. Because there are multiple uni-dimensional operation sets for each attribute, multiple combinations of K uni-dimensional sets for operation integration exist to be tested. Let Tk be the number of uni-dimensional operation sets of the kth attribute sequence. The number of combinations of uni-dimensional operation sets, T, is given by T ¼

K Y

Tk :

ð2Þ

k¼1

For each combination, tðt ¼ 1; . . . ; T Þ, we have K operation sets, Ot1;l0 ; . . . ; Otk;l ; . . . ; OtK;l00 where Otk;l is the lth uni-dimensional operation set of the kth attribute sequence included in the tth combi-

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

393

nation. We shall call such a combination of uni-dimensional operation sets a multidimensional operation set. Let Ot be the tth multidimensional operation set. Then Ot ¼ fp j p ¼ dði; kÞ _ iðj; kÞ _ sði; j; kÞg;

ð3Þ

where k identiﬁes the set of attributes to which the concerned operation was applied. Thus, the set k deﬁnes a segment and represents interdependency between the concerned attributes. For example, if the uni-dimensional operation sets of three attribute sequences (a three-dimensional case) included in the tth combination are Ot1;l0 ¼ fdð2; 1Þ; ið4; 1Þ; sð5; 6; 1Þg; Ot2;l ¼ fdð3; 2Þ; ið4; 2Þ; sð5; 5; 2Þg; Ot3;l00 ¼ fdð2; 3Þ; ið4; 3Þ; sð5; 5; 3Þg; then the tth multidimensional operation set is Ot ¼ fdð2; f1; 3gÞ; dð3; f2gÞ; ið4; f1; 2; 3gÞ; sð5; 5; f2; 3gÞ; sð5; 6; f1gÞg; implying that the 2nd elements of 1st and 3rd source attribute sequences are deleted, the 3rd element of 2nd source attribute sequence is deleted, the 4th element of 1st, 2nd and 3rd target attribute sequences are inserted, the 5th elements of source and target attribute sequences are substituted, and the 5th element of source attribute sequence and the 6th element of target attribute sequences are substituted. The multidimensional operation set, Ot , represents the overall trajectory in multidimensional space. 2.3.3. Multidimensional similarity The weighting value of each operation is determined as its own weighting value multiplied by the maximum attribute weight across the attributes to which the operation is applied. This set is represented by k. Hence, the alignment cost calculated from the tth multidimensional operation set, C t , is Ct ¼

X

ð4Þ

cp

p2Ot

with

8 max < wd bk max cp ¼ wi bk : ws bmax k

if p ¼ dði; kÞ; if p ¼ iðj; kÞ; if p ¼ sði; j; kÞ;

where wd ; wi and ws are the weighting value attached to the deletion, insertion and substitution is the maximum weight across attributes operation, respectively (wd ¼ wi , and ws ¼ wd þ wi ); bmax k in the set k, which takes in eﬀect the interdependency relationships represented by Section 2.3.3. Finally, the minimum alignment cost, C , is C ¼ min½C 1 ; . . . ; C t ; . . . ; C T :

ð5Þ

That is, the minimum cost across trajectory combinations is taken as a measure of multidimensional similarity.

394

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

2.4. Illustration Consider the following example, where K ¼ 4, and length of the source and target patterns is m ¼ 4 and n ¼ 5, respectively. We assume that b1 ¼ 2, and b2 ¼ b3 ¼ b4 ¼ 1.

Source pattern A 1 a a

D 6 b d

B 2 c u

Target pattern C 3 f c

A 1 a a

B 2 b u

C 3 c c

D 4 d d

E 5 e e

! ! ! !

activity-type sequence activity-location sequence transport-mode sequence accompanying-person sequence

The optimum trajectories are shown in Fig. 4. Activity type, location and accompanying person sequences have one optimum trajectory, respectively, while the transport mode sequence has ﬁve. Hence, the number of trajectory combinations in this example is T ¼ 1 1 5 1 ¼ 5: The operation sets for the ﬁrst combination are: O1;1 ¼ fdð2; 1Þ; ið4; 1Þ; ið5; 1Þg;

O1;2 ¼ fdð2; 2Þ; ið4; 2Þ; ið5; 2Þg;

O1;3 ¼ fið4; 3Þ; ið5; 3Þ; dð4; 3Þg;

O1;4 ¼ fdð2; 4Þ; ið4; 4Þ; ið5; 4Þg;

the operation sets for the second combination: O2;1 ¼ fdð2; 1Þ; ið4; 1Þ; ið5; 1Þg; O2;3 ¼ fið4; 3Þ; sð4; 5; 3Þg;

O2;2 ¼ fdð2; 2Þ; ið4; 2Þ; ið5; 2Þg;

O2;4 ¼ fdð2; 4Þ; ið4; 4Þ; ið5; 4Þg;

Fig. 4. Optimum trajectories of each attribute sequence.

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

395

the operation sets for the third combination: O3;1 ¼ fdð2; 1Þ; ið4; 1Þ; ið5; 1Þg;

O3;2 ¼ fdð2; 2Þ; ið4; 2Þ; ið5; 2Þg;

O3;3 ¼ fið4; 3Þ; ið4; 3Þ; dð5; 3Þg;

O3;4 ¼ fdð2; 4Þ; ið4; 4Þ; ið5; 4Þg;

the operation sets for the fourth combination: O4;1 ¼ fdð2; 1Þ; ið4; 1Þ; ið5; 1Þg; O4;3 ¼ fsð4; 4; 3Þ; ið5; 3Þg;

O4;2 ¼ fdð2; 2Þ; ið4; 2Þ; ið5; 2Þg;

O4;4 ¼ fdð2; 4Þ; ið4; 4Þ; ið5; 4Þg;

and the operation sets for the ﬁfth combination: O5;1 ¼ fdð2; 1Þ; ið4; 1Þ; ið5; 1Þg;

O5;2 ¼ fdð2; 2Þ; ið4; 2Þ; ið5; 2Þg;

O5;3 ¼ fið4; 3Þ; ið4; 3Þ; dð5; 3Þg;

O5;4 ¼ fdð2; 4Þ; ið4; 4Þ; ið5; 4Þg:

From Eq. (3), the unique operation set for each combination is: O1 ¼ fdð2; f1; 2; 4gÞ; dð4; f3gÞ; ið4; f1; 2; 3; 4gÞ; ið5; f1; 2; 3; 4gÞg; O2 ¼ fdð2; f1; 2; 4gÞ; ið4; f1; 2; 3; 4gÞ; ið5; f1; 2; 4gÞ; sð4; 5; f3gÞg; O3 ¼ fdð2; f1; 2; 4gÞ; dð4; f3gÞ; ið4; f1; 2; 3; 4gÞ; ið5; f1; 2; 3; 4gÞg; O4 ¼ fdð2; f1; 2; 4gÞ; ið4; f1; 2; 4gÞ; ið5; f1; 2; 3; 4gÞ; sð4; 4; f3gÞg; O5 ¼ fdð2; f1; 2; 4gÞ; dð4; f3gÞ; ið4; f1; 2; 3; 4gÞ; ið5; f1; 2; 3; 4gÞg: From Eq. (4), the alignment costs calculated from these operation sets are: C 1 ¼ 2 1 þ 1 1 þ 2 1 þ 2 1 ¼ 7; C 2 ¼ 2 1 þ 2 1 þ 2 1 þ 1 2 ¼ 8; C 3 ¼ 2 1 þ 1 1 þ 2 1 þ 2 1 ¼ 7; C 4 ¼ 2 1 þ 2 1 þ 2 1 þ 1 2 ¼ 8; C 5 ¼ 2 1 þ 1 1 þ 2 1 þ 2 1 ¼ 7: From Eq. (5), the optimum alignment costs, C , are C ¼ min½C 1 ; C 2 ; C 3 ; C 4 ; C 5 ¼ 7: This example illustrates OT-MDSAM’s fundamental property discussed in Section 2.2. The costs of the MDSAM in a perfectly independent case would be 6 þ 3 þ 3 þ 3 ¼ 15, in a perfectly interdependent case, 15/4. The reduction of 8-units cost according to the proposed method, compared to the perfectly independent case, is achieved by the integration of operations, dð2; f1; 2; 4gÞ (2-units reduction), ið4; f1; 2; 3; 4gÞ (3-units reduction) and ið5; f1; 2; 3; 4gÞ (3-units reduction).

3. Empirical analysis Having discussed the fundamental features of the proposed method and illustrated its application in a simple example, we will discuss two major issues in this section. First, we will investigate whether the proposed method is capable of capturing interdependency between attributes in

396

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

empirical data, its impact on the comparison results and the implications of incorporating interdependency in activity analysis. To this end, the amount of cost reduction achieved by the proposed method will be compared to the sum of costs of the conventional Uni-Dimensional Sequence Alignment Method (UDSAM). In addition, the correlation between the two methods will be examined because the relative rather than absolute pairwise distances are relevant for classiﬁcation and test of goodness-of-ﬁt. A signiﬁcant cost reduction would demonstrate that the proposed method captures interactions between attribute dimensions, while an imperfect correlation would suggest that a variety of inter-relationships between attributes exist. Empirical proof along these lines would be indicative of the superiority of the proposed method. Secondly, we will assess the performance of proposed method in terms of the required computer time. Especially if the method is to be used as a goodness-of-ﬁt method in deriving an activitybased model, low computing times are a true concern. We will therefore examine whether the proposed heuristic method produces the exact optimum solution and compare the running times of the methods. The data used for these analyses are part of a recent Dutch activity diary, collected in 1997 in two cities, Hendrik-Ido-Ambacht and Zwijndrecht. The respondents were asked to report all their activities conducted during two consecutive days. Each day began at 3 a.m. and ended at 3 a.m. the next day. The diary used open time intervals. A total of 2974 activity-travel patterns were included in the data set. To reduce computing time, the empirical analysis of this section used only a sub-set of 71 activity-travel patterns that were randomly selected without replacement from the full data set and involved 2485 pairwise comparisons. Three dimensions of the activity-travel patterns were analyzed: activity type, location and transport mode. The patterns involve twentyﬁve out-of-home activities, thirty-two out-of-home locations and four transport modes including car, walk, bike and car passenger. An extra code, ‘unknown’, was included to represent the attribute elements that were not identiﬁed. The substitution operation was applied to any alignment with the unknown element. The average length of the activity patterns was 6.80 with a maximum of 17, a minimum of 1 and a standard deviation of 3.77 activities. Fig. 5 shows an example. The sum of UDSAM costs and the OT-MDSAM costs were calculated on 2485 pairs of activity-travel patterns. The deletion and insertion weights were given the value of 1, and the substitution weight was assumed to be the sum of deletion and insertion weights. All attribute weights were also given the value of 1. The results are shown in Table 1. The diﬀerence in cost reduction between the two methods is 14.63, which is fairly large considering the size of the sum of UDSAM costs (reduction 52.75% of 27.7336). This ﬁnding provides empirical evidence of the fact that the proposed method captures interdependency in multidimensional activity proﬁles. Moreover, this result suggests that interdependencies are signiﬁcant amount.

Fig. 5. An example of the encoding of activity pattern.

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

397

Table 1 Pairwise comparison results

Sum of UDSAMs OT-MDSAM

Number of comparisons

Mean

Minimum

Maximum

Standard deviation

2485 2485

27.7336 13.1038

2 2

73 32

12.2389 5.1867

The constant amount of cost reduction across diﬀerent weights of the primary attribute should be evident from Eq. (4). It occurs because a change in the weight of the primary attribute does not aﬀect the total amount of cost reduction, which is caused by the integration operation that is applied only to the alignments of smaller-weight attribute elements. Consequently, as the primary attribute weight increases, the ratio of the OT-MDSAM cost to the sum of UDSAM costs becomes larger, and the cost reduction becomes less important. This ﬁnding implies that a ﬁnetuning of the size of the primary attribute weight is required, depending on the context of analysis. Given a proper size of the primary attribute weight, the OT-MDSAM should be expected to signiﬁcantly change the similarity between activity-travel patterns as compared to the sum of UDSAMs. The next step in the analysis involved the examination of the correlation between the two methods to investigate whether the signiﬁcant amount of cost reduction changes the relative distances between activity patterns. The Pearson correlation coeﬃcient and rank correlation coeﬃcient are 0.951 and 0.953, respectively. This ﬁnding indicates that although the two measures are highly correlated, the correlation is not perfect. Evidence of the decreasing eﬀect of cost reduction can be found in Fig. 6. The LHS of the ﬁgure shows that the correlation between sum of UDSAMs and OT-MDSAM becomes close to 1 as the weight of the primary attribute becomes 3 or higher. The eﬀect of the cost reduction caused by the integration of operations applied to the second and third attributes is virtually disappearing as illustrated by the RHS of the ﬁgure that shows the diminishing ratio of cost reduction to the sum of UDSAMs. To investigate the interdependency between attributes in more detail, we further examined which pairs of attribute dimensions demonstrate co-existences of attribute elements. The

Fig. 6. Correlation between sum of UDSAM costs and OT-MDSAM cost with varying weights of activity-type attribute.

398

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

Table 2 Correlation between sum of UDSAMs and two-dimensional OT-MDSAM Attributes included in the activity patterns

Pearson’s r

Spearman’s r

Activity type, activity location Activity type, transport mode Activity location, transport mode

0.977 0.970 0.967

0.977 0.973 0.969

Table 3 Performance of OT-MDSAM versus ExMDSAM MD cost

ExMDSAM OT-MDSAM

Number of MD operation sets

0. Let dðs; gÞ be the alignment cost as defined in Eqs. (1)–(5). Then, Z1 ¼ Z2 :

ðA:1Þ

In words, all optimum trajectories have the same number of identity coordinates. Proof. Given O ¼ fp1 ; . . . ; pd ; . . . ; pD ; q1 ; . . . ; qi ; . . . ; qI ; h1 ; . . . ; hz ; . . . ; hZ g, then dðs; gÞ ¼ ðD þ IÞ1 wo þ Z1 we 6 ðD þ IÞwo þ Zwe 8ðD þ IÞ; Z; ðD þ IÞ1 2 fðD þ IÞg; Z1 2 fZg

ðA:2Þ

dðs; gÞ ¼ ðD þ IÞ1 wo 6 ðD þ IÞwo

ðA:3Þ

or since we ¼ 0

or ðD þ IÞ1 6 ðD þ IÞ:

ðA:4Þ

Hence, ðD þ IÞ1 ¼ minðD þ IÞ:

ðA:5Þ

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

401

Similarly, dðs; gÞ ¼ ðD þ IÞ2 wo þ Z2 we 6 ðD þ IÞwo þ Zwe 8ðD þ IÞ; Z; ðD þ IÞ2 2 fðD þ IÞg; Z2 2 fZg

ðA:6Þ

dðs; gÞ ¼ ðD þ IÞ2 wo 6 ðD þ IÞwo

ðA:7Þ

or since we ¼ 0

or

ðD þ IÞ2 6 ðD þ IÞ:

ðA:8Þ

Hence, ðD þ IÞ2 ¼ minðD þ IÞ:

ðA:9Þ

From Eqs. (A.5) and (A.9), ðD þ IÞ1 ¼ ðD þ IÞ2 :

ðA:10Þ

Since, nðOÞ ¼ D þ I þ Z

given O ¼ fp1 ; . . . ; pD ; q1 ; . . . ; qI ; h1 ; . . . ; hZ g;

ðA:11Þ

we have ðD þ IÞ1 þ Z1 ¼ ðD þ IÞ2 þ Z2 :

ðA:12Þ

Therefore, from Eqs. (A.10) and (A.12), Z1 ¼ Z2 :

Theorem 2. The operation sets of any two possible paths between two adjacent identity operations in a comparison table are identical. To define this formally, let ðk1 ; l1 Þ and ðk2 ; l2 Þ be two identity coordinates in an optimum trajectory. Let Os ¼ ffdig; fijgg be the deletion–insertion operation subset that contains the deletion and insertion operations in-between ðk1 ; l1 Þ and ðk2 ; l2 Þ. Let Os0 ¼ ffdig0 ; fijg0 g be an arbitrary deletion–insertion operation subset. Then, the theorem states, Os0 ¼ Os : if k1 < i < k2

ðA:13Þ 8di 2 fdig0

and

l1 < j < l2

8ij 2 fijg0 :

ðA:14Þ

In words, there can be no other insertion–deletion operation sets between two adjacent identity coordinates. Proof. We establish the proof by assuming the negation of the above theorem, i.e., Os0 6¼ Os :

ðA:15Þ

Then, di 2 fdig0

9di 62 fdig

ðA:16Þ

402

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

or ij 2 fijg0

9ij 62 fijg:

ðA:17Þ

k1 < i < k2

8di 2 fdig

ðA:18Þ

l1 < j < l2

8ij 2 fijg;

ðA:19Þ

Because and we have i 6 k1 or i P k2

9di 62 fdig

ðA:20Þ

j 6 l1 or j P l2

9ij 62 fijg:

ðA:21Þ

or Both Eqs. (A.20) and (A.21) contradict Eq. (A.14), which is the condition of the theorem, and hence, the negation expressed by Eq. (A.15) is false. Therefore, Os 0 ¼ Os :

References Arentze, T., Borgers, A., Timmermans, H.J.P., 1993. A model of multi-purpose shopping trip behavior. Papers in Regional Science 72, 239–256. Arentze, T.A., Hofman, F., Joh, C.H., Timmermans, H.J.P., 1999. The development of ALBATROSS: some key issues. In: Beckmann, K.J. (Ed.), Verker und Mobilit€at, Stadt Region, Land 66, ISB, Aachen, pp. 61–70. Arentze, T., Hofman, F., Van Mourik, H., Timmermans, H., 2000. ALBATROSS: a multi-agent rule-based model of activity pattern decisions. Paper Presented at the 79th Annual Meeting of the Transportation Research Board, Washington, DC. Cha, S., McCleary, K.W., Uysal, M., 1995. Travel motivations of Japanese overseas travelers: a factor-cluster segmentation approach. Journal of Travel Research 34 (1), 33–39. Golledge, R.G., Stimson, R.J., 1987. Analytical Behavioral Geography. Croom-Helm, London. Gower, J.C., 1971. A general coeﬃcient of similarity and some of its properties. Biometrics 27, 857–871. G€ arling, T., Gillholm, R., Romanus, J., Selart, M., 1997. Interdependent activity and travel choices: behavioral principles of integration of choice outcomes. In: Ettema, D.F., Timmermans, H.J.P. (Eds.), Activity-Based Approaches to Travel Analysis. Pergamon Press, Oxford, pp. 135–149. Hanson, S., 1982. The determinants of daily travel-activity patterns: relative location and sociodemographic factors. Urban Geography 3 (3), 179–202. Hanson, S., Burnett, P., 1981. Understanding complex travel behavior: measurement issues. In: Stopher, P.R., Meyburg, A.H., Br€ og, W. (Eds.), New Horizons in Behavioral Travel Research. Lexington Books, Lexington, pp. 207–230. Hanson, S., Hanson, P., 1993. The geography of everyday life. In: G€ arling, T., Golledge, R.G. (Eds.), Behavior and Environment: Psychological and Geographical Approaches. Elsevier, London, pp. 249–269. Hanson, S., Huﬀ, J., 1986. Classiﬁcation issues in the analysis of complex travel behavior. Transportation 13, 271–293. Hanson, S., Huﬀ, J., 1988. Repetition and day-to-day variability in individual travel patterns: implications for classiﬁcation. In: Golledge, R.G., Timmermans, H.J.P. (Eds.), Behavioral Modeling in Geography and Planning. Croom-Helm, London, pp. 368–398.

C.-H. Joh et al. / Transportation Research Part B 36 (2002) 385–403

403

Hatcher, S.G., Mahmassani, H.S., 1992. Daily variability of route and trip scheduling decisions for the evening commute. Transportation Research Record 1357, 72–81. H€ agerstrand, T., 1970. What about people in regional science? Papers and Proceedings of the Regional Science Association 24, 7–21. Kitamura, R., 1984. Incorporating trip chaining into analysis of destination choice. Transportation Research B 18, 67– 81. Kitamura, R., Pendyala, R.M., Pas, E.I., Reddy, D.V., 1995. Application of AMOS: an activity-based TCM evaluation tool applied to Washington D.C. Metropolitan Area. In: Proceedings of the 23rd Summer Annual Meeting PTRC, London, pp. 177–190. Kitamura, R., Pas, E.I., Lula, C.V., Lawton, T.K., Benson, P.E., 1996. The sequenced activity mobility simulator (SAMS): an integrated approach to modeling transportation, land use and air quality. Transportation 23, 267–291. Koppelman, F.S., Pas, E.I., 1985. Travel-activity behavior in time and space: methods for representation and analysis. In: Nijkamp, P., Leitner, H., Wrigley, N. (Eds.), Measuring the Unmeasurable. Martinus Nijhoﬀ, The Hague, pp. 587–623. Kruskal, J.B., 1983. An overview of sequence comparison. In: Sankoﬀ, D., Kruskal, J.B. (Eds.), Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, London, pp. 1–44. Lenntorp, B., 1978. A time-geography simulation model of individual activity programmes. In: Carlstein, T., Parkes, D., Thrift, N. (Eds.), Timing Space and Spacing Time vol. 2: Human Activity and Time Geography. Arnold, London, pp. 162–180. McNally, M.G., 1997. An activity-based micro-simulation model for travel demand forecasting. In: Ettema, D.F., Timmermans, H.J.P. (Eds.), Activity-Based Approaches to Travel Analysis. Pergamon Press, Oxford, pp. 37–54. Pas, E.I., 1983. A ﬂexible and integrated methodology for analytical classiﬁcation of daily travel-activity behavior. Transportation Science 17 (4), 405–429. Pas, E.I., 1984. The eﬀect of selected sociodemographic characteristics on daily travel-activity behavior. Environment and Planning A 16, 571–581. Pas, E.I., Koppelman, F.S., 1986. An examination of the determinants of day-to-day variability in individuals’ urban travel behavior. Transportation 13, 183–200. Pas, E.I., 1988. Weekly travel-activity behavior. Transportation 15, 89–109. Pred, A., 1981a. Of paths and projects: individual behavior and its societal context. In: Cox, K.R., Golledge, R.G. (Eds.), Behavioural Problems in Geography Revisited. Methuen, London, pp. 231–256. Pred, A., 1981b. Social reproduction and the time-geography of everyday life. Geograﬁska Annaler B 63 (1), 5–22. Recker, W.W., McNally, M.G., Root, G.S., 1985. Travel/activity analysis: pattern recognition, classiﬁcation and interpretation. Transportation Research A 19, 279–296. States, D.J., Boguski, M.S., 1991. Similarity and homology. In: Gribskov, M., Devereux, J. (Eds.), Sequence Analysis Primer. Stockton Press, New York, pp. 89–157. Strathman, J.G., Dueker, K.J., Davis, J.S., 1994. Eﬀects of household structure and selected travel characteristics on trip chaining. Transportation 21, 23–45. Thrift, N.J., 1983. On the determination of social action in space and time. Environment and Planning D 1, 23–57. Timmermans, H.J.P., 1996. A stated choice model of sequential mode and destination choice behavior for shopping trips. Environment and Planning A 28, 173–184. Wilson, C., 1998. Activity pattern analysis by means of sequence-alignment methods. Environment and Planning A 30, 1017–1038.

Lihat lebih banyak...

Activity pattern similarity: a multidimensional sequence alignment method

Descripción

Comentarios