A Film Classifier Based on Low-level Visual Features

June 6, 2017 | Autor: Hui-Yu Huang | Categoría: Multimedia, Computer Software, Multimedia signal processing

Descripción

26

JOURNAL OF MULTIMEDIA, VOL. 3, NO. 3, JULY 2008

A Film Classifier Based on Low-level Visual Features Hui-Yu Huang† , Weir-Sheng Shih∗ , Wen-Hsing Hsu∗ †

Department of Computer Science and Information Engineering National Formosa University, Yunlin, 632 Taiwan Email: [email protected] ∗ Department of Electrical Engineering, National Tsing Hua University, Hsinchu, 300 Taiwan

Abstract— We propose an approach to classify the film classes by using low level features and visual features. This approach aims to classify the films into genres. Our current domain of study is using the movie preview. A movie preview often emphasizes the theme of a film and hence provides suitable information for classifying process. In our approach, we categorize films into three broad categories: action, dramas, and thriller films. Four computable video features (average shot length, color variance, motion content and lighting key) and visual features (show and fast moving effects) are combined in our approach to provide the advantage information to demonstrate the movie category. The experimental results present that visual features are the useful messages for processing the film classification. On the other hand, our approach can also be extended for other potential applications, including the browsing and retrieval of videos on the internet, video-on-demand, and video libraries. Index Terms— Film classifier, movie genre, shot boundary detection, visual feature

I. I NTRODUCTION With recent advances in digital video coding and transmission, larger and larger amounts of digital videos are becoming from various source in every day. The advent of digital television and internet is yet another motivating factor for automated analysis of digital video. Although digital video can be labeled at the production stage, there is still need for automatic classification of videos. Manual annotation or analysis of video is an expensive and arduous task that will not be able to keep up with this rapidly increasing volume of video data in the near feature. Hence, people need some technologies such as video database, video browsing, indexing, and data mining to manage the videos and discover knowledge from videos. Application of scene-level classification would allow departure from the prevalent system of movie ratings to a more flexible system of scene ratings. For instance, a child would be able to watch movies containing a few scenes with excessive violence, if a pre-filter system can prune out scenes that have been rated as violent. In order to effective classification of movie protecting the children download from the internet, a film classification processing is needed to develop. © 2008 ACADEMY PUBLISHER

Pickering and Ruger [1] investigated the application of a variety of content-based image retrieval method to video retrieval. Low et al. [2] presented an overview in developing an integrated and content-based solution for computer-assigned video parsing, abstraction, retrieval and browsing. The most important feature of this approach is the use of low-level visual features as a representation of video content and automatic abstraction process. In order to bridge low-level media features and high-level semantics, there are many algorithms proposed to bridge low-level features to object level search, and then connect the object level to event level. Chua and Ruan [3] designed a system to support the entire process of video information management: segmenting, logging, retrieving, and sequencing of video data. It is mainly developed a semiautomatic tool to divide video sequences into meaningful shots. Naphade et al. [4] proposed an approach which fuses multiple modalities and observes both individual modalities based detectors. A set of such multijects were developed and a menu was provided to the user form, which any of the object or event or site can be searched on all of video database. Qian et al. [5] developed a method which adopts a semantic framework for video indexing and detection of events to detect the hunt event in videos. The rest of the paper is organized as follows. The related techniques, which include shot concepts and lowlevel visual features, are described in Section 2. In Section 3, we present the proposed method. Experimental results are given in Section 4. Finally, the paper is concluded in Section 5. II. R ELATED TECHNIQUES The previous researches in the area of video categorization and indexing, there are a number of advantages to being able to use the visual content as the basis of a search. It is often difficult to fully express a visual query in words, and yet a single image can completely describe what is being searched for. However, many modern feature films rely far more on their visual effects that they do on any spoken material. The use of visual cues also allows the retrieval system to become language

JOURNAL OF MULTIMEDIA, VOL. 3, NO. 3, JULY 2008

Figure 1. Example of fade in.

27

Figure 2. Example of fade out.

independent; an advantage where the database is to be made available internationally, such as on the internet [6]. A. Shot boundary concept For the sheer volume of video data to be usable, it must be easily accessible. An important step is to identify and annotate sections of interest. Historically, identification and annotation of video have been performed by human annotators. This is tedious, expansive, and susceptive, and often inconsistent. Automatic indexing methods have the potential to avoid these problems. Parts of the analysis process are to identify and determine the boundaries of the basic semantic elements, the shots. The transition between adjacent shots can be abrupt, a cut or gradual. The former category describes a shot change where two consistence frames belong to the different shots. The latter involves a processing changeover between two shots using video editing techniques such as dissolves, fades, and wipes. In order to discuss shot detection, we define some terms used in this paper. A shot means an image sequence which presents continuous action that appears to be from a single operation of the camera. In other words, a shot is the sequence of image that is generated by the camera from the time, it begins recording images to the time it stops recording images. A fade is a gradual transition in the image sequence. The picture is gradual darkness to black in the case of a fade out and gradually brightens in the case of fade in. A dissolve is a simultaneous application of a fade in and a fade out to the two shots being edited. The fade in (fade out) effect is achieved by gradually increasing (decreasing) the light intensity during the optical pointing or video editing process [7]. Figures 1, 2 and 3 present the examples of fade in, fade out, and dissolve. Most approaches for shot boundary detection are to compute inter-frame distance from the decompressed video. Shot transitions can be detected by monitoring this distance for significant changes. In direct image comparison, changes between adjacent frames are determined on a pixel-to-pixel basis. While this approach shows generally good results, it is computationally intensive, and also sensitive to camera motion, camera zoom, and noise. More common approaches are to use histograms of frame feature data. Approaches using global histogram represent © 2008 ACADEMY PUBLISHER

Figure 3. Example of dissolve.

each frame as a single vector, while those using localized histogram generate separating histograms for subsection of each frame. Inter-frame distances are calculated using simple vector-distance measures to compare corresponding histograms. Localized histograms, used in conjunction with additional features such as edge-detection, perform well when applied in the TRECVid environment [8]. B. Video features 1) Average shot length feature The first feature is the average shot length. The average shot length as a feature represents the tempo of a scene. The director can control the speed at which the audience’s attention is directed by varying the tempo of the scene. The average shot length provides an effective measure of the tempo of a scene, and the first step in its computation is the detection of shot boundaries. A shot is defined as a sequence of frames taken by a single camera without any major change in the color content of consecutive images. After shot boundary have detected, the average shot length is then computed for each preview. This feature is directly computed by dividing the total number of frames by the total number of shots in the preview (the statistical mean). Each detected shot is represented by a key frame to analyze the shot’s color attributes. We use the middle frame of each shot as the key frame [9] 2) Color variance Intuitively, the variance of color has a strong correlational structure with respect to genres, as it can be seen, for instance, that comedies tend to have a large variety of bright colors, whereas horror films often adopt only darker hues. Thus, in order to define

28

JOURNAL OF MULTIMEDIA, VOL. 3, NO. 3, JULY 2008

a computable feature two requirements have to be met. First, a feature has to be defined that is global in nature, and second, distances in the color space employed should be perceptually uniform. We use the CIE Luv space, which was designed to approach a perceptually uniform color space. To represent the variety of color used in the video we employ the generalized variance of the Luv color space of each preview as a whole. The covariance matrix of the multivariate vector (three-dimensional in our case) is expressed as  2  2 2 σLv σL σLu 2 2  σu2 σuv ρ = σLu . (1) 2 2 σLv σuv σv2 The generalized variance is obtained by finding the determinant of Eq. (1). Σ = det(ρ).

(2)

This feature is used as a representation of the color variance. All key frames which present in a preview are used to find this feature. 3) Lighting key Lighting is an important dramatic agent. Generations of film makers have exploited luminance to evoke emotions, using techniques that are well studied and documented in cinematography circles. A deliberate relationship exists, therefore, between the lighting and the genre of a film. In practice, movie directors use multiple light sources to balance the amount and direction of light while shooting a scene. The purpose of using several light sources is to enable a specific portrayal of a scene. For example, how and where shadows appear on the screen is influenced by maintaining a suitable proportion of intensity and direction of light sources. Lighting can also be used to direct the attention of the viewer to certain area of importance in the scene. It can also affect viewer’s feeling directly regardless of the actual content of the scene. In other words, lighting is an issue not only of enough light in the scene to provide good exposure, but of light and shade to create a dramatic effect, consistent with the scene. There are many ways to illuminate a scene, we adopt a scene lighting quality ζi (µ, σ) to achieve this characteristic which is defined as ζi = µi · σi .

(3)

where µ and σ denote the mean and standard deviation of i’th key frame [9], [10]. 4) Motion content The motion content represents the amount of activity in a film. Obviously, action films would have higher values for such a measure, and less visual disturbance would be expected for dramatic or romantic movies. To find visual disturbance, a method based on the structural tensor computation is used which was described in [11]. The frames © 2008 ACADEMY PUBLISHER

contained in a video clip can be thought of as a volume obtained by considering all the frames in time. This volume can be decomposed into a set of two 2-D temporal slices I(x, t) and I(y, t), where each is defined by planes (x, t) and (y, t) for horizontal and vertical slices, respectively. To find the disturbance in the scene, the structure tensor of the slices is evaluated and expressed as µ Jxx Γ= Jxt

¶

µ P H2 = P w x w Hx Ht

¶ P t Pw Hx H 2 w Ht , (4) where Hx and Ht are the partial derivation of I(x, t) along the spatial and temporal dimensions, respectively, and w is the window of support. The direction of gray level change in w, Θ, is expressed as µ ¶ µ ¶ Jxx Jxt λx 0 T R R = . (5) Jxt Jtt 0 λt Jxt Jtt

Where λx and λt are the eigenvalues, and R is the rotation matrix. With the help of above equations, we can solve for the orientation angle θ as Θ=

2Jxt 1 . arctan Jxx − Jtt 2

(6)

When there is no motion in a shot, Θ is constant for all pixels. With global motion (e.g., camera translation) the gray levels of all pixels in a row change in the same direction. This results in equal or similar values of Θ. However, in the case of local motion, pixels that move independently will have different orientations. This can be used to label each pixel in a column of a slice as a moving or a nonmoving pixel. Thus, the overall motion content is the ratio of moving pixels to the total number of pixels in a slice. III. P ROPOSED METHOD In this section, we present the film classification method using the movie preview within the feature-based paradigm. For clustering video categories, we collect all movies which were played in Taiwan from 2004 to 2006 at [12], there are many genres can be found such as action, adventure, comedy, crime, drama, animation, thriller, war, etc. According to these data, the action, drama and thriller genres almost occupy 88% in every year. Hence, if we can classify those of categories, the great part of movies will be indicated. Based on this viewpoint, we identified three major genres: action, drama (including comedy, drama, romance) and thriller (or horror) to demonstrate our approach. After analyzing the movie distribution, we discuss how to classify movie, the general procedure divides into three steps shown in Fig. 4. The first step is shot boundary detection. For any video sequence, it will be combined by several shot. A shot is defined as a sequence of frames taken by a single camera without any major change in the color content of consecutive images. In order to

JOURNAL OF MULTIMEDIA, VOL. 3, NO. 3, JULY 2008

29

B. Shot boundary detection

Figure 4. General video classification diagram.

Shot boundary detection is used to video processing scheme to estimate the relationship between frames. In our approach, we will extend this method for detecting shot boundaries [8], [9] using color histogram intersection in HSV color model. First, video is transformed to many frames, which can be seen as still images. Then, each histogram consists of 16 bins which are eight components for hue, four components for the saturation, and four components for the value. Let S(i) represent the intersection of histograms Hi and Hi−1 of frames ith and i − 1th, respectively. It is expressed as X min(Hi (j), Hi−1 (j)). (7) S(i) = j∈allbins

analyzing a successive video frames, we have to segment the video sequence into the same shot. The second step is feature extraction. After segmenting video shot, we can analyze the color, motion, brightness from every shot, and represent the shot with these low-level features. The final is classification processing. There are many classification methods, like K-mean, classification tree, mean-shift, ada-boost, neural network, etc. Generally, these methods can distinguish the supervised and un-supervised classes. However, the choosing of classification method in our approach is based on feature distribution.

A. Color space selection YUV is used in television broadcasting and MPEG compression standard [13], [14]. HSV model is commonly used in computer graphics applications and extremely fits for human color perception. Generally, the fundamental characteristics of these two color models are briefly described in the following. YUV was also used as the standard format for common video compression algorithms such as MPEG-2. Digital television and DVDs preserve their compressed video streams in the MPEG-2 format, which uses a full YUV color space. The professional CCIR 601 uncompressed digital video format also uses YUV, primarily for compatibility with previous analog video standards. This stream can be easily mixed into any output format needed. The HSV defines a color space in terms of three constituent components, hue, saturation, and value. And it is a nonlinear transformation of the RGB color space, and may be used in color progressions. For hue, the ranges is denoted from 0-360 (but normalized to 0-100% in some applications). For saturation, its value is denoted from 0 to 100. Also sometimes called the “purity” by analogy to the colorimetric quantities excitation purity and colorimetric purity. The lower the saturation of a color, the more the presented “grayness” and the more faded the color will appear color. For value, that is the brightness of the color, and it is represented the ranges from 0 to 100. However, the function of the movie is to provide humans recreation. Hence, we adopt the YUV and HSV color spaces to extract color information. © 2008 ACADEMY PUBLISHER

The magnitude S(i) is often used as a measure of shot boundary in related works. The values of i where S(i) is less than a fixed threshold are assumed to be the shot boundaries. Applying a fixed threshold to S(i) when the shot transition occurs with a dissolve generates several outliers because the consecutive frames differ from each other until the shot transition is completed. To improve the accuracy, an iterative smoothing of the 1-D S function is performed first. S is smoothed iteratively using a Gaussian kernel such that the variance of the Gaussian function varies with the signal gradient. Formally S t+1 (i) = S t (i) + λ[cE · ∇E S t (i) + cW · ∇W S t (i)], (8) where t is the iteration number and 0 < λ < 1/4 with ∇E S(i) ≡ S(i + 1) − S(i) ∇W S(i) ≡ S(i − 1) − S(i).

(9) (10)

The condition coefficients are a function of the gradients and are updated for every iteration ctE = g(|∇E S t (i)|) t cW = g(|∇W S t (i)|), |∇E | 2

(11) (12) |∇W | 2

where g(∇E S) = e−( k ) and g(∇W S) = e−( k ) . The constants were set to λ = 0.1 and k = 0.1. Finally, the shot boundaries are detected by finding the local minima in the smoothed similarity function S. Thus, a shot boundary will be detected where two consecutive frames will have minimum color similarity. C. Visual feature extraction The low-level features contain average shot length, color variance, and lighting key. In addition, we found that human feeling can be affected by movie tempo. While a shot change, the movie editor will use video editing techniques to enhance the visual effect. In order to analyze the rhythm more detail, we propose the visual effect features which are slow moving effect and fast moving effect, the film editor can utilize shot change to create these visual effects.

30

Figure 5. Slow moving effect from movie Silent Hill. (a) Fade in and fade out. (b) Dissolve.

JOURNAL OF MULTIMEDIA, VOL. 3, NO. 3, JULY 2008

equal to brightness difference at change frame, because there has only one rapidly change in these durations. Fade: Film minima brightness can be found in these durations because the fade must have to change between original shot to black frames. Besides that, brightness change is gradually increased or decreased. Dissolve: This technique is hard to determine by brightness. The brightness change can be gradually increased, decreased, or unchanged. If a shot change is detected and cannot recognize as abrupt cut or fade, we set this shot change as dissolve. We only want to find fast moving effect and slow moving effect, the fast moving effect is defined as abrupt cut occurred in 0.2 seconds, slow moving effect is including fade and dissolve. Thus, we design a rule to classify these two change modes (denoted as CH) which present abrupt cut and gradual change (including fade and dissolve). From the change frame, the neighboring frame’s brightness is denoted B, the brightness difference between two frames is denoted Bd , the classification rule is defined as: CH ∈ Abrupt Cut, if 0.8 < T, CH ∈ Gradual Change, otherwise,

(13)

where T = (max(B) − min(B))/max(abs(Bd )) < 1.2. With this classification rule and the visual effect definition, we can extract the fast moving effect and slow moving effect. By using the frame numbers which were detected as fast/slow moving, we calculate the distance between two neighboring frames and then obtain a new vector. Quantizing this vector, we calculate the relative histograms normalized to 1. Then, we can obtain two new features which are defined as fast and slow moving effect distribution, fv and sv , respectively. In our experiments, we set the total length of visual effect distribution as 100. We will further estimate the values of these two features denoted Fme and Sme , and defined as Fme = Figure 6. Fast moving effect. (a) A flash occur in a short time from movie Silent Hill. (b) A short shot from movie i Robot.

Sme =

n X

l · fv (l),

l=1 n X

(14)

l · sv (l),

l=1

Slow moving effect: This visual effect is gradual change from shot to shot. Fade and dissolve are classified as this effect, because they usually need a long time duration in shot change. The fade and dissolve are shown in Fig. 5. Fast moving effect: This visual effect will have two or more hard changes in a short time duration. This effect heavily used in action and thriller films. These frames possess high contrast to their neighbor frames. In addition, it can also be seen as several abrupt cuts occurred in a short time, in our experiments, we set if there are at least 2 abrupt cuts occurred in 0.2 seconds, it possesses a fast moving effect. Two fast moving effects are shown in Fig. 6. Abrupt cut: The total brightness difference will almost © 2008 ACADEMY PUBLISHER

where n is set 100. If one kind of visual effect values is never occurred in a film, we set its visual effect value as 100. The visual effect value can serve as the expected value of how often this visual effect occurred in a film. The less the value, the more occurrence the visual effect.

D. Classification method Thus far, we have discussed the relevance of various low-level features of video data based on feature-space analysis. And then we choose the visual effect values and lighting key be features, building a classifier model, as shown in Fig. 7.

JOURNAL OF MULTIMEDIA, VOL. 3, NO. 3, JULY 2008

31

Figure 7. A diagram of film classification and the corresponding decision rule.

A decision tree is a flowchart of tree structure, where each node denotes a test on the corresponding attribute, each branch represents the test outcome, and leaf nodes represent the classes or class distributions. The first decision rule is decided by visual effect value, including fast moving effect value Fme and slow moving effect value Sme . For a non-drama film, its fast moving effect value will less than drama film. We construct the classification trees to calculate the threshold Tf m and Tsm and further to distinguish a film which belongs to drama or non-drama film. For a film F (i), the decision rule is defined as

Figure 8. Variability of average shot length.

F (i) ∈ Non-drama, if Fme < Tf m and Sme < Tsm , F (i) ∈ Drama, otherwise. (15) The second decision rule is decided by lighting information. Although its significance is less than other features to distinguish Drama or Comedy films, this feature can be still used to distinguish Action or Thriller films. The lighting value is indicated L. A threshold Tl will be calculated by means of classification trees, and Action and Thriller films can be distinguished by using a decision rule, it is expressed as F (i) ∈ Action, if L > Tl , F (i) ∈ Thriller, if L ≤ Tl .

(16)

Thus, we can express all decision rules as   F (i) ∈ Action,      if Fme < Tf m and Sme < Tsm and L > Tl ,  F (i) ∈ Thriller,    if Fme < Tf m and Sme < Tsm and L ≤ Tl ,    F (i) ∈ Drama, otherwise. IV. E XPERIMENTAL RESULTS In our experiments, we choose 44 films to test and verify our proposed method, including 9 thrillers, 10 actions, and 25 dramas (containing comedies). For testing the performance of visual effects, we choose six film previews, total number of abrupt cut and gradual change (including fade and dissolve) is 484 and 176, respectively. Figure 8 shows the variability of average shot length, and it is clearly known that the average shot length in © 2008 ACADEMY PUBLISHER

Figure 9. Distribution of motion content and color variance.

action films is shorter than drama and comedy films, which means the action film has faster tempo than drama and comedy films, thus they can be distinguished by this feature. Drama and comedy films are almost the same at this feature. Motion content and color variance distribution are shown in Fig. 9. In Fig. 9, we can find that the activity in action film is higher than drama film. For color variance, (17)it is a strong correlational structure with respect to genres. However, in our experiments, it is hard to crisply distinguish film genres. In other words, the color information is not a critical condition. Figure 10 shows the result of visual effect values. The action and thriller films tend to lower the fast moving effect value because there will have more shots about flash or short action when the fast moving effect occurs. That is, these frames happened these effects are centralized to the neighboring frames. Figure 11 shows the distribution of lighting key feature. From Fig. 11, the lighting factor in these films is an insignificant feature. We use the precision and recall [15] to evaluate the performance of visual effect detection. Precision and recall are denoted the accuracy

32

JOURNAL OF MULTIMEDIA, VOL. 3, NO. 3, JULY 2008

TABLE II. G RADUAL CHANGE DETECTION RESULT.

Nc 173

Na 176

Nd 232

Precision(%) 74.57

Recall(%) 98.01

precision [16] as: precision = t/w,

(20)

where t is the number of samples that actually belongs to this film genre and is correctly classified. w is the number of all samples that is classified as this film genre. We perform the decision rules every time after estimating the threshold, then it will test the decision rules using other films in test set. The experimental result shows the average precision is 73.33%. The decision accuracies for each video category are close to the precision of decision rules.

Figure 10. Distribution of visual effect value.

V. C ONCLUSIONS In this paper, we have used the low level and visual features to classify the film genres. The experimental results are presented that the combining visual cues with cinematic principles can provide the powerful tools for genre categorization. In the future, we will combine audio or text cues to improve the classification result. Beside, we also interest to develop the relationships between highlevel semantics and low-level information to construct a content-based movie classification system. Figure 11. Distribution of lighting key.

ACKNOWLEDGMENTS This work was supported in part by the National Science Council of Republic of China under Grant No. NSC 94-2213-E-007-055 and NSC 95-2221-E-150-091.

and ability of detection, respectively, defined by Recall

=

P recision

=

Nc × 100%, Na Nc × 100%, Nd

(18) (19)

where Nc , Na , and Nd denote the number of correct detected, the number of actual occurred, and the number of detected, respectively. The detection results of abrupt cut and gradual change (including fade and dissolve) are presented in Table I and II, respectively. In order to obtain the classification result, we choose two-thirds films for each category randomly for training the classification tree, and we will calculate the decision threshold Tf m , Tsm , and Tl by Eq. (16), the rest of films are used as the test data. And we repeat this step six times for all of films. The accuracy of this classifier can be measured by TABLE I. A BRUPT CUT DETECTION RESULT.

Nc 467

Na 484

Nd 486

Precision(%) 96.09

© 2008 ACADEMY PUBLISHER

Recall(%) 96.49

R EFERENCES [1] M. J. Pickering and S. Ruger, “Evaluation of key framebased retrieval techniques for video,” Computer Vision and Image Understanding, vol. 92, no. 2/3, pp. 217–235, Nov.Dec. 2003. [2] C. Y. Low, H. J. Zhang, and S. W. Smoliar, “Video parsing, retrieval and browsing: an integrated and content-based solution,” in Proc. of the ACM Int. Conf. on Multimedia, 1995, pp. 15–24. [3] T. S. Chua and L. Q. Ruan, “A video retrieval and sequencing system,” ACM Trans. on Inform. Syst., vol. 13, no. 4, pp. 373–407, 1995. [4] B. F. M. R. Naphade, T. Kristjansson and T. S. Huang, “Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems,” in Proc. of the IEEE Int. Conf. on Image Processing, 1998. [5] N. H. R. Qian and I. Sezan, “A computational approach to semantic event detection,” in Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recog, 1999. [6] M. J. Pickering, D. Heesch, R. O’Callaghan, S. Ruger, and D. Bull, “Video retrieval using global features in keyframes,” in Proc. of the 11th Text Retrieval Conf., 2003.

JOURNAL OF MULTIMEDIA, VOL. 3, NO. 3, JULY 2008

[7] A. Hampapur, R. Jain, and T. Weymouth, “Digital video segmentation,” in Proc. of ACM Int. Conf. on Multimedia, 1994, pp. 357–364. [8] A. F. Smeaton, P. Over, and W. Kraaij, “Evaluation campaigns and TRECVid,” in Proc. of the 8th ACM Int. Workshop on Multimedia Inform. Retrieval, 2006. [9] Z. Rasheed, Y. Sheikn, and M. Shah, “On the use of computable features for film classification,” IEEE Trans. on Circuits and System for Video Technology, vol. 15, no. 1, pp. 52–63, Jan. 2005. [10] Pacific Cinematiheque, Lighting in filmmaking. http://www.inpoint.org/. [11] B. Jahne, Spatio-tmporal image processing: theory and scientific applications. New York: Springer-Verlag, 1991. http://www.truemovie.com. [12] [13] Y. Wang, J. Ostermann, and Y. Q. Zhang, Video processing and communication. Prentice Hall, 2001. [14] G. Rafae and E. Richard, Digital image processing, 2ed. Prentice Hall, 2002. [15] G. Lupatini, C. Saraceno, and R. Leonardi, “Scene break detection: a comparison,” in Proc. of IEEE Int. Workshop on Research Issues in Data Engineering, Feb. 1998, pp. 34–41. [16] Y. Yuan, Q. B. Song, and J. Y. Shen, “Automatic video classification using decision tree method,” in Proc. of the First Int. Conf. on Machine Learning and Cybernetics, 2002, pp. 1153–1157.

Hui-Yu Huang received the BS degree in electronic engineering from Feng Chia University, Taiwan, in 1992, and the MS degree in electrical and computer engineering from Yuan-Ze Institute of Technology, and the PhD degree in electrical engineering from National Tsing Hua University, Taiwan, in 1994 and 2002, respectively. Since 2005, she has been with the Department of Computer Science and Information Engineering at National Formosa University in Taiwan, where she is now an assistant professor. Her research interests include multimedia processing, neural networks, pattern recognition, and content-based image/video retrieval. Dr. Huang is a member of IEEE, the Chinese Association of Image Processing and Pattern Recognition.

Weir-Sheng Shih received the BS degree in Aeronautical Engineering from National Formosa University, Taiwan, in 2003, and the MS degree in electrical engineering from National Tsing Hua University, Taiwan, in 2005. His research interests include image/video coding, multimedia system.

Wen-Hsing Hsu received the BS degree in electrical engineering from National Cheng Kung University, Taiwan, in 1972, and the ME and PhD degrees in electrical engineering from Keio University, Japan, in 1978 and 1982, respectively. In 1982, he joined in the Department of Electrical Engineering at National Tsing Hua University in Taiwan, where he is now a professor. His research interests include image processing, biologic identification, network security. Dr. Hsu is a member of the Institute of Electronics, Information and Communication Engineers (IEICE), Information Processing Society of Japan, the Chinese Association of Image Processing and Pattern Recognition.

© 2008 ACADEMY PUBLISHER

33

Lihat lebih banyak...

A Film Classifier Based on Low-level Visual Features

Descripción

Comentarios