Kernel Postprocessing of Multispectral Images

July 6, 2017 | Autor: Marcin Michalak | Categoría: Machine Learning, Time Series, Nonparametric Regression, Kernel Regression, Multispectral Images

Share Embed

Laporkan tautan ini

Descripción

Kernel Postprocessing of Multispectral Images ´ Marcin Michalak and Adam Swito´ nski

Abstract. Multispectral analysis is the one of possible ways of skin desease detection. This short paper describes the nonparametrical way of multispectral image postprocessing that improves the quality of obtained pictures. The method below may be described as the regressional approach because it uses kernel regression function estimator as its essence. The algorithm called HASKE was developed as the time series predictor. Its simplification may be used for the postprocessing of multispectral images. Keywords: Multispectral images analysis, nonparametrical regression, machine learning, HASKE.

1 Introduction Every color may be described as the finite set of real values. Depending of the features we choose we obtain models like RGB, HSV and many others. When the number of color features, called channels, increases we say about multispectral or hyperspectral color definitions. We assume, that increasing the number of channels will give us more information about every pixel, or in other words the discernibility of different pixels will be higher. This article describes the kernel method of multispectral image postprocessing and points the direction of its development to obtain the scalable postprocessing algorithm that would give also high quality of results. Marcin Michalak Central Mining Institute, Plac Gwarkow 1, 40-166 Katowice, Poland e-mail: [email protected] ´ Marcin Michalak · Adam Swito´ nski Silesian University of Technology, ul. Akademicka 16, 44-100 Gliwice, Poland e-mail: {Marcin.Michalak,Adam.Switonski}@polsl.pl ´ Adam Swito´ nski Polish-Japanese Institute of Information Technology, ul. Koszykowa 86, 02-008 Warszawa, Poland R. Burduk et al. (Eds.): Computer Recognition Systems 4, AISC 95, pp. 395–401. springerlink.com © Springer-Verlag Berlin Heidelberg 2011

´ M. Michalak and A. Swito´ nski

396

It is common, that measured samples (spectra) of the same color may be different due to the distance from the camera or the difference in lightening density. If the reference point is given, for example correct spectra of finite number of colors, it should be possible to describe the dependence between the observed spectrum and the expected one. Next chapter introduces the research context of the presented paper. Then some theoretical aspects are described, especially the definition of HASKE algorithm that became the essential part of the kernel postprocessing. Afterwards experiments and their results are described. The paper ends with some conclusions and possible perspectives of further works.

2 Previous Research Multispectral analysis becomes more popular as a tool for skin disaes diagnosis [6][1]. In the paper [8] similar problem is raised: multispectral images for skin tumour tissue detection. On the base of acquired skin images and tumor regions pointed by medical experts some popular classifiers were build. Simultanously, the other point of interest was defined: how does the measuring device influence obtained images? It occured, that for the well define given color (color with its spectrum) a wide range of spectra were obtained from the device (device is shown on the Fig. 1).

Fig. 1 Acquiring device.

In the paper [4] two approaches of multispectral images postprocessing were proposed. First of them (called classificational) give quite good results but has the disadvantage of being not scalable for colors that did not take the participation in building the classifier. Second one (called regressional) is scalable, but does not give satisfying results. This paper shows the results of regressional approach improvement.

Kernel Postprocessing of Multispectral Images

397

Fig. 2 Colors from the ColorChecker.

3 Regressional Postprocessing For the further reading of the paper the following notions are defined: specimen - the real spectrum of the colour from the GretagMacbeth ColorChecker; sample - the measured spectrum from the ColorChecker; (color) profile - the aggregated information about all samples from the same colour of the ColorChecker. The regressional approach of multispectral images postprocessing is based on the assumption that there exists some unknown regression function between the observed spectrum and the real one (close to the specimen). This dependence may be the one of the following: • ”1 vs 1”: describes how the c-th color of the specimen depends on the c-th color of original spectrum; • ”1 vs all”: describes how the c-th color of the specimen depend on the all image channels; • ”1 vs all ext.”: extension of previous model wit the usage of additional input variables like the spectrum intergal.

3.1 Nadaraya–Watson Estimator The Nadaraya–Watson estimator [5][10] is defined in the following way: n x − xi y K ∑ i h i=1 fe(x) = n x − xi K ∑ h i=1

(1)

where fe(x) means the estimator of the f (x) value, n is a number of training pairs (x, y), K is a kernel function and h is the smoothing parameter. As it is described in [7] and [9], the selection of h is more important than the selection of the kernel function. It may occure that small values of the h cause that the estimator fits data too much. From the other hand big values of this parameter h lead to the estimator that oversmooths dependencies in the analysed set.

´ M. Michalak and A. Swito´ nski

398

One of the most popular method of evaluation the h parametr is the analysis of the approximation of the Mean Integrated Squared root Error (MISE). MISE(h) =

Z

E{ feh (x) − f (x)}2 dx

(2)

The generalization of the above formula, with the introduction of Integrated Squared Bias(ISB) and Integrated Variance (IV), leads to the Asymptotic Mean Integrated Squared root Error (AMISE): AMISE(h) = R

R(K) 1 4 4 + σK h R( f ′′ ) nh 4

(3)

R

∞ α where R(L) = L2 (x)dx and σKα = −∞ x K(x)dx. Optimization the AMISE in respect of h gives: 1 5 1 R(K) (4) h0 = · n− 5 4 ′′ σK R( f )

The value of the expression R(K) depends on the choosen kernel function K, but the value of the expression R( f ′′ ) is unknown, so it is being replaced by some estimators. Further generalization leads to the following expression: e 34)n−1/5 h0 = 1, 06 min(σe , R/1,

(5)

Details of derivations can be found in [7]. More advanced methods of h evaluation can be found in [9]. One of the most popular is the Epanechnikov kernel [2]. If Vq denotes the volume of the q–dimensional unit sphere in the Rq space: ( (2π )q/2 for q even Vq = 2(22·4·...·n π )(q−1)/2 for q odd 1·3·...·n then the Epanechnikov kernel is defined as: K(x) =

q+2 (1 − ||x||), ||x|| ≤ 1 2Vq

Other popular kernel function are listed in the Table 1 where I(A) means the indicator of the set A.

3.2 HASKE Algorithm Heuristic Adaptive Smoothing parameter Kernel Estimator (HASKE) [3] was developed as the kernel time series predictor. As the first step of the algorithm the mapping between the time series domain (t, xt ) and the regression domain (xt , xt+p )

Kernel Postprocessing of Multispectral Images

399

Table 1 Popular kernel functions. kernel type onedimensional Epanechnikov K(x) = 34 1 − x2 I(−1 < x < 1) Uniform

K(x) = 12 I(−1 < x < 1)

Triangular K(x) = (1 − |x|)I(−1 < x < 1)

15 (1 − u2 )I(−1 < x < 1) 16 K(x) = √1 exp −u2 /2 2π

Biweight K(x) = Gaussian

K(x) =

multidimensional q+2 2Vq (1 − ||x||)I(||x|| < 1)

1 Vq I(||x|| < 1) p q+1 K(x) = Vq (1 − ||x||)I(||x|| < 1) (q+2)(q+4) K(x) = (1 − ||x||)2 I(||x|| < 1) Vq K(x) = (2π )−q/2 exp (−||x||/2)

K(x) =

is performed. The p parameter is usually a period of the time series. After the mapping, the final prediction is proceeded as the estimation of the regression function task that is divided into two steps. Before the first step the division of the training set into the smaller training step and the tuning set is performed. Then the value of the smoothing parameter h, calculated on the smaller training set with the Eq. 5, is modified with the usage of parameter µ : h′ = µ h. The value of the µ parameter that gives the smallest error on the tune set is considered as the optimal one1 . To avoid the underestimation (or overestimation) of the result the second HASKE parameter is introduced. The underestimation αi is defined as the quotient of estimated value yei and the real value yi . Then, the final underestimation α is defined as the median of tune set underestimations values calculated with the usage of h′ smoothing parameter and the Nadaraya–Watson estimator. After the first step, when two new parameters µ and α are calculated, the final regression step is performed: on the basis of the (bigger) training set smoothing parameter is calculated, then its value is multiplied by the µ value and the result of Nadaraya–Watson estimator is divided by α . For the purpose of using HASKE for simple regression there is no need to transform data from one space to another. Only division the train set into train and tune set must be performed. On the basis of µ and α the final formula of regression takes the following form: n xi − xt ∑ yi K h · µ 1 i=1 e (6) f (xt ) = α n xi − xt K ∑ h·µ i=1

4 Experiments and Results The data set contained 31 456 observed spectra from 24 colors. Four of them were represented by 2130 instances, one was represented by 1238 instances and the rest 1

For the time series it is a little more complicated and the notion of time series phase is introduced. Details are described in detail in [3].

´ M. Michalak and A. Swito´ nski

400

of them were represented by 1142 instances. Specimen spectra were 36 channel and were interpolated linearly into 21 channel (domain of the acquisition device spectra). The data set was divided randomly into train set (28 311 objects) and the test set (3 145 objects). As the quality measure the root of the mean value of the squared absolute error (RMSAE) was used: s 1 n cei − ci 2 (7) RMSAE = ∑ ci n i=1 All three models of data dependance were examined: ”1 vs 1”, ”1 vs all”, ”1 vs all ext.”. For each of the model three regressors were used: Nadaraya–Watson (NW ), HASKE without underestimation step (µ HASKE) and full HASKE. The result comparision is shown in the Table 2. Table 2 Postprocessing results. regressor model NW µ HASKE 1 vs 1 102,8% 93,2% 1 vs all 100,0% 130,2% 1 vs all (ext.) 100,0% 46,7%

HASKE 93,24% 236,4% 42,3%

We may see that the usage of HASKE gives different results for different models of regressional dependency. The best results may be observed for the third one. In this case it is also visible that the following steps of HASKE give better results that the typical regression with Nadaraya–Watson estimator.

5 Conclusions and Further Works Referring to the results of the previous research [4] in this paper the regressional approach of multispectral images postprocessing was developed. The usage of time series dedicated HASKE algorithm gave better results than non-nodified Nadaraya– Watson kernel estimator. The adaptive modification of the smoothing parameter value that is the essential part of HASKE should be regarded as the main reason of the results improvement. This means that furhter works should focus on the problem of h parameter evaluation. It also occured that adaptive methods of smoothing parameter evaluation should be applied for the third model of spectra dependencies representation. It may be also interesting to expand this model with several new variables like the typical statistics: mean, maximal and minimal value, standard deviation etc. It may be also interesting to recognize which spectra channels provide most of the color information. In other words – which channels are the most useless ones and make the models more complicated.

Kernel Postprocessing of Multispectral Images

401

Acknowledgments. This work was financed from the Polish Ministry of Science and Higher Education resources in 2009-2012 years as a research project.

References [1] Blum, A., Zalaudek, I., Argenziano, G.: Digital Image Analysis for Diagnosis of Skin Tumors. Seminars in Cutaneous Medicine and Surgery 27(1), 11–15 (2008) [2] Epanechnikov, V.A.: Nonparametric Estimation of a Multivariate Probability Density. Theory of Probab. and its Appl. 14, 153–158 (1969) [3] Michalak, M.: Time series prediction using new adaptive kernel estimators. Adv. in Intell. and Soft Comput. 57, 229–236 (2009) ´ [4] Michalak, M., Swito´ nski, A.: Spectrum evaluation on multispectral images by machine learning techniques. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2010. LNCS, vol. 6375, pp. 126–133. Springer, Heidelberg (2010) [5] Nadaraya, E.A.: On estimating regression. Theory of Probab. and its Appl. 9, 141–142 (1964) [6] Prigent, S., Descombes, X., Zugaj, D., Martel, P., Zerubia, J.: Multi-spectral image analysis for skin pigmentation classification. In: Proc. of IEEE Int. Conf. on Image Process (ICIP), pp. 3641–3644 (2010) [7] Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman & Hall, Boca Raton (1986) ´ [8] Swito´ nski, A., Michalak, M., Josi´nski, H., Wojciechowski, K.: Detection of tumor tissue based on the multispectral imaging. In: Bolc, L., Tadeusiewicz, R., Chmielewski, L.J., Wojciechowski, K. (eds.) ICCVG 2010. LNCS, vol. 6375, pp. 325–333. Springer, Heidelberg (2010) [9] Turlach, B.A.: Bandwidth Selection in Kernel Density Estimation: A Review. C.O.R.E. and Institut de Statistique, Universite Catholique de Louvain (1993) [10] Watson, G.S.: Smooth Regression Analysis. Sankhya - The Indian J. of Stat. 26, 359– 372 (1964)

Lihat lebih banyak...

Kernel Postprocessing of Multispectral Images

Descripción

Comentarios