Analysis of Kernel Approach in Fuzzy-Based Image Classifications

(1)

Analysis of Kernel Approach in Fuzzy-Based

Image Classifications

Ishuita Sengupta

1

_{, Mragank Singhal}

2

Department of Computer Applications, Teerthanker Mahaveer University, Moradabad 1 Department of Computer Applications, Teerthanker Mahaveer University, Moradabad 2

ishuitasengupta8@gmail.com 1

mraganksinghal@gmail.com 2

Abstract- This paper presents a framework of kernel approach in the field of fuzzy based image classification in remote sensing. The goal of image classification is to separate images according to their visual content into two or more disjoint classes. Fuzzy logic is relatively young theory. Major advantage of this theory is that it allows the natural description, in linguistic terms, of problems that should be solved rather than in terms of relationships between precise numerical values. This paper describes how remote sensing data with uncertainty are handled with fuzzy based classification using Kernel approach for land use/land cover maps generation. The introduction to fuzzification using Kernel approach provides the basis for the development of more robust approaches to the remote sensing classification problem. The kernel explicitly defines a similarity measure between two samples and implicitly represents the mapping of the input space to the feature space.

Keywords – Support Vector Machine (SVM), Multispectral, hyperspectral, image classification, Sub-Pixel, Classification, Fuzzy sets.

I.INTRODUCTION

Fuzzy logic is relatively young theory. Major advantage of this theory is that it allows the natural description, in linguistic terms, of problems that should be solved rather than in terms of relationships between precise numerical values. This advantage, dealing with the complicated systems in simple way, is the main reason why fuzzy logic theory is widely applied in technique. It is also possible to classify the remotely sensed image (as well as any other digital imagery), in such a way that certain land cover classes are clearly represented in the resulting image. Modern remote sensing image archives contain large sets of heterogeneous date e.g. multisensor, multispectral, multitemporal.

Fuzzy set theory (Zadeh, 1965), which was triggered by these considerations, provides a conceptual framework for solving knowledge representation and classification problems in an ambiguous environment. The Fuzzy concept has been adopted in different fields such as fuzzy logic control (Wang et al., 2005; Lam and Leung, 2007), fuzzy neural networks (Change et al., 2005; Mail and Mitra, 2005), and fuzzy rule base (Bardossy and Samaniego, 2002; Pal et al., 2005). The fuzzy concept is also a valuable tool for dealing with classification problems. In remote sensing classification, fuzzy-based classifiers are becoming increasingly popular.

(2)

II.IMAGE CLASSIFICATION

Remote sensing is a wide branch of science with tremendous development in the last several decades. Over two dozen optical satellites are currently in orbit doing earth imaging.

Image Classification is a process of classifying multispectral (hyperspectral) images into patterns of varying gray or assigned colors that represent either clusters of statistically different sets of multiband data, some of which can be correlated with separable classes/features/materials. This is the result of Unsupervised Classification, or numerical discriminators composed of these sets of data that have been grouped and specified by associating each with a particular class, etc. whose identity is known independently and which has representative areas (training sites) within the image where that class is located.

Classification can be considered as a useful representation in most of decision problems, simplifying information by means of an informative scheme of the main issues to be taken into account.

Classification is commonly done one by one image, getting training areas for each individual image and then performing supervised hyper spectral classification of the image.

The information contained in hyper spectral data allows the characterization, identification, and classification of land covers with improved accuracy and robustness. In the remote sensing literature, many supervised and unsupervised methods have been developed for multi- and hyper spectral image classification (e.g. maximum likelihood classifiers, neural networks, neuro-fuzzy models, etc.) However, an important problem in the context of hyperspectral data is the high number of spectral bands and relatively low number of labeled training samples, which poses the well-known Hughes phenomenon. This problem is usually reduced by introducing a feature selection/extraction step before training the hyperspectral classifier with the basic objective of reducing the high input dimensionality. However, including such a step is time-consuming, scenario-dependent, and sometimes requires a priori knowledge.

III.FUZZY BASED IMAGE CLASSIFICATION

Fuzzy-rough set theory is a hybridization of rough sets and fuzzy sets, which is capable of dealing with imprecision and uncertainty in data. As a hybridization of fuzzy set theory and rough sets, fuzzy-rough sets not only inherit the domain independence of rough sets, but also address the inability of rough sets in handling real-valued data. That is, fuzzy-rough sets provide a means to deal with discrete or real-valued noisy data (or a mixture of both) without the need for user-supplied thresholding or domain information.

Generally the fuzzy logic is based upon supervised image classification. The concept of maximum likelihood along with fuzzy inference system is the key feature in classification. Fuzzy inference is the process of formulating the mapping of image input with that of the output using fuzzy logic.

(3)

Figure 1. C

The information contained in hy land covers with improved accura unsupervised methods have been likelihood classifiers, neural network hyperspectral data is the high numbe poses the well-known Hughes p selection/extraction step before traini dimensionality.

In recent years, kernel methods, have demonstrated excellent perform The properties of kernel methods m since they can handle large input sp and deal with noisy samples in a robu

The good classification performa could be further increased by includi been successfully illustrated in other etc.).However, to the authors’ know develop the classifier, and thus, the s

Kernel methods have the ability dimensional) input space onto a n structure of the classification task is t

Classification of image input on the basis of RGB color pixel values IV.KERNEL METHODS

hyperspectral data allows the characterization, identific uracy and robustness. In the remote sensing literatu n developed for multi- and hyperspectral image clas orks, neuro-fuzzy models, etc.) However, an important ber of spectral bands and relatively low number of label phenomenon. This problem is usually reduced b

ining the hyperspectral classifier with the basic objective

, such as support vector machines (SVMs) or kernel Fi rmance in hyperspectral data classification in terms of make them well-suited to tackle the problem of hypersp spaces efficiently, work with a relatively low number o obust way.

ance demonstrated by kernel methods using the spectral uding contextual (or even textural) information in the cla her classification algorithms (EM, k- Nearest Neighbor c

wledge, kernel methods have so far taken into account e spatial variability of the spectral signature has not been

lity to deal with nonlinear models by mapping a give new (higher-dimensional) space via a non-linear tran is then linearly separable.

ues

fication, and classification of ature, many supervised and lassification (e.g. maximum ant problem in the context of beled training samples, which by introducing a feature ive of reducing the high input

Fisher discriminate analysis, of accuracy and robustness. rspectral image classification r of labeled training samples,

tral signature as input features classifier, something that has r classifiers, neural networks, nt the spectral information to en considered.

(4)

V.KERNELS IN SVM

SVMs are designed to solve two-class problems. Two approaches can be used for an M-class problem. One approach is called one against all; in this M classifiers are iteratively applied on each class against all the others. Other is called one against one; M (M-1)/2 classifiers are applied on each pair of classes, the most often computed label is kept for each vector. The kernel function is constructed by SVM algorithm to map the training data into a higher dimensional space when the linear separation is impossible in the original one. SVM can be generalized to compute nonlinear decision surfaces. The method consists in projecting the data in a higher space where they are considered to become linearly separable. SVM applied in this space lead to the determination of nonlinear surfaces in the original space. Actually, the projection can be simulated using a kernel method (Grégoire et al, 2003).

Every function K(⋅,⋅)that satisfies mercer’s conditions may be considered as an eligible kernel. The Mercer’s conditions state as:

is finite, then

∫K(x,y) g(x) g(y) dxdy ≥ 0.

A great number of kernels exist and it is difficult to explain their individual characteristics. The kernels used in work are known as local kernels, global kernels and spectral kernels, which are mentioned as follows-

Local kernels: Only the data that are close or in the proximity of each other’s had an influence on the kernel values. Basically, all kernels that are based on a distance function are local kernels. Examples of typical local kernels are-

Gaussian:

K(x, xi) = exp (-0.5 (x – xi) A-1(x – xi) T

where A have three following norms- A = I Euclidean Norm

A = Dj-1 Diagonal Norm

A = Cj-1 Mahalonobis Norm

Radial basis:

K (x, xi) = exp (- ║x – xi║2)

KMOD:

K (x, xi) = exp

1

2 1

1

−

  

 

  

 

−

+ x xi

Inverse Multiquadric:

K (x, xi) =

(

1

)

1

2

+ −xi x

Global kernels: Samples that are far away from each others still have an influence on the kernel value. All kernels based on the dot product are global. Examples of typical global kernels are-

(5)

K (x, xi) = (x.xi + 1)p Sigmoid:

K (x, xi) = tanh (x.xi + 1)

Spectral kernels: The local kernels are based on a quadratic distance evaluation between two samples. In order to fit hyperspectral point of view, it is of interest to consider new criteria that take into consideration spectral signature concept.

Spectral angle (SA) α(x, xi) is defined in order to measure the spectral difference between x and xi while being robust to differences of the overall energy (e.g. illumination, shadows etc.) (Grégoire et al, 2003).

Spectral angle (SA): α(x, xi) = arcos













i i

x

.

As in remote sensing data multi-spectral images are sharpen while fussing multi-spectral image with panchromatic image. Same in the case of kernel function, mixture of kernels can be used (Grégoire et al, 2003). Linear mixture of kernels can fit the dual characteristics; characteristics of dot product or Euclidian distance and also characteristic of spectral angle.

Mixture of kernels may be defined as:

K (x,xi) = µ Ka (x, xi) + (1 - µ) Kb (x, xi) where Ka (x, xi) and Kb (x, xi) are two kernels (e.g., local, global and spectral angle). Since Ka (x, xi) and Kb (x, xi) satisfy Mercer’s conditions, all linear combinations are eligible for kernels. In this work Ka (x, xi) kernel has been taken any local or global kernel and Kb (x, xi) kernel has been taken as Spectral kernel.

VI.KERNELS BASED MULTI-SPECTRAL IMAGE CLASSIFICATION

Classification of multispectral data with high resolution from urban areas by combining hierarchical image segmentation and composite kernel Support Vector Machines (SVMs) is investigated.

The pixel-based classification was conducted by incorporating structural information from mathematical morphological profile. A mathematical morphological profile is constructed based on the repeated use of geodesic openings and closings with a structuring element of increasing size, starting with the original image. Since the profile includes a range of increasing opening and closing by reconstruction operation, the resulting profile can be high-dimensional.

In order to effectively add structural information for image classification, the composite kernel SVMs were selected as classifier, in which different kernel functions are used for spectral and structural information, respectively. After pixel-based classification, a technique that utilizes multichannel watershed transformation with dynamic of the contours was used to segment the image to facilitate further object-based classification. Traditional watershed segmentation defined for gray level image was extended to multispectral image segmentation by computing multispectral gradient image through a vector based approach, which uses extended dilation and erosion operations. The hierarchical multispectral image segmentation was then conducted by dynamic of the contours.

(6)

VII.KERNELS FOR HYPERSPECTRAL IMAGE CLASSIFICATION

A full family of composite kernels for the combination of spectral and contextual information is presented in this section. For this purpose, three steps are followed:

Pixel definition. A pixel entity xi is redefined simultaneously both in the spectral domain using its spectral

content, ω

ω N

i

x

∈

ℜ

_{and in the spatial domain by applying some feature extraction to its surrounding area,}

s

N s i

x

∈

ℜ

_{, which yields Ns spatial (contextual) features, e.g. the mean or standard deviation per spectral band.}

Kernel computation. Once the spatial and spectral feature vectors

s i

x

_and

x

iω_{are constructed, different kernel}

matrices can be easily computed using any suitable kernel function that fulfils Mercer’s conditions.

Kernel combination. At this point, we take advantage of the direct sum of Hilbert Spaces by which two (or more) Hilbert spaces Hk can be combined into a larger Hilbert space. This well-known result from Functional Analysis Theory [21] allows us to sum spectral and textural dedicated kernel matrices (Kω and Ks, respectively), and introduce the cross-information between textural and spectral features (Kωs and Ksω) in the formulation.

In the following, we present four different kernel approaches for the joint consideration of spectral and textural information in a unified framework for hyperspectral image classification.

VIII.VARIOUS KERNEL FUNCTION VERSES OVERALL ACCURACY

The effect of different kernel functions on sub-pixel classification of LISS-III image from Resource sat–1, (IRS-P6) satellite were studied while using density estimation algorithm based on Support Vector Machine approach for sub-pixel classification. The learning parameters for Support Vector Machine approach were kept constant for all the kernel functions. The training as well as testing data used for supervised approach was >10n, were n is dimension of data used. Separate data were used at training as well as at testing stage. At testing stage 500 samples were taken for overall accuracy assessment of sub-pixel output. The effect of different kernel functions were observed on sub-pixel classification output using Fuzzy Error Matrix (Binaghi et al., 1999). The overall accuracy of sub-pixel classification, obtained while using different combinations of kernel functions in Support Vector Machine approach are mentioned in Table 1.

Table 1: Overall Accuracy while using different Mixed Kernel functions. S.

No .

Mixed Kernal Function

Overall Accuracy (%)

Ka Kb µ=0.

1

µ=0. 2

µ=0. 3

µ=0. 5 1 Gaussian

with Euclidean

Norm

Spectral Kernal

94.56 91.22 93.41 91.58

2 Gaussian with Mahalonobis

Norm

Spectral Kernal

90.85 93.70 91.25 93.79

3 Gaussian with Diagonal

Norm

Spectral Kernal

88.23 93.36 93.19 90.85

(7)

Kernal 6 Inverse

Multiquadric

Spectral Kernal

92.26 93.52 94.12 92.82

7 Linear Spectral Kernal

91.89 93.30 93.61 93.06

8 Polynomial (1st order)

Spectral Kernal

92.56 91.06 94.44 91.54

9 Sigmoid Spectral Kernal

90.74 90.19 91.95 94.27

IX.CONCLUSIONS

We have presented an approach for classification of remotely sensed imagery using kernel methods. Basically, all kernels that are based on a distance function are local kernels and in local kernels the data that are close or in the proximity of each other’s have an influence on the kernel values. But in global kernels samples that are far away from each other still have an influence on the kernel value. All kernels based on the dot product are global. In spectral kernel, Spectral angle (SA) ) , ( i x x α is defined in order to measure the spectral difference between x and xi while being robust to differences of the overall energy (e.g. illumination, shadows etc.). To fit the dual point of view: similarity according to the dot product or Euclidian distance and also, similarity according to the spectral shape (SA), Mixture of kernels have been used in this study.

REFERENCE

[1] Anil Kumar, S. K. Ghosh, V. K. Dadhwala, “Study Of Mixed Kernel Effect On Classification Accuracy Using Density Estimation”. [2] Prashant K.Maurya,Hari K .Singh ,”Analysis of Remote Sensed Data using Hybrid Intelligence System”

[3] Drucker, H., Burges, C.J.C., Kaufman, L., Smola, A. and Vapnik, V. 1997, Support vector regression machines. Advances in Neural Information Processing Systems, 9:155–161.

[4] Girosi, F. 1998, An equivalence between sparse approximation and support vector machines. Neural Computation; CBCL AI Memo 1606, MIT.

[5] Mukherjee, S., Osuna, E. and Girosi, F. 1997, Nonlinear prediction of chaotic time series using a support vector machine. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing 7, pages 511–519, Amelia Island, FL.

[6] Osuna, E. and Girosi. F. 1998, Reducing the run-time complexity of support vector machines. In International Conference on Pattern Recognition (submitted).

[7] SchÖlkopf, B., Burges, C. and Vapnik, V. 1995, Extracting support data for a given task. In U. M. Fayyad and R. Uthurusamy, editors, Proceedings, First International Conference on Knowledge Discovery & Data Mining. AAAI Press, Menlo Park, CA.

[8] SchÖlkopf, B., Smola, A. and Müller, K. R. 1998, Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation. In press.

[9] Schmidt, M. 1996, Identifying speaker with support vector networks. In Interface 96 Proceedings, Sydney.

[10] Vapnik, V., Golowich, S. and Smola, 1996, A. Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems, 9:281–287.

[11] Burges CJC.Geometry and invariance in kernel based methods.In Scholkopf B,Burges C, Smola A, Eds. Advance in Kernel Methods Support Vector Learning. Cambridge,MA:MIT Press, Cambridge, 1999,pp:89-116.

[12] Girolami M. Mercer kernel based clustering in feature space. IEEE Trans. on Neural Networks, 2002,13(3):780- 784