Contrast measures and stochastic distances in a model for speckled data

Texto

(1)CONTRAST MEASURES AND STOCHASTIC DISTANCES IN A MODEL FOR SPECKLED DATA. ˜ DAVID COSTA DO NASCIMENTO ABRAAO. Advisor: Prof. Dr. Alejandro C. Frery Concentration area: Computational Statistics. Dissertation submitted to the Department of Statistics at Federal University of Pernambuco for the degree of MSc in Statistics. Recife, 2008.

(2) Nascimento, Abraão David Costa do Contrast measures and stochastic distances in a model for speckled data / Abrãao David Costa do Nascimento. – Recife: O Autor, 2008. xii, 149 folhas : il., fig., tab. Dissertação (mestrado) – Universidade Federal de Pernambuco. CCEN. Depto. de Estatística, 2008. Inclui bibliografia e apêndices. 1. Processamento de imagem. 2. Teoria da informação - divergência. 3. Medidas de contraste I. Título. 621.367. CDD (22.ed.). MEI2008-021.

(3)

(4) Acknowledgements Initially, I thank God for giving me the knowledge that I am here in this world because of him. There are many people to whom I am grateful for their help throughout my dissertation work. I would like to thank my advisor, Prof. Dr. Alejandro C. Frery for supporting this dissertation generously with his scienti

(5) c expertise and for opening my eyes to a systemic perspective on lifespan development. I thank CAPES for supporting research and graduate studies, through

(6) nancial resources, without which this work could not have been done. I am also grateful to my undergraduate and graduate friends. My studies would not have been the same without the social and academic challenges, and from provided by all my colleagues from the Department of Statistics at Federal University of Pernambuco. I am particularly thankful to my friends Esdras Adriano, Adriano Maccoy, Raul Elias, Vanessa Kelly, Juliana Katia, Edwin Giovanny, Fabio Verssimo, and Raphael Araujo. I would like to thank also Valeria, secretary of the postgradation in statistics of UFPE, by her advices of academic and personal nature. This dissertation would not have been possible without the emotional and social support from my friends, my girlfriend and my family: Special thanks go to Francisco Junior (\Chico"), Adelazir Nogueira (\Delly"), and my girlfriend, Rosângela (\Minha Flor"), and my parents, Jo~ao Francisco, Maria Luiza, Rebeca Abi and especially to my mother, Marluce Maria da Silva.. i.

(7) The lamp of the body is the eye. If therefore your eye is sound, your whole body will be full of light. But if your eye is evil, your whole body will be full of darkness. If therefore the light that is in you is darkness, how great is the darkness. Jesus Christ. ii.

(8) Abstract The Synthetic Aperture Radar (SAR) images are of great signi

(9) cance in image processing. The main diculty in working with SAR images is the presence of speckle noise. Because of the speckle noise, proposing parametric tests for assessing roughness in the context of SAR image are problematic. In this context, many versions of improved (bias reduction by numerical and analytical approaches, and robust versions) of estimators of the roughness parameter of the G0 family are considered, but few have considered also the scale parameter γ. This dissertation proposes a methodology di erent from assymptotical and bootstrap methods for dealing with this issue. Through eight tests statistics based in measures of stochastic distances that employ the generic measure (h, φ)-divergence proposed by Salicru et al. (1994); this work evaluates by Monte Carlo experiment the homogeneity , H0 : [α1 γ1 ]t = [α2 γ2 ]t , in SAR images with two class X ∼ G0I (α1 , γ1 , L) and Y ∼ G0I (α2 , γ2 , L). Keywords and phrases: image processing, information theory and contrast measure. iii.

(10) Resumo As imagens de radar de abertura sintetica (Synthetic Aperture Radar - SAR) s~ao de grande signi

(11) cância em processamento de imagens. A principal di

(12) culdade em se trabalhar com imagem SAR e a presenca do rudo speckle. Por causa do rudo speckle, propor testes parametricos para avaliar rugosidade no contexto de imagem SAR e muito problematico. Neste contexto, muitas vers~oes de melhoramento (reduc~ao de vies por abordagens numerica e analtica, e vers~oes robustas) de estimadores do parâmetro de rugosidade da famlia G0 s~ao considerados, mas poucos tem considerado tambem o parâmetro de escala γ. Esta dissertac~ao prop~oe uma metodologia diferente de metodos classicos assintoticos and bootstrap para tratar com esta quest~ao. Atraves de oito estatsticas de teste baseada em medida de distância estocastica que envolve a medida generica (h, φ)-divergence proposta por Salicru et al. (1994); este trabalho avalia via experimento de Monte Carlo a homogeneidade, H0 : [α1 γ1 ]t = [α2 γ2 ]t , em imagens SAR com duas classes X ∼ G0I (α1 , γ1 , L) e Y ∼ G0I (α2 , γ2 , L). palavras-chave e frases: processamento de imagem, teoria da informac~ao e medidas de contraste.. iv.

(13) Contents 1 Introduction. 1.1 1.2 1.3 1.4. 1. Initial presentation . . . . . . . . . . . . . . . . Remote Sensing: radars, sensors and properties SAR image segmentation . . . . . . . . . . . . Computing platforms . . . . . . . . . . . . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 2 The Multiplicative Model: Distribution G0I. 2.1 2.2 2.3 2.4 2.5. The Multiplicative Model in SAR image . . . Speckle noise . . . . . . . . . . . . . . . . . . Backscatter . . . . . . . . . . . . . . . . . . . Return . . . . . . . . . . . . . . . . . . . . . . Parameter estimation for the G0I (α, γ, L) law .. 8. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . .. 3 Measures of Distance. 3.1 3.2 3.3 3.4. 2 3 6 7 9 10 11 14 17 23. Initial presentation about stochastic distances Entropy . . . . . . . . . . . . . . . . . . . . . Distance . . . . . . . . . . . . . . . . . . . . . The (h, φ)-divergence . . . . . . . . . . . . . 3.4.1 The Kullback-Leibler distance . . . . . 3.4.2 The Renyi distance . . . . . . . . . . . 3.4.3 The Hellinger distance . . . . . . . . . 3.4.4 The Bhattacharyya distance . . . . . . v. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. . . . . . . . .. 24 24 26 28 30 33 35 37.

(14) 3.4.5 The Jensen-Shannon distance . . . . 3.4.6 The Arithmetic-Geometric distance 3.4.7 The Triangular distance . . . . . . . 3.4.8 The Harmonic-Mean distance . . . . 3.5 Distances for the intensity return . . . . . . 3.5.1 Kullback-Leibler distance . . . . . . 3.5.2 Bhattacharrya distance . . . . . . . 3.5.3 Renyi distance . . . . . . . . . . . . 3.5.4 Hellinger distance . . . . . . . . . . 3.6 Illustration of distances: Examples . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 4 Homogeneity Tests Based on Measures of Distance. 39 41 43 45 46 46 47 47 48 48 53. 4.1 Initial presentation about contrast measure . . . . . . . . . . . . 4.2 Asymptotic Distributions . . . . . . . . . . . . . . . . . . . . . . 4.3 Test Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Numerical Results. 54 57 59 65. 5.1 Initial presentation about the simulation 5.2 Assessing the size of the tests . . . . . . 5.3 Assessing the test power . . . . . . . . . 5.3.1 Results for α1 = α2 and µX 6= µY 5.3.2 Results for α1 6= α2 and µX = µY 5.3.3 Results for α1 > α2 and µX < µX 5.3.4 Results for α1 > α2 and µX > µX. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. 65 69 71 71 71 72 75. 6 Conclusions. 103. 7 Appendix. 105. 7.1 Details about the Integrals . . . . . . . . . . . . . . . . . . . . . . 105 7.2 Special Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.2.1 PolyGamma function . . . . . . . . . . . . . . . . . . . . 107 vi.

(15) 7.2.2 Hypergeometric function . . . . . . . . . . . . . . . 7.2.3 Generalized hypergeometric function . . . . . . . . 7.2.4 Harmonic number function . . . . . . . . . . . . . 7.3 Proof of the convergence of gamma deviates to a constant 7.4 Additional

(16) gures . . . . . . . . . . . . . . . . . . . . . . . 7.5 Ox code . . . . . . . . . . . . . . . . . . . . . . . . . . . .. vii. . . . . . .. . . . . . .. . . . . . .. . . . . . .. 107 108 108 108 110 120.

(17) List of Figures 1.1 Types of sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Types of gotten images . . . . . . . . . . . . . . . . . . . . . . . .. 4 5. 2.1 Data and densities from the G0I law for some values of α . . . . . 2.2 Data and densities from the G0I law for some di erent scales . . .. 15 16. 3.1 Curves of comparison of measures of distance between X and Y . 3.2 The relationship between the Kullback-Leibler and Renyi distances 3.3 Plots of the behaviors of the Kullback-Leibler distances under variation of the parameters γ and L . . . . . . . . . . . . . . . . 3.4 Plots of the behaviors of the Bhattacharrya distances under variation of the parameters γ and L . . . . . . . . . . . . . . . . . . . 3.5 Plots of the behaviors of the Hellinger distances under variation of the parameters γ and L . . . . . . . . . . . . . . . . . . . . . .. 49 50. 5.1 The Implementation Average Times in seconds (IAT) vs. value of α for the distances in the study of simulation . . . . . . . . . . . 5.2 Graphics of Mean of error type I under N-look for eight Test statistics for nominal level α = 0.01 . . . . . . . . . . . . . . . . . . . . 5.3 Mean Test Powers in function of Bhattacharrya and Triangular distances for α1 = α2 . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Mean Test Power in function of Number of look and Bhattacharrya and Triangular distances for µX = µY and α1 6= α2 . . . . . . . .. viii. 51 51 52 68 70 72 73.

(18) 5.5 Mean Test Power in distances for α1 6= α2 5.6 Mean Test Power in distances for α1 6= α2. function of Bhattacharrya and Triangular and µX < µY . . . . . . . . . . . . . . . . . function of Bhattacharrya and Triangular and µX > µY . . . . . . . . . . . . . . . . .. 7.1 Graphics of Mean of error type I under N-look for eight Test statistics for the size of test α = 0.05 . . . . . . . . . . . . . . . . . . . 7.2 Mean Test Power in function of Hellinger and Renyi distances for α1 = α2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Mean Test Power in function of Jensen Shannon and Kullback Leibler distances for α1 = α2 . . . . . . . . . . . . . . . . . . . . 7.4 Mean Test Power in function of Arithmetic Geometric and Harmonic Mean distances for α1 = α2 . . . . . . . . . . . . . . . . . 7.5 Mean Test Power in function of Hellinger and Renyi distances for α1 6= α2 and µX < µY . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Mean Test Power in function of Jensen Shannon and Kullback Leibler distances for α1 6= α2 and µX < µY . . . . . . . . . . . . 7.7 Mean Test Power in function of Arithmetic Geometric and Harmonic Mean distances for α1 6= α2 and µX < µY . . . . . . . . . 7.8 Mean Test Power in function of Hellinger and Renyi distances for α1 6= α2 and µX > µY . . . . . . . . . . . . . . . . . . . . . . . . 7.9 Mean Test Power in function of Jensen Shannon and Kullback Leibler distances for α1 6= α2 and µX > µY . . . . . . . . . . . . 7.10 Mean Test Power in function of Arithmetic Geometric and Harmonic Mean distances for α1 6= α2 and µX > µY . . . . . . . . .. ix. 74 76 110 111 112 113 114 115 116 117 118 119.

(19) List of Tables 4.1 List of (h, φ)-divergences and their functions φ and h . . . . . . 5.1 The Implementation Average Time in seconds (IAT) by measure of distance for the test size study . . . . . . . . . . . . . . . . . . 5.2 The Implementation Average Time in seconds (IAT) by measure of distance for test power study . . . . . . . . . . . . . . . . . . . 5.3 Rates of rejection test (h, φ)-divergence in % under H0 , considering a model with α∗ = −1.5 . . . . . . . . . . . . . . . . . . . . . 5.4 Rates of rejection test (h, φ)-divergence in % under H0 , considering a model with α∗ = −3 . . . . . . . . . . . . . . . . . . . . . . 5.5 Rates of rejection test (h, φ)-divergence in % under H0 , considering a model with α∗ = −5 . . . . . . . . . . . . . . . . . . . . . . 5.6 Rates of rejection test (h, φ)-divergence in % under H0 , considering a model with α∗ = −8 . . . . . . . . . . . . . . . . . . . . . . 5.7 Rates of rejection test (h, φ) - divergence in % under H1 , considering two models such that µX = µY , α1 = −5 and α2 = −8 . . . 5.8 Rates of rejection test (h, φ) - divergence in % under H1 , considering two models such that µX = µY , α1 = −3 and α2 = −8 . . . 5.9 Rates of rejection test (h, φ) - divergence in % under H1 , considering two models such that µX = µY , α1 = −3 and α2 = −5 . . . 5.10 Rates of rejection test (h, φ) - divergence in % under H1 , considering two models such that µX = µY , α1 = −1.5 and α2 = −8 . .. x. 60 68 69 77 78 79 80 81 82 83 84.

(20) 5.11 Rates of rejection test (h, φ) - divergence in % under H1 , considering two models such that µX = µY , α1 = −1.5 and α2 = −5 . . 5.12 Rates of rejection test (h, φ) - divergence in % under H1 , considering two models such that µX = µY , α1 = −1.5 and α2 = −3 . . 5.13 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX < µY , α1 = −5 and α2 = −8 5.14 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX < µY , α1 = −3 and α2 = −8 5.15 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX < µY , α1 = −3 and α2 = −5 5.16 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX < µY , α1 = −1.5 and α2 = −8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.17 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX < µY , α1 = −1.5 and α2 = −5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.18 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX < µY , α1 = −1.5 and α2 = −3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.19 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX > µY , α1 = −5 and α2 = −8 5.20 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX > µY , α1 = −3 and α2 = −8 5.21 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX > µY , α1 = −3 and α2 = −5 5.22 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX > µY , α1 = −1.5 and α2 = −8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. xi. 85 86 87 88 89. 90. 91. 92 93 94 95. 96.

(21) 5.23 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX > µY , α1 = −1.5 and α2 = −5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.24 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that µX > µY , α1 = −1.5 and α2 = −3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.25 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that α1 = α2 = −8 . . . . . . . . . 99 5.26 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that α1 = α2 = −5 . . . . . . . . . 100 5.27 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that α1 = α2 = −3 . . . . . . . . . 101 5.28 Rates of rejection of the test (h, φ) - divergence in % under H1 , considering two models such that α1 = α2 = −1.5 . . . . . . . . 102. xii.

(22) Chapter 1 Introduction Resumo O desenvolvimento de inferência estatstica aplicada a analise dos sistemas de imageamento via radar de abertura sintetica(SAR) tem contribudo para a proposta de tecnicas que ajuda a extrac~ao de informaco~es relevantes a partir deste tipo de dados. Tais informaco~es podem ser transformadas em importantes decis~oes polticas e estrategicas; como por exemplo, na compreenc~ao adequada do impacto do avanco tecnologico no meio ambiente, o que contribui na quest~ao do desenvolvimento sustentavel. A grande problematica em se trabalhar com imagens SAR e a presenca de um rudo chamado de speckle. Speckle e um rudo granulado que da a imagem uma aparência confusa quanto a indeti

(23) cabilidade de seus objetos, o que di

(24) culta a analise das imagens SAR tanto visualmente quanto na proposta de ferramentas adequadas para o seu processamento. Neste contexto, propor um teste parametrico para se avaliar homogeneidade em uma imagem SAR e uma quest~ao bastante razoavel. Esta dissertac~ao prop~oe e compara oito estatstica de teste assintotica baseada em distância estocastica que envolve (h, φ)-divergence para avaliar homogeneidade em imagens organizadas com duas classes modeladas segundo uma lei multiplicativa, a saber a distribuic~ao G0I . 1.

(25) 1.1. Initial presentation. The observation of natural phenomena and their modelling, either deterministic or random, is of extreme importance. The interest in knowledge extraction by modelling has contributed to a practical understanding of issues in various

(26) elds such as economics, sociology, and astronomy, to name a few. In this context, it is impossible not to talk about Statistics. The applied side of Statistics is a collection of techniques, based on the theory of Probability, with the goal to describe the behavior of phenomena under study through tools for informed decision making. Among those techniques, parametric inference plays a central role in Statistics. The development of statistical inference applied to the processing of Synthetic Aperture Radar(SAR) imagery has contributed to the development of techniques that aid the extraction of relevant information from this kind of data. Such information can be turned into important political and strategical decisions. One of the main features to be extracted from SAR imagery is texture. Texture can be de

(27) ned in many ways, but in this context it is related to the number of elementary objects present in each resolution cell, being the size of the cell of the order of the wavelength employed in the illumination. SAR uses microwaves radiation, so typical wavelengths range from centimeters to meters. SAR sensors are capable of perceiving

(28) ne structures from space. Measures of such structures are often referred to as \roughness". The main diculty with SAR images is the presence of speckle noise. Visually, speckle is a granular noise that gives the image a richly textured look which hampers the analysis of SAR images both visually and with image processing tools (see Rosiles et al. 2006). Because of the speckle noise, proposing parametric tests for assessing roughness in the context of SAR image is very dicult. This dissertation proposes and compares eight asymptotic parametric test statistics based on distances that employ the (h, φ)-divergence to deal with roughness in SAR images modeled by a multiplicative model using the G0I distribution. 2.

(29) This chapter presents a description of radar, sensors and SAR imagery. Then, it discusses issues in SAR image analysis, and the importance of e ective tests for assessing relevant properties on such images. Finally, this chapter describes the computational platform used in this dissertation.. 1.2. Remote Sensing: radars, sensors and properties. One of the most challenging and necessary lines of research of the present time is the understanding of the environment, with respect to the exploration and conservation of its resources. This knowledge is of utmost importance for the development of compensatory policies of sustainable development, that is, policies that aim the improvement of life quality without degrading the environment. The precise knowledge of the environment, requires the use of techniques that collect information about the phenomena of interest. Brazil, for example, having continental dimensions, lacks such a detailed information account in large scale. In this context, remote sensing is a very important tool. Remote sensing is the capture, processing, and analysis, in short or large scale, of information of a natural or physical element, or of a phenomenon, through devices that do not get in physical contact with the target element. Such devices are usually carried by airplanes or satellites, buoys and ships, etc. Sensors are elements of basic importance in remote sensing. There are two kinds of sensors. Passive sensors (see

(30) gure 1.1(a)) detect the natural energy (radiation) that is emitted or re ected by the object or by the surrounding area being observed. The re ected sunlight is the most common source of radiation measured by passive sensors. An important class of passive sensors is the optic sensor. Examples of passive remote sensors include

(31) lm photography, infra-red imaging, charge-coupled devices and radiometers. Active sensors (see

(32) gure 1.1(b)), on the other hand, emit energy towards the target under study, 3.

(33) then detect and measure the radiation that is re ected or backscattered. Radar is an example of active remote sensing where the time delay between emission and return, establishing the location, height, speed and direction of an object, are measured.. (a) Passive sensor. (b) Active sensor. Figure 1.1: Types of sensors SAR sensors are a kind radar sensor. Radar sensors work sending pulses towards a target, and part of the pulse energy is re ected and returns as an echo. The intensity of the returned echo depends on the roughness, on the super

(34) cial moisture and on the degree of incidence of the signal. The delay of the echo discloses the distance to the surface. SAR images present multiplicative noise. The noise is the inherent behavior to the capture process that degrades the quality of the image in every coherent imagery system. It is caused by the variation in the delay of phase of the echo caused by multiple targets in a resolution cell. The energy these sensors send penetrate, to a great extent, clouds and other meteorogycal phenomena, providing images in the night and under adverse weather conditions (see Oliver & Quegan 1998). Optic sensors are characterized by their response to the optical spectrum, sometimes including infrared and ultraviolet components. Such imaging is greatly 4.

(35) a ected by weather conditions, and it is typically passive, requiring sunlight. Optical sensors typically present additive Gaussian noise. The

(36) gure 1.2 illustrates practically the concepts of active and passive sensors presenting images acquired by radar and optical sensor respectively. The sub

(37) gure 1.2(a) is a SAR image obtained by the JERS-1 sensor (band L and polarization HH) obtained on June 26, 1993 over Tapajos, Para State, Brazil. The sub

(38) gure 1.2(b) shows an image obtained by the optic sensor TM/LANDSAT1 using the color composition band 5 to the Red channel, band 4 to the Green, and band 3 to the Blue, obtained in June 29, 1993 over the same Tapajos forest.. (a) JERS-1 image, L Band. (b) TM/LANDSAT color composition. Figure 1.2: Types of gotten images Optical and SAR imagery conveys complementary information, and they tend to be better sources of information in di erent situations. For example, the SAR is better than optic sensor for the ice studies as, for instance, the the Canadian Arctic. In optical imagery, snow and ice tend to appear in saturated white. However, a SAR image can distinguish the surfaces clearly, in such a way that a trained observer should be deduce the age and the thickness of the ice (Langley et al. 2007). 1 The. LANDSAT program, developed by the National Aeronautics and Space Administration { NASA, was the

(39) rst program of satellite remote sensing with the purpose of mapping earth resources, and TM is the thematic mapeador installed in the satellite.. 5.

(40) In the present text, the focus is on a probabilistic approach for

(41) nding structures in SAR images, a problem often referred to as segmentation. SAR image segmentation is the subject of the next section.. 1.3. SAR image segmentation. In computer vision, the term segmentation refers to the process of partitioning a image into multiple regions (sets of pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze than the original data set. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images (Shapiro & Stockman 2001). The result of an image segmentation is a set of regions that collectively cover the entire image, or a set of contours extracted from the image; the procedures leading to contours are usually called \edge detection". Ideally, the resulting regions are heterogeneous among themselves; that is, adjacent regions are signi

(42) cantly di erent with respect to the same characteristic or characteristics used as feature. The resulting regions should, in principle, be homogeneous among themselves, with respect to some characteristic or computed property, such as color, intensity or texture. It was seen in the previous section, speckle is a salient characteristic feature of SAR imagery, and the removal of speckle is an important step in many automatic interpretation procedures. Many techniques have been proposed for reducing the level of speckle noise. These include simple averaging and standard

(43) lters. However, these methods tend to blur edges and other small features, thus the sharpness of the detail, and, as a consequence, the usefulness of the processed image. A di erent approach to dealing with SAR imagery is devising techniques that are capable of working well even in the presence of speckle noise or, even better, that are speci

(44) cally built for extracting information from speckle. This is the approach adopted in this work. 6.

(45) 1.4. Computing platforms. In this dissertation, we worked with Ox, R, and LATEX. Ox is a programming matrix language created by Jurgen A. Doornik in 1994. This language was developed based on the C programming language; it allows ecient numerical computation. Ox version 4.1 was used in our simulations, and it is available at http://www.doornik.com. For details, see Doornik (1999) and Cribari-Neto (1997). Appendix 7.5 presents a program written in Ox used for assessing measures of contrast. For graphical representation of results, we used the R platform. The latest version of this program is freely available at http://www.R-project.org. For details, see Cribari-Neto & Zarkos (1999) and Ihaka & Gentleman (1996). LATEX was the chosen typesetting environment, and BibTEX as the references managing tool.. 7.

(46) Chapter 2 The Multiplicative Model: Distribution G0I Resumo Muitos modelos e tecnicas tem sido propostos e avaliados para descrever, processar e analisar imageamento com rudo speckle. O modelo multiplicativo e frequentemente considerando como o mais apto modelo para lidar com este tipo de dados. Este modelo assume que cada valor observado e uma realizac~ao de uma variavel aleatoria retorno Z, que e o produto de duas variavel aleatoria componentes X e Y . A variavel X modela o comportamento n~ao observado do terreno, equanto que Y modela o rudo speckle. A variavel Z = X × Y possui basicamente dois formatos mais utilizados, intensidade(ZI = XI × YI ) e amplitude(ZA = XA × YA ). Considerando o formato intensidade, a distribuic~ao mais geral para XI e a gaussiana inversa generalizada, XI ∼ N−1 (α, γ, λ); equanto que para a variavel YI e a Gama com parâmetro L, YI ∼ Γ (L, L). Para esta modelagem, tem-se que a variavel ZI e a G distribuic~ao, ZI ∼ G(α, γ, λ, L). Nesta dissertac~ao, considerou-se uma especi

(47) cac~ao bastante importante do modelo G(α, γ, λ, L), a qual considera XI ∼ Γ −1 (α, γ) e YI ∼ Γ (L, L) resultando em uma variavel ZI ∼ G0I . A lei G0I foi considerada em toda a dissertac~ao em virtude de sua capacidade em descrever muitas situaco~es praticas no trabalho 8.

(48) com imagens tipo SAR. Neste captulo s~ao apresentados os modelos das variaveis componentes e retorno, interpretaco~es dos parâmetros da distribuic~ao G0I e, por

(49) m, os estimadores de maximo verossimilhanca para os parâmetros α e γ da lei G0I (α, γ, L) .. 2.1. The Multiplicative Model in SAR image. Many statistical models and techniques have been proposed and assessed for describing, processing, and analysing speckled imagery, among them Jakeman & Pusey (1973, 1976), Jao (1984), Lopes et al. (1990), Lee, Grunes & Kwok (1994), Lee, Hoppel, Mango & Miller (1994), Lee, Schuler, Lang & Ranson (1994), Frery et al. (1995a,b), Lee et al. (1995), Du & Lee (1996), Frery, Muller, Yanasse & Sant'Anna (1997), Frery, Yanasse & Sant'Anna (1997), Mejail (1999) and Mejail et al. (2000). The multiplicative model is often regarded as the best statistical framework for this kind of data, since it emerges from the physics of the image formation (Goodman 1985). This model assumes that each observed value is the outcome of a random variable Z (called \return"), which is the product of two independent random variables, X and Y . The random variable X models the terrain to backscatter, that is, it describes to which type (relative to degree of homogeneity) of area belongs each pixel, and is usually assumed positive. The random variable Y models the speckle noise. The characterization of Z depends on the type of image that is being considered monospectral (a band and a polarization) or polarimetric. We will only deal with monospectral data. To produce an image monoespectral, the SAR systems using two attributes of the electric

(50) eld of the return signal, amplitude(A) and phase(φ). The amplitude is associated with morphological features, while the phase is associated with the distance between the object and the radar sensor. So this process is called coherent. These systems measure the pair (A cos φ, A sin φ). To process information in the sign of return is important 9.

(51) choose a criterion to di erentiate resolution cells. This criterion is the choice of a physical magnitude(or format of the image): A cos φ, A sin φ, φ, A, I or ln I where I = A2 . In this dissertation, it was used to the format intensity (I), so the variables used in the work will be indexed by YI , XI and ZI . The following section presents a classical derivation of the distribution for speckle noise.. 2.2. Speckle noise. In complex format, the speckle noise follows a bivariate normal distribution (Goodman 1985): YC =. Y< Y=. ! ∼N µ=. 0 0. ! ,Σ =. 1/2. 0. 0. 1/2. !! .. Its real Y< and imaginary Y= components are independent. The relationships of this format with the intensity and amplitude formats are given by YI =| YC |2 p and YA =| YC |= Y=2 + Y<2 , respectively. In practice an SAR image is generated by the number of di erent observations (look) of the area under study. An image is called single-look when it is considered only one observation of area. When the average of informations from various observations is Considered, this image is called multi-look. For single-look images, it is possible to see that YI follows an exponential distribution and that YA follows a Rayleigh law (Ulaby et al. 1986). Considering the processing multi-look for an image in intensity format, where the average of the information of L independent samples is calculated, then YI(L) , has a gamma distribution with parameter L, YI(L) ∼ Γ (L, L), with density given by fY (L) (y) = I. LL L−1 y exp (−Ly), Γ (L). where L, y > 0. 10.

(52) Theqamplitude format is obtained by taking the square root of the intensity, (L) (L) = YI . Thus, it can be seen that YA ∼ Γ 1/2 (L, L) whose density function is given by. (L) YA. fY (L) (y) = A. 2LL 2L−1 y exp (−Ly2 ), Γ (L). where L, y > 0.. 2.3. Backscatter. The backscatter describes the homogeneity of target, which depends, among other factors, on the sensor parameters (angle of incidence, polarization, frenquency, etc), on the moisture content of the target and on the relief. Through XI it is possible to identify di erent types of classes in the image. Follows the modeling proposals by Frery, Muller, Yanasse & Sant'Anna (1997), Mejail (1999) and Mejail et al. (2000). Here, a general model that can adequately describe the particular models for homogeneous, heterogeneous, extremely heterogeneous areas, is considered. A general rather distribution for the backscatter, in this context, is the Generalized Inverse Gaussian law (see Frery, Muller, Yanasse & Sant'Anna 1997). This distribution is denoted by XI ∼ N−1 (α, γ, λ) and its density function is given by, fXI (x) =. (λ/γ)α/2 γ 1 p xα−1 exp − (λx + ) , 2 x 2Kα ( (λγ)). (2.1). where x > 0, and the interval of variation of its parameters has the following behavior     γ > 0 and λ ≥ 0 , if α < 0 (α, γ, λ) ∈ Θ = γ > 0 and λ > 0 , if α = 0    γ ≥ 0 and λ > 0 , if α > 0, and Kα is the modi

(53) ed third kind Bessel function of order α, given by: a α/2 1 Z √ 1 α−1 −1 Kα ( ab) = x exp − (ax + bx ) dx, b 2 R+ 2. 11. (2.2).

(54) where a,b ∈ R. If XI ∼ N−1 (α, γ, λ), its moments are given by E. [XrI ]. γ r/2 K (√γλ) α+r √ = . λ Kα ( γλ). This model is suciently general for describing most practical situations, however it has the disadvantage of not being quite tractable (see Frery, Muller, Yanasse & Sant'Anna 1997). This model contains three particular cases that are of practical relevance: a constant (for homogeneous backscattering), the gamma distribution (for situations of homogeneity and heterogeneity), and the reciprocal of gamma distribution (for situations of extreme heterogeneity). The modi

(55) ed Bessel function of the third kind, Kα (x), is also known as the modi

(56) ed Hankel function or the Macdonald function. This function can be de

(57) ned as in (2.2), and some of its properties are presented below (see, for instance, Watson 1944, Abramowitz & Stegun 1965, Gradshteyn & Ryzhik 1980, for further details). It satis

(58) es the following di erential equation 1 d α2 d2 Kα (x) + Kα (x) − 1 + 2 Kα (x) = 0, dx2 x dx x. and the recursion relations α Kα−1 (x) − Kα+1 (x) = −2 Kα (x), x d Kα−1 (x) − Kα+1 (x) = −2 Kα (x). dx. This function presents the following properties [P1] For all x and α, it has the following behavior: Kα (x) = K−α (x).. 12. (2.3).

(59) [P2] The asymptotic expansion of Kα (x) for small x is 1 Kα (x) ∼ Γ (α) 2. . −α 1 x . 2. (2.4). [P3] Finally, the asymptotic expansion of Kα (x) for large x is Kα (x) ∼. r. π 4α2 − 1 (4α2 − 1)(4α2 − 9) exp (−x) 1 + + + ··· . 2x 8x 128x2. The particular cases of density given by the equation (2.1) are obtained as restrictions on the parameter space Θ. For such, it is necessary to use the properties (2.3) and (2.4) of the modi

(60) ed function of Bessel of the third kind. In order to obtain the reciprocal of gamma distribution as a particular case of the Generalized Inverse Gaussian law, consider the sequence of random variables {Xn ; n ≥ 1}, such that Xn ∼ N−1 (α, γ, λn ). By Sche e's theorem (Sche e 1947, James 2002), if λn → 0 when n → ∞, then Xn converges in distribution to D X ∼ Γ −1 (α, γ), Xn → X. Therefore, the density given in equation (2.1) becomes the following density: fXI (x) =. γ 2α α−1 x exp − , γα Γ (−α) 2x. where x > 0. Its moments are given by E. [XrI ]. γ r Γ (−α − r) = , 2 Γ (−α). where α < 0 and | α |> r; if −r < α < 0, then the moment is not

(61) nite. This distribution for the backscatter was originally proposed by Frery, Muller, Yanasse & Sant'Anna (1997) as a model for extremely heterogeneity data and, in multiplicative model, leading to the G0I distribution for the return. Mejail (1999), Mejail et al. (2000) veri

(62) ed that it can be used as an universal model, since it is expressive enough to accurately describe most types of targets. Also it is possible to see in proof 7.3 of Appendix that such random variable 13.

(63) converges in probability to a constant, when a certain condition is satis

(64) ed. This quali

(65) es this variable to also describe homogeneous situations.. 2.4. Return. In the most general case we have that XI ∼ N−1 (α, γ, λ) and YI ∼ Γ (L, L) are two independent random variables with joint density fXI ,YI (x, y) = fXI (x) · fYI (y) = γ LL L−1 1 (λ/γ)α/2 α−1 p y exp (−Ly), x exp − (λx + ) · = 2 x Γ (L) 2Kα ( (λγ)). where x, y, L > 0 and the parameters (α, γ, λ) ∈ Θ. In order to derive the density of the random variable ZI = XI × YI , we used the Jacobian method (James 2002) and the result provided by James (2002, proposal 2.6(b)): fZI (z) =. p LL λL/2 L−1 (α−L)/2 √ z (γ + 2zL) K ( λ(γ + 2zL)). L−α γα/2 Γ (L)Kα ( λγ). (2.5). This situation is denoted ZI ∼ GI (α, γ, λ, L). The rth moment of ZI is given by E. [ZrI ]. γ r/2 K (√γλ) Γ (L + r) α+r √ , = λL2 Kα ( γλ) Γ (L). where α, γ, λ ∈ Θ and L ≥ 1. This distribution was proposed by Frery, Muller, Yanasse & Sant'Anna (1997) as a generic model for SAR data. Among the particular cases of this distributions, it is possible to obtain the G0I law, the Universal Model for speckled data. Using the aforementioned relationships for the Bessel function, it is possible to verify that when λ = 0, γ > 0 and α < 0 the density given in Equation (2.5) reduces to fZI (z) =. α−L LL Γ (L − α) L−1 γ z + Lz . (γ/2)α Γ (−α)Γ (L) 2. 14. (2.6).

(66) This case is denoted ZI ∼ G0I (α, γ, L). The rth moment of ZI is then given by E. [ZrI ]. γ r Γ (−α − r) Γ (L + r) = 2L Γ (−α) Γ (L). (2.7). where α < −r/2, γ > 0 and L ≥ 1 this additional restriction on α is required to ensure that ZI have

(67) nite moments. This is the main model we will use in the remainder of this work. From the inferential viewpoint, α and γ are of utmost interest in many applications. The parameter γ is the power between re ected and incident signals and the parameter α is an indicator of land type (Frery, Muller, Yanasse & Sant'Anna 1997). Figure 2.1(a) shows an image with four parts of simulated data, with γ = 12, L = 4 and varying α ∈ {−3, −5, −8, −11}; each window sizes 512 × 512. The variation of α controls the roughness of the images. When the value of α decreases the image becomes more homogenous. Figure 2.1(b) also shows the density behavior presented in Equation (2.6). It can be seen that an increase of α causes the attening of the curve.. 1.0. 1.5. Z ~ G0I (α α, 12 , 4 ). 0.0. 0.5. fZ(z). α = − 11 α = −8 α = −5 α = −3. 0. 1. 2. 3. 4. 5. 6. z. (a) Speckle with varying roughness. (b) G0I densities for varying roughness. Figure 2.1: Data and densities from the G0I law for some values of α 15.

(68) Figure 2.2(a) shows an image with four parts of simulated data, for α = −4, L = 8, and γ ∈ {6, 12, 30, 60}; each window is of size 512 × 512. The variation of γ controls the scale of the images, when the value of γ decreases, the image becomes more homogenous. Figure 2.2(b) also shows the behavior of the density presented in equation (2.6). It can be seen that an increase of γ causes the attening of the curve.. 0.8. 1.0. Z ~ G0I (−4,γγ, 8 ). 0.0. 0.2. 0.4. fZ(z). 0.6. γ=6 γ = 12 γ = 30 γ = 60. 0. 2. 4. 6. 8. 10. z. (a) Speckle with varying scale. (b) G0I densities for varying scale. Figure 2.2: Data and densities from the G0I law for some di erent scales An extremely important issue is the study of ways to estimate the parameters of this distribution. In this dissertation we use Maximum Likelihood Estimation (MLE) for the parameters α and γ, assuming L known. Another method, with interesting properties, is the method of moments. We favour MLE over the latter because of its nice asymptotic properties and due to the fact that there are several works with improvements, both analytic and by resampling (see Frery et al. 2004, Vasconcellos et al. 2005) and dealing with its robustness (see Bustos et al. 2002). Details of the MLE method for the parameters of the G0I distribution are 16.

(69) discussed in the next section.. 2.5. Parameter estimation for the G0I(α, γ, L) law. The following de

(70) nitions are useful for stating the main properties of MLE estimators.. A sequence of random variables, P X1 , X2 , . . . , Xn converges in probability to a random variable X (Xn → X) if, for every > 0, Definition 1 (Convergence in Probability). lim Pr(| Xn − X |≥ ) = 0 or, equivalently, lim Pr(| Xn − X |< ) = 1. n→∞. n→∞. (see Casella & Berger 2002, p. 232).. A sequence of random variables, X1 , X2 , . . . , Xn with cumulative distribution functions (Fi )1≤i≤n conD verges in distribution to the random variable X (Xn → X), if Definition 2 (Convergence in Distribution). lim FXn (x) = FX (x). n→∞. at all points x where FX (x) is continuous, and where F denotes the cumulative distribution function that characterizes the distribution of X. (see Casella & Berger 2002, p. 235).. Consider X1 , X2 , . . . , Xn a random sample of the variable X that depends on the parameter θ. The estimator b 1 , X2 , . . . , Xn ) is consistent for the parameter θ, if θ(X Definition 3 (Consistent Estimator). P b 1 , X2 , . . . , Xn ) → θ(X θ,. (see Bolfarine & Sandoval 2000, p. 52). 17. ∀θ ∈ Θ.

(71) b 1 , X2 , . . . , Xn ) Consider the estimator θ(X D b 1 , X2 , . . . , Xn )−θ) → of the parameter θ ∈ R, and assume that kn (θ(X N(0, σ2 ), where kn is a constant. The parameter σ2 is called the asymptotic variance b 1 , X2 , . . . , Xn ). or variance of the limit distribution of θ(X. Definition 4 (Asymptotic variance). Definition 5 (Asymptotic efficiency) Let τ(θ) be a continuous function of c 1 , X2 , . . . , Xn ) is asymptotically ecient for θ. A sequence of estimators W(X √ c D a parameter τ(θ), if n[W(X 1 , X2 , . . . , Xn ) − τ(θ)] → N(0, v(θ)) and v(θ) =. [ ∂τ(θ) ]2 ∂θ . 1 |θ)) 2 ) E ( ∂ log(f(X ∂θ. that is, the asymptotic variance of Rao Lower Bound. ;. c 1 , X2 , . . . , Xn ) W(X. achieves the Cramer-. (see Casella & Berger 2002, p. 471). MLEs are consistent and asymptotically ecient under mild conditions, which are satis

(72) ed, in particular, by the G0I distribution. Let X1 , X2 , . . . , Xn be a random sample from a population with distribution function belonging to the parametric family Fθ~ = {F(x | θ~), θ~ ∈ Θ}, where θ~t = (θ1 θ2 · · · θp ) and Θ ⊆ Rp . Consider f(x | θ~) = f(x1 , x2 , . . . , xn | θ~) the density of the vector X = (X1 , X2 , . . . , Xn ). De

(73) ne the function L(·; x) : Θ → R+ such that L(θ~ | x) = f(x | θ~). The function L(θ~ | x) is called the likelihood function. The idea of this method of estimation is

(74) nding an estimator, function of the sample variables, that maximizes L(θ~ | x). The process of obtaining these estimators is, in principle, simply maximizing L(θ~ | x) but, in practice, it is usually easier to maximize l(θ~ | x) = log(L(θ~ | x)). The function l(θ~ | x) is called the log-likelihood function. It is interesting to see that any strictly increasing function of L(θ~ | x), such as log(x) for x ∈ R+ , is maximized at the same point as L(θ~ | x). Then, b ~ | x) = arg maxθ∈Θ ~ | x). θ~n = arg maxθ∈Θ ~ l(θ ~ L(θ. 18.

(75) Therefore, if f(x | θ~) is di erentiable in θi , then possible candidates for the maximum likelihood estimation (MLE) are the values of θi , i = 1, . . . , p, that solve ∂L(θ~ | x) ∂l(θ~ | x) = 0 or = 0, (2.8) ∂θi. ∂θi. for every x (see Casella & Berger 2002, p. 316). Note that the solutions to equation (2.8) are only possible candidates since the

(76) rst derivative being zero is only a necessary condition for a maximum, but not a sucient condition. Points at which the

(77) rst derivatives are zero may be local or global minima, local or global maxima, or in ection points, and the method requires global maxima. The likelihood function of n outcomes, z = z1 , z2 , . . . , zn , associated with a population random variable Z ∼ G0I (α, γ, L) is given by L(α, γ | z) =. LL Γ (L − α) γ α Γ (−α)Γ (L) 2. Qn. !n. L−1 i=1 zi , γ α−L i=1 ( 2 + Lzi ). Qn. with a log-likelihood given by l(α, γ | z) = nL log(L) + (L − 1). n X. log(zi ) − n log(Γ (L)) +. i=1. + n log (Γ (L − α)) − n log (Γ (−α)) − αn log + (α − L). n X i=1. γ 2. +. γ 2. log( + Lzi ).. Note that for the sake of computing the MLE estimators of (α, γ), those terms that do not depend on these parameters can be diregarding leading, thus, to a reduced log-likelihood given by α−LX γ l (α, γ | z) = log Γ (L − α) − log Γ (−α) − α log + log( + Lzi ). 2 n i=1 2 γ. ∗. Definition 6 (Score vector). n. Let X1 , X2 , . . . , Xn be a random sample of inde19.

(78) pendent identically distributed random variables with density f(x | θ~), θ~ Θ ⊆ Rp , and let l(θ~ | x) be its log-likelihood. The score vector is U(θ~) = ∇l(θ~ | x) = t. t. ∂l(θ~ | x) ∂θ1. ∂l(θ~ | x) ∂θ2. .... ∂l(θ~ | x) ∂θp. ∈. ! .. The score function has two interesting properties. The

(79) rst is that E(U(θ~)) = ~0px1 , and the second is presented below.. Let X1 , X2 , . . . , Xn be a collection of independent identically distributed random variables with common density f(x | θ~), θ~ ∈ Θ ⊆ Rp , and let l(θ~ | x) be the its log-likelihood. The Fisher Information Matrix is de

(80) ned as Definition 7 (Fisher Information Matrix). K(θ~) = E(U(θ~)U(θ~)t ),. and under some conditions of regularity presented in Cordeiro (1999), K(θ) is given by ! K(θ~) = E −. ∂U(θ~) ∂θt. .. Then, in our case, the score function of a sample from the G0I law z = (z1 , z2 , . . . , zn ) is given by t ∂l(α, γ | z) ∂l(α, γ | z) U(α, γ) = ∇l(α, γ | z) = = ∂α ∂γ  P n ϕ(0) (L − α) − ϕ(0) (−α) − n log γ2 + ni=1 log =  Pn 1 nα −1 − γ + (α − L) i=1 (γ + Lzi ) 2 . γ 2.  + Lzi  (, 2.9). b ), where ϕ(0) is the digamma function. Thus the MLE for (α, γ), denoted (b α, γ are obtained as a solution of the following system of non-linear equations  P  nϕ(0) (L − α b ) − nϕ(0) (−b α) − n log γb2 + ni=1 log γb2 + Lzi = 0, P  − nbα + (b α − L) n (b γ + Lzi )−1 = 0. b γ. i=1. 20.

(81) This system of equation does not, in general, result in closed formulas for the b), so iterative methods could be considered. estimators (b α, γ The simulations presented in Chapter 5 employed the Quasi-Newton method (with analytical derivatives) or the BFGS (Broyden, Fletcher, Goldfarb and b) for the parameters that index the Shanno) method to obtain estimators (b α, γ G0I distribution. These algorithms are presented in the following. The Quasi-Newton method uses the Taylor series of any di erentiable function f: 1 f(θ) ≈ f(θ(k)) + (θ − θ(k))t ∇f(θ(k)) + (θ − θ(k))t A(θ(k))(θ − θ(k)), 2. where ∇f(θ(k)) is the gradient of f evaluated at θ(k), A(θ(k)) is the Hessian matrix evaluated in θ(k), θ belongs the region of the vector θ(k). With this, ∇f(θ) = ∇f(θ(k)) + A(θ(k))(θ − θ(k)).. Setting ∇f(θ) = 0, one obtains θ = θ(k) − A−1 (θ(k))∇f(θ(k)),. where θ is an approximation of the point that maximizes the function f. So the following law of recurrence is obtained: θ(k + 1) = θ(k) − A−1 (θ(k))∇f(θ(k)).. (2.10). An extension of equation (2.10) is given by θ(k + 1) = θ(k) − s(k)A−1 (θ(k))∇f(θ(k)),. where s(k) is a scale determined by a linear search from θ(k) in the direction of −A−1 (θ(k))∇f(θ(k)) such that f(θ(k)) increases along this direction.. 21.

(82) The idea of Quasi-Newton method is to get a good approximation to −A−1 , that is, to build a sequence of symmetric positive de

(83) ned matrices Q(k) such that lim Q(k) = −A−1 . k→∞. So, the point of maximum of f is obtained by θ(k + 1) = θ(k) + s(k)Q(k)∇f(θ(k)).. This chapter presented an elementary introduction to SAR imagery, the basics of the Multiplicative Model for speckled imagery, the G0I distribution, its paramenters and interpretation, estimators and algorithms for computing them. The next chapter will describe the stochastic distances that will be computed between G0I laws. 22.

(84) Chapter 3 Measures of Distance Resumo A noc~ao inicial de distância entre distribuic~ao de probabilidade coincide com o incio de uma nova linha do conhecimento matematico denominada de teoria de informac~ao. Em anos recentes, tem-se mostrado bastante razoavel adaptar medidas da teoria de informac~ao para analisar sistemas de imageamento para dados com rudo speckle. A aplicac~ao de medidas teoricas de informac~ao tal como divergência tem sido usada para, dentre outros

(85) ns, quanti

(86) car a performance de algortmos de segmentac~ao para imagens SAR. Este trabalho contribui nesta direc~ao com a derivac~ao de distâncias estocasticas segundo Aviyente (2003), basedas na medida generica (h, φ)-divergência proposta por Salicru et al. (1994), em imagens modeladas com uma distribuic~ao G0I . A classe (h, φ) foi utilizada por considerar divergências interessantes tais com a de Kullback-Leibler, Renyi, Hellinger, Bhattacharyya, dentre outras; bem como, por garantir propriedades importantes a estas medidas como a n~ao negatividade e a convexidade. Estas medidas ser~ao usadas para construir estatsticas de teste a

(87) m de avaliar a homogeneidade em imagens SAR.. 23.

(88) 3.1. Initial presentation about stochastic distances. The notion of distances between probability distribution functions is closely related to that of information theory (IT). IT is a relatively new branch of mathematics that was made mathematically rigorous only in the 1940s. In a narrow sense, IT studies all theoretical problems connected to the transmission of information over communication channels. The

(89) rst studies in this direction were undertaken by Nyquist (1924, 1928) and by Hartley (1928), who recognized the logarithmic nature of the measure of information. In 1948, Shannon published a remarkable paper on the properties of information sources and of the communication channels used to transmit the outputs of these sources. Keys features of Shannon's information theory are the terms information and entropy, described in the section to follow. These concepts had implications in several

(90) elds, such as codi

(91) cation and compression of data, telecommunications, and cryptography.. 3.2. Entropy. IT is closely related to probability, mainly because the term information has a connotation of \randomness" in a transmitted message, that is, measures of information try to describe the \surprise" or \innovation" the message contains. The analysis of the measure of information is based on the probability of occurrence of an event. Following Shannon's work, we will present the basic ideas using discrete random variables. Let [ X = xi | Pr(X = xi | θ) = pi ]i=1,2,...,n where θ ∈ Θ ⊆ Rp , be the probability function of the discrete random variable X. Its information. 24.

(92) content can be measured as . 1 I(xi ) = logα Pr(X = xi | θ) 1 = logα pi = − logα (pi ). . (see Shannon 1948). The unit of measurement of the information depends on the base α. If α = 2, the unit of information is bits and if α = e, the unit of information is nats. This measure possesss the following properties; [I1] When there is no surprise in the occurrence of event xi , its relative infor-. mation it is null, i.e., I(xi ) = 0 if pi = 1. [I2] The information is always not negative, i.e., I(xi ) ≥ 0 for 0 ≤ pi ≤ 1. [I3] The larger the probability of occurrence of an event, the lesser its informa-. tion: If pi > pj then I(xi ) < I(xj ). [I4] If two events are statistically independent, then the information joint of. these events is the sum of the its informations, i.e., If xi and xj are independent, then I(xi , xj ) = I(xi ) + I(xj ). Actually theses properties determine the choice of log for measuring information. It is useful to calculate the average (in the sense of mathemathical expecta-. 25.

(93) tion) information of an event. This average receives the name of entropy; H(X) =. n−1 X. pi I(xi ). i=0 n−1 X. = −. pi logα (pi ). i=0. = − E{logα (Pr(X | θ))},. where Pr(X = x | θ) is the probability function that characterizes the distribution of the discrete random variable X. Although Shannon de

(94) ned entropy for the discrete case, it is possible to expand this measure for continuous variates. In this case, entropy is sometimes called differential entropy. Let X be a continuous random variable with density fX (x | θ), with θ ∈ Θ ⊆ p R , and a domain support S(x). Then, its di erential entropy is given by Z. fX (x | θ) logα (fX (x | θ)) dx.. H(X) = −. (3.1). x∈S(x). Yet another type of measure is called relative entropy or divergence. It measures the ineciency of if choosing one variate, when another variable should have to be chosen. Generally, this measure is considered as distance between two probability distributions.. 3.3. Distance. In recent years, there has been increasing interest in adapting informationtheoretic measures to the analysis of speckled imagery. The application of information-theoretic measures such as divergence has made it easier to quantify the performance of segmentation algorithms for coherent polarimetric images processing (Goudail & Refregier 2004). This work contributes in such line of research with the derivation of measures 26.

(95) of contrast for images modeled by the G0I distribution. These measures may be used as test statistics for the veri

(96) cation of heterogeneity in this kind of images. Distance measures between statistical models have been used in image processing applications. Measures of divergence between two probability distributions are used in clustering, classi

(97) cation, compression, and restoration of signals, images, and patterns, in many applications (see Mak & Barnard 1996, Puig & Garcia 2003, Solanki et al. 2006). There is a wide variety of probability distribution dissimilarity measures based on the concepts of information theory. Lee & Kahn (1990) presents a general framework tying together most of these various measures. A widely used class of measures that encompasses many of the main divergences and distances is called the φ-divergence. The φ-divergence measures the diference between two probability density functions, in the continuos case, or probability functions, in the discrete case. In order to be a distance measure in the set of probability distributions D, a function d : D × D → R has to satisfy the three properties presented below. Let DX , DY , DZ ∈ D be the distributions of the random variables X, Y and Z de

(98) ned on the same probability space. Then, d(DX , DY ) is a distance between distributions DX and DY if [D1] The distance is positive between two di erent distributions, and is zero. only if they are the same distribution: d(DX , DY ) ≥ 0 and d(DX , DY ) = 0 if and only if DX = DY .. (3.2). [D2] The distance is symmetric, that is d(DX , DY ) = d(DY , DX ).. 27. (3.3).

(99) [D3] The distance satis

(100) es the triangle inequality, that is d(DX , DY ) + d(DY , DZ ) ≥ d(DX , DZ ).. (3.4). In the following, in order to make a more compact presentation, we will not distinguish between \distance between distributions" and \distance between random variables". Also, if the random variables belong to the same parametric family, we will refer to \distance between parameters". The hardest property to satisfy is the one given by equation (3.4). However, it is possible to see (Aviyente 2003) that satisfying the properties posed by equations (3.2) and (3.3) is enough for the context of distance between random variables.. 3.4. The (h, φ)-divergence. The class of divergences (h, φ)-divergence was proposed by Salicru et al. (1994) and was introduced independently by Ali & Silvey (1996) and Csiszar (1967). We will present a few necessary concepts in order to de

(101) ne it.. A function f 00 (x) ≥ 0 in every points of its domain. Definition 8 (Convex functions). f: R → R. is called convex if. Let X and Y be two random variables with same support (S(x)) and probability density functions fX (x | θ1 ) and fY (x | θ2 ), respectively, and let EY be the expectation operator with respect to the distribution of Y . Let also φ : (0, ∞) → R+ be a convex function such that x φ(x) 0 = 0 and 0 × φ = lim . 0×φ x→∞ 0 0 x. 28.

(102) The φ−divergence is given by (Csiszar 1967), fX (x | θ1 ) = Fφ (X, Y) = EY φ fY (x | θ2 ) Z fX (x | θ1 ) = φ fY (x | θ2 )dx. fY (x | θ2 ) x∈S(x). (3.5). If X and Y are discrete random variables, it follows that X. Pr(X = x | θ1 ) Fφ (X, Y) = φ Pr(Y = x | θ2 ). Pr(Y = x | θ2 ) x∈S(x) . . For the properties of φ−divergences, see Liese & Vajda (1987) and Vajda (1989). There are other important measures of divergence, for instance, the Renyi divergence (Renyi 1961), that cannot be obtained as particular cases of φ−divergence. For this reason Salicru et al. (1994) introduced an extended expression called (h, φ)-divergence, which is de

(103) ned by Fhφ (X, Y). fX (x | θ1 ) = h EY φ = fY (x | θ2 ) Z fX (x | θ1 ) φ fY (x | θ2 )dx , = h fY (x | θ2 ) x∈S(x) . (3.6). and, for the discrete case,  Fhφ (X, Y) = h . X x∈S(x). φ. . Pr(X = x|θ1 ) Pr(Y = x|θ2 ) . Pr(Y = x|θ2 ) . (3.7). In equations (3.6) and (3.7), φ(x) sati

(104) es conditions of the divergence de

(105) nition imposed by Csiszar (1967), i.e., h is a di erentiable increasing function mapping from [0, ∞) onto [0, ∞), with h(0) = 0 and h 0 (x) > 0, ∀x ∈ R+ . The theorem and corollary below show the conditions that ensure the properties of the non-negativity and of the convexity for the divergences included in this class.. 29.

(106) Let the function φ : (0, ∞) → R be di erentiable convex and normalized, i.e., φ(1) = 0, then Fφ (X, Y) is nonnegative and convex for any pair of distributions. Theorem 1 (Csiszar, 1967). If the assumptions of Theorem 1 are satis

(107) ed and if h is a di erentiable increasing function with h(0) = 0 and h 0 (x) > 0, then Fhφ (X, Y) is nonnegative and convex for any pair distributions. Corollary 1 (The (h, φ)-divergence property). This extention of Csiszar's φ-divergence includes, by choosing the functions φ and h, some well-known measures, such as, the Hellinger distance (or discrimination, the relative Kullback-Leibler divergence, the relative Renyi divergence, the Bhattacharyya distance, the relative Jensen-Shannon divergence, the relative Arithmetic-Geometric divergence, the Triangular distance (or triangular discrimination), and the Harmonic-Mean distance (also known as harmonic mean divergence).. 3.4.1. The Kullback-Leibler distance. This measure of Kullback-Leibler distance is based on the divergence given by equations (3.9) and (3.10), as follows. Let Ω be any measurable space. Let fX (x | θ1 ) and fY (x | θ2 ) be the densities of the random variables X and Y with respect to a dominating support S(x). Also, let φ(x) = x log(x),. ∀x ∈ R+ .. Then φ 0 (x) = log(x) + 1. and. φ 00 (x) =. 30. 1 > 0 for all x ∈ R+ . x. (3.8).

(108) Applying the de

(109) nition of Fφ (X, Y) given by the equation 3.5, Z. fX (x | θ1 ) fY (x | θ2 )φ y = Fφ (X, Y) = fY (x | θ2 ) x ∈ S(x) Z fX (x | θ1 ) fX (x | θ1 ) log = dx. fY (x | θ2 ) x ∈ S(x) . dx =. Using Theorem 1, as φ(x) is a convex function and φ(1) = 0, then y ≥ 0. Consider also the function h as in Corolary 1, then h(y) = y. ∀y ∈ R+ , then h 0 (y) = 1 > 0 and h(0) = 0.. Moreover, applying h and φ in equation (3.6), the relative Kullback-Leibler divergence is de

(110) ned by Z KL(X, Y) = x ∈ S(x). fX (x | θ1 ) log. . fX (x | θ1 ) fY (x | θ2 ). dx.. (3.9). For Ω a countable space, applying h and f in equation (3.7),  KL(X, Y) = . X. (Pr(X = x | θ1 )) log. x ∈ S(x). .  Pr(X = x | θ1 )  . Pr(X = x | θ2 ). (3.10). This measure is also known as relative information or relative entropy. Relative entropy is not a distance, since it is not symmetric. However, it has many useful properties, including additivity over marginals of product measures: if X = (X1 , X2 ) and Y = (Y1 , Y2 ) are de

(111) ned on the product space Ω1 × Ω2 , then KL(X, Y) = KL(X1 , X2 )+KL(Y1 , Y2 ); see Cover & Thomas (1991) and Reiss (1989). Another interesting property of this divergence is in the study of Akaike information criterion(AIC). AIC is derived as an asymptotically unbiased estimator of a function used for ranking candidate models which is a variant of the Kullback-Leibler divergence between the true model and the approximating. 31.

(112) candidate model. Note that it is possible to rewrite the equation 3.9 as Z KL(X, Y) = Z. x ∈ S(x). − x ∈ S(x). fX (x | θ1 ) log (fX (x | θ1 )) dx − fX (x | θ1 ) log (fY (x | θ2 )) dx =. = EX (log (fX (x | θ1 ))) − EX (log (fY (x | θ2 ))) .. (3.11). Considering the question of assessing an approximation Y for a particular model X, the knowledge of parameter EX (log (fY (x | θ2 ))) is very important. Akaike found that an asymptotically unbiased estimator of this parameter given by AIC = −2 log L(θ^1 | data) + 2K. (3.12). (see Akaike 1973). where K is the number of estimable parameters in the model, fY (x|θ2 ). Akaike's

(113) nding of a relation between the relative K-L distance and the maximized log-likelihood has allowed major practical and theoretical advances in model selection and the analysis of complex data sets(see Stone 1982). Relative entropy was

(114) rst de

(115) ned by Kullback & Leibler as a generalization of Shannon's entropy, equation (3.1). A standard reference on its properties is Cover & Thomas (1991). Despite the non-symmetry, recent work has shown that the Kullback-Leibler divergence is geometrically important, see Dabak (1993) and Chentsov (1982). In addition to its geometric importance, the Kullback-Leibler divergence is specially attractive because it can be computed in many practical situations. Johnson & Orsak (1993) provide a table of Kullback-Leibler divergences between distributions di ering in mean. Assuming that the relative Kullback-Leibler divergence exists, the Kullback-. 32.

(116) Leibler distance is given by (Je reys 1946, Kullback & Leibler 1951): Z. . fX (x | θ1 ) dx + fY (x | θ2 ). dKL (θ1 , θ2 ) = KL(X, Y) + KL(Y, X) = fX (x | θ1 ) log x ∈ S(x) Z fY (x | θ2 ) + fY (x | θ2 ) log dx = fX (x | θ1 ) x ∈ S(x) Z [fX (x | θ1 ) − fY (x | θ2 )] log (fX (x | θ1 )) + = x ∈ S(x). + [fY (x | θ2 ) − fX (x | θ1 )] log (fY (x | θ2 ))dx = Z fX (x | θ1 ) = (fX (x | θ1 ) − fY (x | θ2 )) log dx. fY (x | θ2 ) x ∈ S(x). (3.13). The same result is valid for the discrete case. From formula (3.13), it is possible to see that . (1) dKL (θ1 , θ2 ) = dKL (θ2 , θ1 ) (2) dKL (θ1 , θ2 ) ≥ 0 (being 0 if and only if θ1 = θ2 ).. The de

(117) nition of the KL distance implies (1). For (2), if θ1 = θ2 then log. . fX (x | θ1 ) fY (x | θ2 ). = 0,. and, hence, dKL (θ1 , θ2 ) = 0. It is possible also for (2) to see that, φ(1) = h(0) = 0, h 0 (y) > 0, and the function (3.8) is convex; then, by Corollary 1, KL(X, Y) and KL(Y, X) are convex and nonnegative (i.e., dKL (θ1 , θ2 ) > 0). So, if dKL (θ1 , θ2 ) < ∞, then (3.13) is a measure of distance.. 3.4.2. The R´ enyi distance. This distance measure is based on Renyi's generalized entropy given by equation (3.15) below. Let fX (x | θ1 ) and fY (x | θ2 ) be the densities of the random variables X and Y with respect to a dominating support S(x). Also, let φ(x) =. xα − α(x − 1) − 1 , α−1. 33. ∀x ∈ R+ and α ∈ (0, 1).. (3.14).

(118) Then, φ 0 (x) =. α αxα−1 − α−1 α−1. and. φ 00 (x) = αxα−2 > 0 for all x ∈ R+ .. Applying the de

(119) nition of Fφ (X, Y) given by the equation 3.5, Z. . fX (x | θ1 ) fY (x | θ2 ). . y = Fφ (X, Y) = fY (x | θ2 )φ dx = x ∈ S(x) Z 1 1−α α = fY (x | θ2 ) fX (x | θ1 ) dx − 1 . α − 1 x ∈ S(x). Using Theorem 1, as φ(x) is a convex function and φ(1) = 0, then y ≥ 0 then Z fY (x | θ2 )1−α fX (x | θ1 )α dx ∈ (0, 1], x ∈ S(x) 1 . then holds that y ∈ 0, 1−α Consider also the function h as in Corollary 1, then. . . 1 1 h(y) = log((α − 1)y + 1) ∀y ∈ 0, , then α−1 1−α 1 h 0 (y) = > 0 and h(0) = 0. (α − 1)y + 1. Moreover, applying h and φ in equation (3.6), the relative Renyi divergence is given by 1 log R (X, Y) = α−1 α. Z. fX (x | θ1 ) fY (x | θ2 ) α. 1−α. dx .. (3.15). x ∈ S(x). For a countable state space Ω, applying h and f in (3.7),  Rα (X, Y) =. 1 log  α−1. X.  (Pr(X = x | θ1 ))α (Pr(Y = x | θ2 ))1−α .. x ∈ S(x). Relative entropy was

(120) rst de

(121) ned by Renyi (1961). Similarly to (3.9), this 34.

(122) measure is not symmetric, but it can be modi

(123) ed to yield a symmetric divergence measure (Aviyente 2003): dαR (θ1 , θ2 ) = Rα (X, Y) + Rα (Y, X).. (3.16). From equation (3.16), it is possible to see that . (1) dαR (θ1 , θ2 ) = dαR (θ2 , θ1 ) (2) dαR (θ1 , θ2 ) ≥ 0 (being 0 if and only if θ1 = θ2 ).. The de

(124) nition of the Renyi distance implies (1). For (2), if θ1 = θ2 then α α f1−α Y (x | θ2 )fX (x | θ1 ) = fX (x | θ2 ) = fY (x | θ1 ), and, hence, dR (θ1 , θ2 ) = 0. It is also possible to see that, φ(1) = h(0) = 0, h 0 (y) > 0, and then function (3.14) is convex. Therefore, by Corollary 1, Rα (X, Y) and Rα (X, Y) are convex and nonnegative (i.e., dαR (θ1 , θ2 ) > 0). So, if dαR (θ1 , θ2 ) < ∞, then (3.16) is a measure of distance.. 3.4.3. The Hellinger distance. This measure distance was

(125) rst de

(126) ned by Diaconis & Zabel (1982). Let Ω be any measurable space. Let fX (x | θ1 ) and fY (x | θ2 ) be the densities of random variables X and Y with respect to a dominating support S(x). Also, consider the function 2 √ φ(x) = x−1 ∀x ∈ R+ . (3.17) Then. √ x−1 φ (x) = √ x 0. and φ 00 (x) =. 35. 1 √ > 0 for all x ∈ R+ . 2x x.

(127) Applying the de