Geographical Data Mining
Thales Sehn Korting
tkorting@dpi.inpe.br
http://www.dpi.inpe.br/~tkorting/
Motivation
• Large datasets
– Few data manipulation techniques – Few information extraction tools
• [Silva 2005] prototype system for mining patterns applied to Brazilian Amazon
deforestation
Amount of data
• Simple crop
– 2562 x 3 =
– 196608 values!
Amount of data
• 196608 input values to answer questions like:
– What kind of image?
– What objects are in the image?
– How many houses?
– Where are the streets?
How to reduce input data?
• Segmentation Regions
Data
Information
Area Perimeter Rectangularity
…
Pixels’ Mean Pixels’ STD Texture
…
In Practice
• Segment image = software A
• Visualize segmentation = software B
• Extract attributes = software C
• Normalize attributes = software D
• Visualize attributes’ space = software D
• Select Samples = software E
• Classify regions = software F
• Visualize results = software B
In Practice
• More than 5 different softwares!
– Processing time
– File-conversion time – etc.
• GeoDMA – Geographical Data Mining Analyst
– All tools on the same system
GeoDMA
• Input
– Raster – Polygons
• Processing
– Attributes Extraction – Normalization
– Supervised training
• Output
– Thematic classification
GeoDMA
Dataflow
GeoDMA and TerraLib
• Image processing functions
– Segmentation
• Region Growing
– Attributes Extraction
• Data Mining algorithms
– C4.5 Decision Tree – Self-Organizing Maps – ...
Current Applications
• Land Change in Brazilian Amazon
• Urban classification
Future Works
• Allow multi-temporal data mining
– Snapshots
– Try to explain changes
• More classification algorithms
• More precise segmentation
Geographical Data Mining
Try GeoDMA!
http://www.dpi.inpe.br/geodma/