I’ve found the data; it’s free and open access. Now what?
Gilberto Câmara
National Institute for Space Research (INPE) Brazil
Geospatial data catalogue
Source: [Bai and Di, 2011]
The hard-wired map metaphor
Cantino planisphere (1502)
Map metaphors live in GIS
Geospatial Database
Desktop GIS Web service
Birds do it… bees do it… even educated fleas do it… Let’s do it…
Distribution Model Algorithm Distribution map
Temperature
Precipitation
Environmental data
Ecological niche modelling
Species info Species
info Precipitation
Soil
Temperature Environmental data
openModeller
Bioclim Neural
Networks GARP
Specimens Modelling algorithms
openopenModellerModeller
Natural disasters
Risk Analyses Risk Analyses
Analysis
On-line data feed On-line data feed
Models Satellite/Radar
DCP
Rain total
Fixed time and irregular – alert Point data
One file per DCP
Grid 4km Total rain 1h Total rain 24h Current (mm/h) Binary file
ETA 40, 20, 5 Km Ensemble 40 Km Total rain 72h 72 files
ASCII grid file
Natural Disasters Monitoring and Alert System
Até 10%
10 - 20%
20 – 30%
30 – 40%
40 – 50%
50 – 60%
60 – 70%
70 – 80%
80 – 90%
90 – 100%
Amazonia (4.000.000 km2 = size of Europe)
Deforestation in Amazonia
Daily warnings of newly deforested large areas
Real-time Deforestation Monitoring
166-112 116-113
116-112
30 Tb of data
500.000 lines of code
150 man/years of software dev 200 man/years of interpreters
How much it takes to survey Amazonia?
Data Access Hitting a Wall
Current science practice based on data download
How do you download a petabyte?
Data Access Hitting a Wall
Current science practice based on data download
How do you download a petabyte?
You don’t! Move the software to the archive
Virtual Observatory
17
“If data is online, the internet is the world’s best telescope” (Jim Gray)
How many clouds do we need?
19
What happened here in the last 10 years?
source: INPE
< Corn > sugarcane ->
Are biofuels replacing food production in Brazil?
Are biofuels replacing food production in Brazil?
3 Tb of data behind this!
How much processing should be in the cloud?
Standard API? WPS?
23
Could this analysis be done in the cloud?
source: INPE
< Corn > sugarcane ->
Data chain in Earth System Science
fonte: NASA
source: USGS
Getting to the Data
Requires solving the spatial semantics problem Tentative solutions catalogues, metadata, SDIs,
ontologies, web services, semantic reference systems, linked open-data, ....
Communicating location is easy
Deforestation hotspots in Amazonia
Weather
source:
WMO
11,000 land stations (3000 automated)
900 radiosondes, 3000 aircraft 6000 ships, 1300 buoys
5 polar, 6 geostationary satellites
Communicating about data is feasible
Communicating concepts is hard
Image source: WMO
vulnerability? climate change? poverty?
degradation
We’re bad at representing meaning
deforestation? degradation? disturbance?
Communicating concepts is hard
When did the Aral Sea reach the tipping point?
Communicating change is very hard
Objects exist, events occur (mount Etna 2002 eruption)
Observations allow us to get the measure of external reality
WMO’s global observing system
WMO GRIB: simple and clean
Code Parameter Units .
052 Relative humidity %
053 Humidity mixing ratio kg/kg 054 Precipitable water kg/m2 055 Vapour pressure Pa
056 Saturation deficit Pa 057 Evaporation kg/m2 058 Cloud Ice kg/m2
059 Precipitation rate kg/m2/s 060 Thunderstorm probability % 061 Total precipitation kg/m2 076 Cloud water k g/m2
. .
When did the large flood occur in Angra?
When did the large flood occur in Angra? When precipitation was > 10mm/hour for 5 hours
Coverage set (hourly precipitation grid)
Cover change set (precipitation > 10
mm/hour)
When did the large flood occur in Angra?
CoverageSet p1 (“Precipitation”).
CoverChangeSet s1 = extract (p1 >
10, time1, time2)
TimeSeries t1 = intersect (s1, geom (“Angra”)
How many walruses
reached Baffin island?
How many walruses reached Baffin island? Those whose
trajectories touched Baffin isld moving
objects
trajectories
How many walruses reached Baffin island?
MovingObjectSet m1(“walruses”) Trajectories t1=
extract(m1,time1,time2)
Trajectories t2 = reach(t1, geom (“Baffin”))
When was this area converted from food to biofuel production?
Coverage set (remote sensing
images)
Time Series (vegetation
index)
When was this area converted from food to biofuel production? When the vegetation index peaked once a year.
Coverage set (remote sensing
images)
Time Series (vegetation
index)
When was this area converted from food to biofuel production?
CoverageSet c1 (“Cerrado”).
TimeSeries ts1 = extract (c1, “VegIndex”) for year = y1, yn do
time1 = year*52 + 1 time2 = time1 + 52
TimeSeries t2 = onepeak(ts1, time1, time2)
Time t1 = first (t2)
A new kind of geospatial analysis engine?
TerraLib: spatio-temporal database as a basis for innovation
Visualization (TerraView)
Spatio-temporal Database (TerraLib)
Modelling (TerraME)
Data Mining(GeoDMA)
Statistics (aRT)
We need a new generation of GI appliances Connect data brokering, sources, analysis
We need many clouds with remote processing Describe observations, not events
Allow users to process the data Conclusions