• Nenhum resultado encontrado

GeoCLEF 2007: the CLEF 2007 Cross-Language Geographic Information Retrieval Track Overview

N/A
N/A
Protected

Academic year: 2021

Share "GeoCLEF 2007: the CLEF 2007 Cross-Language Geographic Information Retrieval Track Overview"

Copied!
10
0
0

Texto

(1)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

2

2

GeoCLEF

GeoCLEF

Administration

Administration





Joint effort of

Joint effort of





Fredric

Fredric Gey

Gey

(U. California at Berkeley)

(U. California at Berkeley)





Diana Santos (

Diana Santos (Linguateca

Linguateca, SINTEF ICT, Norway)

, SINTEF ICT, Norway)





Mark Sanderson (U. Sheffield)

Mark Sanderson (U. Sheffield)





Nicola Ferro, Giorgio

Nicola Ferro, Giorgio Di

Di

Nunzio

Nunzio

(U. Padua)

(U. Padua)





Xing

Xing Xie

Xie

(Microsoft Research,

(Microsoft Research, Asia)

Asia)





Thomas Mandl, Christa

Thomas Mandl, Christa Womser

Womser-

-Hacker

Hacker

(U. Hildesheim)

(U. Hildesheim)





and many others ...

and many others ...

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

3

3

Overview

Overview

More on the geographic search task

More on the geographic search task





Thomas Mandl: Overview

Thomas Mandl: Overview





Query Classification Subtask

Query Classification Subtask





Mark Sanderson:

Mark Sanderson:

Topic Creation and Relevance Assessment

Topic Creation and Relevance Assessment





Giorgio

Giorgio

Di

Di

Nunzio

Nunzio

:

:

Results

Results





Diana Santos:

Diana Santos:

Approaches and Interpretation

Approaches and Interpretation

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

4

4

Geographic Information Systems

Geographic Information Systems

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

5

5

Initial Aim of

Initial Aim of

GeoCLEF

GeoCLEF





Aim: to evaluate retrieval of multilingual documents with an

Aim: to evaluate retrieval of multilingual documents with an

emphasis on geographic search (GIR)

emphasis on geographic search (GIR)





“find me news stories about riots near Dublin

find me news stories about riots near Dublin”

(Fred

(Fred Gey

Gey

@ CLEF Workshop 2005)

@ CLEF Workshop 2005)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

6

6

Participation

Participation

108

108

149

149

117

117

Nr. of

Nr. of

submitted

submitted

Experiments

Experiments

13

13

17

17

11

11

Nr. of

Nr. of

Participants

Participants

2007

2007

2006

2006

2005

2005

CLEF

CLEF

Year

Year

New: Query

New: Query

Classification

Classification

Task

Task

6

(2)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

7

7

Search Task 2007

Search Task 2007





Three languages

Three languages





600,000 + docs

600,000 + docs





25 topics (75 in three years now)

25 topics (75 in three years now)





Intention behind topics

Intention behind topics





geographically

geographically

challenging

challenging

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

8

8

Reliability?

Reliability?





25 topics are sufficient under most circumstances to reliably

25 topics are sufficient under most circumstances to reliably

order systems

order systems

(

(

Sanderson &

Sanderson &

Zobel

Zobel

2005

2005

)

)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

9

9

Partial Swap Rate Analysis

Partial Swap Rate Analysis

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Average Correlation between system rankings of full and partial topic set

for German Mono-lingual task 2006

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

10

10

Search Task

Search Task





How much and which geo knowledge and reasoning is

How much and which geo knowledge and reasoning is

necessary?

necessary?





Each year, keyword based

Each year, keyword based

systems do well on the task

systems do well on the task

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

11

11

Query Classification Task

Query Classification Task





Goal: find geo queries in a log of real queries

Goal: find geo queries in a log of real queries





New in 2007

New in 2007





Organized by Xing

Organized by Xing

Xie

Xie

(Microsoft Research Asia, Beijing, China)

(Microsoft Research Asia, Beijing, China)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

12

12

Data

Data





Query log from the MSN search engine

Query log from the MSN search engine





in English

in English





800.000 queries (collected August 2006)

800.000 queries (collected August 2006)





500 queries were

500 queries were labelled

labelled

and used for evaluation

and used for evaluation





100 queries for training

100 queries for training



(3)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

13

13

Task

Task





Find queries with a geographic scope

Find queries with a geographic scope





Extract where component

Extract where component





Extract geo-

Extract geo

-relation

relation-

-type

type





Extract what component

Extract what component





Classify what type {information, yellow page, map}

Classify what type

{information, yellow page, map}

Example:

< what>lottery

<what-type>information

<where>Florida, US

< geo-relation>in

<local>YES

Lottery

in Florida

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

14

14

Geo

Geo

-

-

relation

relation

-

-

types

types





27 classes

27 classes





Examples:

Examples:





In

In





On

On





Near

Near





Along

Along





Distance

Distance





North_of

North_of





North_west_of

North_west_of





North_to

North_to





CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

15

15

Evaluation Set

Evaluation Set

1.

1.

Choose 800 queries randomly from the query set.

Choose 800 queries randomly from the query set.

2.

2.

Remove the typos and the ambiguous queries from the 800 ones

Remove the typos and the ambiguous queries from the 800 ones

manually.

manually.

3.

3.

Select the queries with special geo

Select the queries with special geo-

-relations from the remainder

relations from the remainder

queries in the query set manually and add them to the evaluation

queries in the query set manually and add them to the evaluation

set.

set.

4.

4.

Select 500 queries for the final evaluation set.

Select 500 queries for the final evaluation set.

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

16

16

Query Types in Evaluation Set

Query Types in Evaluation Set

map

16%

information

19%

Yellow page

29%

non-local

36%

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

17

17

Evaluation Metrics

Evaluation Metrics





Three assessors

Three assessors





individually assessed all system answers

individually assessed all system answers





reached an agreement

reached an agreement





Fully Correct classified query instances

Fully Correct classified query instances





Recall, precision and combined F1

Recall, precision and combined F1

-

-

Score

Score

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

18

18

Results

Results

0.088

0.088

0.08

0.08

0.096

0.096

Xldb

Xldb

0.235

0.235

0.249

0.249

0.222

0.222

Talp

Talp

0.488

0.488

0.566

0.566

0.428

0.428

Miracle

Miracle

0.057

0.057

0.038

0.038

0.112

0.112

Linguit

Linguit

0.199

0.199

0.197

0.197

0.201

0.201

Csusm

Csusm

0.365

0.365

0.258

0.258

0.625

0.625

Ask

Ask

F1

F1

Recall

Recall

Precision

Precision

Team

Team

(4)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

19

19

Approaches

Approaches





Gazeteers

Gazeteers

for location identification

for location identification





Large base of geo names

Large base of geo names





Pre

Pre

-

-

defined Rules

defined Rules





Issues

Issues





Low Perfomance

Low

Perfomance





Few training classes for many geo-

Few training classes for many geo

-types

types

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

20

20

geoCLEF

geoCLEF

topic creation

topic creation

Mark Sanderson

Mark Sanderson

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

29/09/2007

© The University of Sheffield / Department of Marketing and Communications

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

21

21

Topics

Topics





25

25

adhoc

adhoc

topics

topics





Developed

Developed



One third in English

One third in English



One third in German

One third in German



One third in Portuguese

One third in Portuguese

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

22

22

Classic CLEF topic creation

Classic CLEF topic creation





Developed topics in local language

Developed topics in local language





Other

Other

geoCLEF

geoCLEF

partners translated topics and

partners translated topics and

checked for relevance in their collection.

checked for relevance in their collection.

29/09/2007

© The University of Sheffield / Department of Marketing and Communications

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

23

23

Motivation behind topic design

Motivation behind topic design





Text only often wins

Text only often wins





How to tackle that

How to tackle that





Imprecise regions

Imprecise regions





Regions surrounding, but not including a point

Regions surrounding, but not including a point





Regions where local knowledge is important

Regions where local knowledge is important

29/09/2007

© The University of Sheffield / Department of Marketing and Communications

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

24

24

Imprecise regions

Imprecise regions





Documents describing the damage caused by acid rain in the

Documents describing the damage caused by acid rain in the

countries of northern Europe

countries of northern Europe

(5)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

25

25

Surrounding

Surrounding



“

Find information about social problems

Find information about social problems

afflicting places in greater Lisbon.

afflicting places in greater Lisbon.





Find documents mentioning airplane

Find documents mentioning airplane

crashes close to Russian cities

crashes close to Russian cities

29/09/2007

© The University of Sheffield / Department of Marketing and Communications

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

26

26

Local knowledge

Local knowledge





To be relevant, a document must describe a

To be relevant, a document must describe a

whisky made, or a whisky distillery located, on

whisky made, or a whisky distillery located, on

a Scottish island.

a Scottish island.

29/09/2007

© The University of Sheffield / Department of Marketing and Communications

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

27

27

Problems

Problems





Always hard to find topics that work

Always hard to find topics that work

well across languages

well across languages

29/09/2007

© The University of Sheffield / Department of Marketing and Communications

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

28

28

Assessment

Assessment

-

-

DIRECT

DIRECT





Mostly volunteers

Mostly volunteers



Hildesheim

Hildesheim



SINTEF

SINTEF



Fred in Berkeley

Fred in Berkeley





Some funding

Some funding



Sheffield

Sheffield –

Tripod project

Tripod project

29/09/2007

© The University of Sheffield / Department of Marketing and Communications

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

29

29

Participation (1/2)

Participation (1/2)

Monolingual Tasks

Bilingual Tasks

TOTAL

Participant

DE

EN

PT

X2DE X2EN X2PT

catalunya

5

5

cheshire

1

1

1

3

3

3

12

csusm

6

6

5

4

4

25

depok*

6

6

groningen

5

5

hagen

5

5

10

hildesheim

4

4

8

icl

4

4

linguit*

4

4

moscow*

2

2

msasia

5

5

valencia

12

12

xldb

5

5

10

TOTAL

16

53

11

8

13

7

108

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

30

30

Participation (2/2)

Participation (2/2)

Source Language

TOTAL

Track

DE EN ES ID PT

Bilingual X2DE

6

1

1

8

Bilingual X2EN

1

5

6

1

13

Bilingual X2PT

1

1

5

7

TOTAL

2

7

11

6

2

28

(6)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

31

31

Monolingual

Monolingual

Tasks

Tasks





English

English





German

German





Portuguese

Portuguese

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

32

32

Monolingual

Monolingual

English

English

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Recall

Precision

GeoCLEF Monolingual English Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision

catalunya [Experiment TALPGEOIRTD2; MAP 28.50%; Not Pooled]

cheshire [Experiment BERKMOENBASE; MAP 26.42%; Pooled]

valencia [Experiment RFIAUPV06; MAP 26.36%; Not Pooled]

groningen [Experiment CLCGGEOEETD00; MAP 25.15%; Not Pooled]

csusm [Experiment GEOMOEN5; MAP 21.32%; Not Pooled]

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

33

33

Monolingual

Monolingual

English

English

Track

Rank

Part.

Experiment DOI

MAP

1

st

catalunya

10.2415/GC-MONO-EN-CLEF2007.CATALUNYA.TALPGEOIRTD2

28.50%

2

nd

cheshire

10.2415/GC-MONO-EN-CLEF2007.CHESHIRE.BERKMOENBASE

26.42%

3

rd

valencia

10.2415/GC-MONO-EN-CLEF2007.VALENCIA.RFIAUPV06

26.36%

4

th

groningen

10.2415/GC-MONO-EN-CLEF2007.GRONINGEN.CLCGGEOEETD00

25.15%

5

th

csusm

10.2415/GC-MONO-EN-CLEF2007.CSUSM.GEOMOEN5

21.32%

Monolingual

English

Diff.

33,68%

POS, lemmas, named entities

Geographical thesaurus

tfxidf (Terrier)

World Gazetteer

LR

WordNet

LingPipe

tfxidf (Lucene)

World Gazetteer, GeoNET, Wikipedia,

WordNet

Ling Pipe

mixed tfxidf (Lucene)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

34

34

Monolingual

Monolingual

German

German

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Recall

Precision

GeoCLEF Monolingual German Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision

hagen [Experiment FUHTDN5DE; MAP 25.76%; Pooled]

csusm [Experiment GEOMODE5; MAP 21.41%; Pooled]

hildesheim [Experiment HIMODENE2NA; MAP 20.67%; Pooled]

cheshire [Experiment BERKMODEBASE; MAP 13.92%; Pooled]

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

35

35

Monolingual

Monolingual

German

German

Track

Rank

Part.

Experiment DOI

MAP

1

st

hagen

10.2415/GC-MONO-DE-CLEF2007.HAGEN.FUHTDN5DE

25.76%

2

nd

csusm

10.2415/GC-MONO-DE-CLEF2007.CSUSM.GEOMODE4

21.41%

3

rd

hildesheim

10.2415/GC-MONO-DE-CLEF2007.HILDESHEIM.HIMODENE2NA

20.67%

4

th

cheshire

10.2415/GC-MONO-DE-CLEF2007.CHESHIRE.BERKMODEBASE

13.92%

5

th

Monolingual

German

Diff.

85.06%

Semantic analysis for GIR

tfxidf (Zebra DBMS)

GeoNet name server

BM25, LM, InL2 (Terrier)

LingPipe

tfxidf (Lucene)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

36

36

Monolingual

Monolingual

Portuguese

Portuguese

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Recall

Precision

GeoCLEF Monolingual Portuguese Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision

csusm [Experiment GEOMOPT3; MAP 17.83%; Pooled]

cheshire [Experiment BERKMOPTBASE; MAP 17.39%; Pooled]

xldb [Experiment XLDBPT_1; MAP 3.29%; Pooled]

(7)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

37

37

Monolingual

Monolingual

Portuguese

Portuguese

Track

Rank

Part.

Experiment DOI

MAP

1

st

csusm

10.2415/GC-MONO-PT-CLEF2007.CSUSM.GEOMOPT3

17.83%

2

nd

cheshire

10.2415/GC-MONO-PT-CLEF2007.CHESHIRE.BERKMOPTBASE

17.39%

3

rd

xldb

10.2415/GC-MONO-PT-CLEF2007.XLDB.XLDBPT_1

3.29%

4

th

5

th

Monolingual

Portuguese

Diff.

441.95%

System with different modules

Geographic Ontology + Text Mining

MG4J + mixed Okapi BM25

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

38

38

Bilingual

Bilingual

Tasks

Tasks





X

X

-

-

>

>

English

English





X

X

-

-

>

>

German

German





X

X

-

-

>

>

Portuguese

Portuguese

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

39

39

Bilingual

Bilingual

X

X

-

-

>

>

English

English

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Recall

Precision

GeoCLEF Bilingual English Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision

cheshire [Experiment BERKBIDEENBASE; MAP 22.08%; Pooled]

depok [Experiment UIBITDGP; MAP 20.96%; Pooled]

csusm [Experiment GEOBIESEN2; MAP 19.62%; Pooled]

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

40

40

Bilingual

Bilingual

X

X

-

-

>EN

>EN

Track

Rank

Part.

Experiment DOI

MAP

1

st

cheshire

10.2415/GC-BILI-X2EN-CLEF2007.CHESHIRE.BERKBIDEENBASE

22.08%

2

nd

depok*

10.2415/GC-BILI-X2EN-CLEF2007.DEPOK.UIBITDGP

20.96%

3

rd

csusm

10.2415/GC-BILI-X2EN-CLEF2007.CSUSM.GEOBIESEN2

19.62%

4

th

5

th

Bilingual

English

Diff.

12.54%

LEC Power Translator

World Gazetteer

LR

Free MT tool

Geonames + Wikipedia

LM (Lemur)

Spanish Toponomy from Euro Parl.

GeoNet name server

BM25, LM, InL2 (Terrier)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

41

41

Bilingual

Bilingual

X

X

-

-

>

>

German

German

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Recall

Precision

GeoCLEF Bilingual German Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision

hagen [Experiment FUHTDN4EN; MAP 20.92%; Pooled]

cheshire [Experiment BERKBIPTDEBASE; MAP 11.09%; Pooled]

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

42

42

Bilingual

Bilingual

X

X

-

-

>

>

German

German

Track

Rank

Part.

Experiment DOI

MAP

1

st

hagen

10.2415/GC-BILI-X2DE-CLEF2007.HAGEN.FUHTDN4EN

20.92%

2

nd

cheshire

10.2415/GC-BILI-X2DE-CLEF2007.CHESHIRE.BERKBIPTDEBASE

11.09%

3

rd

4

th

5

th

Bilingual

German

Diff.

88,64%

Promt Web service MT

Semantic analysis for GIR

(8)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

43

43

Bilingual

Bilingual

X

X

-

-

>

>

Portuguese

Portuguese

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Recall

Precision

GeoCLEF Bilingual Portuguese Task Top 5 Participants − Standard Recall Levels vs Mean Interpolated Precision

cheshire [Experiment BERKBIENPTBASE; MAP 20.12%; Pooled]

csusm [Experiment GEOBIESPT4; MAP 5.33%; Pooled]

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

44

44

Bilingual

Bilingual

X

X

-

-

>

>

Portuguese

Portuguese

Track

Rank

Part.

Experiment DOI

MAP

1

st

cheshire

10.2415/GC-BILI-X2PT-CLEF2007.CHESHIRE.BERKBIENPTBASE

20.12%

2

nd

csusm

10.2415/GC-BILI-X2PT-CLEF2007.CSUSM.GEOBIESPT4

5.33%

3

rd

4

th

5

th

Bilingual

Portuguese

Diff.

277.49%

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

45

45

More

More

Analyses

Analyses





Monolingual

Monolingual

vs

vs

Bilingual Analyses

Bilingual Analyses





Mean of average precision for each topic of a task

Mean of average precision for each topic of a task





Median of average precision for each topic of a task

Median of average precision for each topic of a task

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

46

46

Mean

Mean

vs

vs

Median

Median

(

(

mono

mono

, bili

, bili

German

German

)

)

52−GC64−GC66−GC55−GC70−GC72−GC53−GC57−GC51−GC56−GC71−GC62−GC65−GC73−GC74−GC60−GC61−GC58−GC59−GC54−GC69−GC63−GC68−GC75−GC67−GC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 52−GC64−GC66−GC55−GC70−GC72−GC53−GC57−GC51−GC56−GC71−GC62−GC65−GC73−GC74−GC60−GC61−GC58−GC59−GC54−GC69−GC63−GC68−GC75−GC67−GC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

mean > median

mean < median

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

47

47

Monolingual

Monolingual

Mean

Mean

VS

VS

Bilingual

Bilingual

Mean

Mean

German

German

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 51−GC 52−GC 53−GC 54−GC 55−GC 56−GC 57−GC 58−GC 59−GC 60−GC 61−GC 62−GC 63−GC 64−GC 65−GC 66−GC 67−GC 68−GC 69−GC 70−GC 71−GC 72−GC 73−GC 74−GC 75−GC

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

48

48

GC-MONO-CLEF2007 Track Overview Results and Graphs GC-MONO-DE-CLEF2007

Statistics for Average Precsion TopicMin1st Q2nd Q3rd QMaxMeanStdIQRRangeC.VarGeom.Harm.MADLOTUOTMNOSNO 51-GC0.00140.00720.04560.16810.26010.08660.09680.16090.25881.11760.02800.00610.08410.00140.26010.08660.0968 52-GC0.00000.00000.00000.00000.00000.00000.00000.00000.0000N/D0.00000.00000.00000.00000.00000.00000.0000 53-GC0.00040.00680.07920.12430.19850.07780.07090.11740.19810.91150.02450.00240.06340.00040.19850.07780.0709 54-GC0.01860.15540.30250.51140.78030.31640.22750.35600.76170.71900.19740.07990.17390.01860.78030.31640.2275 55-GC0.00000.00100.02130.07290.14670.04260.05010.07190.14671.17550.00860.00010.04120.00000.14670.04260.0501 56-GC0.00250.03990.06960.23230.25120.12180.10160.19240.24870.83400.06500.01750.09390.00250.25120.12180.1016 57-GC0.00000.00150.01560.11140.50000.07920.14010.10990.50001.76870.00440.00000.10410.00000.25000.05120.0868 58-GC0.00290.16230.23550.27790.35090.20580.09220.11560.34810.44770.15530.03690.07280.00290.35090.20580.0922 59-GC0.00000.02240.32320.44520.54570.26260.21360.42280.54570.81330.03050.00010.19340.00000.54570.26260.2136 60-GC0.00000.02450.12140.39180.54360.18820.19440.36730.54361.03290.03710.00010.16490.00000.54360.18820.1944 61-GC0.00000.03240.17420.22710.54000.18830.18490.19480.54000.98200.06820.00020.14220.00000.49200.13880.1353 62-GC0.00640.01800.07710.27670.36360.14020.14450.25870.35721.03070.05900.02340.12550.00640.36360.14020.1445 63-GC0.14710.30130.34400.39150.45130.34060.07800.09020.30420.22910.33030.31720.06060.26490.45130.35350.0606 64-GC0.00030.00120.00400.01490.02380.00850.00880.01370.02361.02820.00370.00150.00780.00030.02380.00850.0088 65-GC0.02380.04830.10800.29590.34790.15780.12540.24770.32410.79480.10670.06910.11120.02380.34790.15780.1254 66-GC0.00110.00500.01040.02130.08100.01980.02390.01630.07991.20870.01120.00630.01650.00110.03110.01160.0088 67-GC0.73390.76900.79220.81260.81530.78540.02710.04360.08150.03450.78490.78450.02270.73390.81530.78540.0271 68-GC0.08600.32190.40250.48590.57820.38430.14190.16410.49220.36910.34350.27880.10280.08600.57820.38430.1419 69-GC0.11080.12370.24230.49740.64400.32150.20940.37360.53320.65140.25700.20700.18740.11080.64400.32150.2094 70-GC0.00290.00660.08980.10840.12430.06390.05160.10180.12140.80720.03150.01170.04880.00290.12430.06390.0516 71-GC0.00000.02260.07190.22920.34920.12510.13750.20660.34921.09890.02840.00010.11200.00000.34920.12510.1375 72-GC0.00140.00420.05600.11150.20810.06720.06690.10720.20670.99520.02440.00630.05920.00140.20810.06720.0669 73-GC0.00240.02950.05790.34520.41200.16160.16730.31570.40961.03560.06990.01990.14550.00240.41200.16160.1673 74-GC0.00000.02190.07330.30790.48100.16430.16600.28600.48101.01080.03360.00010.14760.00000.48100.16430.1660 75-GC0.04760.39630.44990.58770.84810.47730.23200.19140.80040.48600.38350.22830.16270.39630.84810.53870.1722 23 GC-MONO-CLEF2007 Track Overview Results and Graphs GC-MONO-EN-CLEF2007

Statistics for Average Precsion TopicMin1st Q2nd Q3rd QMaxMeanStdIQRRangeC.VarGeom.Harm.MADLOTUOTMNOSNO 51-GC0.00000.39360.52580.68220.79920.47940.23960.28860.79910.49990.16900.00020.18120.00000.79920.47940.2396 52-GC0.00000.00110.00450.01990.09630.01350.02110.01880.09631.56400.00340.00020.01440.00000.03270.00790.0096 53-GC0.00060.05660.10890.16050.25440.11400.06590.10390.25390.57810.08090.01660.05420.00060.25440.11400.0659 54-GC0.00000.00320.05080.10740.14320.05970.04950.10420.14320.82970.01270.00010.04510.00000.14320.05970.0495 55-GC0.00000.00870.11270.20370.36430.12630.10740.19500.36430.85020.03980.00020.09280.00000.36430.12630.1074 56-GC0.00000.00850.14950.20980.47770.14160.11880.20140.47770.83900.02710.00010.08770.00000.47770.14160.1188 57-GC0.00000.06710.16980.23090.45020.17360.13030.16380.45020.75070.05130.00010.09880.00000.45020.17360.1303 58-GC0.00000.01110.03080.09910.23580.05690.06310.08810.23581.10940.02220.00030.05150.00000.19650.05350.0585 59-GC0.00000.00000.00000.00000.00000.00000.00000.00000.0000N/D0.00000.00000.00000.00000.00000.00000.0000 60-GC0.00080.12810.70290.74340.78180.52400.31180.61530.78100.59510.23010.01230.28020.00080.78180.52400.3118 61-GC0.00000.03440.10120.14080.48780.12490.12040.10640.48780.96470.06770.00050.08580.00000.26350.08980.0673 62-GC0.00030.02280.06350.27380.35090.13750.13030.25100.35060.94800.05950.00730.12180.00030.35090.13750.1303 63-GC0.00000.00000.00000.00000.00000.00000.00000.00000.0000N/D0.00000.00000.00000.00000.00000.00000.0000 64-GC0.00000.00290.01620.02910.20100.02800.04340.02620.20101.54960.00580.00010.02640.00000.06620.01510.0148 65-GC0.00080.07980.14070.35470.43300.21340.14670.27490.43220.68730.13820.02620.13820.00080.43300.21340.1467 66-GC0.00000.00890.13670.33720.37250.16630.14600.32830.37250.87790.03420.00010.13340.00000.37250.16630.1460 67-GC0.00000.17770.34020.42460.53530.28630.16560.24690.53530.57830.07150.00010.14170.00000.53530.28630.1656 68-GC0.00000.32340.53270.64100.75790.47130.22410.31760.75790.47540.32890.00050.19380.00000.75790.47130.2241 69-GC0.00000.01180.11070.16450.33420.11660.10470.15280.33420.89800.01670.00010.08440.00000.33420.11660.1047 70-GC0.00040.00810.02330.06250.31840.05870.08870.05440.31801.51160.02010.00500.06170.00040.13380.02650.0280 71-GC0.00000.00000.00000.00000.00000.00000.00000.00000.0000N/D0.00000.00000.00000.00000.00000.00000.0000 72-GC0.00000.20160.32260.63790.70710.37280.25060.43630.70710.67210.12350.00010.22460.00000.70710.37280.2506 73-GC0.00000.00200.02160.03800.30050.04620.07780.03600.30051.68360.00550.00010.04970.00000.08250.01910.0197 74-GC0.00000.00420.17320.45190.72220.26110.24640.44760.72220.94380.03430.00010.22290.00000.72220.26110.2464 75-GC0.00000.08270.30220.44110.63350.28010.21050.35850.63350.75150.09030.00020.18260.00000.63350.28010.2105 55

GC-MONO-CLEF2007 Track Overview Results and Graphs GC-MONO-PT-CLEF2007

Statistics for Average Precsion TopicMin1st Q2nd Q3rd QMaxMeanStdIQRRangeC.VarGeom.Harm.MADLOTUOTMNOSNO 51-GC0.00000.00000.00110.19350.19350.07100.09710.19350.19351.36750.00130.00000.08910.00000.19350.07100.0971 52-GC0.00000.01050.02710.02710.21140.04340.06380.01660.21141.46980.01290.00010.04370.00000.02710.01670.0104 53-GC0.00000.00000.00040.00490.04300.00570.01260.00490.04302.19730.00030.00000.00680.00000.00490.00200.0025 54-GC0.00000.00000.00290.00290.05230.00790.01610.00290.05232.04300.00050.00000.01070.00000.00290.00130.0015 55-GC0.00020.00710.00710.02610.03430.01530.01240.01910.03400.80650.00820.00180.01130.00020.03430.01530.0124 56-GC0.00000.00000.00140.36840.57220.18610.22110.36840.57221.18810.00270.00000.20270.00000.57220.18610.2211 57-GC0.00280.01020.11670.12330.37500.09350.10920.11310.37221.16820.03880.01380.07710.00280.12330.06540.0598 58-GC0.00020.00020.00930.13300.14780.06180.06890.13290.14761.11320.00690.00040.06430.00020.14780.06180.0689 59-GC0.00000.00000.00620.00620.00930.00370.00360.00620.00930.98770.00030.00000.00340.00000.00930.00370.0036 60-GC0.00000.00000.42290.53120.70530.28520.28150.53120.70530.98700.00690.00000.25770.00000.70530.28520.2815 61-GC0.00000.13530.13530.16130.27830.13900.06810.02600.27830.49000.06060.00010.04160.13530.16130.14830.0139 62-GC0.00000.00080.00960.05140.12150.03090.03860.05060.12151.25150.00470.00010.03140.00000.12150.03090.0386 63-GC0.00350.01280.01280.03040.18890.04650.07090.01760.18551.52540.02010.01170.05180.00350.03040.01480.0095 64-GC0.00000.00110.00740.01420.02630.00820.00860.01320.02631.04800.00270.00010.00700.00000.02630.00820.0086 65-GC0.00010.00410.00410.36580.59410.18550.24620.36170.59401.32700.01690.00060.21420.00010.59410.18550.2462 66-GC0.00000.00000.00070.66720.66720.24820.33270.66720.66721.34010.00190.00000.30470.00000.66720.24820.3327 67-GC0.10620.10620.27990.29350.47280.22700.12080.18730.36660.53220.19780.17230.10270.10620.47280.22700.1208 68-GC0.00000.04130.26060.27940.30850.17030.13200.23810.30850.77490.04840.00010.12280.00000.30850.17030.1320 69-GC0.00000.00000.01060.45990.55800.21890.25110.45990.55801.14680.00250.00000.23690.00000.55800.21890.2511 70-GC0.00000.00000.00650.00650.02700.00680.00920.00650.02701.35820.00060.00000.00630.00000.00650.00290.0034 71-GC0.00440.00440.03350.27680.27680.13240.13790.27240.27241.04210.04060.01100.13070.00440.27680.13240.1379 72-GC0.00200.05310.05310.10610.10610.06330.03870.05300.10410.61130.04100.01330.03110.00200.10610.06330.0387 73-GC0.00000.00000.00030.00030.30880.02830.09300.00030.30883.28400.00010.00000.05100.00000.00030.00010.0001 74-GC0.00220.01900.09760.18760.19670.10210.08700.16850.19450.85270.05160.01570.07940.00220.19670.10210.0870 75-GC0.00000.06620.06620.10220.30470.09370.07640.03610.30470.81490.03920.00010.04450.05270.10220.08070.0209 87

22.70

22.70

78.54

78.54

28.36

28.36

Portuguese

Portuguese

(75 rel

(75

rel

docs

docs)

)

German

German

(61

(61 rel

rel

docs

docs)

)

English

English

(17

(17 rel

rel

docs

docs)

)

Topic

(9)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

49

49

-<top lang="en">

<num>10.2452/67-GC</num>

<title>F1 circuits where Ayrton Senna competed in 1994</title>

<desc>Find documents that mention circuits where the Brazilian driver Ayrton Senna participated in 1994. The name and location of the circuit is required</desc>

<narr>Documents should indicate that Ayrton Senna participated in a race in a particular stadion, and the location of the race track.</narr>

</top>

-- <top lang="de">

<num>10.2452/67-GC</num>

<title>Formel-1-Rennstrecken, auf denen Ayrton Senna 1994 gefahren ist</title>

<desc>Dokumente, die Rennstrecken erwähnen, auf denen der Brasilianische Fahrer Ayrton Senna 1994 gefahren ist. Name und Ort müssen erwähnt sein.</desc>

<narr>Dokumente sollten angeben, dass Ayrton Senna an einem Rennen auf einer bestimmten Strecke teilgenommen hat. Der Ort der Strecke sollte genannt

sein.</narr>

</top>

-- <top lang="pt">

<num>10.2452/67-GC</num>

<title>Pistas em que Ayrton Senna correu em 1994</title>

<desc>Em que circuitos de fórmula 1 Ayrton Senna competiuem 1994?</desc>

<narr>Documentos que indiquem que Ayrton Senna participou em corridas num determinado autódromo, em que a localização deste é também mencionada.</narr>

</top>

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

50

50

Overview of GeoCLEF 2007

Overview of GeoCLEF 2007





IR techniques

IR techniques





IE/NLP techniques

IE/NLP techniques





GIR techniques

GIR techniques





Systems

Systems





Resources

Resources





Experiments

Experiments





Translation

Translation





General comments

General comments

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

51

51

IR techniques and systems

IR techniques and systems





automatic and manual query expansion

automatic and manual query expansion





blind relevance feedback

blind relevance feedback





INL2 term weighting model

INL2 term weighting model





divergence from randomness framework

divergence from randomness framework





latent Dirichlet allocation model

latent Dirichlet allocation model





logistic regression

logistic regression





query decomposition

query decomposition





vector space model

vector space model





stemming

stemming





systems:

systems:

Lemur (2), Lucene (4), MG4J, Terrier (2), Zebra DBMS

Lemur (2), Lucene (4), MG4J, Terrier (2), Zebra DBMS

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

52

52

NLP techniques and systems

NLP techniques and systems





named entity recognition

named entity recognition





WordNet

WordNet

-

-

based expansion

based expansion





semantic analysis

semantic analysis





part

part

-

-

of

of

-

-

speech tagging

speech tagging





use of a QA system for subqueries

use of a QA system for subqueries





decompounding (for German)

decompounding (for German)





WordNet lemmatizer

WordNet lemmatizer





systems

systems

: LingPipe, Annie (GATE)

: LingPipe, Annie (GATE)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

53

53

GIR/GIE techniques

GIR/GIE techniques





location disambiguation

location disambiguation





geographic unique strings

geographic unique strings





location normalization

location normalization





query expansion based on geographical terms

query expansion based on geographical terms





query expansion based on a geographic ontology

query expansion based on a geographic ontology





heuristic geographically informed filtering

heuristic geographically informed filtering





removing candidates

removing candidates





using shape files for close or near geographical relations

using shape files for close or near geographical relations





separate geographical indexes

separate geographical indexes





geographic cooccurrence model (based on Wikipedia)

geographic cooccurrence model (based on Wikipedia)





geographic relation finder

geographic relation finder

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

54

54

GIR/GIE resources

GIR/GIE resources





geonames gazetteer

geonames gazetteer





World Gazetteer

World Gazetteer





Getty TGN

Getty TGN





GNIS

GNIS





GeoWorldMap World gazetteer

GeoWorldMap World gazetteer





ADL Feature Type thesaurus

ADL Feature Type thesaurus





GKB 2.0

GKB 2.0





Spanish toponymy list

Spanish toponymy list



(10)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

55

55

Translation (only queries)

Translation (only queries)





Systran (PT

Systran (PT

-

-

EN)

EN)





Prompt (ES

Prompt (ES

-

-

EN)

EN)





Promt (EN

Promt (EN

-

-

DE)

DE)





LEC Power Translator (all possible pairs)

LEC Power Translator (all possible pairs)





transfer MT system (ES

transfer MT system (ES

-

-

EN, ES

EN, ES

-

-

PT)

PT)





Toggletext (ID

Toggletext (ID

-

-

EN)

EN)

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

56

56

Experiments

Experiments





In addition to the particular experiments of each group

In addition to the particular experiments of each group

concerning their original approaches

concerning their original approaches





Separate indexes for geographic and non

Separate indexes for geographic and non

-

-

geographic

geographic

information

information





Apply different techniques to different parts of the query

Apply different techniques to different parts of the query





Use T, TD or TDN

Use T, TD or TDN

CLEF 2007, Budapest, September 19

CLEF 2007, Budapest, September 19

-

-

21, 2007

21, 2007

T. Mandl et al.

T. Mandl et al.

57

57

Comments

Comments





Very few papers mention distinguished treatment for different

Very few papers mention distinguished treatment for different

kinds of topics

kinds of topics





although some discuss differences between topics

although some discuss differences between topics





only one provides some per-

only one provides some per

-topic analysis

topic analysis





Most participants have best results with text only!

Most participants have best results with text only!





Most participants use NER,

Most participants use NER,





but no NER evaluation in itself has been reported: does it really help? or

but no NER evaluation in itself has been reported: does it reall

y help? or

does it also diminish performance?

does it also diminish performance?





they don’

they don

’t use the output of NER the same way: wide difference in

t use the output of NER the same way: wide difference in

behaviour after NER has been invoked

Referências

Documentos relacionados

Ousasse apontar algumas hipóteses para a solução desse problema público a partir do exposto dos autores usados como base para fundamentação teórica, da análise dos dados

Na cultura da cana-de-açúcar o manejo químico é o mais utilizado, no entanto é importante conhecer todas as formas de controle, e visar integra-las de modo a

O paciente evoluiu no pós-operatório imediato com progressão de doença em topografia do tumor primário (figura1), sendo encaminhado ao Oncobeda em janeiro de 2017

A história das relações entre política e cultura está cheia de enganos espalhados por toda parte. De um lado, uma concepção espiritualista da cultura, que vê na política

In ragione di queste evidenze, il BIM - come richiesto dalla disciplina comunitaria - consente di rispondere in maniera assolutamente adeguata alla necessità di innovazione dell’intera

The results showed that the higher effi ciency in light energy uptake, paired with better photochemical performance and better CO 2 fi xation in plants under direct sunlight

Recentemente, Barber-Westin and Noyes 2 , numa revisão sistemática de 264 artigos, reportaram que os critérios mais utilizados para autorizar a retoma ao desporto

We present a fast, real-time dosimetry detection system based on flexible oxide thin-film transistors that show a quantitative shift in threshold voltage of up to 3.4 V/gray

Paulo: [...] os termos científicos se a gente for pensar em Biologia para aluno de ensino médio eles não são tão corriqueiros tão fáceis de entender. Então se eu consigo fazer