• Nenhum resultado encontrado

Resultados das avaliações manuais

4.2 EXPERIMENTOS

6.2.3 Resultados das avaliações manuais

O experimento aqui descrito avalia manualmente o desempenho dos sistemas de sumarização extrativa selecionados, seguindo os critérios de informatividade e qualidade textual definidos no próprio questionário de avaliação. Para isso, adotou-se a avaliação intríseca baseada em questionários. Um exemplo de um questionário com as instruções no cabeçalho, o link para a notícia original e os sumários candidatos para avaliação pode ser visto abaixo.

Evaluation Of Automatic Text Summarization

Instructions: This survey consists in assessing summaries automatically produced; each summary summarizes the news.

Read the news and summaries below and select the best summaries, taking into account the following quality criteria:

The summary in which it considers easier to read and understand The summary in which it considers better organized The summary that has the highest quality (cohesion and coherence) The summary that best represents the original text The summary that contains more relevant information The summary that contains the least amount of repeated information The summary that contains the fewest pronouns without being connected to an entity (e.g., noun) And you should take into account the five aspects of linguistic quality described below:

1. Grammaticality: The summary should have no datelines, system-internal format- ting, capitalization errors or obviously ungrammatical sentences (e.g., fragments, missing components) that make the text difficult to read.

2. Non-redundancy: There should be no unnecessary repetition in summary. Un- necessary repetition might take the form of whole sentences that are repeated, or repeated facts, or the repeated use of a noun or noun phrase (e.g., “Bill Clin- ton”) when a pronoun (“he”) would suffice.

3. Referential clarity: It should be easy to identify who or what the pronouns and noun phrases, in summary, are referring to. If a person or other entity is mentioned, it should be clear what their role in the story is. So, a reference would be unclear if an entity is referenced but its identity or relation to the story remains unclear. 4. Focus: The summary should have a focus; sentences should only contain information

that is related to the rest of the summary.

5. Structure and Coherence: The summary should be well- structured and well- organized. The summary should not just be a heap of related information but should build from sentence to sentence to a coherent body of information about a topic. News:

<http://edition.cnn.com/2013/07/04/travel/brazil-10-things/index.html>

Summaries:

1. Brazil might be the biggest country most of the world doesnt know a whole heckuva a lot about. Now that Brazil will be hosting the World Cup next year and the Olympics in 2016 , its time for a crash course in all things Brazilian. In world rankings for the gap between rich and poor, Brazil has the 11th biggest gulf, coming in after a group of impoverished African countries. The seventh largest city in Brazil sits halfway up the Amazon River, where the Rio Negro intersects the great river. In Rio de Janeiro, the city puts on free music fests, with top bands performing on stages across the city, while in Recife, at the citys outdoor concert stage across the street from the beach, free concerts range from rock to forro to an event featuring every drummer with a drum kit in the city, more than 100 of them playing in unison.

2. Brazil might be the biggest country most of the world doesn’t know a whole heckuva a lot about. Now that Brazil will be hosting the World Cup next year and the Olympics in 2016 (massive public expenditures that many here deeply oppose), it’s time for a crash course in all things Brazilian. Not many Brazilians speak English, particularly outside Rio de Janeiro or Sao Paulo. In world rankings for the gap

between rich and poor, Brazil has the 11th biggest gulf, coming in after a group of impoverished African countries. The seventh largest city in Brazil sits halfway up the Amazon River, where the Rio Negro intersects the great river. Today, the city of Manaus has gleaming office buildings and modern factories and is growing faster than many other parts of the country. The Brazilian knack for improvising on any musical theme makes this country unrivaled in the quantity and quality of different musical rhythms and styles. But Brazil dances to the beat of many different drummers, reflecting its diverse population. Piranha live in the major river basins of South America, and the Amazon tributaries of Brazil are well-stocked with them.

3. Brazil might be the biggest country most of the world doesn’t know a whole heckuva a lot about. This is beginning to change as financial fortunes for some have soared in recent years, helping lift some 50% of the country into the middle class. But success hasn’t come without problems. Still, if there’s one thing Brazil is known for worldwide it’s a gregarious and energetic population. Now that Brazil will be hosting the World Cup next year and the Olympics in 2016 (massive public expenditures that many here deeply oppose), it’s time for a crash course in all things Brazilian. Brazilians like to express themselves and don’t have many restraints on the volume with which they do it. That goes not just for conversation, but car horns, TV pitchmen and sound trucks blasting advertisements through the neighborhood from 16 speakers. They’ve seen all the American movies and know all the Adele songs, but the premium has traditionally been on getting a good education in Portuguese, still difficult in a land where education is far down the list of priorities. For scale, it’s hard to match the immensity of the Amazon rainforest, the largest in the world, or the power of Iguazu Falls, which is the second widest and in volume in the world.

4. Brazil might be the biggest country most of the world doesn’t know a whole heckuva a lot about. Now that Brazil will be hosting the World Cup next year and the Olympics in 2016 (massive public expenditures that many here deeply oppose), it’s time for a crash course in all things Brazilian. As Portuguese speakers on a Spanish continent far from the English-speaking world, Brazilians have been a linguistic universe unto themselves. Not many Brazilians speak English, particularly outside Rio de Janeiro or Sao Paulo. They’ve seen all the American movies and know all the Adele songs, but the premium has traditionally been on getting a good education in Portuguese, still difficult in a land where education is far down the list of priorities. Maybe because they’re surrounded by so many varieties of exotic fruits the rest of us have never heard of – caju, camu-camu, pitanga – Brazilians are experts in the creation of especially tasty fruit drinks, or sucos. The Brazilian banana is the tastiest

in the world, far superior to the bland Central American version (say Brazilians), and it makes for super savory drinks. Stay in a safe area, don’t carry more money on you than you can afford to lose, keep your valuables in the hotel safe, use taxis vetted by your hotel, don’t take van taxis and make sure you know what part of town you’re in at night. In Rio de Janeiro, the city puts on free music fests, with top bands performing on stages across the city, while in Recife, at the city’s outdoor concert stage across the street from the beach, free concerts range from rock to forro to an event featuring every drummer with a drum kit in the city, more than 100 of them playing in unison.

5. ... .10

No total foram aplicados 205 questionários, compostos pelas instruções iniciais, o

link para o documento origninal e 10 sumários em ordem aleatória e não identificados

os sumarizadores, de acordo com o exemplo apresentado acima. Cada questionário foi avaliado por três avaliadores, conforme metodologia descrita na Seção 6.2.1.

Para garantir uma qualidade no processo de avaliação, apenas nove países cuja língua oficial é o inglês foram selecionados para que permitam avaliações. Desses, quatro países tiveram avaliadores. A figura 16 mostra a distribuição do número de avaliadores válidos (alta confiança) por País. O Estados Unidos ficou em primeiro com 63 avaliadores, seguido

da Índia com 31 avaliadores, Grã-Bretanha com 13 e Canadá com 10 avaliadores.

10 13 31 63 0 10 20 30 40 50 60 70 CAN GBR IND USA Avaliadores

Figura 16 – Distribuição geográfica dos avaliadores.

Durante o processo de avaliação, os avaliadores têm com objetivo indicar o(s) melhor(es) sumário(s) conforme critérios definidos no cabeçalho do questionário de avaliação. Quando a avaliação é concluída, a ferramenta figure-eight agrega os resultados dos três avaliadores com uma pontuação de confiança6. O índice de confiança descreve o nível de concordância

entre os avaliadores (ponderado pelas pontuações de confiança de cada avaliador) e indica a confiança na validade da resposta agregada para cada questionário. O resultado agregado é escolhido com base na resposta com a maior confiança.

A tabela 46 apresenta os resultados do desempenho geral dos sistemas avaliados de acordo com os resultados agregados da ferramenta figure eight. O sistema AutoS obteve os 29 sumários mais bem avaliados, seguido dos sistemas C4J e ALMUS com 25 sumários cada um. Não houve nenhum sumarizador que obtivesse um bom desempenho para todos os documentos sumarizados. Os resultados foram bem distribuídos entre os sumarizadores. Tabela 46 – Resultados da avaliação manual agregada pela ferramenta Figure Eight. Os sistemas com maior número de sumários que obtiveram o melhor desempenho são destacados em negrito.

Sistema #Sumários AutoS 29 C4J 25 ALMUS 25 MEAD 22 PLI + GE 19 SUMMA 18 HP-FS 18 TextRank 18 RL + PLI 16 Aylien 15

A tabela 47 apresenta os resultados do desempenho geral dos sistemas avaliados sem agregação da ferramenta figure eight. Para selecionar os sumários sem agregação, verificou- se se um sumário foi selecionado por no mínimo dois avaliadores. Assim como na avaliação agregada, o sistema AutoS atingiu o melhor resultado com 52 sumários, seguido dos sistemas C4J com 41 sumários e PLI + GE com 35 sumários.

Tabela 47 – Resultados da avaliação manual sem agregação. Os sistemas com maior número de sumários que obtiveram o melhor desempenho são destacados em negrito.

Sistema #Sumários AutoS 52 C4J 41 PLI + GE 35 Aylien 33 ALMUS 31 HP-FS 30 SUMMA 29 TextRank 28 MEAD 27 RL + PLI 27

6.2.4 Avaliação comparativa entre as medidas automáticas e avaliação manual