Access and Representation of Unstructured Information Based on Graphs
MAP-i thesis proposal, 2021/2022 January 31, 2022
Scientific Area
Language technology; Information Extraction
Brief Description and Research Questions
The efficacy of accessing unstructured information - texts, images - remains a scientific research topic that is relevant despite the big steps taken in the last decade. Its relevance comes from the need to organize big amounts of data that is dispersed throughout several documents, many of them in natural language, in order to generate knowledge in faster ways. If we consider accessing information within documents that are written in Portuguese there is even less variety of tools at our disposal.
The work to be developed in this Doctoral project is to create solutions/tools that are able to systematize information in documents in natural language, par- ticularly in Portuguese. These documents are not limited to text, they can include images and/or graphs, and to provide an appropriate and easily acces- sible interfaces to the knowledge base generate.
Specifically, the first objective is the development of techniques for curating knowledge bases through identifying the more protruding and relevant informa- tion to be shown/communicated, and which information is just residual or not useful to the end-user. It is expected to achieve this goal through research, de- velopment and testing of characterization algorithms and graph search methods to detect higher density zones. There will be equally decided zone segmenta- tion thresholds, reliability of information and there will also be research done towards identifying incoherent information.
The second objective is the creation of methods to develop easy access in- terfaces (ex. chatbot, interactive graphs) that will be the bridge between the system that was created and the end-user. The goal is to interpret questions or queries poised by an user and query the knowledge base that was created by analyzing relevant documents.
1
Supervisor’s name and affiliation
Mário Jorge Fereira Rodrigues, ESTGA - Universidade de Aveiro, Instituto de Engenharia Eletrónica e Informática de Aveiro (IEETA)
Co-supervisors’ names and affiliations
António Joaquim da Silva Teixeira, DETI - Universidade de Aveiro, Instituto de Engenharia Eletrónica e Informática de Aveiro (IEETA)
Marlene Paula Castro Amorim, DEGEIT - Universidade de Aveiro, Unidade de Investigação em Governança, Competitividade e Políticas Publicas (GOV- COPP)
Research unit
Instituto de Engenharia Eletrónica e Informática de Aveiro (IEETA)
External advisor
Will be invited a senior researcher of the Language Technologies Institute at the Carnegie Mellon University
2