• Nenhum resultado encontrado

Keyword Search over COVID-19 Data Yenier T. Izquierdo

N/A
N/A
Protected

Academic year: 2022

Share "Keyword Search over COVID-19 Data Yenier T. Izquierdo"

Copied!
27
0
0

Texto

(1)

Keyword Search over COVID-19 Data

Yenier T. Izquierdo

1,2

, Grettel M. García

2

, Melissa Lemos

1,2

, Alexandre Novello

1

, Bruno Novelli

2

, Cleber Damasceno

2

,

Luiz André P.P. Leme

3

, Marco A. Casanova

1,2

(2)

2

Agenda

• Motivation

• DANKE

CovidKeys

• Conclusions

(3)

Motivation

• Examples of COVID-19 data

• FAPESP / USP / Instituto Fleury, H. Sírio-Libanês and H. Israelita Albert Einstein

• The COVID-19 Data Sharing / BR initiative - open data with demographic, clinical and laboratory data about patients tested for COVID-19 in the State of São Paulo

• Ministry of Health/Brazil

• NSG (Notificações de Síndrome Gripal) - notifications of suspected COVID-19 cases

• SRAG 2020 - Acute Respiratory Syndrome (SARS) and COVID-19 data

• The Center for Systems Science and Engineering (CSSE) at Johns Hopkins University

• COVID-19 Data Repository

(4)

Motivation

• COVID-19 data

• Data is typically available

• as downloadable CSV files

• as datasets that can be queried through a user interface (e.g., Elastic Search)

• as datasets available through a "sandbox" (e.g., Google Cloud services)

• Files can be quite large, with hundreds of columns and millions of lines

• Data can be at different levels of granularity

4

(5)

Motivation

• COVID-19 data

• Cumbersome to process with the usual desktop spreadsheet tools

• A perhaps more robust approach:

• Download and store the data in a standard DBMS

• Define a query that retrieves the data the user is interested in

• Export the query results to a data analysis tool

(6)

Motivation

• COVID-19 data

• Cumbersome to process with the usual desktop spreadsheet tools

• A perhaps more robust approach:

• Download and store the data in a standard DBMS

• Define a query that retrieves the data the user is interested in

• Export the query results to a data analysis tool

6

Keyword search

(7)

Motivation

• Keyword search over databases

• the user specifies a few terms, called keywords

• A keyword query is simply a list of keywords

• the system must retrieve the data that best match the list of keywords

• Keyword search has been expanded to relational databases (and RDF datasets)

• Keyword queries are typically compiled into SQL (or SPARQL)

(8)

8

Agenda

• Motivation

• DANKE

CovidKeys

• Conclusions

(9)

DANKE

• What is DANKE?

• a platform for data and knowledge retrieval

• Main Components

• Data Ingestion and Indexing Services

• Keyword Query Processing Service

• Compiles a keyword query into an SQL (or SPARQL) query

• Uses the database schema to guide the process

(10)

10

name A1

P1

M1 act

title

name descr

P2

“Richard Burton”

act A2

win descr

rel

“Liz Taylor”

“Oscar Best Actress”

"Oscar Best Cinematography"

"Who's Afraid of Virginia Woolf"

win

A3

descr

name win

“John Hurt”

P3 Query: Taylor Burton Hurt

award

?v1 name “Taylor”

?v2 name “Burton”

?v3 name “Hurt”

?v1 rel ?v2

?v2 win ?v4

?v3 win ?v4

"Valladolid Best Actor

(1984)"

(11)

...

Taylor Burton Hurt

(12)

12

Agenda

• Motivation

• DANKE

CovidKeys

• Conclusions

(13)

Experiments with CovidKeys

CovidKeyS

• a Web Application that uses the DANKE services to query COVID-19 data

(14)

Experiments with CovidKeys

• Scenario 1 - The NSG (Notificações de Síndrome Gripal) dataset

• stores notifications of suspected COVID-19 cases

• data allow relating demographic, clinical, and laboratory data

• data is organized in DANKE as a relational database with three tables:

• Paciente – patient data

• Teste – COVID-19 test data

• Desfecho – how the case evolved

14

(15)

Experiments with CovidKeys

• Scenario 1 - The NSG (Notificações de Síndrome Gripal) dataset

• NL Query: “Quais foram os casos em Saquarema de gestantes, com sintoma de febre, e idade menor do que 40 anos?” ( “Which were the cases in

Saquarema involving pregnant women, with fever symptoms, and age less than 40 years?”)

Keyword query: Saquarema gestante febre idade < 40

• NL Query: “Quais enfermeiros, e em que municípios, fizeram teste rápido e

tiveram resultado positivo?” (“Which nurses in which counties had a rapid

test with positive result?”)

(16)

Experiments with CovidKeys

• Scenario 1 - The NSG (Notificações de Síndrome Gripal) dataset

• Keyword query: enfermeiro município “teste rápido” resultado positivo

enfermeiro

partially matches values of attribute Paciente.profissao

generates the restriction “Paciente.profissao = '*enfermeiro*' ”

(the keyword “profissão” is not required, since attribute profissao is inferred from the match)

• “teste rápido” (a single term due to the use of double quotes)

partially matches values of attribute Teste.tipo

generates the restriction “Teste.tipo='*teste rápido*'

município, resultado and positivo

(processed likewise)

• requires joining two tables, Paciente and Teste

discovered from the matches and the schema

16

(17)

Experiments with CovidKeys

• Scenario 1 - The NSG (Notificações de Síndrome Gripal) dataset

• Keyword query: enfermeiro município “teste rápido” resultado positivo

• Compiled SQL query:

select Paciente.id_paciente, Paciente.profissao,

Paciente.municipio, Teste.tipo, Teste.Resultado from Paciente, Teste

where Paciente.profissao = '*enfermeiro*' and Teste.tipo = = '*teste rapido*'

and Teste.resultado = '*positivo'

(18)
(19)

Experiments with CovidKeys

• Scenario 2 - Global Data (from the Google COVID-19 Public Datasets)

• The John Hopkins University (JHU)

• global confirmed COVID-19 cases, deaths, and recoveries

• Oxford Policy Tracker

• COVID-19 policies at the country level

• Google Mobility Report

• movement trends over time by geography across different categories of places

• (some) World Bank global data

(20)

Experiments with CovidKeys

• Scenario 2 - Global Data (from the Google COVID-19 Public Datasets)

• NL Query: “List the total number of deaths and the recreation mobility index on the same date for Belarus. ”

Keyword query: Belarus deaths recreation date

20

(21)

Experiments with CovidKeys

• Scenario 2 - Global Data (from the Google COVID-19 Public Datasets)

• Keyword query: Belarus deaths recreation date

Belarus

matches a value of attribute “Country Name” of table Country (from the World Bank global data about population)

deaths and date

deaths partially matches the name of attribute “Total Deaths” and date matches the name of attribute Date of table Contamination (from the global synthesis of COVID-19 cases by country)

recreation

(22)
(23)

Agenda

• Motivation

• DANKE

• Architecture

• Keyword Query Processing

• Experiments with CovidKeys

• Conclusions

(24)

Conclusions

• Summary

• DANKE - a platform for data and knowledge retrieval

• concentrated on the keyword search component

CovidKeyS

• uses DANKE to implement keyword queries over three COVID-19 data scenarios

two scenarios described in the paper

The NSG (Notificações de Síndrome Gripal) dataset

Global Data (from the Google COVID-19 Public Datasets)

third scenario:

The COVID-19 Data Sharing / BR initiative

24

(25)

Conclusions

• Future work

• extend DANKE to operate over federated databases

• extend keyword query processing to...

• support aggregation operations in natural language-like syntax

• support specific join clauses (e.g., same date)

• ... beyond keyword query

• (see Tutorial 1 – "Palavras, apenas*: Métodos e Técnicas para Interfaces de Linguagem Natural em Bancos de Dados")

(26)

26

Thank You

(27)

References for this presentation...

casanova puc-rio

Referências

Documentos relacionados

In addition, the calculation of COVID-19 prevalence in Brazil on May 11th might not be accurate, because the data reported by the Brazilian Ministry of Health did not

The problem that had originally arisen with regard to genetic counseling within the confines of the family has since expanded to the point of acquiring relevance at

Pressione as setas do teclado para mover o ator para cima e para

O presente trabalho teve como objetivos, avaliar através de análises microbiológicas (pesquisa de Salmonella e enumerações de bactérias mesófilas aeróbias, fungos,

Relacionado a isso, pode-se dizer que, de certa forma, as ações colaborativas são uma espécie de protótipo, afinal de acordo com o texto “Cómohacerun

“O processo pelo qual as partes se movem de suas posições iniciais divergentes até um ponto no qual o acordo pode

A REN tem como missão garantir o fornecimento ininterrupto de electricidade e gás natural, ao menor custo, satisfazendo critérios de qualidade e de segurança,

teórico a empregá-la em seu trabalho sobre o verso russo em estudo com- parativo com o verso tcheco (1923). Mas, afinal, se se está falando tanto em estrutura, o que eles entendiam