• Nenhum resultado encontrado

Data Mining, Integration and Analysis +


Academic year: 2022

Share "Data Mining, Integration and Analysis +"





Data Mining, Integration and Analysis

Karin Becker


+ Data Mining, Integration and Analysis

Knowledge Discovery

Web and Text Mining

Data Science

Recommendation Systems

Scalability and Performance


Ana Lucia Cetertich Bazzan

Joao Luiz Dihl Comba

Karin Becker

Leandro Krug Wives

Lucas Mello Schnorr

Mara Abel

Renata De Matos Galante

Viviane Pereira Moreira

Reserach Areas Faculty


+ Knowledge Discovery

What do we do?


+ Knowledge Discovery

• Data Collection

• Data Integration

• Data Preprocessing

• Data Mining

• Data Analysis



Karin Becker


+ Extract Knowledge from Social Media

Semantic enrichment framework for event-related tweet identification (Simone Romero)

No assumptions about event properties

Contextual knowledge from semantic web and external documents

Improved mainly recall

Simone Romero, Karin Becker. A framework for event classification in tweets based on hybrid semantic enrichment . Expert Systems with Applications 118: 522-538 (2019)


+ Extract Knowledge from Social Media

Identification of stance in tweets (Marcelo Dias)

No threads of argumentations

Unsupervised and weakly supervised* frameworks (runner- up)

Target and stance expression depends on the domain

Marcelo Dias, Karin Becker. An Heuristics-based, Weakly-Supervised Approach for Classification of Stance in Tweets . Proc. of Web Inteligence, 2016.


+ Extract Knowledge from Social Media

Identification of stance in tweets

Unsupervised framework

Excelent perfomance on straightfoward targets (Hillary, Clinton)

Marcelo Dias, Karin Becker. An Heuristics-based, Weakly-Supervised Approach for Classification of Stance in Tweets . Proc. of Web Inteligence, 2016.


+ Extracting Knowledge from Social Midia

analyze the emotions people express about terrorism events in Twitter using demographics (Jonathas Harb)

Automatic emotion classification (4 terrorism events)

Tested deep learning with different seeding strategies

Demographic analysis (Face++, Profile Location)

Jonathas Harb, Karin Becker. Emotion Analysis of Reaction to Terrorism on Twitter. Proc.

of Workshop on Big Social Data and Urban Computing, 2018.

Conv (5) Conv (4) Conv (3)

Global Pool Global Pool Global Pool

Concat Dropout



Word-level Input

Glove’s Embeddings



 Q2: Do different terrorism events raise the same

emotional reaction?


Gender? Age? Location?

Our hypothesis: it

depends on how people relate to the event


+ Extracting Knowledge from Social Midia

Compare engagement of twitter users in Pink October and Blue November campaigns (Roberto Walter)

5 different countries

Demographic analysis (Face++, Profile Location)

Tweet topic categorization

Roberto Walter, Karin Becker. Caracterização e Comparação das Campanhas do Outubro Rosa e Novembro Azul no Twitter. SBBD 2018: 133-144


+ Extracting Knowledge from Social Midia

Topic discovery and drift analysis


+ Extracting Knowledge from Social Interaction

Relating conversational topics and toxic behavior effects in a MOBA game (Joaquim Mesquita)

MOBA Games (LoL)

Effects of toxic behavior on other players

Behavioral patterns based on on-line chats

Joaquim A. M. Neto, Karin Becker: Relating conversational topics and toxic behavior effects in a MOBA game. Entertainment Computing 26: 10-29 (2018)


+ Extracting Knowledge from Social Interaction

Relating conversational topics and toxic behavior effects in a MOBA game (Joaquim Mesquita)

MOBA Games (LoL)

Effects of toxic behavior

Behavioral Patterns based on on-line chats

Joaquim A. M. Neto, Karin Becker: Relating conversational topics and toxic behavior effects in a MOBA game. Entertainment Computing 26: 10-29 (2018)


+ Extracing Knowledge from Medical Data

Machine translation for biomedical texts, paralel corpus (Felipe Soares)

Hierarchical classifier for non-invasive colorectal cancer screening

Plasma fluorescence data

Cancer, No findings, Further investigation

Felipe Soares, Karin Becker, Michel J. Anzanello:

A hierarchical classifier based on human blood plasma fluorescence for non- invasive colorectal cancer screening. Artificial Intelligence in Medicine 82: 1- 10 (2017)


+ Extracting Knowledge from Medical Data

Relating mental states using social media (Vanessa Borba)

Characterization of mental states (verbal cues, emotions and sentiments, behavioral and social patterns)

Analysis of temporal evolution of mental states (e.g.

Ansiety – depression – suicide)

Detecting Anomalies in Health Provision Records (Cristiano Sulzbach)

Lack of parameters of “normality”

Discovery of groups of data

Analysis of closeness


+ A final word on Software Engineering

Strong background on software engineering

Industry experience

Agile Methods

Sentiment analysis on software artifacts

Satisfaction of IT users (Sentiment analysis on IT Tickets, Blaz, 2016)

Analisis of assertiveness of user stories and development productivity and quality metrics (Guilherme Dias, 2018)

Using gamefication in SCRUM for self-imrpovement (Camilla Schmidt, on-going)


Renata Galante


Data Integration

Data Analysis


Raul Barth (master)

Passenger density and flow analysis and city zones and bus stops classification for public bus service




DMBSM – Data Mining Framework for Bus Service Management

• Input: GPS, bus stop and smart card data

• Extracting as passengers’ density and flow information

• Bus stops segmentation based on travel purposes

• Finding the real bus service demand

• Enabling decision-making.

• Based on Lambda Architecture, using Big

Data for parallel processing


Framework – Architecture and Results


Drunk Text Identification

Marcos Grzeça, Karin Becker, Renata Galante (UFRGS)


Drunk Text Identification

Detecção de textos escritos por pessoas alcoolizadas Marcos Grzeça, Karin Becker, Renata Galante (UFRGS)

Romero & Becker (2019)


Drunk Text Identification


Drunk Text Identification


Documentos relacionados

Figura 8.38 – Dados referentes necessidade de água para combate a incêndios e produção em m 3 por dia para um mês médio do ano de 2008. Figura 8.39 – Dados referentes

Underpinning this change, the paper explores how assistive technologies have evolved into learning technologies by taking into consideration the technological

On Test 10, the user simply sends packets to the access control server without having them modified by the packet modifier.. The packet authentication barrier inhibits packets

relacionados localizados ou Não Localizados, anote os Bens não relacionados e localizados que estão fisicamente na unidade, mas não constam no

Covered are basic principles of Pavlovian and operant conditioning, weakening and extinction of learned responses, influences of intermittent reinforcement, conditioned re-

similar [como definidos na regra 4.2.2 (9)] ou um tanque, espaço perdido ou espaço de máquinas auxiliares com pequeno ou nulo risco de incêndio [como definidos na regra 4.2.2 (10)],

x Investigar como os profissionais de Educação Física representam as situações que envolvem aspectos psicológicos de atletas, praticantes de exercícios físicos e de escolares em

Os resultados apurados, com recurso ao T-Teste (cf. Tabela 57), evidenciam que os estudantes, que referiram que os jornais são fonte de informação, foram os que obtiveram