• Nenhum resultado encontrado

Eating and drinking recognition for triggering smart reminders

N/A
N/A
Protected

Academic year: 2021

Share "Eating and drinking recognition for triggering smart reminders"

Copied!
103
0
0

Texto

(1)

F

ACULDADE DE

E

NGENHARIA DA

U

NIVERSIDADE DO

P

ORTO

Eating and drinking recognition for

triggering smart reminders

Diana Sousa Gomes

Mestrado Integrado em Bioengenharia

Supervisor at FEUP: João Mendes Moreira, PhD Supervisor at Fraunhofer: Inês Sousa, PhD

(2)
(3)

Resumo

O envelhecimento progressivo da sociedade nos países desenvolvidos de hoje tem chamado a atenção para a sua faixa sénior. A sua necessidade particular de atenção preocupa os prestadores de cuidados e as instituições de saúde estão sobrecarregadas, tentando prestar assistência e ter um impacto positivo na sua saúde. O isolamento dos idosos é também alarmante, potenciando comportamentos de risco e pouco saudáveis, como o negligenciar de refeições, a ingestão de poucos líquidos e a gestão imprópria de tratamentos farmacológicos.

A investigação tecnológica encontra-se, portanto, em busca de soluções eficazes e de baixo custo, e a vulgarização dos smartphones e dos dispositivos wearable tem contribuído largamente para este processo. Esta dissertação teve como objetivo combinar tudo isto no desenvolvimento de uma ferramenta que reconhece as atividades de comer e beber. Com base na informação de-volvida por tal ferramenta, o lançamento de lembretes inteligentes relacionados com a toma de medicamentos, a hidratação e outros hábitos saudáveis, é possível em tempo e condições reais.

Foi também promovido um olhar mais atento para o problema em mãos e as características da população-alvo, de uma perspetiva antropológica, biológica e fisiológica. O estabelecimento da população-alvo reforçou a necessidade de sensores discretos e levantou desafios mais particulares no que toca à persuasão dos utilizadores. Foram encontrados poucos trabalhos que distinguissem as atividades de comer e beber em simultâneo. Os investigadores têm investido sobretudo na extração de features simples a partir dos dados adquiridos e no desenvolvimento de modelos de classificação robustos que melhor se ajustam aos dados.

Esta dissertação culminou na proposta de um algoritmo de reconhecimento das atividades de comer e beber, de forma computacionalmente eficiente e independente do utilizador, em condições e tempo reais. A obstrutividade do sistema foi minimizada, sendo apenas requeridos dados inerci-ais do pulso correspondente à mão dominante do utilizador. Uma taxa de reconhecimento geral de 88% foi obtida com dados adquiridos continuamente ao longo de 16h de um utilizador indepen-dente.

No processo de optimização do algoritmo, foram discutidos desafios atuais associados aos trabalhos de reconhecimento de atividades humanas, enquanto foram propostas algumas soluções para os enfrentar, baseadas no enquadramento de tais desafios no contexto do objetivo final do trabalho. Entre as principais contribuições científicas deste trabalho é também possível destacar a proposta de uma nova abordagem de segmentação de data streams adequada para a identificação da atividade de beber, baseado no trabalho de [Kozina et al., 2011], e a verificação da maior eficácia da utilização simultânea de dois modelos de classificação binária (um por cada atividade de interesse) em vez de um único modelo de classificação multi-classe.

Assim, este trabalho multidisciplinar propôs-se a melhorar a qualidade de vida dos seniores e ultrapassar alguns dos desafios que têm estado a dificultar o reconhecimento de atividades hu-manas pelo foco em perceber a população-alvo e melhorar os métodos de classificação atuais.

(4)
(5)

Abstract

Nowadays, the increasingly ageing society of developed countries has raised attention to seniors. Their particular need for care worries care-providers and health institutions are overwhelmed, try-ing to assist them and improve their health. The isolation of seniors is also alarmtry-ing, potentiattry-ing risky and unhealthy behaviors, like neglecting meals, ingesting a very low amount of fluids and improperly ensure pharmacological treatment.

Technological research is, therefore, looking for effective and low-cost solutions, and the vul-garization of smartphones and wearables has been of great assistance. This dissertation aimed to combine all of this in order to deliver a tool that recognizes meaningful eating and drinking mo-ments, based on which the issue of smart reminders concerning the intake of medicines, hydration and other healthy habits, is possible in real-time and real-world conditions.

A closer look to the problem in hands and the characteristics of the target population, from an anthropological, biological and physiological point of view, was also promoted. The establishment of the target population reinforced the need for unobtrusive sensors and raised particular challenges in user persuasion. Very few works were found to distinguish the eating and drinking activity simultaneously. Researchers have invested in extracting mostly simple features from the acquired data and to develop robust classification models that better fit the data.

This dissertation culminated in the proposal of a computationally efficient and user-indepen-dent eating and drinking recognition algorithm with application in free-living conditions with online response. The obstructiveness of the system was minimized, since only inertial data from the dominant wrist of the user is required. An overall recognition accuracy of 88% was achieved for a 16h continuous acquisition of an independent user.

In the process of algorithm optimization, current challenges associated with Human Activity Recognition works were also addressed, and proposed some solutions to overcome them, by fram-ing them in the context of the final goal of this work. Among the main scientific contributions of this work it also possible to highlight the proposal of a novel data stream segmentation approach suitable for drinking activity identification, based on the work of [Kozina et al., 2011], and the verification of the higher efficacy of simultaneously employing two binary classification models (one for each activity of interest) rather than a single multi-class classification model.

Thus, this multidisciplinary work cared to improve the quality of life of seniors and overcome some of the challenges that have been in the way of Human Activity Recognition research by focusing on understanding the target population and improving current classification methods.

(6)
(7)

Agradecimentos

Esta dissertação contou com muitas e importantes contribuições e apoios, sem os quais não teria sido possível. Por todos eles, estou eternamente grata.

Ao Prof. Dr. João MendMoreira e à Dra. Inês Sousa, pela orientação, as opiniões, o es-clarecimento de dúvidas, a disponibilidade e o incentivo, pois sem eles como suporte este trabalho não seria possível.

A todos os docentes que contribuíram para a minha formação académica e me deram as fer-ramentas necessárias para saber colocar questões e procurar respostas para as mesmas incessante-mente. À FEUP por ter sido a casa que me acolheu durante os passados 5 anos e com quem cresci pessoal e academicamente. À Fraunhofer Portugal por ter proposto esta dissertação, acreditado nas minhas capacidades e me presentear com uma equipa disponível e sempre aberta à inovação científica.

Aos colegas que partilharam comigo os sucessos e as dificuldades deste curso, nomeadamente os amigos que, à distância de um clique, me sabem manter com a cabeça no sítio e focada nos meus objetivos, e aos Matrecos da Fraunhofer por deixarem a vossa marca tão positiva na realização desta dissertação.

Às meninas Aqualusus, por me terem aturado em tantos momentos, pelas opiniões sinceras e pela a amizade que ficou para a vida.

Àquele grupo de amigos (que dispensa descrição), por me apoiar sempre, mesmo quando não apetece, pelas pequenas ações cheias de afeto, pelas conversas sinceras que mudam a nossa vida, e pelos sacrifícios que fazem para me fazer feliz.

Por último, à minha família, com um agradecimento muito especial aos meus pais e à minha irmã por me terem permitido chegar até aqui, pela paciência, pela amizade e por me terem ajudado a superar todos os obstáculos desta caminhada. Este trabalho é deles.

Estas linhas de agradecimento marcam o fim de uma jornada que me transformou completa e profundamente. E estendo este obrigada a todos os que fizeram parte dela, mesmo que não estejam nomeados, por a terem marcado.

Diana Gomes

(8)
(9)

“Valeu a pena? Tudo vale a pena Se a alma não é pequena.”

Fernando Pessoa

(10)
(11)

Contents

1 Introduction 1

1.1 Context . . . 1

1.2 Problem description and motivation . . . 2

1.3 Objectives and requirements . . . 2

1.4 Main scientific contributions . . . 3

1.5 Document structure . . . 3

2 The Human Activity Recognition world 5 2.1 State-of-the-art . . . 5

2.1.1 Learning methods: supervision and response time . . . 6

2.1.2 Features and classifiers . . . 9

2.2 Smartphone processed HAR . . . 10

2.2.1 Challenges . . . 11

2.3 The eating and drinking activities . . . 13

2.4 Summary and conclusions . . . 15

3 Eating and drinking recognition 17 3.1 Eating and drinking for older adults . . . 17

3.1.1 Sociocultural and geographical differences . . . 18

3.1.2 Senior eating disorders and dehydration . . . 18

3.1.3 Food, water and medication . . . 19

3.2 State-of-the-art . . . 20

3.2.1 Algorithms and approaches . . . 20

3.2.2 Sensors and acquisition . . . 23

3.2.3 Features and classifiers . . . 24

3.3 Datasets . . . 25

3.4 Challenges . . . 25

3.5 Summary and conclusions . . . 26

4 Experimental setting and practical concerns 29 4.1 Fraunhofer AICOS previous works . . . 29

4.2 Sensors . . . 30

4.2.1 Pandlets . . . 30

4.2.2 Smartwatches . . . 31

4.2.3 Pandletsvs. Smartwatches . . . 32

4.3 Acquisition conditions . . . 33

4.4 Experiment with ADL public dataset . . . 34

4.5 Summary and conclusions . . . 35

(12)

5 Acquisition and datasets 37

5.1 Acquisition . . . 37

5.1.1 Material . . . 38

5.2 Datasets . . . 38

5.2.1 Dataset 1: perfectly segmented activities . . . 39

5.2.2 Dataset 2: continuous sequence of activities . . . 39

5.2.3 Dataset 3: eating and drinking with seniors . . . 41

5.2.4 Discussion of the datasets . . . 43

5.3 Summary and conclusions . . . 44

6 Methodology 45 6.1 Methods . . . 45

6.1.1 Raw signal and preprocessing methods . . . 45

6.1.2 Segmentation and feature extraction . . . 46

6.1.3 Classifiers . . . 51

6.1.4 Performance metrics . . . 53

6.2 Initial decisions . . . 54

6.2.1 Sensors . . . 55

6.2.2 Feature selection and importance . . . 56

6.3 Summary and conclusions . . . 57

7 Model conception, selection and validation 59 7.1 Binary vs. multi-class classification . . . 59

7.1.1 Experiments description . . . 60

7.1.2 Results and discussion . . . 60

7.1.3 Conclusions . . . 62

7.2 Eating recognition model . . . 62

7.2.1 Classifier selection . . . 63

7.2.2 Meaningful eating moments recognition . . . 63

7.3 Drinking recognition model . . . 64

7.3.1 Drinking classification improvement . . . 66

7.3.2 Peak detection for recognition improvement . . . 69

7.4 Model assembly and final algorithm . . . 70

7.5 Validation . . . 71

7.5.1 Results and discussion . . . 74

7.6 Summary and conclusions . . . 75

8 Conclusion 77 8.1 Future work . . . 78

(13)

List of Figures

2.1 Steps for activity recognition with a smartphone. . . 11

4.1 Pandlets CORE components and surface area. IMU - Inertial Measurement Unit; EMU - Environmental Measurement Unit. . . 31

4.2 Living labat Fraunhofer AICOS [Fraunhofer AICOS, 2014]. . . 33

4.3 Confusion matrix of the experiment. . . 36

5.1 Representation of the Pandlets disposition in a right wrist, with the configuration of the accelerometer and gyroscope. . . 38

6.1 Raw and preprocessed accelerometer signals from a drinking and eating activity. . 46

6.2 Feature importance in the recognition of eating and drinking. . . 57

7.1 Comparison between overall accuracy of binary vs. multi-class classification mod-els generated in the sequence of different segmentation methods. . . 61

7.2 Meal recognition algorithm. . . 65

7.3 Examples of rotational magnitude peaks detected in perfectly segmented drinking activities. . . 68

7.4 Segmentation approaches employed in the final recognition algorithm. . . 71

7.5 Final recognition algorithm. . . 72

(14)
(15)

List of Tables

2.1 Overview of HAR works, that rely on inertial data from wearable devices and/or

smartphones, with respect to learning supervision and response time. . . 7

2.2 Common feature extraction methods (from accelerometer sensor data) and state-of-the-art classification algorithms surveyed in [Lara and Labrador, 2013] for HAR. 9 2.3 Classifiers employed in the surveyed HAR works. . . 10

2.4 Supervised HAR works that recognize the eating activity among other ADL. . . . 14

3.1 Summary of eating recognition works. . . 21

4.1 AAL projects from Fraunhofer AICOS that support healthy dietary habits of the elder [Fraunhofer Portugal, 2016]. . . 30

4.2 List of smartwatches available at Fraunhofer and its specifications. . . 32

4.3 Description of the dataset used in the experiment. . . 35

5.1 Gender and age distribution of the volunteers of datasets 1 and 2. . . 39

5.2 Description of Dataset 1. . . 40

5.3 Description of Dataset 2. . . 42

5.4 Description of Dataset 3. . . 43

5.5 Summary of class distribution. . . 43

6.1 Implemented features. . . 48

6.2 Accuracy of the classification using accelerometer and/or gyroscope data from the dominant wrist vs. both wrists. . . 55

6.3 Highly correlated features. . . 56

7.1 Comparison of eating recognition test results. . . 61

7.2 Average AUC for each classification model. . . 63

7.3 Comparison between class balancing approaches in drinking recognition. . . 67

7.4 Comparison of the proposed segmentation approaches. . . 69

7.5 Demonstration of the peak detection algorithm for discarding false positive clas-sifications. . . 70

7.6 Description of the activities that constituted the 16h validation set. . . 73

7.7 Performance measurements for the validation set. . . 74

(16)
(17)

Abbreviations and Symbols

AAL Ambient Assisting Living

ADL Activity of Daily Living ALR Additive Logistic Regression

AR Autoregressive Model

CNN Convolutional Neural Networks

DBSCAN Density-Based Spatial Clustering of Applications with Noise

DCT Discrete Cosine Transform

DTW Dynamic Time Warping

EMG Electromyography

EMU Environmental Measurement Unit

ENN Edited Nearest Neighbor

FBF Fuzzy Basis Function

FDA Food and Drugs Administration

FT Fourier Transform

HAR Human Activity Recognition

HMM Hidden Markov Models

IMU Inertial Measurement Unit

kNN k-Nearest Neighbors

LDA Linear Discriminant Analysis

MAD Mean Absolute Deviation

MLP Multi-Layer Perceptron

MLR Multinomial Logistic Regression

OECD Organization for Economic Co-operation and Development

PCA Principal Component Analysis

QDA Quadratic Discriminant Analysis

RMS Root Mean Square

ROC Receive Operating Characteristic

SMOTE Synthetic Minority Over-Sampling Technique

SVM Support Vector Machines

UK United Kingdom

(18)
(19)

Chapter 1

Introduction

1.1

Context

Technology is now, more than ever, trying to offer solutions for human daily activities monitoring, as a way to prevent unhealthy and dangerous behaviors, especially in the most fragile groups of society. Older adults constitute one of these groups and have been in the receiving end of many studies. This increasingly aging society requires highly adapted technological solutions, worthy of the trust of both the elder and his care providers.

The aging of society brings many problems. The growth of the average life expectancy is due to the evolution of health systems, the pharmacological advances and the availability of bio-compatible devices. However, costs with health remain on the table, with ethical and political considerations always on the loop. Effective and low-cost health solutions are, therefore, in high demand. The elder scenario is also worrisome due to isolation-related issues. More often than not, seniors do not want to abandon their homes or lose their independence and hiring assistance from a professional can be very expensive. For that reason, the elderly frequently neglect essential daily activities, like eating properly, hydrating or taking care of their personal hygiene. This unhealthy habits may be associated with unfortunate consequences (e.g. illnesses, conditions, diseases).

The vulgarization of smartphones has opened new windows and cleared room for improve-ments even in the health sector. Society has evolved to a time when people do not leave home without their mobile phone, a well-equipped device with several embedded sensors, wireless and Bluetooth connection and many other appealing services. Wearable devices, like smartwatches, are also, more than ever, recognized by their high potentiality and utility to monitor activities (e.g. physical exercise).

Continuous research in this field is motivated by the goal of bringing the best of technology to the most worrisome situations. It is, however, important to focus our attention in each problem to allow the development of effective solutions.

(20)

1.2

Problem description and motivation

Dietary habits and hydration of the elder have major impact in their overall health. Notwithstand-ing, it is very common for them to become careless with respect to eating and ingesting fluids when autonomously living. Moreover, older adults are frequently highly medicated. Misusing prescribed medication is another serious problem associated with incapacitated independent se-niors, exacerbating conditions and diseases.

The relation food-water-medication is, therefore, very worrisome in the elderly. There are, in fact, several medicines associated to the current status of the digestive system (must be taken with empty stomach, during or after a meal). By forgetting or deliberately skipping a meal, one will probably forget medication or improperly ensure the treatment. To be able to issue a smart reminder when a meal takes place or when none does for a long time is very useful, reminding the subject to take the medication or that it is time to have a meal in each case, respectively. Addi-tionally, keeping track of whether seniors are ingesting fluids regularly shall be useful. This will enable reminding the users to drink water frequently, which will positively impact their homeosta-sis. This relies on the ability to correctly identify eating and drinking moments in real-time and real-world conditions.

In the context of human daily activities recognition, eating and drinking involve very unique challenges, from the use of different utensils, to food intake rhythm variations, even in the course of the same meal. Eating habits and patterns also vary all around the world. It is also impor-tant to keep in mind that establishing seniors as the target population of a technological solution means that simplicity and unobtrusiveness of the system are important prerequisites. All of these challenges must be discussed and addressed in order to completely understand all the dimensions of eating and drinking activities, its impact in the elders’ lives and how these may influence a technological response.

In summary, there is a clear need for a reliable application, using only off-the-shelf commer-cially available wearable sensors and a smartphone, to continuously detect eating and drinking activities and issue smart reminders to elderly people. This dissertation responds to that concrete need and cares to improve quality of life and society itself, while developing a study academically challenging and very important to the human daily activity recognition research community.

1.3

Objectives and requirements

The present dissertation had the final objective of developing a tool for the simultaneous recogni-tion and distincrecogni-tion of eating and drinking activities in free-living condirecogni-tions. This tool shall:

• Be computationally efficient, allowing proper smartphone implementation; • Be user-independent;

(21)

1.4 Main scientific contributions 3

• Suit the target population (seniors) expectations; • Minimize obstructiveness;

• Return meaningful predictions, from which adequate conclusions on whether to issue a reminder (and which reminder) can be withdrawn.

In this process, it is also important to propose and test hypothesis that shall advance the Human Activity Recognition (HAR) scientific field.

1.4

Main scientific contributions

Among the scientific contributions of this work, it is possible to highlight the following:

• Proposal of a novel data stream segmentation approach suitable for drinking activity identi-fication, based on the work of [Kozina et al., 2011];

• Verification of the higher efficacy of simultaneously employing two binary classification models (one for each activity of interest) rather than a single multi-class classification model;

• Conception of a new, computationally efficient and user-independent algorithm for simulta-neous eating and drinking recognition in free-living conditions.

Moreover, many common HAR difficulties were discussed and hypothesis to overcome them were proposed, which shall contribute to the advance of the field.

1.5

Document structure

This dissertation is constituted by a total of 8 chapters. Chapters 2 and 3 survey state-of-the-art approaches in the fields of HAR and eating and drinking recognition, respectively. Then, in Chapter 4, a description of the experimental setting within which this dissertation was proposed is promoted, along with a discussion of experimental concerns.

After that, a description of the process of data collection is brought forward by Chapter 5, followed by a description of the implemented methods, in Chapter 6. The latter also cares to describe the first experiments that took place towards the first decisions that propelled the advances towards the final solution.

Then, Chapter 7details all steps and experiments that led to the conception of the final recog-nition model. It also illustrates the final algorithm and presents the test results of the proposed method, comparing them to those found in state-of-the-art approaches.

This document is finalized with a conclusions chapter, where a discussion of how success-ful the proposed method, concerning the aforementioned objectives, was, along with the main scientific contributions of the dissertation and future work steps.

(22)
(23)

Chapter 2

The Human Activity Recognition world

Activities of daily living (ADLs) consist of a set of everyday tasks that allow a subject to live independently [Pendleton and Schultz-Krohn, 2013]. In recent years, due to an increasing interest in recognition of human activities, extensive work on Human Activity Recognition (HAR) has been developed, aiming to recognize common human activities in real life settings. Some of these works focus in distinguishing different ADLs, like walking, eating, drinking, laying, brushing teeth and combing hair. Therefore, it is interesting to understand what has been done so far in the field of HAR and what is the eating/drinking recognition role in such a wide area.

In this chapter, an overview of HAR methods is brought forward. Subsequently, it is given particular attention to smartphone processed activity recognition and its inherent challenges. The eating activity is also framed in the HAR world by evidencing works that recognize eating among other daily activities. Finally, summary and conclusions of the chapter are presented.

2.1

State-of-the-art

The need for HAR algorithms is vast and several approaches have been proposed with respect to different types of sensor data and machine learning models [Cook et al., 2013]. So far, it is possible to distinguish two big groups of sensors: external, when they are placed in surrounding objects with which the subject frequently interacts, and wearable, if they are actually placed on the subject.

External sensing has been employed in intelligent homes, recognizing people’s activities in the area of the house. However, its installation and maintenance can be very costly. Moreover, it is not possible to recognize activities out of the reach of the sensors. Cameras are another example of external sensing with high potential, but privacy concerns limit their utilization. It is also very difficult to attach cameras that allow an entire visualization of the body of a subject, which conditions its use to identify everyday activities [Lara and Labrador, 2013].

These are some of the reasons that have been motivating the use of wearable sensors, along with the dissemination of portable devices constantly carried by their users and the improvement of acquisition equipment.

(24)

Even when constrained to wearable inertial sensors based approaches, a considerable number of works have been developed. In order to properly compare them, it is important to perform a good categorization. A very interesting way of doing so is segmenting approaches by learning supervision and response time.

2.1.1 Learning methods: supervision and response time

HAR systems are frequently based on supervised learning, i.e. trained with a static and labeled dataset, which is previously acquired and processed and then used for classification. Another possibility consists in semi-supervised learning, which uses both labeled and unlabeled data for training purposes [Cardoso and Moreira, 2016]. Unsupervised learning methods do not use la-beled data and, therefore, are not suitable for HAR purposes, since returning a label is required. However, [Kwon et al., 2014] proposes an unsupervised method to automatically decide on the number of activities that maximize accuracy for activity recognition, making generation of train-ing datasets by hand obsolete. With a similar objective, [Ling and Wang, 2015] conceived an unsupervised segmentation of activities approach with applications in healthcare. These works prove that there is room for unsupervised techniques in activity recognition other then with re-spect to learning supervision.

It is also possible to frame this approaches with respect to their response time. Online ap-proaches provide immediate information about the performed activity. On its hand, offline meth-ods only provide information later. High computational expense and lack of need to know instantly the current activity justify its use. The same nomenclature is also applied to describe the training process, concerning whether training takes place in real-time or not, respectively.

It is important, in this context, to overview a few HAR works that rely on wearable inertial sensors data, namely accelerometers, and aim to recognize ADLs, framing them with respect to response time and supervision of the learning method. Table 2.1promotes a comparison of some important HAR works, grouping them by the aforementioned categorization. Then, a discussion of the methods behind works with high impact is presented in order to deepen the understanding of what has been done so far.

2.1.1.1 Offline approaches

There are situations when data can be analyzed in a weekly or daily basis. It may be the case of dietary habits and even physical activity analysis, for example. Several experiments have been conducted with approaches of this category and some relevant works have emerged.

Among offline supervised learning, the work of Bao and Intille [Bao and Intille, 2004] con-tributed significantly for the field of activity recognition, since most previous works focused exclu-sively in ambulatory activities. It recognizes 20 daily activities, e.g. eating or drinking, watching TV, reading, running. Even though they started by placing five bi-axial accelerometers in the user, they came to the conclusion that with just two of them (thigh and wrist) performance just slightly

(25)

2.1

State-of-the-art

7

Table 2.1: Overview of HAR works, that rely on inertial data from wearable devices and/or smartphones, with respect to learning supervision and response time. It is presented a selection of high impact and/or recent works that recognize complex ADL.

Sensors Activities recognized Learning Response Work Performance assessment No. devices Type No. Eating Drinking

[Bao and Intille, 2004] Accuracy: 84% 5 Inertial 20 3 3

[McGlynn and Madden, 2011] Accuracy: 84% 3 Inertial 5 7 7

Offline [Chernbumroong et al., 2013] Accuracy: 90% 1

Inertial; Temperature;

Altitude

9 3 7

[Ronao and Cho, 2016] Accuracy: 95%-96% 1 Inertial 6 7 7

[Shoaib et al., 2016] F-measure (only presented by

activity) 2 Inertial 13 3 3

[Maurer et al., 2006] Accuracy: up to 93% 6 Inertial; Light 6 7 7

Supervised [Kao et al., 2009] Accuracy: 95% 1 Inertial 7 7 7

[Berchtold et al., 2010] Accuracy: up to 97% 1 Inertial 10 7 7

Online [Riboni and Bettini, 2011] Accuracy: up to 93% 3 Inertial; GPS 10 7 7

[Zhu, 2011] Accuracy: 85% 4 Inertial;

Location 10 3 7

[Lara and Labrador, 2012] Accuracy: 96% 1 Inertial; Vital

signs 5 7 7

[Siirtola and Röning, 2012] Accuracy: 94%-96% 1 Inertial 5 7 7

[Varkey et al., 2012] Accuracy: 91% 2 Inertial 6 7 7

[Huynh and Schiele, 2006] Accuracy: up to 88% 12 Inertial 8 7 7

Semi-supervised Offline [Stikic et al., 2011] Accuracy: up to 78% 2 Inertial 20 3 3

[Nazábal et al., 2016] Accuracy: 92% 4 Inertial 12 7 7

Online [Longstaff et al., 2010] Accuracy: up to 91% 1 Inertial; GPS 3 7 7

(26)

dropped. Data was labeled by users in a naturalistic environment and a C4.5 decision tree clas-sifier with time and frequency domain features achieved an overall accuracy of 84%. In another work, [McGlynn and Madden, 2011] proposed a new ensemble classifier based on Dynamic Time Warping (DTW) aiming to recognize 6 ADLs. Three accelerometers placed on the hip, thigh and wrist were used. When combining data from the three sensors an overall accuracy of 84% was obtained. This value dropped significantly when using data from sensors separately.

But it may be impossible to label all data in order to attain completely supervised algorithms. With the purpose of not missing any data, it is possible to resort to semi-supervised learning ap-proaches. [Stikic et al., 2011,Stikic et al., 2009] propose a method for label propagation through a graph with both labeled and unlabeled data, suggesting that it might top accuracy/time compromise in the labeling task. Another semi-supervised learning algorithm applied for activity recognition was proposed in [Ali et al., 2008], combining Multiple Eigenspaces based on PCA with HMM. Their experiment monitored finger movement by placing two accelerometers in the index finger. However it is given few information about the experiments, making it difficult to analyze them. It is worth to mention that, before this work, [Huynh and Schiele, 2006] proposed a combina-tion of Multiple Eigenspaces with SVM, where ambulacombina-tion and daily activities were recognized. However, it relied on the use of eleven accelerometers, and, therefore, high obstructiveness.

According to [Lara and Labrador, 2013], semi-supervised learning HAR presents several is-sues and requires further investigation to overcome them. In fact, most of these HAR approaches assign labels to unlabeled data in the training set and then apply typical supervised methods, some of them are not adapted to real scenarios and conditions and some other are very computation-ally expensive. Works of this nature are, however, raising discussion and proposing alternatives to optimize the trade-off between labeling effort and recognition performance, which justifies the interest of the scientific community in overcoming the discussed difficulties.

2.1.1.2 Online approaches

Online HAR systems are the answer for applications that perform continuous health monitoring or real-time physical exercise performance assessment, for example. These systems receive continu-ously streaming data. For that reason, windowing techniques are often employed for segmentation purposes. This constitutes an additional challenge, when compared to offline approaches, which may lead to misclassification due to the occurrence of several activities within the same time-frame.

Works like eWatch [Maurer et al., 2006], Vigilante [Lara and Labrador, 2012] or ActiServ [Berchtold et al., 2010] are successful examples of online systems that recognize ambulatory ac-tivities with resource to accelerometers. In [Kao et al., 2009], 7 ADLs were recognized. They used a triaxial accelerometer placed in the dominant wrist of the user that transmitted information via wireless to an embedded system. That system collected the data and performed preprocessing, feature extraction and classification. Streaming data was split in windows of 320ms with 50% overlap. Classification was performed with a Fuzzy-Basis-Function (FBF) based classifier with Linear Discriminant Analysis (LDA). Another example of an online ADL recognition system is

(27)

2.1 State-of-the-art 9

COSAR[Riboni and Bettini, 2011]. They collect data from 2 accelerometers, one on the phone and another on the wrist of the user, and take advantage of GPS location to filter activities. Moreover, they implement a statistical classification of activities that takes into account historical variants, proposing an alternative to totally supervised learning methods. Overall, the system performs well, with roughly 93% reported accuracy.

According to [Incel et al., 2013], by the time of its publication, the only work that proposed the use semi-supervised learning in mobile phones for activity recognition was [Longstaff et al., 2010]. Longstaff et al. implemented a self-learning technique, that labels unlabeled data based on the most confident predictions of a single classifier, thus augmenting their existing classifier, and a co-learning technique, which uses multiple classifiers that learn from each other. If the initial classifiers performed poorly, democratic co-learning proved to be a good option for most applications. It used GPS and accelerometer based features with a decision tree C4.5 (co-learning and self-learning), naive Bayes (co-learning) and SVM classifiers (co-learning). Later on, the effect of online semi-supervised learning was explored in [Mendes-Moreira and Cardoso, 2016]. For perfect segmentation of activities, their work verified that semi-supervised learning performed better than totally supervised learning.

2.1.2 Features and classifiers

There is a great diversity of features, and each of the works mentioned so far has taken this fact into their advantage. In [Lara and Labrador, 2013], state-of-the-art approaches of HAR using wearable sensors were surveyed and Table 2.2 highlights methods of feature extraction for acceleration signals and state-of-the-art classification algorithms based on their work.

Table 2.2: Common feature extraction methods (from accelerometer sensor data) and state-of-the-art classification algorithms surveyed in [Lara and Labrador, 2013] for HAR.

Phase Most relevant methods

Feature extraction Time domain:mean, standard deviation, variance, interquartile, range (IQR), mean absolute deviation (MAD), correlation between axis, entropy, kurtosis

Frequency domain:Fourier Transform (FT), Discrete Cosine Transform (DCT) Others: Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Autoregressive Model (AR), HAAR filters

Classification Decision tree:C4.5, ID3

Bayesian:Naive Bayes, Bayesian Networks Instance Based: k-nearest neighbors (kNN) Neural Networks:Multilayer Perceptron

Domain transform:Support Vector Machines (SVM) Fuzzy Logic:Fuzzy Basis Function, Fuzzy Inference System

Regression methods: Multinomial Logistic Regression (MLR), Additive Logistic Regression (ALR)

Markov Models:Hidden Markov Models (HMM), Conditional Random Fields Classifier Ensembles:Boosting, Bagging

(28)

always depends on the final purpose of the work, e.g. the response time of the algorithm or the type of activities of interest. Nevertheless, the features presented in Table 2.2 seem to be some of the most well accepted and consensual, since most of the surveyed works implemented at least one of them.

Based on the extracted features, activities are recognized by classification algorithms. Ta-ble 2.3 groups the discussed works by classifier employed. Further discussion on classification methods and concerns in promoted in the next section.

Table 2.3: Classifiers employed in the surveyed HAR works.

Classifier Used in

ALR [Lara and Labrador, 2012]

CNN [Ronao and Cho, 2016]

DBN [Zhu, 2011]

Decision tree [Bao and Intille, 2004], [Maurer et al., 2006], [Longstaff et al., 2010], [Shoaib et al., 2016]

DTW-based [McGlynn and Madden, 2011]

Ensemble method [Lara and Labrador, 2012], [Cardoso and Moreira, 2016] FBF-based [Kao et al., 2009], [Berchtold et al., 2010]

HMM [Ali et al., 2008], [Nazábal et al., 2016]

kNN [Siirtola and Röning, 2012], [Shoaib et al., 2016] Naive Bayes [Longstaff et al., 2010], [Shoaib et al., 2016]

QDA [Siirtola and Röning, 2012]

SVM [Huynh and Schiele, 2006], [Longstaff et al., 2010], [Stikic et al., 2011], [Varkey et al., 2012], [Chernbumroong et al., 2013]

2.2

Smartphone processed HAR

The field of activity recognition with smartphones has been growing. As smartphones became an indispensable tool of daily assistance, it is possible to notice an increasing interest in using all of its features. Typical smartphones incorporate multiple sensors - accelerometer, ambient tempera-ture sensor, gravity sensor, gyroscope, light sensor, linear acceleration, magnetometer, barometer, proximity sensor, humidity sensor - and have, thus, become convenient for recognition of both simple and complex activities [Su et al., 2014]. Even when its embed sensors are not used, smart-phones remain interesting for external sensors acquisition purposes and to process continuously streaming data (online recognition). Figure 2.1 illustrates the flow of a typical recognition task with smartphone processing.

A lot of effort has been put in identifying ambulatory activities, like walking, running, biking and ascending/descending stairs, particularly with smartphones. The work of [Kwapisz et al., 2011] is an example that uses the accelerometer from a smartphone to this purpose with a reported accuracy of up to 92%. Fewer works attempt to recognize further ADLs. In spite of that, some of

(29)

2.2 Smartphone processed HAR 11

Figure 2.1: Steps for activity recognition with a smartphone.

these works which exploit smartphone sensors perform well, like [Yan et al., 2012] and [Rai et al., 2012], with reported overall accuracy of roughly 77% and 86%, respectively. Section 2.3presents further ADL recognition works framing them according to its eating recognition performance. 2.2.1 Challenges

There are some general challenges in the field of activity recognition, fairly common to most HAR works. In the survey presented in [Avci et al., 2010], the authors highlighted the following unresolved challenges, specially concerning inertial activity recognition systems:

• Human behavior: it is very common for people to perform multiple activities simultane-ously and to evidence cultural/individual differences in the way they perform said activities, making it difficult for recognition systems to make a decision.

• Sensor inaccuracy: the reliability of sensor data is crucial to ensure good performance of the recognition process.

(30)

• Sensor placement: inefficient placement/orientation of sensors or changes of sensor position during the movement of the subject constitutes an additional problem.

• Resource constraints: includes power consumption, memory space and computational effi-ciency related concerns.

• Usability: the efficiency of the recognition system depends on the definition of a target group, so an adequate sensor arrangement and distribution is used.

• Privacy: private data of the users must be properly addressed and encrypted for transmission and storage, which requires computational power.

Later on, [Incel et al., 2013] thoroughly reviewed specific challenges of activity recognition with mobile phones and identified difficulties related to continuous sensing, running classifiers in a smartphone environment, phone context and the burden of training.

Limited battery life constitutes a challenge for activity recognition applications that sample sensor data continuously. Continuous sensing should not influence user experience or limit the smartphone utilization. Different techniques have been proposed to work around this issue. Some works only sample specific sensors in specific states [Wang et al., 2009], while some others adjust the sampling rate depending on how interesting an event seems, improving trade-offs between energy, accuracy and latency [Chu et al., 2011]. In [Lu et al., 2010], user activities are classified online using the accelerometer. Based on this information, GPS is either switched on and sampled (if the user is moving) or turned off (if he is not). This approach is very interesting for contextual data gathering (localization).

The processing and power limitations of smartphones may also condition its ability of running classifiers, i.e. a machine learning technique with remarkable performance in the literature may perform poorly in a mobile phone platform due to its limited resources. In fact, it is very common to find in the literature works that collect data and use it with offline training algorithms, usually achieving better recognition results than it would be possible with online algorithms. As discussed before, there are applications in which offline processing may be enough, while some others re-quire an online approach to obtain real-time information of user context. Therefore, classifiers that do not require extensive resources are preferable (e.g. decision trees, kNN).

The phone context problem is especially worrisome when the inertial sensors of the smart-phone are used, since its position is very important. The device may be thrown into a bag, put in the pocket, carried in the hand or be located in any other place. In fact, phones can even be left at home or in the car. The consequences of this action can be attenuated using wearable sensors other than the ones from the phone. Even though phone context has been receiving a lot of attention, its identification in real-time still remains an unresolved issue [Incel et al., 2013].

Running the training phase of the classifieron the smartphone is still and also a challenge. The training phase is frequently conducted offline and static models are used for later online classifica-tion. For a good training model to be conceived, large training sets are often required, along with appropriate labeling that is also a burden for the users. Therefore, there are still many problems

(31)

2.3 The eating and drinking activities 13

to overcome and research on online training systems continues. It is important that the applica-tion quickly reaches the ready-to-use state, while maintaining its user independence. Methods to achieve user-independent training were proposed in [Köse et al., 2012] and [Miluzzo et al., 2010]. However, a work that might deserve particular mention is that of [Lane et al., 2011], which pro-poses the "community similarity networks" (CSN) approach. Physical, lifestyle and sensor-data similarity networks are employed in CSN, which provides a unique classifier for each user accord-ing to their own characteristics. The results enable the conclusion that, with fewer labeled data, CSN performed considerably better than the remaining methods that address population diversity. [Incel et al., 2013] also emphasize that persuading users is an additional challenge in appli-cations that imply behavioral or lifestyle changes, as the application will only succeed if the user commits. To understand which are the most effective methods might be a job for multidisciplinary teams, with a psychology component, that address the target group of an application.

2.3

The eating and drinking activities

There are HAR systems that specifically focus on ADL recognition. While these works are usu-ally confined to the study of ambulatory activities (the case of most works presented in Table

2.1), it is possible to find a few that recognize eating. Table 2.4 summarizes some these works. Eating recognition performance can also be consulted, encouraging comparison not only between complexity of approaches, but also concerning its eating spotting capabilities.

The works surveyed in Table2.4bring additional contributions yet. By aiming to recognize complex activities (such as the activities of interest), which not all HAR works focus on, these works faced particular challenges and withdrawn interesting conclusions.

[Shoaib et al., 2016] studied the influence of increasingly larger windows (2 to 30s) and con-cluded that a bigger window size is associated with better performance in recognizing complex activities. As it is not possible to increase the window size indefinitely, the authors recommend the implementation of an hierarchical approach (on top of the classification output) in case of ac-tivities that inherently include pauses among gestures or other unpredictable movements, as eating. It is also the only work that simultaneously recognizes both eating and drinking ([Bao and Intille, 2004] also recognizes both activities, but it does not distinguish one from the other).

In fact, very few works seem to distinguish the drinking activity from other ADL. Notwith-standing, [Shoaib et al., 2016] reports up to 91% F-measure in recognizing drinking. In [Varkey et al., 2012], the authors also distinguish eating and drinking from smoking, writing, typing and brushing the teeth in an experiment to assess their algorithm performance in distinguishing similar activities. Even though it was a small experiment, with only one test subject, an overall accuracy of roughly 98% was attained.

It is also interesting to acknowledge the work of [Chernbumroong et al., 2013] for focusing in the same target population as the present work, the elder.

(32)

Table 2.4: Supervised HAR works that recognize the eating activity among other ADL.

[Bao and Intille, 2004]

Recognized activities

Walking, walking carrying items, sitting & relaxing, working on computer, stand-ing still, eatstand-ing or drinkstand-ing, watchstand-ing TV, readstand-ing, runnstand-ing, bicyclstand-ing, stretchstand-ing, strength-training, scrubbing, vacuuming, folding laundry, lying down & relaxing, brushing teeth, climbing stairs, riding elevator and riding escalator

Sensors Accelerometers

Features Mean, energy, frequency-domain entropy, and correlation Classifiers Decision tree C4.5

Eating recog-nition metrics

Accuracy: 89% [Zhu, 2011] Recognized

activities

Sitting, sit-to-stand, stand-to-sit, standing, walking, typing on keyboard, using the mouse, flipping a page, cooking, eating

Sensors Accelerometers and location tracker Features Mean and variance of the 3D acceleration Classifiers Dynamic Bayesian Network

Eating recog-nition metrics Accuracy: 80% [Chernbumroong et al., 2013] Recognized activities

Brushing teeth, dressing/undressing, eating, sweeping, sleeping, ironing, walking, washing dishes, watching TV

Sensors Accelerometer, thermometer and altimeter

Features Mean, minimum, maximum, standard deviation, variance, range, root-mean-square, correlation, difference, main axis, spectral energy, spectral entropy, key coefficient Classifiers Support Vector Machines

Eating recog-nition metrics Accuracy: 93% [Shoaib et al., 2016] Recognized activities

Standing, jogging, sitting, biking, writing, walking, walking upstairs, walking down-stairs, drinking coffee, talking, smoking, eating

Sensors Accelerometer, gyroscope and linear acceleration sensor

Features Mean, standard deviation, minimum, maximum, semi-quantile, median, sum of the first ten FFT coefficients

Classifiers Naive Bayes, k-Nearest Neighbors, Decision Tree Eating

recog-nition metrics

F-measure: up to 87%

Several works of assisted living with environmental sensing, e.g. smarthomes, that recognize the eating activity have also been published. However, these were not considered, since they are out of the scope of this literature review.

(33)

2.4 Summary and conclusions 15

So far, only works that distinguish eating and drinking from other ADL were discussed. In the next chapter, however, it will be possible to discuss some other HAR works specifically focused in only recognizing these activities.

2.4

Summary and conclusions

This chapter introduced the world of HAR, summarizing the most popular recognition methods. Wearable sensing was particularly addressed and HAR works were framed according to their learn-ing supervision and response time, while advantages and disadvantages of each technique were revealed and compared. It was also presented an overview of the features and classifiers employed in the surveyed works.

Then, the activity recognition process with a smartphone was illustrated and its main chal-lenges were described. Besides general difficulties, five chalchal-lenges particularly related to activity recognition in a smartphone were detailed: continuous sensing, running classifiers and their train-ing phase, the phone context problem and user persuasion.

Finally, four HAR works that addressed up to 20 ADLs were explored due to their ability to identify the eating activity with supervised learning methods. These works reported eating recognition accuracy ranging from 80% to 93%. A total of five and three works were found to distinguish eating and drinking, respectively, from other ADLs. However, some of them are not able to distinguish one from the other.

(34)
(35)

Chapter 3

Eating and drinking recognition

Among ADL, eating and drinking belong to the group of self-care tasks, along with bathing, bowel and bladder control, functional mobility, sexual activity and hygiene [Pendleton and Schultz-Krohn, 2013].

The importance of eating is unquestionable in overall human health and the scientific commu-nity has performed extensive research concerning eating behavior, patterns and disorders. Fluid intake is also essential for life, as it enables cellular homeostasis. In fact, 55% to 75% of human body weight is due to water (corresponding to the elder and children, respectively) [Popkin et al., 2010] and its extreme absence can even lead to death within a couple of days.

In this chapter, an understanding of the role of eating and drinking in seniors’ lives is firstly promoted. Then, the state-of-the-art of eating and drinking recognition is presented. Issues related to sensors, acquisition, features and classifiers are also discussed. Publicly available datasets of HAR, and particularly eating and drinking recognition, are then presented. Ultimately, challenges associated to the recognition of these activities are described. A summary and conclusions section closes the chapter.

3.1

Eating and drinking for older adults

In order for technology to properly address eating-related issues it is essential to segment po-pulation, as eating habits vary greatly with age, gender, disabilities/conditions and even culture. Furthermore, it is possible to identify risk groups that require particular care to maintain healthy habits, namely proper eating habits.

These groups’ necessities must be in the loop of technological advances, so that solutions that closely meet their inherent requirements are developed, which justifies the need for establishing the target population firstly. Elderly people constitute one of society’s most sensitive groups with very particular needs that require continuous observation.

This section discusses some of these topics, raising awareness to the complexity of eating and drinking activities, especially for seniors.

(36)

3.1.1 Sociocultural and geographical differences

The eating activity is much more than a biological necessity, a physiological or biochemical pro-cess. It is also a social activity, highly affected by social relations. These relations often explain eating patterns, suggesting that sociocultural context must be understood in order for the eating behavior to be analyzed [Delormier et al., 2009].

In an anthropological perspective, eating is a social urge: food is shared with family and friends and to distribute food is an expression of one’s altruism. One of the first connections that a child performs is also through food, in the breastfeeding process. Thanks to all this, food has become a symbol of love and security [Fox, 2003].

There are several aspects that evidence the complexity of eating. For example, while it is a necessity for all animals, human beings are the only species that cook [Fox, 2003]. People also use different utensils to eat around the world. In Western culture, it is common to resort to cutlery to assist the food-to-mouth moment. The use of knife and fork is typical in a Western-style meal. Asian culture, however, often relies on the use of chopsticks. It is also common, in some cultures, to simply use hands or bread in place of utensils. It is the case of Ethiopia and India. Humans have also established periods in a day when people are supposed to eat, i.e. meals. The number and periodicity of meals vary around the world, but there is some consensus in considering at least three eating periods in a day: breakfast, lunch and dinner. Eating even varies with respect to religious beliefs, i.e. some religions do not allow the consumption of certain foods, and with many other dimensions of human sociological complexity.

Such a diversity has difficult the work of researchers in understanding meal patterns and its implications in nutrition and diet quality. In a thorough and recent review, [Leech et al., 2015] studied 48 works that confronted meal patterns with nutrient intake and diet quality, concluding that meal definitions and guidance is necessary in order to propel meal pattern characterization. The most consistent finding the authors verified was the negative association between skipping breakfast and diet quality.

Even when focusing in the elderly, it is important to keep in mind these sociocultural dif-ferences and maintain their individuality, respecting the complexity of the eating activity and its impact in one’s life as much as possible.

3.1.2 Senior eating disorders and dehydration

Seniors are one of the risk groups to which more and more attention has been given lately. The ag-ing process implies physical, biological and behavioral changes and increases the need for special care.

According to [Donini et al., 2003], the decline in food intake and the lack of motivation to eat are some of the most concerning problems associated with eating habits of the elder. Usually, society assumes that eating disorders, like anorexia nervosa and bulimia nervosa, occur almost exclusively in younger ages. [Lapid et al., 2010], however, in a review of 48 published cases, con-cluded that these problems do occur in the elderly. Anorexia nervosa is the most common eating

(37)

3.1 Eating and drinking for older adults 19

disorder and depression the most common comorbid psychiatric condition. The diagnosis of eating disorders in seniors may be difficult, but if correctly performed, behavioral and pharmacological interventions may reflect good results [Lapid et al., 2010].

Dehydration can affect physical and cognitive performance, provoke headaches, potentiate delirium (and delirium presenting as dementia, in the elderly) and impair the gastrointestinal and kidney functions, as well as heart function and the hemodynamic response. Good hydration, on the other hand, has proven to have a positive impact in several chronic diseases [Popkin et al., 2010]. In fact, the effects of aging are also evident in the relation between thirst and fluid intake, due to defects in osmoreceptors and baroreceptors and some regulatory mechanisms. Therefore, according to [Popkin et al., 2010], it is recommended for older adults to ingest water regularly, even when not thirsty, and to be careful with salt intake after sweating. These principles help preventing hypotension, strokes and abnormal fatigue. The same authors also alert for an im-paired thermoregulation of the elder, due to dysfunctional renal fluid conservation mechanisms that increase the risk of hypohydration and dehydration.

While taking care of the aforementioned problems in this segment of society, it is important to maintain their independence and self-esteem, as these qualities are a crucial part of physical and mental health of human beings. Thus, monitoring the eating activity of "free-living" elders, e.g. with resource to mobile health technologies, is an interesting alternative for care-providers to understand their eating habits while empowering self-care and awareness.

3.1.3 Food, water and medication

Medication ingestion is often related to eating periods and fluid intake. Physicians frequently recommend whether a certain pharmaceutical must or not be taken before or after a meal. That information is also available at the package insert of each box of medication. However, its reading and interpretation is not straight forward, since current conditions/illnesses of the patients or other external factors may affect the common procedure. The Food and Drugs Administration (FDA) advises that information on how to use the medication must be taken seriously, whether it concerns ingestion with or without food or water, times of the day to take the medication or any other indication [Food and Drug Administration et al., 2006].

Prescription medications are very frequent in the elder and, since there is not a generic guide-line for every pharmaceutical, each individual or the care-providers must be alert. However, it is common, especially with elder people living alone, to forget or misuse medication. Medication-taking behavior is known to be affected by an elderly person’s age, gender, marital and living status, health beliefs and education level [Huang, 1996]. To ensure education of care-providers and the elder himself is of extreme importance to guarantee safety in taking medication. In this case, technology may also be an interesting assistance-providing option.

Many of the prescribed pharmaceuticals may also cause malabsorption of nutrients, gastroin-testinal symptoms and loss of appetite [Donini et al., 2003]. This reinforces the need for the maintenance of appropriate dietary habits and plenty fluid intake for seniors.

(38)

3.2

State-of-the-art

The importance of appropriate eating and drinking behaviors has driven the scientific community to look for technological solutions. To that end, eating and drinking recognition algorithms have been proposed by many authors, namely in very recent works, which reveals high current interest in this particular topic.

3.2.1 Algorithms and approaches

It appears that eating has been on the receiving end of more attention from the scientific commu-nity when compared to drinking. Moreover, apart from works that distinguish a large number of activities, as those presented in the previous chapter, it were not found works towards exclusively distinguishing eating from drinking, and both of these from other activities.

A description and discussion of the surveyed works is subsequently presented.

3.2.1.1 Eating recognition

Table 3.1summarizes some eating recognition works. Most of these corresponded to supervised learning approaches with online response, aiming especially dietary monitoring rather than issuing reminders, which appears to be a novel intention.

The surveyed works confirm the assumption that inertial data would be an appropriate ap-proach in recognizing the eating activity. Nevertheless, [Thomaz, 2016] has proposed several ways of detecting eating moments, taking advantage of different types of sensor data to do so. One of the approaches uses first-person point of view photographs and computer vision techniques allied with Convolutional Neural Networks (CNN) classification. Even though the main focus of his works was eating detection, it recognized 19 ADLs from one individual over a period of 6 months. It was possible to achieve a total accuracy for all 19 activities, and the eating activity itself, of roughly 83%. Thomaz also conducted an experiment with audio signals, which resulted in eating activities being identified with 90% precision and 76% recall.

Several works focused in gesture detection to predict eating periods [Amft et al., 2005,Amft and Tröster, 2008,Junker et al., 2008,Ramos-Garcia et al., 2015,Thomaz et al., 2015], propos-ing segmentation methods of said gestures in the continuous data stream and their classification. However, it is important to acknowledge the heavy and time-consuming labeling job, especially for large datasets, as the aforementioned works remarkably present. Moreover, the subjectivity of this process is very worrisome, as different people might not agree in the beginning and ending moments of an activity and/or gesture.

Gesture spotting approaches often resort to classifiers that instigate sequential dependency, since gestures are frequently performed sequentially. This is why HMM are one of the most employed classification method, namely when dealing with continuous data streams from motion sensors. In [Amft et al., 2005], the authors discriminated eating an drinking gestures from other movements with an accuracy of 95% on isolated gesture segments and 87% with online streaming

(39)

3.2

State-of-the-art

21

Table 3.1: Summary of eating recognition works.

Sensor type

No. of Sensors

Features Classifier Performan-ce metrics [Amft et al., 2005] Inertial 4 Pitch angles, rate of turn, rotation, change of angle to the horizontal plane HMMs Accuracy:

87-95% [Dong et al., 2014] Inertial 1 Manipulation, linear acceleration, amount of wrist roll motion, regularity

of wrist roll motion

Naive Bayes Accuracy: 81% [Thomaz, 2016, Thomaz

et al., 2015]

Inertial 1 Mean, Variance, Skewness, Kurtosis, RMS Random Forest F-scores: 71-76% [Thomaz, 2016] Image 1 CNN soft-max probabilities, Day of the week, Time of the day, Histogram

of color

CNN and RDF ensemble

Accuracy: 83% Audio 1 Zero-crossing rate, Loudness, Energy, Envelope shape statistics, LPC,

LSF, Spectral flatness, Spectral flux, Spectral rollo, Spectral shape statis-tics, Spectral variation

Random Forest F-score: 80% [Merck et al., 2016] Inertial,

Audio

4 Mean (axis, magnitude of gyroscope and derivative of acceleration vec-tors), covariance of acceleration components, coefficients of 4th order polynomial fits to each acceleration component, zero-crossing rate, stan-dard deviation of the zero-crossing intervals, energy, spectral flux, MFCC coefficients

Random Forest Precision: 88%; Recall: 87%

[Shen et al., 2016] Inertial 1 Mean, standard deviation, slope HMMs Accuracy:

(40)

data. Later on, in [Amft and Tröster, 2008], a study of recognition of swallowing, chewing and arm movements (e.g. hand-to-mouth) food intake events was performed in a multi-modal approach with resource to 7 sensors (inertial, EMG and audio).

Some works propose an explicit segmentation as a preprocessing step to improve the spotting task [Lukowicz et al., 2004,Lee and Xu, 1996], instead of relying on HMMs intrinsic segmentation capabilities. Believing in the importance of that step, [Junker et al., 2008] proposed a two-staged spotting approach, firstly identifying potential candidate sections in the data stream and only then proceeding to classification with HMMs in order to remove false positives (non-relevant gestures). In this work, it is also presented a case study that aimed to distinguish between eating with cutlery, drinking, eating with spoon and eating with hands, with an overall recall and precision of 79% and 73%, respectively.

[Ramos-Garcia et al., 2015] also tracked wrist motion with inertial sensors. Following the work of [Dong et al., 2012,Dong et al., 2014], where they performed eating detection continu-ously throughout an entire day with simple, while effective, algorithms, hierarchical HMMs that take advantage of the gesture-to-gesture sequential dependency were conceived. Their method improved the recognition of eating gestures, achieving finally an overall accuracy of 97%.

When developing a monitoring tool, the continuous recognition of the eating activity is the main goal. However, most of the aforementioned works process data acquired in controlled en-vironments rather than free-living conditions. These approaches, while maintaining real-time re-sponse, lack free-living validation. On the contrary, the works of [Thomaz et al., 2015] and [Dong et al., 2014] validate their algorithms with datasets acquired in free-living conditions.

[Thomaz et al., 2015] introduced an eating moment estimation approach based in detecting food intake gestures, but HMM were not employed. It was possible to recognize eating moments with F-scores of 76%, for data acquired by 7 participants during 1 day, and 71% for 1 participant over 31 days. Best results of classification of eating gestures was achieved with the Random Forest classifier and eating moments were estimated using the DBSCAN clustering algorithm. The same algorithm was used in [Thomaz, 2016] with inertial data acquired from Google Glass, and an F-score of 72% was obtained. In a final experiment, the author used inertial data from both wrists. The results of this experiment were worse than the previous, using the same algorithm.

[Dong et al., 2014] highlights that, in the course of a typical day, the ratio between eating and performing other activitiesis 1:20. Therefore, the authors present the accuracy of their method by means of a different calculation, made explicit by 3.1, where T P, T N, FP and FN correspond to true positives, true negatives, false positives and false negatives, respectively. In the same work, the group verifies that an eating moment is preceded and succeeded by a peak of the energy of the signal, due to the fact that eating corresponds to a period of essentially small movements which present themselves as a "valley" in the energy signal.

accuracy= 20 × T P + T N

(T P + FN) × 20 + (T N + FP) (3.1)

(41)

3.2 State-of-the-art 23

age and utensils - in overall eating recognition improvement, verifying an improvement of 0.9%, 4.3% and 6.2% respectively compared to a baseline HMM classifier under the same conditions. As future work, the group intends to combine gesture-to-gesture sequencing with context data, aiming to achieve the best of both worlds.

3.2.1.2 Drinking recognition

Drinking recognition is also an interesting task. In fact, in [Amft et al., 2010], the authors go even further and aim to determine fluid intake. They achieved 84% recall and 94% precision in recognizing drinking motion from a continuous data stream, with only one inertial wrist sensor. Then, container types and fluid level were estimated. Even though all experiments took place in a controlled environment, a suggestive 75% and 72% recognition rate of container type and fluid level, respectively, was achieved with a C4.5 Decision Tree classifier.

Amft had already tried to distinguish drinking from eating movements in [Amft et al., 2005]. However, this work focused solely in distinguishing gestures, i.e. it did not aim to recognize drinking (and eating) among other activities performed in the course of a day.

3.2.2 Sensors and acquisition

From the surveyed approaches, it is possible to understand that different types of sensors have been employed for eating detection. Moreover, these sensors have been placed in different con-formations. However, it also interesting to notice that good results have been obtained resorting to only one sensor.

This sensor has to be strategically located. Therefore, it is common to place it in the wrist, since, for an independent individual, hand/wrist movement is necessary during an eating activ-ity. A watch-like sensor device also reduces obstructiveness, as people are used to the feeling of watches and bracelets around the wrists and these do not disturb their daily activities.

Most works acquire data from inertial sensors, namely accelerometers and gyroscopes, to quantify movement. Gyroscopes, however, are usually associated with higher power consump-tion. This constitutes one of the reasons why accelerometers are more common for eating detec-tion purposes, as accelerometers alone are able to deliver good results, positively balancing the performance-consumption trade-off.

3.2.2.1 Sampling rate

Sampling rate is also an important subject when planning acquisition. In the eating activity recog-nition works with inertial data previously surveyed, sampling rates ranged from 15Hz [ Ramos-Garcia et al., 2015,Dong et al., 2014,Shen et al., 2016,Merck et al., 2016] to 100Hz [Amft and Tröster, 2008,Junker et al., 2008].

The trade-off between data acquisition and battery life of the involved devices is, however, crucial. In [Dong et al., 2014], the authors addressed this trade-off. They started by recording data

(42)

at 60Hz. In this case, battery lasted for approximately 8.5h. Later on, they came to the conclusion that a 15Hz sampling rate would suffice and battery life would last for 12h, i.e. 3.5h longer.

3.2.2.2 Context data

The work of [Shen et al., 2016] also raises awareness for the role of context data in eating recogni-tion algorithms. While gender was proven not to have major influence in the classificarecogni-tion results, information about age and utensils used in the eating activity have improved the final results in up to 6.2%. In this work, the impact of the context data was studied individually. To combine this information is an interesting next step, with potential to achieve increasingly better results.

It may be also useful to include GPS data in the classification model, inspired by much cited works like [Ashbrook and Starner, 2003, Zheng et al., 2009]. This has also been studied and implemented in some HAR works, with good activity recognition results, as it is the case of the COSAR system [Riboni and Bettini, 2011]. Even though GPS data is usually used in ambulatory activities recognition, its potential to assist eating detection is yet to be demonstrated, since loca-tion informaloca-tion may be useful (e.g. if the user is in a restaurant, it is very likely that he will be having a meal; if the user is in the middle of a street, it is not so likely). Among the setbacks of GPS data acquisition there are battery life concerns, associated to its high power consumption. To perform acquisition just when the inertial sensors detect doubtful motion might be a strategy to overcome said issue [Lu et al., 2010].

3.2.3 Features and classifiers

As discussed in subsection 2.2.1, an activity recognition system with smartphone processing has its challenges. Algorithms with high computational demand are not adequate in these cases, which might make some good approaches unfeasible. Therefore, it is particularly important to develop models that perform well with features and classifiers of modest computational demand.

In fact, as it is possible to understand from Table3.1, there are already some approaches of moderate complexity. Simple time domain statistical features are frequent, e.g. mean and standard deviation. Some authors also propose the identification of potential candidate sections of the data stream [Junker et al., 2008] and feature selection techniques, reinforcing the high importance of feature reduction in the overall classification process.

However, there are still many frequently used classifiers that require caution when it comes to computational expense. It is the case of HMM, for example, a classifier extensively used due to its ability to take into account sequential dependency, which is very useful for activity recognition purposes. [Lee and Cho, 2011] proposed an online activity recognition system implemented in a smartphone with the Android platform. The authors used hierarchical HMM and proposed a recognition model designed hierarchically as actions and activities in order to overcome memory storage and computational power limitations, proving that HMM implementation in a smartphone environment is a feasible option.

Referências

Documentos relacionados

Resumo da análise de variância dos dados relativos à proporção, altura e número de folhas das mudas do híbrido de mamoeiro Tainung nº 1, em função do tipo de planta femininas

Para o tubo de aço inox com 07(sete) casos, todos os valores da rugosidade equivalente selecionados inicialmente pelos projetistas, permaneceram dentro da faixa das

Antes de definirmos clima organizacional, é importante trazermos o significado de sua fonte geratriz, a cultura organizacional. Entende-se cultura organizacional como um

A desproporção entre a população a ser cui- dada e o número de trabalhadores disponíveis, as absurdas barreiras arquitetônicas, a ausência de capacitação para as equipes, a

Resumo: Objetivos: verificar a confiabilidade do escore em um questionário de aptidão física autorrelatada (QAPA) em crianças de 6 a 11 anos, nas aulas de educação física

Embora outras espécies de cochonilhas, tais como Dysmicoccus brevipes (Cockerell, 1893), Geococcus coffeae (Green), Nipaecoccus coffeae (Hempel), Pseudococcus cryptus

Figura 2 Comprimento de raízes com diâmetros maiores que 2 mm, em metros, de mudas de cafeeiro Acaiá e Rubi obtidas por meio de enraizamento de estacas e por semeadura..

O número de ácaros nas outras fases, na parte inferior da planta, foi semelhante ao número de ácaros observados na parte média da planta, com as médias dos picos iguais a 1,0 e