Dynamic evaluation for reactive scenarios

(1)

Universidade de Aveiro

Departamento de Electrónica, Telecomunica¸cões e Informática 2016

das Universidades de Aveiro, Minho e Porto Programa de Doutoramento em Inform´atica

Carlos Eduardo

da Silva Pereira

Avalia¸

c˜

ao Dinˆ

amica para Cen´

arios Reactivos

(2)

(3)

Universidade de Aveiro

Departamento de Electrónica, Telecomunica¸cões e Informática 2016

das Universidades de Aveiro, Minho e Porto Programa de Doutoramento em Inform´atica

Carlos Eduardo

da Silva Pereira

Avalia¸

c˜

ao Dinˆ

amica para Cen´

arios Reactivos

Dynamic Evaluation for Reactive Scenarios

Tese apresentada à Universidade de Aveiro, Minho e Porto para cumpri-mento dos requisitos necessários à obten¸cão do grau de Doutor em In-formática no âmbito do Programa Doutoral MAP-i, realizado sob a ori-enta¸cão cient´ıfica do Doutor António Joaquim da Silva Teixeira, Professor Associado do Departamento Electrónica Telecomunica¸cões e Informática da Universidade de Aveiro, e pelo Professor Doutor Miguel Augusto Mendes Oliveira e Silva, Professor Auxiliar do Departamento Electrónica Telecomu-nica¸cões e Informática da Universidade de Aveiro.

(4)

(5)

o j´uri / the juri

presidente / president Vitor Br´as de Sequeira Amaral

Professor Catedr´atico, Universidade de Aveiro

vogais / examiners committee Jo˜ao Manuel Pereira Barroso

Professor Associado com Agrega¸c˜ao, Universidade de Tr´as-os-Montes e Alto Douro

Luis Manuel Dias Coelho Soares Barbosa

Professor Associado, Escola de Engenharia, Universidade do Minho

Jos´e Miguel de Oliveira Monteiro Sales Dias

Professor Associado Convidado, Instituto Superior de Ciˆencias do Trabalho e da Empresa, Instituto Universit´ario de Lisboa. Diretor do Centro de De-senvolvimento da Linguagem da Microsoft Portugal

Ant´onio Joaquim da Silva Teixeira (Orientador)

Professor Associado, Universidade de Aveiro

Joaquim Manuel Henriques de Sousa Pinto

Professor Auxiliar, Universidade de Aveiro

Carlos Jorge da Concei¸c˜ao Teixeira

Professor Auxiliar, Universidade de Tr´as-os-Montes e Alto Douro

Hugo Alexandre Paredes Guedes da Silva

Professor Auxiliar, Escola de Ciˆencias e Tecnologia, Universidade de Tr´ as-os-Montes e Alto Douro

(6)

(7)

agradecimentos Na realiza¸c˜ao deste trabalho, foram muitos aqueles que me

incenti-varam e ajudaram, e sem os quais a constitui¸c˜ao deste seria muito

provavelmente apenas uma miragem.

Em primeiro lugar, tenho que agradecer aos meus pais, fonte

ines-got´avel de apoio, por terem feito tudo ao seu alcance para me

providen-ciarem oportunidades `as quais n˜ao tiveram acesso e por me suportarem

nos bons e maus momentos que todo este meu per´ıodo acad´emico

compreendeu.

Agrade¸co ao Professor Ant´onio Teixeira e ao Professor Miguel Oliveira

e Silva pela oportunidade, paciˆencia e confian¸ca ao longo destes anos.

Sem eles, n˜ao seria poss´ıvel este trabalho. A forma descontra´ıda com

que foi poss´ıvel discutir ideias e conceitos ´e algo que prezo e que

recor-darei com apre¸co. Melhores orientadores seria na minha opini˜ao,

im-poss´ıvel.

Agrade¸co ao Artur, pelo apoio e por ter aberto o caminho acad´emico

para a minha gera¸c˜ao. Deste per´ıodo, ficar˜ao sempre as acaloradas

discussões deste a universidade até à esta¸cão e as sessões de ténis de

praia ao final da tarde.

Ao Nuno Luz, um agradecimento não só pela colabora¸cão efectuada no

contexto deste trabalho mas pela amizade que estou certo permanecer´a

por muitos e longos anos.

Um agradecimento ao IEETA e aos seus colaboradores, em especial a Ana Isabel, o Nuno Almeida e o Samuel Silva pelo apoio nas actividades

decorrentes deste trabalho e pela boa disposi¸c˜ao permanente.

Por fim, um agradecimento para a longa lista de pessoas que me aju-daram, incentivaram e apoiaram durante este longo per´ıodo e que seria imposs´ıvel indicar individualmente. Muito obrigado a todos.

(8)

(9)

Palavras-Chave Avalia¸cão, Cenários Reactivos, Dinamismo, Adapta¸cão ao contexto, AAL, Arquitectura Orientada a Servi¸cos, Eventos

Resumo A natureza dinˆamica de cen´arios como Ambient Assisting Living e

ambientes pervasivos e ub´ıquos criam contextos de avalia¸c˜ao exigentes

que não são completamente considerados pelos métodos existentes.

Esta tese defende que s˜ao poss´ıveis avalia¸c˜oes que tenham em

consid-era¸cão a natureza dinâmica e heterogénea de ambientes reactivos,

in-tegrando aspectos como percep¸c˜ao e dependˆencia de contexto,

adapt-abilidade ao utilizador, gest˜ao de eventos complexos e diversidade de

ambientes.

O principal objectivo deste trabalho foi desenvolver uma solu¸c˜ao que

forne¸ca aos avaliadores a possibilidade de definir e aplicar avalia¸c˜oes a

utilizadores suportadas por um modelo de avalia¸c˜ao flex´ıvel, permitindo

a cria¸cão e reutiliza¸cão de instrumentos e especifica¸cões de avalia¸cão

sem modificar a infraestrutura geral.

Para atingir este objectivo foi seguida uma abordagem de engenharia

envolvendo: a) defini¸c˜ao de requisitos; b) conceptualiza¸c˜ao de uma

solu¸c˜ao geral contendo um paradigma, uma metodologia, um modelo

e uma arquitectura; c) implementa¸c˜ao dos componentes nucleares; d)

desenvolvimento e teste de provas de conceito.

Como resultado principal obteve-se uma solu¸cão de avalia¸cão dinâmica

para ambientes reactivos integrando trˆes partes essenciais: um

paradigma, uma metodologia e uma arquitectura de suporte. No seu

conjunto, esta solu¸cão permite a cria¸cão de sistemas de avalia¸cão

es-caláveis, flex´ıveis e modulares para concep¸cão de avalia¸cões e aplica¸cão

(10)

(11)

Keywords Evaluation, Reactive Scenarios, Dynamic, Context-awareness, AAL, Service Oriented Architecture, Event-awareness

Abstract The dynamic nature of scenarios such as Ambient Assisting Living and

Ubiquitous and Pervasive environments turns them into challenging evaluation contexts not properly addressed by existing methods. We argue that it is possible to have evaluations that take into considera-tion the dynamic and heterogeneous nature of reactive environments by integrating aspects such as context-awareness, user adaptability, complex event handling, and environment diversity.

In this context, the main objective of this work was to develop a solution providing evaluators with the ability to define and apply evaluation tests to end-users supported by a flexible evaluation model allowing them to create or reuse evaluation instruments and specifications without changing the infrastructure or the need for other logistical necessities. To pursue this goal, we adopted an engineering approach encompass-ing: a) requirements definition; b) conceptualization of a general so-lution comprising paradigm, methodology, model, and architecture; c) implementation of its core components; and d) development and de-ployment of a proof of concept.

The result was a dynamic evaluation solution for reactive environments based on three major parts: a paradigm, a methodology and its model, and a support architecture. Altogether, they enable the creation of scalable, flexible and modular evaluation systems for evaluation design and application in reactive environments.

Overall, we consider that the proposed approach, due to its flexibility and scope, widely surpasses the goals considered on the onset of this work. With a broad range of features it establishes itself as a general purpose evaluation solution, potentially applicable to a wider range of scenarios, and fostering the creation of ubiquitous and continuous evaluation systems.

(12)

(13)

List of Figures

2.1 Screenshots of a multiple choice question screen and a tutorial screen for an

audio note sample using CAES . . . 10

2.2 CAESSA Visual Editor Screenshot . . . 11

2.3 MyExperience ESM Tool Screenshot . . . 12

2.4 MovisensXS Tool Sampling Screenshot . . . 13

2.5 Maestro Tool Sampling Screenshot . . . 14

2.6 CSCP Profile Example . . . 18

2.7 COBRA-Ont List of Classes and properties [Chen et al., 2003] . . . 19

3.1 User’s Circle of Information . . . 32

3.2 Evaluation Assessment Example . . . 33

3.3 Evaluation Flow . . . 34

3.4 User Characterization . . . 35

3.5 User Selection . . . 37

3.6 Domain User Selection Phase . . . 39

3.7 Node Conceptual Listing . . . 40

3.8 Conceptual Architecture . . . 41

4.1 Dynamic Evaluation Lifecycle . . . 44

4.2 Instantiating an Evaluation within a Domain . . . 46

4.3 Applying the methodology to the conceptual architecture . . . 47

4.4 Enquiry Conceptual Specification . . . 49

4.5 Event Specification . . . 50

4.6 EPR Conceptual Specification . . . 52

4.7 Task Specification . . . 54

4.8 Evaluation Flow Control Ontology . . . 55

4.9 Domain Language Creation Process . . . 56

4.10 Domain Language Creation - Phase One . . . 57

4.11 Enquiry Specification: Question extension example . . . 58

4.12 Enquiry Specification: Answer extension example . . . 59

4.13 Event Specification: Extension example . . . 60

4.14 Domain Language Creation - Phase Two . . . 62

(18)

4.16 Domain Language Creation: Interface Association . . . 64

4.17 Domain Language Creation: EPR Ontology . . . 65

4.18 Evaluation Creation Process Flow . . . 67

4.19 Evaluation Specification . . . 68

4.20 Evaluation Assessment Specification . . . 69

4.21 Enquiry Instantiation Example . . . 70

4.22 EPR Instantiation Example . . . 71

4.23 Evaluation Assessment Example . . . 73

4.24 Execution Specification Transformation Process . . . 74

4.25 Evaluation Instance Example . . . 75

4.26 Execution Specification - Step One . . . 77

4.27 Execution Specification - Step Two . . . 78

4.28 Execution Specification - Step Three . . . 79

5.1 Dynamic Evaluation Architecture Overview . . . 82

5.2 Virtual Domain Component Overview . . . 86

5.3 Virtual Domain Internal Overview . . . 88

5.4 Task Interface Processing Example . . . 92

5.5 Event Interface Example . . . 92

5.6 Event Interface Example . . . 95

5.7 Evaluation Module Input . . . 98

5.8 Evaluation Module Output Example . . . 99

5.9 Interface Manager Overview . . . 100

5.10 EPR Engine Integration Overview . . . 102

5.11 EPR Engine Execution Demonstration . . . 103

5.12 EPR Engine Result Example . . . 105

5.13 Support Unit Overview . . . 107

5.14 Criteria Handling in the Node . . . 112

5.15 Sequence Diagram regarding the initialization of an evaluation . . . 114

5.16 Evaluation Hub UI Overview . . . 115

6.1 DynEaaS Platform: EPR Creation User Interface . . . 123

6.2 DynEaaS Platform: Evaluation Assessment Results in a Timeline . . . 124

6.3 DynEaaS Platform: SPARQL UI . . . 125

6.4 DynEaaS Platform: Node Registration UI . . . 126

6.5 The TeleRehabiliation Application User Interfaces . . . 127

6.6 TeleRehabilitation Domain - Creating the Enquiry Extended Specification 130 6.7 TeleRehabilitation Domain - Associating the Enquiry Extended Specifica-tion with control flow elements . . . 132

6.8 TeleRehabilitation Domain - Enquiry final domain language . . . 133

6.9 TeleRehabilitation Domain - Event final domain language . . . 133

6.10 TeleRehabilitation Domain - Enquiry Creation UI . . . 134

(19)

6.12 Defining an Evaluation Assessment using DynEaaS . . . 138 6.13 Distribution of the software components for the execution of the

TeleReha-bilitation evaluation test . . . 139

6.14 Conducting the evaluation test during a TeleRehabilitation session between

a user and a therapist . . . 140

6.15 Example of the DynEaaS UI regarding a question in one of the evaluation

test’s assessments. . . 143

(20)

(21)

Chapter 1 Introduction

1.1 Motivation

Technology can be found on every day objects like phones, cars, cloth or in our own homes. A big challenge however remains on how to create technology that can integrate or even cooperate with already existing technology. Technology should be as transparent as possible to the user. Systems and services should not be required to compete for the user and if they are, the user should not be aware of it. In order to guarantee their success, it is necessary to heavily test these systems in real-life situations with the users and guarantee that they operate according to expectations. It is necessary to involve users in development particularly within tests and evaluations. Otherwise, due to competitiveness of the market, these systems and applications risk not being accepted, not used and soon forgotten by the users.

Evaluation processes are fundamental to assess the merits of a given research or a prod-uct. They are applied in many forms, from written enquiries [Bevan and Bruval, 2003] to observation practice [Hanington and Martin, 2012], and provide evaluators with data from which it is possible to extract results and conclusions and thus determine the success of the research or its utter failure. To apply an evaluation is a hard process. Independently of the research area, evaluations require planning on both content and application. From creating enquiries, preparing locations, instructing the users or analyzing the results, evaluations cost time. It is often harder for evaluators to handle logistical aspects such as the location than to design the actual evaluation test. These necessities range from area to area, and while certain areas have hard restrictions which evaluators must comply with, others have methodologies which do not fit their necessities.

Due to our work in projects such as the Living Usability Lab [LUL Consortium, 2012] and the AAL4ALL [AAL4ALL Consortium, 2015], we came upon examples of both sce-narios. In one test, we had the objective of applying an evaluation to a set of users which had a common disability. The objective was to conduct a test for two months consisting on a set of pre-validated enquiries to the user using a browser [Martins et al., 2014]. The browser would apply the test exactly as its paper version and collect data that can only

(22)

be analyzed when the full test ends. In another test, we intended to analyze the usability of an Ambient Assisted Living (AAL) application. In it, the objective was to test the application in real life conditions and analyze if the user’s reactions and feedback in regard to the application. We intended to check which options did the user preferred and why. We intended to analyze results as the test was performed and conduct new tests on the fly if necessary. We intended to verify if the environment itself compromised the application and if it did, then how or why.

On these cases, we found very different realities. In the first, a typical methodology based on enquiries proved sufficient. It applied the enquiries on a browser to the user who would select answers from a predefined set, thus submitting them to a database. At the end of the trial the database became closed and the results became accessible to the evaluators. On the second we found a gap as common methodologies were not capable of realizing the entire set of constrains and requirements that environments such as AAL or pervasive environments comprehend. To apply a set of enquiries would disregard the reactive nature of the environment itself and leave questions open. Conducting an observational test, can limit the user in the sense of him performing its normal routines. To create a focus group posteriously to the test within a group could result in cases where the users are either too nice or do not remember what occurred. In addition, neither methodology provided an answer as to how to assess the effect that the environment had on the user performance, on to how to detect extraordinary events and data related to them, on how to apply consecutive tests without disrupting the user’s routines, or on how to suit the test to the user’s characteristics, preferences or interests. In all, it was clear that common methodologies were not suited for reactive environments such as AAL.

In an AAL environment, users are constantly interacting. Whether directed to an application or to a sensor, users interact actively or passively. In some situations, these interactions may interfere with one another as they may clash and cause drastic changes on usability. A sudden change caused by an application can directly influence the conditions of the environment, thus making it highly reactive. Applications that operate simultaneously or using the same medium (such as sound) may see their conditions deteriorated which can cause high drops in user experience. To really test an application in such a scenario it is necessary to envision these situations and, if possible, extract data from them in order to verify their importance. In such environments, another important factor is context. For instance, an obtained result can be directly dependent on previous actions. Having knowledge of these actions can lead to a better comprehension of the user’s answers. Dis-regarding them can lead evaluators to compare situations where the users where not in similar conditions thus leading to incorrect conclusions.

In all, the differences between reactive environments, such as AAL, and other areas, hinted that these environments have different necessities, requirements and objectives. In some cases, the evaluator may want to directly pose a question to the user while in others he might find it better to analyze the user’s behavior. In some cases, the test must be performed exclusively at the user’s home but in others, it may be necessary for the test to available at all times. A methodology for reactive environments should contemplate alternative solutions from which evaluators may choose from. They must be allowed to

(23)

simply build and apply their evaluations according to their realities and without constrains from the methodology, something that common solutions are not suited for.

In the literature, a methodology called Experience Sampling Methodology (ESM) [Lar-son and Csikszentmihalyi, 1983] with context-awareness has been used by some researchers as a way of evaluating the effect that other sources of information may have on the user during an evaluation. This methodology mainly consists on allowing evaluators to link events with specific enquiries which are triggered only when the event occurs and obtain information on site. They are able to detect events that a device produces and apply the enquiries via the device. While these solutions point in the right direction, most are directed for mobile application testing [Consolvo and Walker, 2003] and only contemplate a single device (like a specific phone device) and not the whole environment [Froehlich, 2009, Intille et al., 2003]. Others are standalone solutions and are not prepared for multiple user evaluation or mass scale instantiations [Barrett and Barrett, 2001]. They are restric-tive in the type of answer that can be posed to the user and do not offer the flexibility to change the procedure by maintaining a fixed methodological process on how to apply to test. Finally, they disregard aspects such as multimodality and are often too complex for evaluators as they require programming knowledge in order to set up an evaluation.

Driven by the necessities of AAL and the limitations of ESM solutions, it was clear that an evaluation solution that fits the dynamic nature of reactive environments is necessary. The solution should focus on allowing evaluators to create, apply and analyze evaluations in scenarios where the environment can directly influence the outcomes of a test. It should allow evaluators to gather data that can assist them in interpreting unexpected situations, the effect they have on the user’s objectives and establish more accurate conclusions. It also requires a strong focus on dynamicity. Since environments change from one to another, evaluations may not perform identically. The reactive nature of an environment can lead one evaluation to go in a completely different way to another due to a sudden change in context. Rather than considering that user as an outlier, extracting data from that particular user may lead to results that may result in valuable information. Following these thoughts, a dynamic environment requires a dynamic evaluation methodology in which it is possible to evaluate the user under certain conditions and see the user as an individual rather than a simple element of a set.

1.2 Thesis Statement

Evaluation in reactive and dynamic environments requires an alternative approach that goes beyond contextual data. Aspects such as user adaptability, complex event handling or environment diversity must be considered and integrated into an evaluation solution that answers the challenges and difficulties of performing evaluations in reactive environments. We consider that is possible to develop a solution that provides evaluators with the ability to create and apply evaluation scenarios supported by a flexible evaluation model that allows them to create or reuse evaluation instruments and specifications without changing the infrastructure or the need for other logistical necessities.

(24)

1.3 Objectives

The main objective of this thesis is to provide evaluators with the necessary tools to create and apply evaluations within reactive environments. This objective focuses on gath-ering additional data that allows evaluators to analyze the implications of the environment on the evaluation subject and extract better conclusions for the evaluation.

To realize this objective, this work focuses on the development of a solution that pro-vides evaluators with the ability to define new evaluation practices or methodologies with-out requiring a strong logistical effort in preparing the subjects for these changes. Simulta-neously, it intends to prove that it is possible to create a scalable evaluation solution that is not based on a simple methodology and that provides evaluators with the freedom to change specifications, introduce new evaluation elements, use contextual data and apply remote evaluations without implementing new systems for each new iteration.

Altogether, our approach is divided in three parts: an evaluation paradigm, an evalua-tion methodology and a supporting architecture. These three elements are combined into an evaluation solution that offers a set of tools to evaluators with the purpose of facilitat-ing the creation and application of evaluations. In addition to its focus on context, the solution tackles aspects that often plague evaluators like user recruitment, user adaptation to context, deployment and result gathering and at the same time promotes concepts like evaluation reusability, flexibility, and multi-environment support. Together, they provide evaluators with the ability to design evaluations in whatever manner they require and without the limitations of a common methodology.

1.4 Main Contributions

The main contributions of this thesis comprehend an evaluation solution that features: • A new paradigm based on the concept of dynamic evaluation for applying evalua-tions in reactive environments. The paradigm introduces the notion of domains and nodes to represent evaluation networks comprised of evaluation instruments such as enquiries or events while integrating concepts such as context awareness, user adapt-ability, remote initialization and control for evaluation scenarios into evaluations. • An evaluation methodology and a model featuring an incremental approach for

eval-uation specification. The methodology provides evaluators with the ability to design their evaluations from the bottom up, and be able to specify both the content of the evaluation as well as the procedure that delivers the test to the user. The model fea-tures an ontological approach to promote reusability and allow different evaluations (and their methodologies) to be applied at a distributed level.

• A support architecture inspired by Service Oriented Architecture (SOA) principles that facilitates recruitment, enables the creation of evaluation networks focused on user characteristics and preferences, and allows the remote creation, deployment and application of evaluation tests based on the evaluation model.

(25)

• A proof of concept that implements the evaluation solution and demonstrates its applicability.

1.5 Structure

This thesis is structured as follows. In Chapter 2, we present related work regarding common evaluation methodologies with a special focus on ESM solutions, as well as sup-porting technologies that influenced our solution. In Chapter 3 we present the main issues and ideas associated with the necessity of a dynamic evaluation paradigm as well as a conceptual architecture for this purpose. In Chapter 4 we present an evaluation method-ology and a model to allow the specification of dynamic evaluation tests. In Chapter 5, we propose a service architecture that implements our evaluation methodology and allows the creation, design and application of evaluations on a large scale. Chapter 6 demonstrates the feasibility of the solution by featuring a proof of concept that implements it. Chapter 7 enumerates the main conclusions of the thesis and possible future work topics. Additional aspects considered important were added in the Annex.

1.6 Published Results

Throughout the development of this thesis, and directly associated with the subject of the thesis, we identify the already published contributions:

• The dynamic evaluation methodology and its model ([Pereira et al., 2014] and [Luz et al., 2014]);

• The architecture and its proof of concept scenario ([Pereira et al., 2015]).

Several contributions were also made in areas that although not directly associated with the thesis, were fundamental to its successful development as they contributed to decisions, requirements and conclusions taken during this research. From this work we highlight the following publications: in multimodal interfaces ([Teixeira et al., 2011a] [Silva et al., 2015]); in AAL architecture proposals ([Teixeira et al., 2015] [Teixeira et al., 2011b] [Pereira et al., 2013b]); in application development ([Teixeira et al., 2013] [Teixeira et al., 2012]); and in generic evaluations ([Pereira et al., 2013a] [Martins et al., 2014]).

Finally, it is important to note that the development of this thesis was created within the context of European projects Living Usability Lab [LUL Consortium, 2012] and AAL4ALL [AAL4ALL Consortium, 2015], both of which were completed with high levels of success.

(26)

(27)

Chapter 2 Background/Related Work

The evaluation of users within reactive environments in normally made through general purpose evaluation methods such as enquiries, observation or interviews. Due to this, we start this chapter by listing the most used evaluation methodologies and briefly describing their characteristics and advantages. Following, we introduce the ESM as an alternative and describe existing frameworks and tools that support it. Finally, to better contextualize this thesis, we present a brief description of key technologies used on the scope of this work, namely SOA, user and context models and ontologies.

2.1 Common Evaluation Methodologies

The roots of technology evaluation lie in the USA at the end of the 1960s when large-scale applications of technology began to affect dramatically the life of citizens [Bakouros, 2000]. A thorough evaluation assesses the technology and its device’s value from technical, market and consumer perspectives and reconciles the results within a valid methodol-ogy [Bakouros, 2000]. A wide number of methodologies have been proposed to conduct evaluation on applications and systems. However, when concerning the evaluation of the user’s experience it is possible to divide the existing methodologies into two groups: test methods and enquiry methods.

2.1.1 Test Methods

Test methods involve observing users while they perform predefined tasks [Nielsen, 1993]. They are able to measure the user interaction and consist of collecting mostly quantitative data from users [Afonso et al., 2013]. Testing usually involves systematic observations to determine how well participants can use the system or service [Mitchell, 2007]. They focus on people and their tasks, and seek empirical evidence about how to improve the user interaction [Hanington and Martin, 2012]. Test methods techniques include:

(28)

• Rapid prototyping is a technique that uses a low fidelity prototype (not implemented) called mock up, used to collect preliminary data about user interaction [Bernsen and

Dybkjr, 2009]. A mock up can be quickly created and changed. Despite being

gathered in a preliminary stage of development, the collected information is valid and reliable [Bernsen and Dybkjr, 2009].

• Performance evaluation is centered in the users and the tasks they perform, and it involves the collection of quantitative data. The participant’s performances are evaluated by recording elements related to the execution of particular tasks (e.g. execution time, success/failure, or number of errors) [Nielsen, 1993]. Log file analysis is used to collect information about users’ performance. The logs recorded by the system are important to supplement data collected by observers, as they enable the realization of triangulations.

• Observation is a research method that consists on attentive visualization and system-atic recording of a particular phenomenon, including people, artifacts, environments, behaviors and interactions [Hanington and Martin, 2012]. Observation can be direct, when the researcher is present during the task execution, or indirect, when the task is observed through other means, such as video recording [Bevan and Bruval, 2003]. • Remote testing is a method oriented to usability evaluation where evaluators are separated in space and/or time from users. In a traditional usability evaluation, users are directly observed by evaluators, however, in a remote usability test, the communication networks act as a bridge between evaluators and users, leading to review the user’s interaction in their natural conditions and environments [Castillo, 1997]. This approach also facilitates the quick collection of feedback from users who are in remote areas with reduced overall costs.

2.1.2 Enquiry Methods

Enquiry methods involve collecting qualitative data from users. It can provide valuable information on what the users feel and desire [Bevan and Bruval, 2003]. Qualitative data, although subjective, may help to know what users actually want, and for that reason, survey methods are often used for evaluating user experience and usability, particularly, interviews, questionnaires or focus groups [Shneiderman, 1997]:

• The Focus group methodology consists in involving a small number of people in an informal discussion group, focused on a specific subject [Wilkinson, 2003]. A moderator introduces the topic and guides the discussion. The goal is to extract the participant’s perceptions, feelings, attitudes and ideas about a given subject [Bevan and Bruval, 2003].

• Interviewing is a method used in direct contact with the participants, to gather opinions, attitudes, perceptions and experiences [Hanington and Martin, 2012]. The

(29)

interviews are usually conducted by an interviewer who conducts a dialog with the participant. Because interviews have a one-to-one nature, errors and misunderstand-ings can be quickly identified and clarified [Bevan and Bruval, 2003].

• The questionnaire is a tool to collect self-registration information as characteristics, thoughts, feelings, perceptions, behaviors or attitudes, usually in a written form [Han-ington and Martin, 2012]. A questionnaire has the advantage of being cheap, do not require test equipment, and the results reflect the user’s opinions, namely about the strengths and weaknesses of the user interaction.

• Diary study is a non-intrusive field method in which the users are in a different location of the evaluators and can manage their own time and means of gathering information [Brandt et al., 2007]. The data are recorded in the moment that occurs, which reduces the risk of false information [Tomitsch et al., 2010]. Participants record specific events throughout the day. The data resulting from this collection can then be used to guide the implementation of clarification interviews.

2.2 Experience Sampling Methodology

Experience Sampling Methodology [Larson and Csikszentmihalyi, 1983] or Event Sam-pling Methodology (ESM) is a successful method applied from social psychology [Csik-szentmihalyi and Larson, 2014] that has been adapted for evaluation in diverse fields such as quality of life, the experience of work, the examination of cross-cultural differences and clinical research questions [Hektner et al., 2007]. ESM allow evaluators to study “in situ” situations of everyday experiences. It can involve detailed descriptions of a person’s life such as asking participants to relate feelings, thoughts or activities or it can be used to describe specific events whenever they occur [Reis and Gable, 2000]. Generally, the methodology can be translated as a self-observation method [Reis and Gable, 2000].

Several tools have been created to support the application of ESM. [Consolvo and Walker, 2003] characterizes ESM tools in three major parts: alerting, delivering and cap-turing. Alerting consists on how to alert the participant to capture his attention, and the authors divide it into type of alert (random, scheduled, event-based), scheduling require-ments (daily time period, number of alerts per day, number of alerts overall) and delivery mechanism (audible or tactile). Delivering relates to how to pose the question to the user, which the authors divide into delivery type and question design. Finally, capturing, regards how the answer is provided by the user, which the authors divide into record type (written or spoken) and timing of responses (timestamp and timeout).

From the early approaches with programmed stopwatches and handwritten notes, sev-eral tools have been developed to allow for electronic data collection:

ESP The Experience-Sampling Program version 2.0 (ESP) is a software package reported

from 1999 [Barrett and Barrett, 2001, Barrett, 1999] containing a native application to trig-ger and run ESM enquiries on a PDA Palm Pilot, and a desktop application for structuring

(30)

the questionnaires. As an advanced version to ESP, [Consolvo and Walker, 2003] introduces iESP as a tool focused on ubicomp application evaluation.

CAES and CAESSA Intille et al. [Intille et al., 2003] introduced the concept of

Context-Aware Experience Sampling (CAES) by designing a tool capable of sampling the user di-rectly via questioning as well as sampling from sensors that were on the user or nearby. Their tool, developed for PocketPC, features the ability to pose multiple choice questions only through the phone’s interface (showcased in Figure 2.1). Additionally, the tool permits some level of question answer chaining based upon particular question responses.

Figure 2.1: Screenshots of a multiple choice question screen and a tutorial screen for an audio note sample using CAES [Intille et al., 2003]

In [Fetter et al., 2011], the authors proposed an extension of CAES by building CAESSA - a toolkit enabling researchers to setup CAES studies through a visual editor. The toolkit was developed for PDAs, and supports a fixed set of question types as well as answers via text input, microphone and camera. CAESSA is composed of three main parts: a daemon for collection sensor data, an editor for handling event streams, and a question actuator daemon for presenting the questions to the user. CAESSA features a plug-in mechanism for sensor inclusion and its visual editor allows the creation of flows between sensors, engines and actuators. Questions are restricted to free text, numerical text, ratting scale and yes/no questions and are defined in a XML file. Figure 2.2 illustrates the CAESSA Visual Editor. The same authors proposed another version of their work with PRIMIExperience [Fetter and Gross, 2011], by use of Instant Messaging (IM) as a cost-effective mean for carrying out ESM studies.

MyExperience MyExperience is a open-source software that runs on Windows Mobile

devices (including PDAs and mobile phones) [Consolvo et al., 2007]. It is based on a three tier architecture of sensors, triggers and actions in which triggers use sensor event data to start certain actions. Its interfaces are specified via XML and a lightweight scripting language similar to the HTML/JavaScript paradigm on the web [Froehlich et al., 2007]. Their latest release include a set of built-in sensors including support for GPS, GSM-based motion sensors (based on cellular signals), and device usage information (e.g., button presses, battery life information, etc.). The events can be used to trigger actions such

(31)

Figure 2.2: Screenshots of the CAESSA Visual Editor [Fetter et al., 2011]

as to initiate wireless database synchronization, send SMS messages to the research team and/or start “in situ” self-report surveys. Additional sensors can be added via its plug-in architecture.

The tool provides evaluators with the ability to pose questions in a set of fixed formats, including closed-form and open-form data. In total, the latest version of MyExperience provides fourteen separate survey response widgets (a selection of which are shown below in Figure 2.3) from radio button lists and text fields to widgets that allow the subject to take pictures, video, or even to record their responses audibly. Regarding the usage of the tool, MyExperience allows evaluators to define a test using a XML structure. Data is stored at the phone using SQL Compact Edition and thus, can only be consulted after the experiment ends.

Momento Momento is a ESM tool that provides integrated support for situated

evalua-tion of ubicomp applicaevalua-tions [Carter et al., 2007]. Momento can gather log data, experience sampling and user diaries using a client-server architecture. It features a desktop appli-cation designed for experiment management and uses the SMS and MMS capabilities of mobile devices to share information between the end users and the evaluators. Through it, the tool permits evaluators to answer end user requests, ask participants to capture or record data, or automatically gather data from the mobile device. To support multiple evaluations, the tool includes a fixed server to store gathered data.

(32)

Figure 2.3: Screenshots of the MyExperience tool regarding its possible response meth-ods [Froehlich, 2009]

Communication is handled via text or multimedia messages through HTTP or SM-S/MMS. It is also possible to integrate the client with other applications using the Context Toolkit [Dey et al., 2001] through the CTK event system and the CTK services system. Test specification is made through configuration files that are read by the mobile client. A specification includes the participants, the locations (via bluetooth IDs) and a set of rules. The rules can be used to automate certain actions, and follow a if[conditional and/or conditional] then send[content] to [recipients] structure.

movisensXS MovisensXS is a commercial ESM software supporting self-reports,

behav-ior records, or physiological measurements [movisens GmbH, 2015]. The software operates on Android devices and is coordinated through the web. The software claims to support multiple questionnaire items such as likert scale, open input items, geomaping and multi-media (through video, pictures and audio). The software includes a workflow methodology for setting the ESM test (see Figure 2.4) as well as a form editor to design the enquiries. Answer types are restricted to dates, decimals, geopoints, numbers, text, radio selection and visual analog. At the time of writing, the software only operates through time trig-gering and does not include other sensors or event selection options.

MetricWire MetricWire [MetricWire, 2015] is a commercial ESM software similar to

movisensXS. It allows evaluators to design studies using their website and then deploy them to a set of users on a mobile device. Study creation includes several types of ques-tions formats, allows the specification of dates where quesques-tions are to be triggered by the smartphone and the ability to record GPS coordinates.

(33)

Figure 2.4: MovisensXS Tool Sampling Screenshot [movisens GmbH, 2015] for specifying sampling schemas

Maestro Maestro is an ESM tool that proposes itself to enhance ESM solutions by

ex-ploiting long term user behaviour and usage patters for ESM triggering [Meschtscherjakov et al., 2010]. According to the authors, user behavior and context triggered ESM enables the evaluator to trigger ESM questions based on past user behaviour and dinamically adapt the ESM questions whenever the user changes his/her behaviour. At the same time, it also allows evaluators to log and monitor real-time comprehensively and flexible user behavior with meaningful context information [Meschtscherjakov et al., 2010].

The maestro tool follows a client-server architecture. Events are stored at the central server thus requiring a constant connection between the client and the server. A rule based mechanism allows evaluators to specify event triggers for ESM questions. Communication between the server and the client is made through XML (regarding configuration files) and HTTP/XML for handling enquiry data. Configuration can be performed remotely through the server’s web application. Events are captured at the client, transferred to the server through HTTP via GPRS/EDGE and there evaluated. If the evaluation of the rule is positive, the server contacts the client with the corresponding question(s). The client was

(34)

developed for Blackberry and uses its internal web browser. Figure 2.5 showcases some ESM questions displayed on the Blackberry.

Figure 2.5: Example questions on the Blackberry device created by the Maestro ESM Tool [Meschtscherjakov et al., 2010]

In [Fischer, 2009], Fischer makes a critical review of some ESM tools and indicates several principles that should be considered when designing an ESM tool. The author claims that (1) ESM tools should follow a client-server logic to make use of the growing connectivity of today mobile device’s and the internet; (2) evaluation creation interfaces should be accessible to evaluators and not require in-depth programming knowledge; (3) users have their own devices and these should be prefered for ESM evaluations; (4) ESM tools should include different configuration options for the ESM study and not be based on a certain area; (5) Logging and enquiries should be separated regarding setup on devices; (6) be aware of the limitations of client-side devices and their interfaces when implementing; (7) ESM evaluations should be accessible from a server-side to allow evaluators to monitor the progress of the study.

2.3 Support Technologies

2.3.1 Service Oriented Architecture

Due to heterogeneity of hardware, communication interfaces, operating systems and mainly vendors, the main challenge for today’s architectures rests on interoperability. This concern is particularly addressed by SOA [Huhns and Singh, 2005, Papazoglou and van den Heuvel, 2007].

Given web services popularity, a logical step came in the concept of creating a dis-tributed service only architecture which was later called “Service Oriented Architecture”. In the last few years, SOA evolved from being a single concept to a widely accepted

(35)

architectural style. Summarily, SOA can be described as an architectural model where applications are “encapsulated” as services, and where communication is provided via a self-contained communication system inherent to the architecture.

SOA as an architecture proposes itself to several objectives. One of them is related to its inherent interoperability which allows for any component within the architecture to be remotely invoked by any potential client. Given that every component involved in the architecture must provide a standard interface by which it can be invoked (using a protocol known by all clients), this is a capability that rests assured. With it, new components can be easily discovered and included by the architecture by themselves or for the construction of new software systems. These can later also be published and made available as new services entering an “infinite” cycle limited only by hardware capabilities.

Another main objective is the aggregation and abstraction of complex business logics under standardized service interfaces to allow simpler integration of complex services in novel applications. This way, business processes are hidden and become irrelevant to the development of newer applications.

In order to be discussed at a conceptual level, a typical SOA architecture is composed by four basic layers [Thies and Vossen, 2008]:

• Applicational Layer - which may include legacy systems, Customer Relationship Man-agement (CRM) software, Enterprise Resource Planning (ERP) systems or additional databases.

• Service Layer - where services are provided on top of the applicational layer. An important note is that services are normally described using Web Service Definition Language (WSDL).

• Processing Layer - where services are orchestrated into processes using, for instances, Business Process Execution Language (BPEL).

• Presentation Layer - where functionalities are made available to users via desktop or web applications.

The implementation of SOA generally implicates a good practices set of principles [Bieberstein et al., 2005]:

• Reusability, granularity, modularity, composition - reusable services due to the ex-istence of others scenarios; individual and autonomous modules regarding the pro-cessing of a certain instruction; creation of complex services either by composition or orchestration of other services.

• Standards compliance - to assure the interoperability of services.

• Identification, categorization, provisioning and monitoring of services - through which searching for services and detecting their anomalies become easier.

(36)

According to the same author, other more concrete principles should still be considered such as the separation of business logics from base technology, reusing business logics whenever necessary, lifecycle management or the efficient usage of system resources.

2.3.2 Context Modeling

Context-awareness has become an important concept within computer science since Weiser first presented the term “pervasive” in 1991 as “the seamless integration of devices in our everyday life” [Weiser, 1991]. Similarly to user modeling systems, context-aware systems focus on adaptability. They aim their operations on current surroundings and can change their modus-operandi without explicit user intervention. Their objective is to increase user’s usability and system’s effectiveness rates [Strang and Linnhoff-Popien, 2004]. A number of definitions for context exist in literature. Some point out specific terms such as location, nearby people, objects, identity or temperature. Others point date and time, emotional state or focus of attention. Schilit et al. [Schilit et al., 1994] claimed that the most important aspects of context are: where you are, who you are with and what resources are nearby. A perhaps more general and accurate definition is provided by Abowd [Abowd et al., 1999]:

“Context is any information that can be used to characterize the situation of entities (i.e., whether a person, place or object) that are considered relevant to the interaction between a user and an application, including the user and the application themselves.”

Later on, in the same article, Abowd would consider a context-aware system, a sys-tem that uses context to provide relevant information and/or services to the user, where relevancy depends on the user’s task’ [Abowd et al., 1999]. In this case, the definition of Fickas et al. [Fickas et al., 1997] is perhaps more practical by defining context-aware sys-tems as “applications that monitor changes in the environment and adapt their operations according to predefined or user-defined guidelines”.

The first contextual application remotes to 1992 when [Want et al., 1992] created the Active Badge Location system. The system was based on infrared technology to determine the user’s location and had the objective of forwarding telephone calls to a phone that was nearby the user. Since then, a wide number of contextual applications have been researched and developed. However most can be cataloged and pointed to a strict number of architecture definitions.

2.3.2.1 Architectural Principles

While contextual-aware systems can be implemented in a high number of ways, they differ according to the location of sensors, the amount of users or even the future plan-ning for the system. The method of data acquisition however normally relates to three types [Chen, 2004]:

(37)

• Direct Sensor Access - client software gathers the information directly from sensors. This means that sensors become “embedded” in the application and can only be accessed by it. Concurrency is very hard to achieve due to this fact, and as such proves to be a improper choice for distributed systems.

• Middleware Infrastructure - a middleware is used on top of sensors which separates business logic from data-aquisition mechanisms. This facilitates modifications on sensor acquisition properties without changing clients and vice-versa plus allowing for concurrent access.

• Context Server - the use of a server implicates a client-server paradigm. Contextual data is sent to the server and consulted on demand by the clients (by polling or subscription). Additionally, this method allows for high levels of concurrency and allows for modifications in a easier manner.

A common trace within distributed contextual architectures comes by the implemen-tation of layered systems, separating components by stages [Baldauf et al., 2007]. An abstract view of these layered systems is given by Ailisto [Ailisto et al., 2002], who classi-fied contextual systems as normally been depicted by five layers.

Layer one represents the Sensing Layer, that is, the obtainers of data. Note that sensors can be more than hardware devices, possibly being other applications or services that pro-vide information on demand. Sensor examples can be devices such cameras, microphones, accelerometers, motion detectors, thermometers, biosensors among others.

The second layer, Data Retrieval Layer, represents the first software component of the framework. Its objective centers on the obtaining raw data from the sensors. If the sensors are applications or services, then this layer establishes itself as a client for those.

The third layer, Pre-Processment depends mainly on the granularity of the information. Data from sensors may sometimes be presented with irrelevant information which may need some “parsing” before being made available. Within the responsibility of this layer, are cases when data is be minimal and need to be compiled or joined for applicational purposes. Finally, the pre-processment layer may also be used to join extra information (f.i. tags) to data, that may be necessary to avoid future conflicts with other sources.

The fourth layer, Storage and Management prepares the data for consumption by mak-ing it available via an interface to the client. Two main methods are commonly imple-mented to allow access to information, by polling - clients regularly check for updates, or by subscription - clients subscribe to a certain resource and are notified when updates occur.

Finally, the fifth layer represents the Applicational layer, where the context data is actually consumed by applications or services.

2.3.2.2 Representation Models

A representation model for context is used to describe and store context data in a pro-cessable form. The development of flexible and useable context ontologies is a challenging

(38)

process [Baldauf et al., 2007]. However, it is possible to summarize the most relevant con-text modeling approaches based on data structures and exchange of information [Strang and Popien, 2004]:

• Key-Value Models. These models represent the simplest data structure for modeling contextual information. They are often used to described the capabilities of services within distributed frameworks. Matching algorithms are used by service discovery methods to find key-value pairs (ex. CAPEUS [Samulowitz et al., 2001]). Their ad-vantages are mainly associated with simplistic representation and easy maintenance. • Markup scheme models. All markup scheme modeling approaches use a hierarchi-cal data structure comprised by markup tags with attributes and content. Profiles represent typical markup-scheme models. Some are defined as extensions to the Com-posite Capabilities / Preferences Profile (CC/PP) [W3C, 2007] standard possessing Resource Description Framework (RDF) encoding and eXtensible Markup Language (XML) serialization. One such example is the Comprehensive Structured Context Profile (CSCP) [Indulska et al., 2003] shown on Figure 2.6. Other examples can be found in [Strang and Popien, 2004].

Figure 2.6: CSCP Profile Example [Indulska et al., 2003]

• Graphical Models. The Unified Modeling Language (UML) is a type of graphical representation suitable to represent context due to its generic structure. Several approaches exist where UML is used to model contextual aspects such as [Sheng and Benatallah, 2005]. Another graphical modeling example is based on the extension of the Object-Role Modeling (ORM) format.

• Object oriented models. The use of object-oriented techniques to model context in-stantly provides a number of powerful features such as encapsulation, reusability or inheritance. Objects are used to represent context types (such as location or noise,

(39)

etc.), and the details of context processing become encapsulated on an object level and hence hidden to other components. One such example is Hydrogen [Hofer et al., 2003], specialized for mobile devices and inspired by a decentralized approach where context is divided into local (context values relative to the client itself) and remote (context values from other devices) types.

• Logic based models. Logic-based models based themselves upon facts, expressions and rules to defined the context model. Normally, a logic system is used to manage those terms and adding, removing or changing the existing facts. Existing systems are usually constructed upon Prolog (to facilitate reasoning on the facts) but more “theoretical” examples also exist being based on first-order logic for instances. • Ontology base models. Ontologies are a common method for specifying concepts and

relationships between them. Due to their formal inclination, they become a good method to modeling contextual information. Several context-aware frameworks and systems use ontologies as their representation model such as COBRA [Chen et al., 2003] which uses a OWL-based ontology approach, named COBRA-Ont, exemplified on Figure 2.7.

Figure 2.7: COBRA-Ont List of Classes and properties [Chen et al., 2003]

2.3.3 User Modeling

User modeling techniques are mainly used to provide user adaptation in applications [Kay, 2000]. Despite all research, there is no uniform definition for what a user model

(40)

really is. Nonetheless, Kobsa’s [Kobsa, 2007] definition of a user model is probably the most widely accepted:

“A user model is a knowledge source in a natural-language dialog system which contains explicit assumptions on all aspect of user that may be relevant to the dialog behavior of the system. These assumptions must be separable by the system from the rest of the system’s knowledge.”

There is today a wide number of systems which use some level of adaptation in or-der to increase usability, normally provided by some kind of user analysis and reasoning. Kay [Kay, 2000] enumerates a broad number of systems ranging from advisors or consul-tants to fully fledged systems that adapt interaction to the user’s preferences, goals, task, needs and knowledge.

In order to compose a user model, several techniques exist and are applied at different stages of development. In her work, Kay [Kay, 2000] enumerates four main techniques for this purpose.

Elicitation of User Modeling Information This is the most direct way of obtaining

user information. Whenever the system aims to obtain information about the user, it simply asks him. For this purpose, several methods can be used such as direct question and answer, through forms or a simple wizard tool. In a sense, this method is better for applications which contain an exclusive user model, where the questionnaires are aimed for that specific application.

Modeling Based Upon Observing Another method is to compose the model through

user observation. The main advantage of this approach is the ability to collect large

quantities of data without disturbing the user. One can argue that capturing data while the user is interacting with the application leads to more a genuine evaluation, in the sense that the user will not stop the interaction to answer the questionnaire. On the other hand, the evaluation may also be biased by the person who is analyzing the interaction, later leading to errors. However this can be countered by using many observers and later merging their impressions into a more solid and trustworthy report.

Stereotypes It is expected for increasingly individualized adaptation of interaction to

require detailed and sophisticated user modeling [Kay, 2000]. Creating models from scratch is an expensive investment. Stereotyping is a valuable solution applied by known systems like GUMS [Finin and Drager, 1986], BGP-MS [Kobsa and Pohl, 1994] or the um toolkit as examples of user modeling based on stereotyping. Due to their relevance, some of these will be later described. In truth, the concept of stereotyping remotes to Rich’s work [Rich, 1979] who used people descriptions of themselves to deduce the characteristics of books that they would probably like. Its execution is based on the principle, depending on the applied stereotypes, the application changes.

(41)

A simple example may be explained by conceptualizing the stereotype “bad sight”. If a user has difficulties in seeing, then he’s “labeled” with this stereotype. Now, imagine that an application possesses a routine that increments font size if the user belongs to that stereotype. When interacting with the application, the user will now experience a larger font compared to other users who don’t share this stereotype leading to “user-adaptation”. Stereotyping is perhaps the most used technique when developing a user model for ap-plications. This technique involves establishing specific groups where users will be inserted depending of certain characteristics. These groups are based on categories (or events) established by the developer to whom applications will react differently.

On her work, [Kay, 2000] divides the essential elements of stereotyping in three parts: • triggers - activate the stereotype, if the user has “bad sight” then it triggers that

stereotype;

• inferences - the consequences of a stereotype, in case of “bad sight” this leads to a bigger font or a smaller resolution;

• retraction - in case a stereotype in no longer valid, then this mechanism is capable of disabling the applied inferences for that user. Resorting to the case of the “bad sight” stereotype, simply imagine a scenario where the user puts some glasses. In this case, the application should return to normal font size and/or resolution. Normally, when stereotyping is applied to provide user-adaptation to an application, starting information is minimal based on default assumptions. As the user interacts with the application, starting assumptions may become invalid or simply be complemented by new assumptions, which implies a dynamic process. In addition, the process of identifying stereotypes is also dynamic, using collected data together with machine learning techniques or statistical tools to identify new one [Kay, 2000].

Knowledge-Based Reasoning This technique is many times applied in conjunction

with stereotyping. Based on collected data about the user, reasoning algorithms can be applied in order to extrapolate new information. For instances, if a user indicates that he does not have speakers, then it is possible to infer that producing sound is ineffective since having speakers is a prerequisite for producing any sound.

Some of these techniques are often used together when establishing a new user model system. Initially, the user may be asked some questions to establish default “ground” for the system. Based on this questionnaire, reasoning algorithms may be applied in order to extract new inferences about the users. Finally stereotyping is used to “group” users to specific characteristics or application changes.

2.3.3.1 Related Systems

According to the work of [Kobsa, 2007], systems that include user data modeling are generally divided into two types, shell systems and server systems:

(42)

Shell User Model Examples For Kobsa [Kobsa, 2007], “user model shell systems” are systems that are integrated into applications, dependent from applications and in most cases, embedded into the own logic of the corresponding applications which leads to a clear distinction between both being non existent.

Examples of “shell” modeling systems can be found in the works of [Huang et al., 1991], [Kono et al., 1994], [Brajnik and Tasso, 1994] or [Kay, 1994], these last two being examples of user modeling based on stereotyping - matching users to previous defined profiles [Kay, 2000]. Braijnik’s work [Brajnik and Tasso, 1994], UMT allows the user model developer the possibility of defining hierarchically ordered stereotypes as well as rules for user model inference and contradiction detection. When new information about the user is received, it is classified by UMT based on a set of premises or assumptions. Based on these, stereotypes may be triggered and their contents added to the user model. Thanks to these stereotypes, the application will be able to adapt itself considering this particular type of user. PROTUM [Vergara, 1994] is a similar approach to UMT including more sophisticated stereotypes.

While not being a “shell” user model system in a strict sense, um [Kay, 1994] is a toolkit for user modeling that represents assumptions about the user using attribute-value pairs that may describe the user’s preferences, beliefs or other information considered relevant. In order to evaluate assumptions, um tags each pair with a list of evidence for its truth and and falsehood. At runtime, interpreters are used to evaluate evidences and compose conclusions.

Server User Model Examples In truth, the work by [Finin and Drager, 1986] is

considered to be the starting point for application-independent user modeling systems. GUMS is a Prolog designed system aiming to compose long term models of individual users. Its objective is to provide a well defined set of services for an application system interacting with various users. As the application system interacts with the user, it simultaneously acquires information on the user and maps it to an user model maintenance system for incorporation. The application provides new facts about the system which is then verified with some assumptions thus generating new assumptions about the user which later may be queried by the application.

Server models are nowadays much more common than their alternative. In fact, given their success, numerous commercial user model examples are currently available to develop-ers. This comes as no surprise given the numerous advantages this model imposes [Kobsa, 2007]:

• User information is at the disposal of more than one application at a time;

• Applications may use information acquired by other applications, leading to a sub-sequently better user perception;

(43)

• Other information dispersed across the enterprise (past purchases, demographic da-ta), can be more easily inserted into the user model repository;

On the other hand, a major disadvantage comes from the fact that this model is based on a central server, accessible only through the network which may lead to availability or performance issues. In order to diminish these problems, redundancy is often used.

The server system “Doppelganger” [Orwant, 1994] gets all its information by means of sensors (either software or hardware). In order to analyze this information, developers are able to use several techniques such as beta distributions, linear prediction or Markov models. Individual user models are collected into so-called “communities” leading to a similar stereotyping concept. It differs from traditional stereotyping due to being in a sense dynamical due to being based on probabilistics [Kobsa, 2007].

Personis [Kay et al., 2002] is an “extended server” version of um using the same base components residing in an object data layer over a database. User models are hierarchically ordered contexts which are structured on the object database structure. The authors distinguish two basic operations upon this representation: accretion - collecting evidence about the users - and resolution - interpreting the collected evidence.

A more concrete example is contained on KnowledgeTree [Brusilovsky, 2004], a student adaptive education system which includes a user model functionality by collecting evidence from students obtained by their interaction with multiple servers. The activities performed by students are stored and inferred by agents which process the flow of events and updates the model. Each agent is responsible for a specific property such as motivation level or level of knowledge for a specific course [Kobsa, 2007].

BGP-MS [Kobsa and Pohl, 1994] is another user model server system using assumptions about the user by techniques of stereotyping based on a first-order predicate logic. Different assumption types such as beliefs and goals are represented in different partitions which are hierarchically ordered to exploit inherence [Kobsa, 2007].

2.3.4 Ontologies and the Semantic Web

According to [Studer et al., 1998], an ontology can be defined as “a formal, explicit specification of a shared conceptualization”, composed by a set of entities (e.g., objects, concepts, relations) that are assumed to exist in a particular domain. Its formality is due to its support on unambiguous formal logics, explicity because it makes domain assumptions explicit for reasoning and understanding, and shared for its ability to reach consensus.

An ontology can be characterized according to several dimensions, two of which are particularly relevant: formality and granularity. Regarding formality, ontologies can be informal, structurally informal, semi-formal or formal [Silva, 2004]. Informal knowledge representation mechanisms like using natural language are normally associated with human readability. They might however become highly ambiguous. On the other end, with a more formal approach, knowledge representation mechanisms become less ambiguous and more machine readable, but less human readable. The more common languages used in semi-formal and semi-formal ontologies are based in description logics and first-order logics.

Dynamic evaluation for reactive scenarios

Carlos Eduardo

da Silva Pereira

Avalia¸

c˜

ao Dinˆ

amica para Cen´

arios Reactivos

Carlos Eduardo

da Silva Pereira

Avalia¸

c˜

ao Dinˆ

amica para Cen´

arios Reactivos

Dynamic Evaluation for Reactive Scenarios

Contents

List of Figures

Chapter 1

Introduction

1.1

Motivation

1.2

Thesis Statement

1.3

Objectives

1.4

Main Contributions

1.5

Structure

1.6

Published Results

Chapter 2

Background/Related Work

2.1

Common Evaluation Methodologies

2.1.1

Test Methods

2.1.2

Enquiry Methods

2.2

Experience Sampling Methodology

2.3

Support Technologies

2.3.1

Service Oriented Architecture

2.3.2

Context Modeling

2.3.3

User Modeling

2.3.4

Ontologies and the Semantic Web