2.4 Evaluation Methodologies
2.4.3 Evaluation Criteria
2.4.3.2 Behavioral Complexity
Relative amount of inferences in relation to the maximum number of pos- sible inferences.
Ainf erence rate= total inf erences
Imax (2.9)
Weibelzahl et al. (2002b) also performed an empirical validation of the pro- posed structural information measures, comparing data gathered using question- naires subjectively rating navigation, orientation, adaptation, annotation, page suggestions, and impression, with computed structural measures. Some statisti- cally significant correlations were found, but with small effect sizes, meaning that the correlation between the subjective measures and the structural information measures was of low magnitude. Three possible interpretations for these results were given (Weibelzahl et al., 2002b). First, the proposed measures might be useless for authoring purposes, since it does not seem to exist any relation be- tween the course structure and users’ subjective impression. Nevertheless, the adaptivity degree might still be useful for authors to get a kind of summary of their presentation. Second, the subjective ratings might have been useless to in- dicate what the structural measures should detect. This is an implicit problem of adaptation evaluation, since perfect adaptation is not even noticed by the user and thus cannot be reported. Third, it is possible that adaptation is never in- dependent of content. Instead of considering that more concept relations and a higher adaptivity degree results in a better course, it might be that each content has its own ideal structure.
a system can be described as a state-transition-network. The system changes its current state when the user initiates an action. For example, mouse-clicks, commands, or selection from a menu initiate such a transition and the system enters a new state or returns to a previously visited state. The analysis of collected data yields an individual transition network for every user. Users that are familiar with the system are able to find the shortest path through the network to reach the final state (Borgman, 1999). Other users that have incomplete or incorrect knowledge have to enrich the entire concrete task solving process with a lot of heuristics or trial and error strategies. They will return to the previous state if they realize that the chosen transition did not result in the effect they wanted.
One of the potential problems of this approach concerns the modeling of the states. Although it is a trivial procedure for simple systems supporting straight- forward interaction, the modeling of complex interaction raises the question of how to define a state. For example, consider an adaptive learning system where each concept is characterized by aread attribute, which changes value when the user reads a page with the concept explanation. In such a simple system, a state can be defined as the combination of all concepts’ read values. As the user navigates through the system, new states are entered. Now consider an adaptive system without discrete pages to navigate and where attributes are modeled using probabilities, or even continuous values. The definition of states for this system is a non-trivial problem, and one of the possible approaches to its solution suggests the use of methods from cognitive task analysis to identify states and transitions (Weibelzahl & Weber, 2000).
In summary, by modeling the interaction between user and system, it should be possible to observe the impact of adaptation, and the expected decrease in interaction complexity. To assess the complexity of the state-transition-networks four complexity measures, derived from graph theory, have been introduced in Rauterberg (1992), based on previous work by Stevens et al. (1974), McCabe (1976) and Kornwachs (1987).
The most simple measure is Cstate, where the complexity equals the number of states found in a network.
Cstate=S (2.10)
The complexity must consider relations between states too. Otherwise, a system grouping all functions into a single page, and modeling the page as a state, would be considered less complex, while, intuitively, at least clusters of functions that belong together should be separated to improve the usability. Thus Cf an computes the relation of states and transitions.
Cf an= T
S (2.11)
The third measure extracts the number of cycles in the network. Thus, it indicates how often a user returned to a previous state.
Ccycle=T −S+P (2.12)
P is a constant for correction purposes only.
The fourth measure, Cdensity, shows the network’s density in relation to the maximal possible density.
Cdensity = T
S×(S−1) (2.13)
When applied to software evaluation, Rauterberg(1992) showed that all the measures were able to distinguish between novice and expert users. However, Cstate and Cdensity varied with different tasks, i.e., they are only useful for ex- perimental settings with constant tasks. For the evaluation of adaptive systems this is not a serious limitation, because adaptivity aims at simplifying a constant task.
Weibelzahl & Weber (2000) compared the four measures of complexity for the interaction with an adaptive product recommendation system. Participants in the experimental group were supported by a user modeling component, while participants in the control group were not. Analysis of the empiric results re- vealed that the experimental group required less time and was more satisfied with the interaction. However, the results were not statistically significant. The same data was analyzed in terms of behavioral complexity. Participants who had been supported by the adaptive system produced behavior of reduced complex- ity compared to participants who completed the same task with a non-adaptive
version, as could be discerned by the differences for Cstate, Ccycle and Cdensity. While traditional criteria, such as duration of interaction and interaction satis- faction, indicated only a vague difference between the adaptive and non-adaptive versions, three of the complexity measures were able to discern the groups. Cf an did not show the expected effect with this procedure. On the other hand, Ccycle and Cdensity appear to be very interesting measures, as they correlate with expe- rience. Especially Cdensity is encouraging for evaluation purposes, because it is strongly related to subjective satisfaction but circumvents the problems of asking the user directly.