Exploring the subjective and objective responses to different newspaper

(1)

D1.2.2.2

Exploring the subjective and objective responses to different newspaper

Editor(s): J. Matias Kivikangas, Simo Järvelä, Alessio Falco, Niklas Ravaja Author(s): J. Matias Kivikangas, Simo Järvelä, Alessio Falco, Niklas Ravaja Confidentiality: Public

Date and status:

This work was supported by TEKES as part of the Next Media program of DIGILE (Finnish Strategic Centre for Science, Technology and Innovation in digital business)

(2)

WP1 EREADING, WP1 EREADING, EXPLORING THE SUBJECTIVE AND OBJECTIVE RESPONSES TO DIFFERENT NEWSPAPER LAYOUT DESIGNS D1.2.2.2

1 (3) {Version history is meant to be used while editing the document and may be taken away from the final version}

Version history:

Version Date State

(draft/ /update/ final)

Author(s) OR Editor/Contributors

Remarks

1.0 J. Matias,

Kivikangas, Simo Järvelä, Alessio

Falco, Niklas Ravaja

{Participants = all research organisations and companies involved in the making of the deliverable}

Participants Name Organisation

Next Media

(3)

2 (3)

www.nextmedia.fi www.dig

(4)

3 (3)

(5)

Running head: EXPLORING DIFFERENT NEWSPAPER LAYOUTS

Exploring the subjective and objective responses to different newspaper layout designs J. Matias Kivikangas, Simo Järvelä, Alessio Falco, Niklas Ravaja

Aalto University

PO Box 21255, 00076 Aalto, Finland [email protected], [email protected]

Author Note

J. Matias Kivikangas, Department of Information and Service Economy, School of Economics, Aalto University Helsinki School of Economics, Finland.

Correspondence concerning this article should be addressed to J. Matias Kivikangas:

[email protected].

(6)

Abstract

We investigated the effects of different digital newspaper layout designs on media experience (incl. attention, emotions, subjective assessments). Especially the role of hierarchy and abundance were investigated. We found that hierarchy and abundance are not fully recognized by readers, that a layout most resembling the modern hierarchical front page was favored as a front page for the digital newspaper, and that there were other differences between the layouts that suggest that they should not be completely abandoned. Reading styles and visualizer-verbalizer cognitive styles were also examined, but the results were conflicting and unclear.

(7)

Exploring the subjective and objective responses to different newspaper layout designs

The digitalization of newspaper has brought new challenges to the traditional print media. One specific issue largely ignored by the research is the design of newspaper layout, which has to find new solutions when the used ones cannot be easily fit to digital platforms, such as tablet and mobile devices.

The practice of layout design has been evolving since the birth of modern newspaper about a century ago (Pulkkinen, 2008). While over the century the design has changed significantly and to the better, the evolution has not been knowledge-driven. Although different newspaper designers share the understanding of the practice and can justify their decisions with functional explanations, arguably the design process is still more art than science, relying on personal experience, tacit knowledge, and rule-of-thumb heuristics. The effectiveness of different decisions, practices, and explanations is difficult or impossible to assess, because when the newspapers do systematic work to develop their own designs, the methods and results are typically not published.

Academic research follows this in the lack of layout design theory (Mario Garcia, paraphrased in Pulkkinen, 2008, p.30) and the relative lack of empirical research on effects of newspaper layout on the reading experience. With the advent of small and less unwieldy eye- tracking devices, some studies on the effects of layout on attention have been conducted (especially by Holmqvist and colleagues, e.g. 2003, 2005)—although they are mostly case studies, not controlled experiments. However, for newspaper designers it should be important not to know only how people read their layouts, but also how theyexperience them (Järvelä, Kivikangas Saari, & Ravaja, 2014; see also Kansei engineering approach, e.g., Nagamachi et al., 2006). This paper seeks to investigate one aspect of how newspaper layout design affects different dimensions of media experience (MX), in a controlled experiment.

(8)

Previous Studies on Layout Design

As mentioned, the studies of layout effects have been mainly interested in how people read layouts (i.e., how the reader’s gaze moves over certain layout elements). The

fundamental work by Garcia and Stark (1991) revealed that a newspaper is mostly not read, but simply scanned over, and only a fraction of the content is read. They also established the idea of entry points, layout elements such as large pictures, headlines, and text boxes, which stop the reader’s scanning behavior and have them actually read a story. To elaborate, while reading and scanning behaviors can be distinguished they do not mostly occur separately, so that one first scans and then reads—instead, they continuously overlap (Holsanova, Rahm, &

Holmqvist, 2005).

The later studies have supported the general idea of entry points, but the specifics have been debated. While Holmqvist & Wartenberg (2005) confirm that areas with pictures, larger size, and upper left position are looked at significantly earlier, they found no effect for color suggested by Garcia and Stark (1991); Holmqvist and Wartenberg also point to

Josephson (1996), whose color effect was mainly due to positioning. The same study also reports much lower percentages for content that the readers skip altogether, 15 % compared to Garcia and Stark’s (1991) 75 % (a figure apparently also reported by Hansen, 1994), although the authors explain this with differences in reading times, possibly due to instructions given to participants. In general, Holmqvist & Wartenberg (2005) report that readers typically focus the most (in regards to gaze dwell time) on the upper parts of pages and spreads, and that articles on the left-hand side are read longer than those on the right- hand side. However, this might be due to common pattern in ad placement on the right page—an experimental study should further investigate the issue.

In another study Holmqvist’s laboratory investigated the designers’ predictions of how certain layout elements affected the readers’ visual behavior (Wartenberg & Holmqvist,

(9)

2005). Although the predictions were reasonably good overall, the designers overestimated the effect of colors, pictures, and picture size, underestimated the effect of horizontal positioning, and over- and underestimated the different effects of information graphics.

When moving to digital newspapers, Holmqvist, Holsanova, Barthelson,and

Lundqvist (2003) reference that Lewenstein and others (2000; Poynter proprietary report; see also Outing & Ruel, 2004) that contrary to paper, news from a web site was viewed text first, and that people read more than scan. Their own results, on the other hand, suggest that readers scan much more and read less when browsing a web newspaper, selecting carefully two or three stories they read from specific topics, as opposed to less selective reading found in paper readers. Apparently, when the reader is readily exposed to the content while

scanning a paper version of a newspaper, they sometimes catch something interesting and start reading the story, but if the reader is only provided with headlines and links, most of the stories are never opened. Essentially same message is repeated in another proprietary study (Miratech, 2011): better attention (and retention) on paper than reading from an iPad. An experimental study in our lab suggested that participants are more approach-motivated

towards reading a newspaper on paper than on tablet (Marghescu, Salminen & Ravaja, 2014), which may reflect the same phenomenon. The differences in electroencephalographic

activities indicate that digital newspaper attracts visual attention (scanning), but less cognitive engagement (reading; cf. Smith & Gevins, 2004).

The Present Study

The above studies do not help much with the basic question: how should layouts be designed for digital newspapers? Professionals cannot necessarily rely on their long

experience and heuristics with a completely new size and format, which may partly explain the fact that they are generally reluctant to take risks (cf. also the slow change from

broadsheet to tabloid; e.g., Pulkkinen, 2008).

(10)

This paper attempts to shed some light to this question. We limit our focus on the layout of the digital front page, which is the first sight from a typical digital newspaper. Like front pages on most modern newspapers, digital front pages serve as an index to the content, presenting a selection of headlines and pictures from different stories (as opposed to front pages of sensational tabloids that are mostly created to draw the attention of potential

customers at the newspaper shelf; Pulkkinen, 2008). However, unlike a paper front page, the digital front page also acts as an interface where the reader selects a story to go to directly—

on paper, the front page only shows a glance on the content, but typically the readers then continue browsing the stories page by page, a functionality much harder to implement on the small screens of typical digital news reading devices. This means that the front page is not merely a display window, but it should also provide enough information for making the selections.

In addition to the front page, we limit our focus to the middle level of layout: above the level of basic graphic elements (such as fonts, colors, lines), but below the higher level where the pages of a newspaper should create a coherent continuum flowing from one section to another. This limitation is due to practical constraints of our experimental approach: the number of variables investigated should be small enough to be viably studied.

According to newspaper design professionals we consulted, two factors are especially relevant when designing a layout: hierarchy and abundance. Hierarchy is the extent to which the layout emphasizes the stories the journalists have considered the most important.

Typically, the hierarchy forms levels where the higher levels bring up the stories more than the lower levels. Ways to achieve this are positioning (to place the stories according to their importance from top-left to bottom right), size (text and pictures of the more important pieces take more space on the page compared to less important ones), and the number of elements per story (if deck and picture are used in addition to a headline and body) (cf. focus in Moen,

(11)

1990). Abundance is the amount of (informative) elements provided on one page, including headlines, article bodies, decks, and pictures, purportedly bringing about the feeling of large wealth of information being on a tray, ready for picking. We chose these two as our main interest variables. In practice, our expert newspaper consultants created layout templates to implement high and low levels of hierarchy and arousal, resulting in 2 x 2 different layouts that were presented to the participants in a controlled experiment.

Our research methods include self-reports of media experience and emotions, usage metrics, and psychophysiological measurements. It has been shown that the emotions felt during the reading experience itself and those reported after the fact are not necessarily the same (Ravaja, 2004). Self-reporting measures are limited to reactions that are subject to various response biases (Paulhus & Reid, 1991; Robinson & Clore, 2002) and are able to capture only one response per reading period, while psychophysiological methods (Cacioppo, Tassinary & Berntson, 2000) offer a way to assess these emotional reactions more objectively with a high temporal resolution, capturing only the responses to using the front page. On the other hand, the final, processed feelings may sometimes be more important to the experience and whether that affects the subsequent behavior (future readership; Järvelä et al., 2014). The nonconscious basic emotional responses are also somewhat limited in their scope compared to potentially limitless possibilities of self-reports; the typical way to assess them is the interpretation in two-dimensional space of valence and arousal (the positive-negative dimension, and the active-inactive dimension; Ravaja, 2004).

In addition to investigating the relationship between hierarchy/abundance and the different measures of media experience, we collected some answers to trait questionnaires potentially relevant to newspaper media experience. The eye-tracking studies have shown that although the patterns typically cited are common, there are always people who happen to read differently (e.g., Holsanova, Rahm & Holmqvist, 2006; Wartenberg & Holmqvist,

(12)

2005). Because the number of participants in these studies is most often small, it is not known how large the portion of different reading styles is. In an attempt to take these differences into account, we asked the participants questions about their common reading styles. It has also been found that the different information processing styles often called cognitive styles affect the experience of text and picture intake (e.g., Mendelson & Thorson, 2004). High verbalizers actually remember less when photos are present, while low

verbalizers (regardless of their visualizer score) were helped by photos—although this was in a very text-heavy context. Consequently, a specific questionnaire to measure verbalizing and visualizing tendencies was administered.

Methods Participants

The participants were 38 volunteers (24 female, 14 male) with mean age 33.7 years (SD = 8.89) recruited from various sources (list of participants in previous experiments, list of digital content subscribers from our partner newspaper, student and graduate student email lists). The participants were recruited on the basis that they were reading some newspapers and they were familiar with using a tablet (to remove potential confounds due to inability to use the device).

Materials

Five layouts were used as templates for a Baker framework HTML5 script to automatically create digital newspapers. The content was downloaded each day from a collaborating newspaper’s server, and the script randomized the content so that we had 15 newspapers with unique news content, three for each of the five layouts. The content consisted of general news stories from the national news agency and from stories of a small newspaper distributed on the other side of the country. This way, we ensured that the used

(13)

content was not the exact content the participants would have read from their regular news reading.

Four of the five layout templates were designed by newspaper design professionals, intended to implement the factorial design of low and high levels of hierarchy and abundance (see the 2x2 grid in Figure 1). (In short, henceforth high hierarchy and high abundance will be called hH-hA, high hierarchy and low abundance hH-lA, low hierarchy and high

abundance lH-hA, and low hierarchy and low abundance lH-lA.) The implementations of hierarchy and abundance followed the most general definitions: hierarchy being the number of levels (4 levels for high, 2 for low hierarchy), and abundance reduced to the number of news articles seen on the page at once times the elements used in them (11*2=22 for high, 4*3=12 for low abundance). As the layouts still needed to be plausible newspaper layouts (assessed by the newspaper design professionals), arbitrarily low or high levels of hierarchy or abundance were not used.

In addition to four experimental layouts, we used a fifth layout, created with the same style as the four others, but consisting of a simple list of headlines in time order (Figure 1, outside the 2x2 grid). This layout was used as a baseline (bl in short), intended to implement a “as little layout (in terms of hierarchy and abundance) as possible” – i.e., no hierarchy emphasizing any importance differences, and only one element (headline) per news article used for abundance.

The number of stories available on the layout in total (regardless of number of stories seen on the page at once, as the participant could swipe down and find more content there) was kept constant (14) across layouts. The layout of the stories themselves was not under investigation, and was kept the same across layouts. Each layout was presented under the logo of generic “Newspaper”.

Procedure

(14)

When arriving at the laboratory, the participants were briefed and an informed

consent form was filled. After preparing the electrodes and a brief rest period, the experiment started. The participants read news from a tablet device (iPad 3) using a layout randomly selected from the five alternatives (four experimental layouts and the baseline). The participant was instructed to imagine that they were in an everyday situation with five

minutes to spare for reading the newspaper. The participants would first see one of the layout alternatives as the front page, where they would choose a story to read, go back to the front page, and continue reading like this until the the computer automatically notified when five minutes had passed. Although we forced the reader to this behavior with constraints in the software functionality, a study by Holmqvist et al. (2003) suggests that this is close to actual reading behavior.

After the reading phase the participant answered the questionnaires. This procedure was repeated for each layout alternative three times, totaling in 5 (layouts) x 3 (repetitions) five-minute reading periods – the only difference was that on the second and third rounds, the number of questionnaire items was reduced (in order to reduce the total experiment time). In the end, the participants were shown the first repetitions from each layout, and asked to arrange them in order relative to certain variables (see Self-reports).

In total, the experiment took about 3 hours, and the participants were compensated with 40 euros.

Physiological Data Collection

Facial electromyography (fEMG) activity was monitored at three muscle sites, zygomaticus major (ZM), corrugator supercilii (CS) and orbicularis oculi (OO), as suggested by Fridlund and Cacioppo (1986). For fEMG, the low cut-off filter was 30 Hz, and the high cut-off filter 430 Hz. Electrocardiograms (ECG) were measured using the modified lead III electrode placement, and the R-peaks were detected to provide heart rate. Skin conductance

(15)

level (shortened EDA for Electrodermal Activity) was recorded from the medial phalanges of the ring and little fingers of the participant’s left hand.

The analysis of the raw data was performed using the BrainVision Analyzer v. 2.0.1.

The data was filtered using a 50 Hz Notch filter to remove the electric hum. For each reading phase, physiological data were averaged over the whole five-minute reading time.

In addition, eye-tracking data was gathered, but due to technical problems we lost the data of almost half of the participants, which is why the whole eye-tracking data was

discarded in the end.

Self-Reports

Before arriving at the laboratory, the participants were asked to fill in background and trait questionnaires. The questions about reading styles were: “When I open a newspaper spread, I pick the first interesting story and read that” (focused reader; Holsanova et al., 2006) and “I scan over the spread and pick those that seem the most interesting” (entry point

overviewer). In addition, we asked Mendelson and Thorson’s (2006) version of the Verbal and Visual Learnings Styles Questionnaire (Kirby, Moore & Schofield, 1988). Instead of dichotomous, however, we used five-point scale fromstrongly agree tostrongly disagree, allowing wider variation in responses.

Self-reports during the experiment were based on the MX Questionnaire developed within the Next Media project (Helle, Ravaja & Heikkilä, 2011). The questionnaire was shortened in order to shorten the time the participants had to use on answering after each reading phase. The used subscales were Abundance (2 items), Hierarchy (2), Beauty (1,

“layout looked good”), Navigation (1, ), Aesthetic (8), Usability (3), Entertainingness (1), Trustworthiness (1), Interestingness (3), Attention allocation (2), and Overall Experience (1), all reported on a Likert scale from 1 to 5. In addition, Self-Assessment Manikins for Valence, Arousal, and Dominance were used (on scale 1 to 9), making the total number of items 28 for

(16)

the first round. On second and third rounds, the multi-item MX subscales were reduced to a single item, and MX Navigation, Aesthetics, Usability, and Trustworthiness were dropped altogether, to prevent the experiment becoming too lengthy.

In the end, the participants arranged the five layouts in order in relation to Interestingness, Ease of Use, Attractiveness, and Preference, on a scale from 1 to 5.

Event Scoring and Usage Data

In addition to physiological measurements and questionnaires, we recorded usage data. During the experiment, the Baker script was running on a separate Mac connected with the iPad, logging everything the participant was doing with the newspaper. Later, the logs were parsed to find events when the front page had been opened or closed, used to calculate the usage data: Time on Front Page (out of five minutes of reading time, how many seconds the participant spent on front page), Articles Read (how many articles the participant opened and closed during the reading phase), and Reported Articles Read (how many articles the participant reported reading during the reading phase, regardless of how many they actually read). In addition, the timestamps of the events were used to pinpoint the exact starting and ending times from the physiological data for each “visit” when the participant was looking at the front page layout, so that the physiological variables only contain data from the front page, and not responses to the article pages.

Data Reduction and Analysis

R version 3.0.1 was used to process the data. Both SPSS 21 and R with “nlme”

package were used to carry out the analyses.

The (multi-item) subscales of MX were tested for internal reliability. The alphas were found acceptable (> .6), except in case of usability ( = .57) and abundance ( = 46). After looking at the items, item 1 was removed from both subscales, leaving the subscale of

(17)

abundance to one item: “The newspaper was rich and had a lot of choices”, and raising the alpha for usability to acceptable.

Mean values of the physiological signals over each visit during each of the reading phases were calculated for each participant. The physiological variables were transformed using natural logarithms to normalize their distributions. For SAMs, the self-report after rest period was subtracted from the values from experimental rounds, to remove the general mood level from the emotion self-reports. Then, for all variables, the average of all baseline

repetitions (three for those that were measured thrice, one for those questionnaires that were measured only once) was subtracted from the variable values, in order to use the intercept as the zero level in the models.

The data were analyzed using the Linear Mixed Models (LMM) procedure with restricted maximum likelihood estimation (in R). The physiological, self-report, and usage variables were used as dependent variables, the predictor variable was the categorical

condition variable as described below (see Manipulation Check and Predictor Variables). For all models, we defined a random intercept with the participant as the subject variable to control for individual differences. In addition (in SPSS), with physiological dependent variables we used a repeated effect for visit with participant*layout*repetition as a subject variable and a first-order autoregressive covariance structure for residuals, because

subsequent physiological responses are known to have an autocorrelation. Thet-tests were carried out within this linear mixed model.

For covariate models, each covariate was tested by using one covariate in the LMMs that were otherwise as described above. Only the interactions between layout and the

covariate were of interest and are reported. The covariates were seven traits (Visual and Verbal scales, and background variables age, gender, experience of using tablets for news reading, and Focused reader and Entry point overviewer reading styles ), and the dependent

(18)

variables were the six MX subscales that were not measured only for one repetition (Hierarchy, Abundance, Attention Allocation, Interestingness, Beauty, Overall), the three SAMs, the three usage metrics, and the five physiological variables.

Although the this resulted in 119 LMM runs with concerns of false positive inflation, 50 of the analyses had significant differences in the omnibus tests for interactions with at least .05 level, while the expected number of significant findings due to chance alone would have been 6. Even more notable, 40 of the analyses were with physiological dependent variables, which – as mentioned in results – had very high autocorrelations and subsequently very small differences between layouts, yielding only one significant (at p= .01) result for covariate analyses. This means that the 79 other runs had 49 significant results for covariate- layout interactions (a 62 % rate). In order to save space, only the results considered

interesting are reported.

Results

The descriptive statistics of all self-reports and usage metrics are presented in Table 1.

Manipulation Check and Predictor Variables

The manipulation check was carried out with a LMM model where a priori high and low levels of hierarchy and abundance predicted (in separate analyses) reported hierarchy and abundance. The responses revealed that hierarchy was not recognized by the participants as was assumed (Figure 2): reported hierarchy was highest in hH-hA and lH-lA (Ms = 3.26 and 3.15), and lower in hH-lA and lH-hA (Ms = 3.02 and 2.95), the difference between hH-hA and hH-lA being 0.24,t(410) = -2.26,p = 0.024. However, all experimental layouts were reported higher in hierarchy than baseline (the lowest M = 2.95, compared to M= 1.83 for bl, t(410) = 6.71,p < .001). The assertions for hierarchy were (translated from Finnish): “The contents of the newspaper were well structured” and “The most important articles were stood out”.

(19)

Abundance was reported to be higher in a priori high abundance layouts hH-hA and lH-hA (Ms = 3.03 and 2.73) than in a priori low abundance layouts hH-lA and lH-lA (Ms = 2.23 and 2.13) – the smallest difference was 0.5,t(410) = 3.68,p < .001. However, the

baseline was reported higher in abundance than the a priori low abundance layouts (M = 2.66, t(410) = 3.18,p = .00158), and not significantly lower than lH-hA (difference = 0.067,t(410)

= 0.497,p = .619) but lower than hH-hA (difference = 0.37,t(410) = 2.724, p = 0.0067). The assertion for abundance was “The newspaper was rich and had a lot of choices”.

In sum, we found that the participants did not report the hierarchy and abundance in the same way that was intended in the design of the experimental layouts. Due to this, we proceeded to create a single categorical variable to capture all four experimental layouts and use that as a predictor in the rest of the analyses, in place of separate hierarchy and abundance dimensions of factorial design. For the reader’s convenience, the layouts will still be called with their intended a priori hierarchy-abundance levels below, but bear in mind that these dimensions are not necessarily the most relevant differences in the layouts eliciting the responses.

MX Responses to Layouts

The means andF-tests of MX responses to layouts from the LMMs can be found in Table 2.

MX Aesthetics, Usability, Trustworthiness, and Navigation, and SAM Valence and Arousal did not differ between layouts. As the four MX subscales happened to be the only ones that were used only on round one, and seeing that the estimated means and standard errors are roughly the same size for these than the other variables, it seems that the lack of power due to smaller sample strongly contributes to the results. On the other hand, Usability, Trustworthiness, and Navigation all also referred more to the technical implementation of the digital newspaper (e.g., “It was easy to handle the newspaper while reading” for Usability,

(20)

“The layout was confusing and complex” for Navigation (reverse scored), and “The paper was done by professionals” for Trustworthiness), which may tell about the fact that all the layouts were created by professionals to serve as plausible layouts in a digital newspaper.

This interpretation is supported by the fact that for both Trustworthiness (p = .00269) and Usability (p = .00658), all experimental layouts were scored higher than the baseline

(indicated by EMMs significantly higher than zero,p-value reported for the intercept) . The baseline was obviously the least complex and confusing layout, so the difference between the experimental layouts and it were not significant, but interestingly, it was still assessed lower.

In MX Attention Allocation (e.g., “I concentrated on the paper”), Interestingness (“There were interesting pieces in the newspaper”), and Beauty (“The layout looked good”), the layout hH-hA was reported higher than other layouts (ps = .001, .025, and .001), while in Overall (“The reading experience was overall good”) it was hH-hA and lH-hA that had higher scores (p = .003) than the two others.

For MX Aesthetics, Beauty, and Overall, all experimental layouts were also assessed higher than the baseline (indicated by EMMs significantly higher than zero; allps < 001 for the model intercept). On Attention Allocation (p < .001) and Interestingness (p = .0158) only the hH-hA layout was significantly higher than baseline (indicated by intercepts).

When the participants ranked the layouts side by side (Table 3), the layout hH-hA was ranked the most Attractive and it was preferred the most, and second in Usability, after lH-lA – bearing in mind that MX Usability did not show any differences between the layouts.

Although lH-lA was ranked third in Interest, it was second in Attractiveness and Preference, much better than the average scores in the MX subscales would have led to expect.

Interestingly, lH-hA was ranked at the tied first place (with hH-hA) in Interest while the MX Interestingness was clearly lower than for hH-hA. Similar discrepancy was found in fourth places for Attractiveness, compared to good assessment in MX Overall. While the baseline

(21)

layout was ranked the last in all items, hH-lA performed mediocrely as it was not ranked above third in any item.

Emotional Responses to Layouts

In self-reported emotions (Table 2), SAM Valence and arousal did not differ between layouts, although Valence (but not Arousal) was higher than baseline in hH-hA (t(387) = 3.1464, p = .00178, for intercept). Dominance repeated the difference between hH-hA and baseline, but here also the other layouts were significantly lower than it (F(3,383.97) = 5.269, p = .001).

Psychophysiological measurements followed the same pattern in regard to valence (Table 4). Facial EMG measured from zygomaticus major and orbicularis oculi muscle sites showed higher responses to hH-hA than other layouts and baseline, signifying a higher positive affect,ps = .009 and .004. CS fEMG and EDA, indexing negative affect and arousal, did not differ between layouts. IBI, indexing attention, was significantly higher in hH-lA when compared to hH-hA and lH-lA (pairwise comparisons not shown on Table 4,ps = .015 and .003). However, it should be noted that while the repeated effect rho ranged from .3 and .23 in fEMG ZM and OO to .53 in IBI, it was .77 and .85 in fEMG CS and EDA, confirming that the physiological measures that did not differ between layouts had a very high

autocorrelation which may have masked a possible effect.

Usage Metrics

As can be seen from Table 1, the average time on front page was about a minute (out of five minutes reading time), although the variance was great (from 9 seconds to over three minutes). However, there were significant differences between layouts in Total Time on Front Page (Table 5), in that the participants spent much more time on the layout hH-lA (9.5 s) than others (-4.1, -0.4, and -3.8 seconds for hH-hA, lH-hA, and lH-lA, respectively, when compared to baseline),p< 001. This also means that the participants spent in total less time

(22)

on the news articles, because the reading time was a fixed five minutes. They did not read fewer articles, though: Articles Read reveals that the participants read about equally -0.5 to - 0.8 articles, compared to baseline, which means that the participants read about 10-20 % more articles when choosing them from the baseline layout (a simple headline list), but read them faster than when choosing from the experimental layouts. However, the participants reported differences in articles read: they thought they read more (or less few) articles when choosing them from the lH-lA layout than others,p = .034.

Covariate models

All mentioned interactions reported here are estimated to have at least 0.3 difference in MX questionnaires for each 1 point more in covariate – a 30 % effect with both measures scaled from 1 to 5. Effects on usage metrics are mentioned separately. In the analyses, lH-lA was used as the reference point, but when comparing a layout to ‘others’ in this section, the differences between these others were not great.

The more visual the participants reported being, the more Abundance they reported and the less articles (-0.44) they read during hH-lA and lH-hA, compared to others (ps <

.003). The more verbal the participants were, the more hierarchic they reported lH-hA, and less abundance and more attention allocation they reported after hH-hA and hH-lA – but they also considered hH-lAless beautiful; in addition, the more verbal people read 0.50 articles more (and reported reading 0.74 more) during hH-hA and -0.78 less (but no difference in reported reads) during lH-hA, compared to others. Higher verbalizer score was also

associated with less time spent on front page (11.6 and 8.9 s) for hH-lA and lH-hA (allps <

.001, except for Attention Allocationp = .02).

Age and previous experience of using tablets for news reading had multiple significant differences between the layouts, but the effects were mostly very small.

Interestingly, the older people read hH-hA less than younger, to the effect of -4.6 s per 10

(23)

years (p = .007). More experienced read more articles (0.36 per point) during hH-lA use than during others.

Differences in reading styles also lead to differences in experiencing and reading the layouts. Higher Focused reader style was associated to smaller score in Attention Allocation and Overall (-0.47 and -0.56 per point) in general,smaller reported interest andless articles read (-0.6, and for reported read, -0.5 per point) for lH-lA, and higher Overall experience for hH-hA and hH-lA, while still spending less time (5.9 s) on hH-hA (and the most in hH-lA; all ps < .001). Entry point overviewer style was associated with interest, so that higher scores in

style were associated with less interest during hH-hA, and more Articles Read (0.5, and reported read, 0.3) for lH-lA, compared to others.

Discussion

This study is the first experimental investigation on the effects of layouts on digital newspaper media experience. We studied specifically the effects of hierarchy and abundance, two high-level concepts concerning the layout of placing and emphasizing individual stories on a page or spread. A 2 x 2 design was employed. Although the old adage asserts that you cannot discuss form isolated from content, we attempted just that by randomizing the newspaper content by creating unique set of articles for each layout and reading period, effectively counterbalancing any random differences possibly present in the content for the first time in newspaper layout research literature.

The Failure to Recognize Hierarchy and Abundance

We found that the participants did not assess the hierarchy and abundance in the manner the layouts were a priori created. The responses to questions about the dimensions revealed that hierarchy was not recognized to vary between the experimental layouts (but was different from baseline, a simple headline list in time order). Abundance was recognized to

(24)

vary between high and low abundance, but there were also differences between the a priori same level abundance layouts that should have been different only in respect to hierarchy, and the baseline was assessed as more abundant than the low abundance layouts.

There are three possible explanations for this. One is that the concepts of hierarchy and abundance are abstract and more relevant to the designers than the readers, so that even when pointed out in layman terms, the readers cannot recognize them. Second option is that the questions themselves had low validity; the fact that we had to use only one of the two abundance questions because of low intercorrelation could support this interpretation. Third option is that there really was not enough difference in hierarchy and abundance between the layouts, meaning that our practical implementations of the dimensions failed. Our expert consultants admitted that the two dimensions are so intertwined that creating the 2 x 2 (and selecting the suitable baseline) with truly equal levels of hierarchy/abundance on the a priori same levels, and at the same time clearly different levels on the a priori different levels—at the same time taking care that the layouts were plausible as real newspaper layouts—was near impossible, and they only had to settle with the best implementations they came up with.

The compromise can be seen, for instance, in the two low hierarchy layouts, which both had only two levels, but one implemented the levels with a side bar in order to avoid shrinking individual stories too much, while the other implemented it (in order to avoid high

abundance) by simply moving the lower hierarchy articles below the line what is shown at one glance; thus one must swipe to see the rest, and that was not required for the levels to be present in other layouts. However, hierarchy was recognized to differ from baseline, and although the baseline was assessed as more abundant than low-abundance layouts, the difference between low and high levels was recognized, so the implementations did not lack validity altogether.

(25)

Ultimately, we decided to run the analyses without the dimensions, with four separate layouts. We advise to be careful in interpreting any of the presented differences as directly resulting from hierarchy or abundance.

Layout Differences

In sum, the layout a priori created as high hierarchy and high abundance stood out as the most different from all others, as it was both assessed better in self-reports (more

interesting, better looking, more attention gathering, and the best overall assessment, in addition to the best rankings in side-by-side comparisons) and it elicited the highest positive emotions measured by psychophysiological methods (facial EMG; but not self-reported emotion, except the highest dominance). This layout is the one most resembling a common modern newspaper front page, so one possibility is that the participants are overwhelmingly reacting to familiarity instead of inherent differences—unfortunately we did not measure layout familiarity to control this.

The layout a priori implemented as low hierarchy and high abundance (lower right in Figure 1), despite of losing out in all the other measures to high-high layout, was still

assessed as overall the best reading experience (tied with high-high). This layout was also ranked equally interesting to high-high layout in side-by-side comparison, although all other rankings—including preference, despite the good overall assessments mentioned—were much worse. A priori low-low layout, on the other hand, was ranked much better, but in individual assessments did not stand out in anything.

The interesting thing about the low hierarchy layouts is that by presenting the stories on equal level, they don’t provide any entry points (photos, typographical elements standing out) that guide the reading experience. The hierarchical layout design has been evolved from the more equal ones in the early newspapers (Barnhurst & Nerone, 1991; Pulkkinen, 2008), arguably due to some superiority that the readers have rewarded and the designers

(26)

recognized. According to these results, in a digital newspaper, the superiority might be questioned in some aspects—especially if we can assume that the high-high layout, by virtue of being the most familiar one, was also the closest to the “optimal”, while the low-hierarchy layouts could still have room to improve.

Perhaps relevant to this, the reading style we named Focused reader was associated with less interest for low-low layout and higher overall experience for high-hierachy layouts, possibly reflecting the lack of outstanding entry points in the low-low (the other low

hierarchy layout still had the difference between the side bar and the central content, which might explain why that layout was not associated as well). The people with high Focused reader also had smaller score in Articles Read, which inversely means that they focused more on one story, exactly as Holsanova and others (2006) reported. However, the reading style also co-occurred with smaller Attention Allocation score, contrary to what we would have expected, and odd differences in total reading times.

On the other hand, the reading style we named Entry point overviewer was associated with smaller interest in high-high layout, despite the fact that high hierarchy layouts should provide excellent entry points. The style was also associated with more Articles Read on low- low layout, suggesting that there was more scanning than reading during that layout. It

appears that the relationship between behavior and reported interest and attention is not as straightforward as one might suspect.

The layout a priori implemented as high hierarchy and low abundance (top left in Figure 1) was not assessed favorably nor experienced especially positively (nor negatively), but stood out in that heart index commonly associated with attention was higher. This is seems natural: the layout is the one with the most text presented, and the higher attention indicates that the text is not simply scanned but also read. (see also Holmqvist et al., 2003), confirmed by most time spent on the front page. This layout was also expected to be

(27)

associated with visualizer-verbalizer cognitive styles (Mendelson & Thorson, 2004) due to amount of text mixed with a large picture. However, the results were conflicting, and we could not find meaningful interpretations for them.

In conclusion, we found clear differences between the layouts that should help in developing layouts for digital newspapers. Although most measures indicate that the layout most resembling a modern newspaper front page should be preferred, other interesting details suggest that other layouts could be used for particular purposes. In future, the role of

familiarity should be investigated in relation to the superior layout, as well as the role of individual differences on newspaper media experiences.

(28)

Acknowledgements This study was funded by TEKES project Next Media.

References

Barnhurst, K.G. & Nerone, J. (1991). Design trends in US front pages 1885-1985.Journalism Quarterly, 68(4).

Fridlund, A. and Cacioppo, J., 1986. Guidelines for human electromyographic research.

Psychophysiology, 23(5), pp. 567–589.

Garcia, M.R. & Stark, P. (1991).Eyes on the News. St. Petersburg, FL: The Poynter Institute.

Hansen, J. P. (1994). Analyse af læsernes informationsprioritering, Unpublished report. Kognitiv Systemgruppen, Forskningscenter Risø, Roskilde. Referenced in Holmqvist & Wartenberg, 2005.

Helle, M., Ravaja, N. & Heikkilä, H., (2011).A theoretical model of media experience and research methods for studying it. Project report for Next Media – a TIVIT Programme. Helsinki.

Holmberg, N. (2004).Eye movement patterns and newspaper design factors. An experimental approach. Lund: Lund University Cognitive Science.

Holmqvist, K, Holsanova, J, Barthelson, M., & Lundqvist, D. (2003). Reading or Scanning? A Study f Newspaper and Net Paper Reading, in J. Hyönä, R. Radach, and H. Deubel (eds.):The Mind's Eye: Cognitive and Applied Aspects of Eye Movement Research. Elsevier.

http://www.lucs.lu.se/jana.holsanova/PDF/Holmqvist%20et%20al.pdf.

Holmqvist, K., & Wartenberg, C. (2005). The role of local design factors for newspaper reading behaviour-an eye-tracking perspective.Lund University Cognitive Studies, 127.

http://www.lucs.lu.se/LUCS/127/LUCS.127.pdf.

Holsanova, J., Rahm, H., & Holmqvist, K. (2006). Entry points and reading paths on newspaper spreads: comparing a semiotic analysis with eye-tracking measurements.Visual Communication, 5(1), 65–93. doi:10.1177/1470357206061005

Josephson, S. (1996). Questioning the power of color.Visual Communication Quarterly, 4-7, 12 Järvelä, S., Kivikangas, J.M., Saari, T., & Ravaja, N. (2014).Media experience as a predictor of

future news reading. Manuscript accepted for publication in Journal of Print and Media Technology Research.

Kirby, J.R., Moore, P.J., & Schofield, N.J. (1988). Verbal and visual learning styles.Contemporary Educational Psychology, 13, 169-184.

Lewenstein, M., Edwards, G., Tatar, D., & DeVigal, A. (2000). The Stanford-Poynter Project:

Poynter’s first online eyetracking study [abstract].

http://www.poynter.org/extra/Eyetrack/previous.html

(29)

Marghescu, D., Salminen, M., & Ravaja, N. (2013).Media experience elicited by print and tablet news: A psychophysiological investigation of emotion and attention during natural reading.

Manuscript submitted for publication in Journal of Media Psychology.

Mendelson, A. L., & Thorson, E. (2004). How Verbalizers and Visualizers Process the Newspaper Environment.Journal of Communication,54(3), 474–491. doi:10.1111/j.1460-

2466.2004.tb02640.x

Miratech (2011).Readers are more likely to skim over articles on an iPad than in a newspaper. White paper. http://miratech.com/blog/eye-tracking-etude-iPad-vs-journal2.html

Moen, D. (1989). Newspaper layout and design. Ames, IA: Iowa State University Press.

Nagamachi, M. (2008). Perspectives and the new trend of Kansei/affective engineering.The TQM Journal,20(4), 290–298. doi:10.1108/17542730810881285

Outing, S. & Ruel, L. (2004). Poynter Eyetrack Study III [abstract].

http://www.poynter.org/extra/Eyetrack/previous.html.

Pulkkinen, H. (2008). Uutisten arkkitehtuuri Sanomalehden ulkoasun rakenteiden järjestys ja jousto [The architecture of news. The order and flexibility in newspaper layout structures]. Doctoral thesis, University of Jyväskylä, Finland.

Smith, M.E., & Gevins, A. (2004). Attention and Brain Activity While Watching Television:

Components of Viewer Engagement.Media Psychology , 6 (3), 285-305.

Wartenberg, C. & Holmqvist, K. (2005). Daily newspaper layout – designers’ predictions of readers’

visual behaviour - a case study.Lund University Cognitive Studies, 126.

http://www.lucs.lu.se/LUCS/126/LUCS.126.pdf.

(30)

Figure Caption

Figure 1.The 2 x 2 grid of high and low hierarchy and abundance layouts. The layout outside the grid is the baseline, a simple headline list in time order.

Figure 2.Reported hierarchy and abundance between layouts. The bars represent standard errors.

(31)

Table 1.

Descriptive Statistics of Self-Report and Usage Metrics Variables

Minimum Maximum Mean SD

SAM Valence -7 3 -.62 1.44

SAM Arousal -4 5 .80 1.59

SAM Dominance -6 4 .11 1.36

MX Hierarchy 1.0 5.0 3.13 1.03

MX Abundance 1 5 2.48 1.29

MX Aesthetics 1.1 5.0 3.10 0.66

MX Attention Alloc. 1.0 5.0 3.51 1.02

MX Interestingness 1.0 5.0 3.17 1.17

MX Trustworthiness 1.0 5.0 3.59 0.92

MX Navigation 1.0 5.0 4.40 0.89

MX Beauty 1 5 3.44 1.06

MX Overall 1 5 3.33 1.07

Total Time on FrontP 9.040 181.092 60.519 23.131

Articles Read 1.00 12.00 5.42 1.85

Reported Articles Read 1.00 9.00 4.45 1.50

Note. Minima and maxima reported as integers had only one item, and as decimals for averages over several items. SAM values are differences between the self-report given after rest period, and self-reports after the layouts.

(32)

Table 2.

Summary of Linear Mixed Model Analyses for Self-Report Dependent Variables

Estimated Marginal Means (SE)

Dependent Variables hH-hA hH-lA lH-hA lH-lA df F p

MX Aesthetics 0.709 (0.158) 0.513 (0.152) 0.482 (0.161) 0.717 (0.151) 3, 96.04 1.695 .173

MX Attention Alloc. 0.546 (0.125) 0.150 (0.125) 0.220 (0.125) 0.101 (0.125) 3, 408.67 5.854 .001 MX Interestingness 0.326 (0.134) -0.083 (0.134) 0.064 (0.134) 0.001 (0.134) 3, 409.3 3.140 .025

MX Usability 0.520 (0.187) 0.360 (0.183) 0.476 (0.190) 0.503 (0.183) 3, 94.51 0.739 .532

MX Trustworthiness 0.595 (0.193) 0.468 (0.184) 0.406 (0.199) 0.460 (0.182) 3, 97.85 0.318 .812

MX Navigation 0.385 (0.227) 0.200 (0.217) 0.066 (0.236) 0.396 (0.215) 3, 96.34 1.000 .396

MX Beauty 1.686 (0.175) 1.258 (0.175) 1.390 (0.175) 1.470 (0.175) 3, 409.09 5.499 .001

MX Overall 0.813 (0.152) 0.383 (0.151) 0.710 (0.151) 0.480 (0.151) 3, 408.68 4.824 .003

SAM Valence 0.464 (0.147) 0.197 (0.147) 0.253 (0.147) 0.178 (0.147) 3, 384.72 1.808 .145 SAM Arousal 0.315 (0.182) 0.058 (0.182) -0.007 (0.182) -0.035 (0.182) 3, 387.19 2.387 .069 SAM Dominance 0.364 (0.146) -0.066 (0.145) 0.028 (0.145) -0.011 (0.145) 3, 383.97 5.269 .001 Note. All estimated marginal means are relative to baseline. The highest estimated marginal means are bolded when the difference to those next highest is significant.

(33)

Table 3.

Rankings of Layouts by Item

Layout Interest Usability Attractiveness Preference

hH-hA 1^st/2^nd 2^nd 1^st 1^st

hH-lA 4^th 4^th 3^rd 3^rd

lH-hA 1^st/2^nd 3^rd 4^th 4^th

lH-lA 3^rd 1^st 2^nd 2^nd

Baseline 5^th 5^th 5^th 5^th

Table 4.

Summary of Linear Mixed Model Analyses for Physiological Dependent Variables

ZM Mean 0.318 (0.199) -0.09 (0.199) -0.159 (0.198) -0.053 (0.198) 3, 488.33 3.939 .009

CS Mean 0.092 (0.747) 0.081 (0.748) 0.725 (0.744) -0.011 (0.746) 3, 340.05 1.201 .310

OO Mean 0.218 (0.102) -0.018 (0.102) 0.008 (0.102) 0.087 (0.102) 3, 498.34 4.448 .004

EDA Mean 0.04 (0.443) 0.022 (0.443) -0.05 (0.441) 0.16 (0.443) 3, 316.93 0.241 .868

IBI Mean -27.599 (24.93) 9.022 (25.01) -17.706 (24.90) -35.983 (24.98) 3, 359.78 3.465 .016 Note. All estimated marginal means are relative to baseline. The highest estimated marginal means are bolded when the difference to those next highest is significant.

(34)

Table 5.

Summary of Linear Mixed Model Analyses for Usage Metrics Dependent Variables

Total Time on FrontP -4.073 (2.442) 9.456 (2.414) -0.388 (2.450) -3.805 (2.427) 3, 370.34 10.871 .000 Artcles Read -0.828 (0.245) -0.637 (0.243) -0.659 (0.245) -0.541 (0.244) 3, 372.27 0.793 .499 Reported Articles Read -0.410 (0.183) -0.201 (0.182) -0.378 (0.182) -0.033 (0.182) 3, 409.31 2.927 .034 Note. All estimated marginal means are relative to baseline. The highest estimated marginal means are bolded when the difference to those next highest is significant.

(35)

(36)

1 2 3 4 5

Korkea hierarkia -

korkea runsaus

Korkea hierarkia -

matala runsaus

Matala hierarkia -

korkea runsaus

Matala hierarkia -

matala runsaus

Baseline