The choice between a five-point and a ten-point scale in the framework of customer satisfaction measurement

(1)

The choice between a five-point and a

ten-point scale in the framework of

customer satisfaction measurement

Pedro S. Coelho Susana P. Esteves New University of Lisbon

In marketing research, and particularly in the context of customer satisfaction measurement, we often try to measure attitudes and human perceptions. This raises a number of questions regarding appropriate scales to use, such as the number of response alternatives. Obviously, there is a trade-off between the desired response discrimination level and the effort that is demanded of the respondent to situate his or her answer in one of the scale categories. If this effort is too high it can reduce the quality of responses and increase the non-response rate. In the context of customer satisfaction measurement we compare a five-point and a ten-five-point numerical scale. The analysis includes the evaluation of non-response rates, response distribution, the ability to model customer satisfaction, as well as convergent, discriminant and nomological validity of constructs used in the ECSI (European Customer Satisfaction Index) model. Globally, results tend to favour the choice of the ten-point scale, which contradicts some conventional wisdom. Moreover, we conclude that in this context there are no effects of socio-demographic characteristics (namely educational level) on the ability of respondents to use each scale.

Introduction

(2)

In fact, in recent years some discussion has taken place among academics and practitioners regarding the appropriate number of response alternatives to use. It is usually accepted that a small number of points does not allow a good discrimination of responses (limiting the ability to find significant differences between segments) and may limit the data analysis methods that can be used. More points improve the data metric, enrich the possible data analyses and facilitate the calculation of covariances between variables, which are used in most multivariate data analysis methods. Obviously, there is a trade-off between the desired response discrimination level and the effort that is demanded of the respondent to locate his response in one of the scale categories. If this effort is too high it can reduce the quality of responses and increase the non-response rate. Some of the traditional guidelines suggest the use of a number of categories between three and nine (Stem & Noazin 1985; Malhotra & Birks 2003) and clearly the most used number of response alternatives is seven (Cox 1980). Motivational theorists have been arguing against questions with a large number of response alternatives, based on the fact that respondents may not be sufficiently motivated to make meaningful discriminations (Tourangeau 1984; Krosnick & Alwin 1989; Alwin 1991). Obviously the choice regarding the number of response alternatives depends on several factors, such as the nature of the phenomena being measured, the involvement of respondents in the phenomena, the socio-demographic characteristics of respondents and even the nature of the data collection methods.

(3)

In spite of these well-known ideas very little work has been done in trying to validate them. Particularly in the context of customer satisfaction measurement many authors have proposed the ten-point numerical scale (anchored in the extremes). This is the usual approach in the European

Customer Satisfaction Index (ECSI) (ECSI 1998; Ball et al. 2004) and in

the framework of the American Customer Satisfaction Index (ACSI)

(Fornell et al. 1996, 1998; Johnson et al. 2001). Nevertheless, some

practitioners have criticised this scale, arguing that respondents have difficulty discriminating answers using more than five points (particularly in a telephone survey) and that an odd number of points may be preferable in order to account for neutrality. According to these arguments a standard five-point scale would be a preferable choice over a ten-point scale.

This paper aims to compare a five-point and a ten-point numerical scale in the context of customer satisfaction measurement. The analysis includes the evaluation of non-response rates, response distribution, the ability to discriminate between attributes being measured, the ability to model customer satisfaction, as well as convergent, discriminant and nomological validity of constructs used in the ECSI model. Moreover, we investigate the effects of social-demographic characteristics on the ability of respondents to use each scale. The social-demographic analysis includes both the effects on response profile and the probability of non-response.

The structure of the paper is organised as follows. The Introduction presents the problem and the goals of the paper. In the next section we introduce the use of scales in attitude measurement, and approach the problem of choosing the number of response alternatives, referring to previous work in this context. The organisation of the empirical study is shown in the subsequent section. This presentation includes the ECSI model as well as the study design. The fourth section presents and analyses the main results obtained in the study. The final section discusses the main findings.

Number of scale points in attitude measurement

(4)

For instance, customer satisfaction and customer loyalty are considered antecedents of behaviours like customer attrition, customer acquisition and customer value. Attitude variables that can be found in typical surveys include things such as beliefs, preferences and intentions.

In surveys, attitude variables are often measured using rating scales. Among the major decisions when constructing a rating scale are the number of response alternatives and the use of an odd or even number of categories. The choice between attitude scales and particularly the choice regarding the number of response alternatives is not new in marketing theory, but there is clearly a lack of empirical work on this issue. Moreover, in published research there is little agreement about the optimal number of response alternatives (Neumann & Neumann 1981; Alwin 1997), or the use of an odd or even number of categories (Malhotra & Birks 2003). While several authors have concluded that there are no significant gains in using more than five response alternatives (Jenkins & Taber 1977; Stem & Noazin 1985; Converse & Presser 1986), others have favoured scales with more alternatives (up to 25) (Green & Rao 1970; Cox 1980; Alwin 1997). Nevertheless, most of the results tend to favour a solution with a number of alternatives between five and nine (Andrews & Withey 1976; Cox 1980; Neumann & Neumann 1981; Givon &

Shapira 1984; Cicchetti et al. 1985; Alwin & Krosnick 1991; Colman

et al. 1997).

Green and Rao (1970), comparing scales with two, three, six and eighteen response categories, recommend using at least six points and at least eight scales per variable. Nevertheless, they conclude that little information appears to be gained by increasing the number of response categories beyond six. Ramsay (1973), studying the effects of the number of response categories on precision of scale values, concluded that using seven categories provides almost as much precision as a scale requiring a continuous judgment. Jenkins and Taber (1977), using a Monte Carlo study to analyse composite scale reliability, conclude that reliability levels off after five response categories. Neumann and Neumann (1981) compared six rating scales of lengths between two and ten points. Although their conclusions favour the longer scales, they could not find significant differences between a seven-point and a ten-point scale in terms

of correlation and etacoefficients. Moreover they found that deviations of

(5)

they have not found any increase in reliability from using a number of categories above five, and detected a significant decrease in reliability

when using more than seven points. Cicchettiet al. (1985), using a Monte

Carlo simulation to access the extent to which the interrater reliability is affected by the number of scale points, concluded that, although reliability increases up to seven scale points, no substantial increases occur beyond that point. Nevertheless, conclusions are necessarily limited to the simulation parameters. Haley and Case (1979) have tested 13 attitude scales for agreement, discrimination among brands and response pattern, but their study did not offer much insight into the choice of the number of response alternatives within a same scale type. Givon and Shapira (1984) use a stochastic model to investigate the conjoint effect of the number of items and number of response alternatives on the sampling error of a composite scale estimator. They conclude that sampling error may be reduced, increasing the number of response alternatives up to five, seven or nine, depending on the number of items used. A review of the work done up until the beginning of the 1980s, regarding the optimal number of response alternatives, can be found in Cox (1980). The author concludes that there is no single number of response alternatives for a scale that is appropriate under all circumstances. Also, he concludes that no formula can be given to indicate what this number should be, even in a particular set of circumstances. Nevertheless, he establishes that scales with two or three response alternatives are generally inadequate and that the marginal return from using more than nine response alternatives is minimal. Consequently, he proposes that the optimal number of response alternatives is normally situated between five and nine. One exception is Alwin’s (1997) work, which compares seven- and eleven-category rating scales. His conclusions favour the eleven-point scale in terms of measurement precision, and reject the idea that the eleven-point scale is more vulnerable to measurement errors.

Description of the study

The ECSI model

The European Customer Satisfaction Index (ECSI) appeared in 1999 and is adapted from the Swedish Customer Satisfaction Index (Fornell 1992)

and the ACSI (American Customer Satisfaction Index) (Fornell et al.

(6)

(ECSI 1998; Cassel & Eklof 2001; Vilares & Coelho 2004). It has been validated across a number of European countries and many industries, such as insurance, mobile phones, fixed phones, carbonated soft drinks, public transportation, retail banking, cable TV, supermarkets, postal services, food products and public service.

The ECSI model is composed of two sub-models: the structural model and the measurement model. The structural model defines the relationships between the latent variables and is represented in Figure 1. Customer satisfaction is the central variable of this model, having as antecedents the image of the company, customer expectations, perceived quality of products and services, and perceived value (where the relation between quality and price is measured). As consequences of customer satisfaction there are two variables: complaints and loyalty.

The measurement model includes the relations between the latent or non-observable variables and the observed indicators that correspond to survey questions (Table 1). Within this model we assume that the relationships between the latent variables and the observed indicators are all of the reflective nature (i.e. the indicators are assumed to be reflex of the latent variables).

Figure 1 ECSI structural model

Complaints Loyalty Image

Expectations

Perceived quality

Perceived value

(7)

PLS (Partial Least Squares) was used to estimate this model using two data sets obtained as explained in the next section. The methodology PLS applied to ECSI is presented in detail by several authors (e.g. ECSI 1998;

Casselet al. 2000).

Data

Data came from a survey corresponding to the 2004 wave of ECSI-Portugal (the Portuguese Customer Satisfaction Index). The selection of respondents follows the criteria defined in ECSI (1998). Data collection took place in November and December 2004, through telephone interviews supported by a CATI system. The same questionnaire was administered to both samples, but for one sample we used a five-point

Table 1 Indicators of each latent variable

Latent variable Indicators

Image Q4A: It is a reliable operator Q4B: It is well established

Q4C: It gives a positive contribution to society Q4D: It is concerned about its customers Q4E: It is innovative and forward looking

Expectations Q5A: Expectations concerning overall quality

Q5B: Expectations concerning the fulfilment of personal needs Q5C: Expectations concerning reliability

Perceived quality Q6: Perceived overall quality Q7A: Technical quality of the network Q7B: Personal attention

Q7C: Quality of services provided Q7D: Diversity of products and services Q7E: Product reliability

Q7F: Quality of information provided Q7G: Coverage of the network

Perceived value Q10: Evaluation of price given quality Q11: Evaluation of quality given price

Satisfaction Q3: Overall satisfaction Q9: Fulfilment of expectations Q18: Distance to the ideal company

Complaints Q15: Complaint handling

Q16: Expectations of complaint handling

(8)

scale and for the other a ten-point scale (anchored in the extremes).1 _The

two scales are both numerical, with the same labels on the extreme points. Therefore, the only difference between them is the number of response alternatives. The questionnaire is the standard questionnaire used in ECSI-Portugal for the mobile telecommunications industry. The questionnaire includes a set of questions regarding the seven constructs of a structural satisfaction model (image, expectations, perceived quality, perceived value, satisfaction, complaints, and loyalty), plus a set of socio-demographic questions. The sample size was 252 for the five-point scale and 253 for the ten-point scale. Both data sets were collected among customers of the same mobile telecommunications operator. The sampling design includes a random selection of households using random-digit dialling. In each household one resident is selected randomly and qualified as a member of the target population.

Results

Descriptive analysis

Table 2 shows the frequency of non-response and the frequency of response on the middle points of each scale. Results are shown by indicator and organised in seven groups corresponding to the seven latent variables in the satisfaction model. From the results presented in Table 2 it can be seen that, in general, the five-point scale has a higher proportion of non-responses when compared to the ten-point scale. Among the 25 indicators considered, only four show higher non-response rates for the ten-point scale.

When we formally test the difference between the proportions of non-response measured with the two scales, using the hypotheses

H₀: p_i,5–_p= p_i_,10–_p H₁: p_i,5–_p= p_i_,10–_p

where p_i,5–_p is the proportion of non-response for variable i, when using

the five-point scale and p_i,10–_p has the same meaning when using the

ten-point scale, we never reject the null hypothesis, with the exception of variables Q4a and Q4e, at a 5% significance level. Therefore, although the

(9)

proportion of non-response is numerically higher in the sample using the five-point scale, we can not conclude that generally these proportions are different in the population.

Table 2 Non-response rates and proportion of responses in the middle points of the scale

% of middle points

NR (%) % of 3 % of 5 and 6

Latent variable 1 to 5 1 to 10 Difference 1 to 5 1 to 10 Difference

Image

Q4A 4.0 0.8 3.2* 20.2 21.1 –0.9

Q4B 2.8 1.2 1.6 7.8 10.8 –3.0

Q4C 4.8 2.4 2.4 27.1 21.9 5.2

Q4D 4.4 2.4 2.0 29.0 28.3 0.7

Q4E 4.0 1.2 2.8* 16.9 15.6 1.3

Expectations

Q5A 6.0 4.3 1.7 27.0 24.4 2.6

Q5B 6.7 7.1 –0.4 35.7 25.5 10.2*

Q5C 6.3 6.7 –0.4 40.3 25.8 14.5*

Perceived quality

Q6 0.4 0.0 0.4 19.9 18.2 1.7

Q7A 1.2 0.0 1.2 25.7 16.6 9.1*

Q7B 8.3 4.7 3.6 20.3 13.3 7.0*

Q7C 13.1 10.3 2.8 20.5 17.2 3.3

Q7D 12.7 7.5 5.2 28.6 17.9 10.7*

Q7E 7.1 4.0 3.1 25.2 21.4 3.8

Q7F 6.0 3.2 2.8 26.6 17.1 9.5*

Q7G 0.8 0.4 0.4 32.8 21.0 11.8*

Perceived value

Q10 2.0 1.6 0.4 42.1 44.2 –2.1

Q11 2.8 1.2 1.6 44.1 41.6 2.5

Satisfaction

Q3 0.4 0.8 –0.4 25.5 25.1 0.4

Q9 2.4 2.4 0.0 20.2 27.5 –7.3

Q18 4.4 4.0 0.4 7.8 21.0 –13.2

Complaints

Q15 0.0 0.0 0.0 34.6 12.5 22.1*

Q16 11.1 11.4 –0.3 31.8 27.6 4.2

Loyalty

Q12 3.2 2.0 1.2 10.7 18.1 –7.4

Q17 3.2 2.8 0.4 17.2 15.0 2.2

(10)

If the response effort were too high using the ten-point scale we would expect to find a higher frequency of non-response in the group using this scale. Results do not confirm this hypothesis, and we may conclude that the use of a five-point or ten-point scale does not tend to affect significantly the non-response rate.

The concentration of response in the middle points of the scale can also be seen in Table 2. This table exhibits the proportion of response in category 3 for the five-point scale and the proportion of response in categories 5 and 6 for the ten-point scale. It can be observed that in general the concentration of response in the middle points is higher for the five-point scale, when compared to the ten-five-point one. In fact, only six indicators (among 25) show a higher concentration of response in middle points for the ten-point scale.

When we formally test the difference between the proportions of responses in middle points with the two scales, using the hypotheses

H₀: p_i_,5–_p= p_i_,10–_p H₁: p_i_,5–_p≠p

i,10–p

where p_i_,5–_p is the proportion of responses on rate 3 for variable i, when

using the five-point scale and p_i_,10–_pis the proportion of responses on rates 5 and 6 when using the ten-point scale, we reject for eight variables the null hypothesis that the proportions are equal in the population (cf. Table 2) at 5% significance level. Therefore, we can conclude that concentration of response in middle points tends to be higher for the five-point scale.

This is also an interesting result that tends to confirm our hypothesis that within an odd scale the middle point is often used by the respondents that prefer to reduce the response effort, resulting in an overestimation of the true frequency associated with this middle point. This result, along with the equivalence of non-response rates using both scales, tends to validate the idea that for most questions measuring attitudes within customer satisfaction surveys the use of a scale with an even number of response alternatives may be a preferable choice.

The mean of each indicator, both for the five-point and the ten-point scale, is shown is Table 3. We also present the transformed mean, after

conversion of both scales to the interval [0;1], using the formula y_i* =

(y_i– 1)/R_iwhere y_iis the original rating for respondent i, R_iis the range of

the scale used by respondent i and y_i* is the transformed rating for

(11)

It can be seen that for the transformed variables the global average is identical both for the five-point and ten-point scales: 0.70. Also, when analysing individual indicators, both means tend to show similar values (the differences are always smaller than 0.08).

Table 3 Means by indicator using the original and the transformed scale

1 to 5 1 to 10

Indicator Mean Transformed mean Mean Transformed mean Difference

Image

Q4A 4.0 0.76 7.7 0.75 0.01

Q4B 4.4 0.84 8.3 0.82 0.02

Q4C 3.9 0.72 7.6 0.73 –0.01

Q4D 3.7 0.68 7.2 0.69 –0.01

Q4E 4.1 0.77 7.9 0.77 0.00

Expectations

Q5A 3.8 0.70 7.4 0.71 –0.01

Q5B 3.7 0.68 7.3 0.70 –0.02

Q5C 3.6 0.66 6.9 0.65 0.01

Perceived quality

Q6 4.0 0.75 7.8 0.76 –0.01

Q7A 3.9 0.72 7.6 0.74 –0.02

Q7B 4.1 0.77 7.8 0.76 0.01

Q7C 4.0 0.75 7.7 0.74 0.01

Q7D 3.8 0.70 7.4 0.71 –0.01

Q7E 3.9 0.72 7.4 0.71 0.01

Q7F 3.8 0.71 7.5 0.72 –0.01

Q7G 3.7 0.66 7.2 0.69 –0.03

Perceived value

Q10 2.9 0.47 5.4 0.49 –0.02

Q11 3.4 0.60 6.4 0.60 0.00

Satisfaction

Q3 3.9 0.71 7.5 0.72 –0.01

Q9 3.8 0.69 7.3 0.70 –0.01

Q18 3.6 0.65 7.2 0.69 –0.04

Complaints

Q15 2.7 0.43 5.6 0.51 –0.08

Q16 3.7 0.67 7.0 0.67 0.00

Loyalty

Q12 4.2 0.79 7.7 0.75 0.04

Q17 4.1 0.77 7.7 0.75 0.02

(12)

When we formally test the difference between the transformed means with the two scales, using the hypotheses

H₀: m_i_,5–_p = m_i_,10–_p H₁: m_i_,5–_p ≠m

i,10–p

where m_i_,5–_pis the mean for variable iwhen using the five-point scale and

m_i_,10–_p has the same meaning when using the ten-point scale, we never

reject the null hypothesis. Therefore we can not conclude that these means are different in the population. Once again this result tends to confirm that both scales produce equivalent mean scores and validate the acceptability of the ten-point scale. In fact, if the efforts demanded of respondents were too high using the ten-point scale we would expect to find different mean scores for the two scales.

The effect of socio-demographics on ratings

One specific fear regarding the use of ten-point scales concerns some socio-demographic groups, specifically those with a lower education level. The question to be answered is: ‘Do people with different education levels use the five- and ten-point scales differently?’

To answer this question we have implemented two analyses:

1. an analysis of variance having as dependent variable the ratings to question Q3 (overall satisfaction)

2. a logistic regression having as its dependent variable a binary variable representing the non-response to question Q7c (quality of services provided).

(13)

Results for the analysis of variance are given in Tables 5 and 6. Table 5

shows the F tests for the significance of the regression and for each

individual variable (using Type III sum of squares). The null hypothesis that all model coefficients are zero is rejected, showing the relevance of the model. From the tested factors, only sex and education level are significant at any reasonable significance level. These are important results that tend to show that the scale used (ten-point or five-point) does not influence the mean rating for the dependent variable. Also, although the education level

Table 4Independent variables and categories

Independent variable Categories

Sex Male

Female

Age group <30

30–39 40–49 50 or more

Education level Basic

Secondary (high school) University degree

Scale used 10 points scale

5 points scale

Scale used × education level cross-effects 10 points scale/Basic 10 points scale/Secondary 10 points scale/University

5 points scale/Basic 5 points scale/Secondary 5 points scale/University

Table 5 Ftests

Source DF Sum of squares Mean square Fvalue Pr>F

Model 10 1.21637952 0.12163795 3.42 0.0002

Error 491 17.46895476 0.03557832

Corrected total 501 18.68533428

Variables

Sex 1 0.27802108 0.27802108 7.81 0.0054

Used scale 1 0.00268551 0.00268551 0.08 0.7836

Age group 4 0.07279392 0.01819848 0.51 0.7273

(14)

is a significant factor, there is no interaction between the scale used and the education level. The conclusion is that people with different educations do not tend to use the two scales differently.

Table 6 shows coefficient estimates for the significant parameters. Please remember that only sex and education level were considered significant. Results confirm the well-known notion in customer satisfaction studies that females tend to show higher ratings than males, and also that mean scores tend to decrease with the increase of education level.

With a logistic regression we intend to offer a complementary insight to the question regarding the hypothetical influence of socio-demographic variables on the rating profile. With the analysis of variance we have been concerned with the response pattern to question Q3 (overall satisfaction). Nevertheless, it is also interesting to understand whether the scale used and socio-demographic characteristics influence the probability of non-response. This is particularly important since, despite concluding that the scale does not influence the response pattern (for respondents), it may influence the probability of response and therefore the quality of collected data. The dependent variable Q7c (quality of services provided) was chosen for being one with the higher non-response rate among variables in the questionnaire. Results for the logistic regression are given in Tables 7 and 8.

Table 7 shows a chi-square test for the model significance and Wald statistics2_{for individual independent variables. The null hypothesis that all}

model coefficients are zero is rejected, showing the relevance of the model. For the independent variables only age group and education level are significant at any reasonable significance level. Once again, these results

Table 6 Parameter estimates

Parameter Estimate Standard error tvalue Pr> |t|

Intercept 0.7424151497 0.07227142 10.27 <0.0001

Female 0.0490012601 0.01752916 2.80 0.0054

Male 0.0000000000

Basic education 0.0849742986 0.03029263 2.81 0.0052

Secondary education 0.0215570425 0.02873063 0.75 0.4534 University education 0.0000000000

(15)

tend to show that the scale used (ten-point or five-point) does not influence the probability of non-response for the dependent variable. Also, although the education level is a significant factor, there is no interaction between the scale used and the education level. The conclusion is that within an education level, people using different scales do not tend to show different response rates.

Coefficient estimates for the significant parameters are shown in Table 8. Note that only age group and education level were considered significant. Also note that the probability modelled in our analysis is with regard to the response event. Results show that the response probability tends to decrease with the increase of age. Also it can be seen that the response probability tends to be higher for people with higher education levels.

Table 7 Chi-square tests

Testing global null hypothesis: BETA = 0

Test DF Chi-square Pr> chi sq

Likelihood ratio 10 40.6550 <.0001

Score 10 44.7595 <.0001

Wald 10 36.7804 <.0001

Variables

Sex 1 0.8306 0.3621

Scale used 1 0.6921 0.4054

Age group 4 16.8279 0.0021

Education level 2 12.8545 0.0016

Scale used*education level 2 0.0698 0.9657

Table 8 Parameter estimates

Analysis of maximum likelihood estimates

Standard Wald

Parameter DF Estimate error chi-square Pr> chi sq

Intercept 1 1.7622 0.2092 70.9829 <0.0001

Age group <30 1 0.9717 0.3625 7.1847 0.0074

Age group 30–39 1 0.4654 0.3489 1.7787 0.1823

Age group 40–49 1 0.7258 0.3529 4.2303 0.0397

Age group 50 or + 1 –0.1955 0.2795 0.4893 0.4842

(16)

Globally, we can conclude that:

• some socio-demographic variables, such as sex, age group and

education level, influence the probability of non-response or the response profiles for respondents

• the scale used (ten-point or five-point) influences neither the

probability of non-response nor the response profiles for respondents

• in neither analysis is there an interaction between the scale used and

the education level; therefore, we can conclude that, within any education level, people using different scales do not tend to show different response patterns.

These conclusions tend to confirm the empirical analysis made in the previous section where we have observed that both scales produced similar non-response rates as well as similar mean scores (after rescaling). Therefore, the small and usually non-significant differences in response rates between both samples, rather than being a consequence of the scale used, may be totally explained by small differences in the socio-demographic profile of the samples. In fact, the sample using the ten-point scale presents a slightly higher proportion of both young and well-educated people that contributes to a higher response rate.

Validity assessment

Figures 2 and 3 show the estimated path coefficients and tvalues (between

parentheses) for the structural models estimated with five-point and ten-point scales, respectively. Generally, the hypothesised links tend to be significant at 5% significance level. For the model estimated with the five-point scale the only exceptions are the Image–Loyalty, Expectations– Satisfaction and the Complaints–Loyalty paths that are not significant at 10% significance level. In particular, the estimate for the hypothesised path between Complaints and Loyalty is negative (contradicting the

theory) and shows an extremely low tratio. When using the ten-point scale

(17)

Figure 2 Model parameter estimate and t values for the five-point scale Complaints Loyalty 0.10 (1.31**) 0.11 (1.46**) 0.25 (3.87) 0.46 (5.73) 0.25 (4.44) 0.31 (3.99) 0.58 (13.42) 0.66 (16.78) 0.65 (17.22) 0.30 (3.93)

* Non-significant at 5% significance level ** Non-significant at 10% significance level 0.29 (3.21) –0.03 (0.34)** Image Expectations Perceived quality Perceived value Satisfaction (ECSI)

Figure 3 Model parameter estimate and t_{values for the ten-point scale}

Complaints Loyalty 0.14 (1.62**) 0.14 (1.79*) 0.34 (4.83) 0.48 (4.89) 0.34 (4.56) 0.19 (2.22) 0.55 (9.77) 0.68 (22.42) 0.64 (15.43) 0.17 (2.31)

(18)

We also access the model’s explanatory power (through the

determination coefficient,3 _R2_{) for the equations explaining satisfaction}

and loyalty, both in the five-point and ten-point scales. From the results presented in Table 9 it can be seen that the ten-point scale shows a higher explanatory power for customer satisfaction and loyalty, when compared to the five-point scale. The increase in explanatory power attributed to the ten-point scale is particularly impressive for the loyalty construct.

The R2_{value for satisfaction is higher than 0.50 for both scales (0.59 for}

the five-point scale and 0.66 for the ten-point scale). On the other hand, although the R2_{value for loyalty on the ten-point scale is quite high (0.54),}

the value is very low for the five-point scale (0.27), showing a weaker explanatory power for loyalty.

These results clearly favour the ten-point scale and can be seen as a confirmation of the higher nomological validity of this scale, since, globally, constructs in the model estimated with the ten-point scale tend to show higher correlations, confirming theoretical predictions.

Table 10 presents the average communalities for the seven latent variables of the ECSI model, both for five-point and ten-point scales. Communality for a manifest variable may be interpreted as the proportion of its variance, which is reproduced by the directly connected latent variable. This measure can be used as an indicator of the convergent validity of the measurement model. In almost all cases latent variable communalities are higher than 0.50, indicating than the variance captured by each latent variable is significantly larger than variance due to measurement error, and thus demonstrating a high convergent validity of the construct. There are two exceptions: quality and loyalty, which have communality slightly below 0.50 for the five-point scale (0.479 and 0.468, respectively). In general the communality is higher for the ten-point scale. The only exception occurs in expectation (0.717 for the five-point scale

Table 9 Determination coefficient (R2_{) of satisfaction and loyalty}

Latent variable 1 to 5 1 to 10

Satisfaction 0.59 0.66

Loyalty 0.27 0.54

(19)

and 0.674 for the ten-point scale). Once again these results tend to favour the convergent validity of the ten-point scale.

Some authors (e.g. Givon & Shapira 1984) have contended that the advantages of using a higher number of response alternatives tend to be more pronounced for constructs with fewer number of indicators. The results showed in Table 10 do not confirm this statement. In fact, the latent variables with a higher difference between communality for the ten-point scale and five-point scale are image (0.093), loyalty (0.089) and perceived quality (0.041). Nevertheless, image and quality are the constructs in our model with a higher number of indicators (five and eight, respectively). So, globally, we could not find a relation between the communality improvements resulting from using a higher number of response alternatives and the number of indicators in the construct.

One way to assess discriminant validity is to determine whether each latent variable shares more variance with its own measurement variables than with other constructs. For that we start to compare measurement variables communalities with the squared correlations between their own construct and other constructs in the model. A low percentage of latent variable squared correlations exceeding measurement variables communalities tends to confirm discriminant validity (Chin 1998).

Communalities for the indicators of each latent variable and the percentage of latent variable squared correlations exceeding measurement variables communalities are shown in Table 11. Regarding the indicators communalities it can be seen that values tend to be higher for the ten-point scale, when compared to the five-point one. Among the 24 indicators considered, only five show higher communalities in the five-point scale.

Regarding the latent variable squared correlations exceeding measurement variables communalities, in general there are few violations.

Table 10 Communality and number of indicators by latent variable

Communality Difference

Latent variable 1 to 5 1 to 10 10 to 5 points Number of indicators

Image 0.51 0.60 0.09 5

Expectations 0.72 0.67 –0.04 3

Perceived quality 0.48 0.52 0.04 8

Perceived value 0.74 0.74 0.00 2

Satisfaction 0.68 0.70 0.02 3

Complaints – – – 1

(20)

Table 11 Indicators communalities and percentage of latent variable squared correlations exceeding measurement variables communalities

1 to 5 1 to 10

Latent variable Latent variable square correlations square correlations exceeding measurement exceeding measurement

variables communalities variables communalities

Commu- Number of % of Commu- Number of % of Indicator nality comparisons violations nality comparisons violations

Image

Q4A 0.49 6 16.7 0.59 6 0.0

Q4B 0.21 6 83.3 0.43 6 33.3

Q4C 0.57 6 0.0 0.62 6 0.0

Q4D 0.71 6 0.0 0.70 6 0.0

Q4E 0.54 6 16.7 0.64 6 0.0

Expectations

Q5A 0.71 6 0.0 0.65 6 0.0

Q5B 0.78 6 0.0 0.70 6 0.0

Q5C 0.65 6 0.0 0.67 6 0.0

Perceived quality

Q6 0.51 6 16.7 0.59 6 0.0

Q7A 0.45 6 33.3 0.51 6 0.0

Q7B 0.39 6 50.0 0.44 6 50.0

Q7C 0.50 6 16.7 0.63 6 0.0

Q7D 0.50 6 16.7 0.53 6 0.0

Q7E 0.57 6 0.0 0.60 6 0.0

Q7F 0.57 6 0.0 0.53 6 0.0

Q7G 0.33 6 50.0 0.34 6 66.7

Perceived value

Q10 0.67 6 0.0 0.68 6 0.0

Q11 0.80 6 0.0 0.80 6 0.0

Satisfaction

Q3 0.63 6 0.0 0.59 6 0.0

Q9 0.66 6 0.0 0.73 6 0.0

Q18 0.75 6 0.0 0.78 6 0.0

Complaints

Q15–16 1.00 6 0.0 1.00 6 0.0

Loyalty

Q12 0.30 6 0.0 0.72 6 0.0

(21)

In expectations, perceived value, satisfaction, complaints and loyalty we observed no violations. In the other latent variables, the violations tend to be more significant for the five-point scale. Note that the gains in discriminant validity for the ten-point scale are concentrated in the Image and Perceived Quality constructs, which is to be expected since these are the latent variables measured with a higher number of indicators. The exception occurs in indicator Q7g of perceived quality, where there is one more violation for the ten-point scale (four, in a total of six comparisons). Nevertheless, one should note that this is an indicator with low discriminant validity in both scales. Generally, these results confirm a higher discriminant validity of the constructs when using the ten-point scale.

A complementary assessment of discriminant validity may be obtained using the variance extracted test (Fornell & Larcker 1981). We compare the estimates of average variance extracted (AVE) for each pair of constructs in the model with the correlation between the constructs. Discriminant validity is demonstrated if both square roots of variance extracted are greater than this correlation. Table 12 presents the results for both scales. Elements in the main diagonal represent the square roots of AVE and the other elements correlations between constructs. Although for both scales discriminant validity is generally achieved for most constructs,

Table 12Square roots of average variance extracted and correlations between constructs

Perceived

Image Expectations value Quality Satisfaction Complaints Loyalty

Five-point scale

Image 0.74

Expectations 0.65 0.85

Perceived value 0.46 0.50 0.85

Quality 0.75 0.66 0.50 0.69

Satisfaction 0.67 0.60 0.58 0.69 0.83

Complaints 0.50 0.42 0.44 0.51 0.58 1.00

Loyalty 0.40 0.34 0.37 0.45 0.51 0.29 0.82

Ten-point scale

Image 0.78

Expectations 0.64 0.82

Perceived value 0.52 0.45 0.86

Quality 0.70 0.68 0.51 0.71

Satisfaction 0.72 0.62 0.66 0.67 0.84

Complaints 0.44 0.49 0.40 0.49 0.55 1.00

(22)

results tend to favour the ten-point scale. In fact, with the five-point scale a lack of discriminant validity is detected between image and quality constructs (note that these are the constructs where we have found some construct squared correlations exceeding measurement variables communalities). Also, the square root of AVE for quality is equal to the correlation between this construct and satisfaction, while with the ten-point scale all constructs show square roots of AVEs higher than all the corresponding correlations.

Discussion and conclusions

This paper aimed to compare a five-point and a ten-point numerical scale in customer satisfaction measurement in the framework of the Portuguese Customer Satisfaction Index (ECSI-Portugal). The analysis includes the evaluation of non-response rates, response distribution, as well as convergent, discriminant and nomological validity of constructs used in the ECSI model. Moreover we have investigated the effects of socio-demographic characteristics on the ability of respondents to use each scale. Globally, it is apparent that the ten-point scale shows better properties than the five-point scale, validating the choice made in the context of ECSI-Portugal. In fact, it has been seen that the ten-point scale generally shows higher validity than the five-point scale. This is true both for convergent and discriminant validity. Also the ten-point scale showed a higher explanatory power for the main variables in our model (satisfaction and loyalty) thus confirming a higher nomological validity.

(23)

may be a preferable choice. We can also conclude that respondents can deal with scales with an even number of points and therefore the use of scales with a neutral category should not be mandatory.

Through an analysis of variance aiming to explain response scores and a logistic regression aiming to explain non-response, we confirmed the well-known idea in customer satisfaction studies that some socio-demographic variables, such as sex, age group and education level, influence the probability of non-response and the response profiles of respondents. We have also concluded that the scale used (ten-point or five-point) influences neither the probability of non-response, nor the response profiles of respondents. Finally, we found that there was not, in either analysis, any interaction between the scale used and education level. Therefore we can conclude that within any education level, people using different scales do not tend to exhibit different response probabilities or response profiles. This is a particularly important result since some criticism regarding the use of scales with a high number of response alternatives is that it would decrease the quality of response, particularly for people with lower educational levels.

If we use Cox’s (1980) definition regarding the optimal number of response alternatives for a scale – ‘a scale with the optimal number of response alternatives is refined enough to be capable of transmitting most of the information available from respondents without being so refined that it simply encourages response error’ – we can clearly state that in the context of our study ten points is a better choice for the number of response alternatives than five points.

(24)

that they are contributing to the improvement of the product/service offered by their supplier, but this seems to be an insufficient cause for the specificity of the results. Third, the analysis performed differs from most previous studies defending a lower number of response categories. In fact, we have compared the two scales using multi-item constructs estimated in a structural equation modelling (SEM) framework. Some of the previous work concerned with single-item measures and the ones using multi-item constructs were almost exclusively concerned with reliability. In fact, validity, which was one of the major concerns in our study, has not been used as a criterion for most studies. Fourth, it is possible that the conventional wisdom of 20 or 30 years ago no longer applies. One possible explanation is that consumers have become more sophisticated at taking tests and rating their attitudes. Globally, the familiarity with scales has definitely been growing. Decades ago, the use of standardised tests and attitude scales was much less common than today. Phrases such as ‘On a scale of 1 to 10, how would you rate …’ have crept into popular speech. Taken together, these facts may point to a shift in the ability of average consumers to use more discriminating scales with greater ease.

If it is true that consumers can use scales with more scale points more easily than they could in the past, there is every reason for researchers to use such scales. Our results suggest that more scale points, routinely used, will result in greater ability to identify important relationships, higher validity for constructs, and better hypothesis tests in theory and practice.

(25)

out that the analysis was limited to the fact that the two questionnaires (using different scales) were administered to independent samples. Some additional analysis would be beneficial if the two questionnaires were administered to the same sample.

Acknowledgements

The authors would like to thank Dr Dwayne Ball (University of Nebraska-Lincoln) and the anonymous referees for their helpful comments.

References

Alwin, D.F. (1991) Research on survey quality. Sociological Methods & Research,

20_{, pp. 3–29.}

Alwin, D.F. (1997) Feeling thermometers versus 7-point scales – which are better? Sociological Methods and Research, 25_{, 3, pp. 318–340.}

Alwin, D.F. & Krosnick, J.A. (1991) The reliability of attitudinal survey measures: the role of question and respondent attributes. Sociological Methods & Research,

20_{, pp. 139–181.}

Andrews, F.M. & Withey, S.B. (1976) Social Indicators of Well-Being: Americans’

Perceptions of Life Quality. New York: Plenum.

Ball, A.D., Coelho, P.S. & Machás, A. (2004) The role of communication and trust in explaining customer loyalty: an extension to the ECSI model. European Journal of Marketing, 38_{, available from the authors.}

Cassel, C. & Eklof, J.A. (2001) Modeling customer satisfaction and loyalty on aggregate levels: experience from the ECSI pilot study. Total Quality Management,

12_{, 7–8, pp. 834–841.}

Cassel, C., Hackl, P. & Westlund, A. (2000) On measurement of intangibles assets: a study of robustness of partial least squares. Total Quality Management, 7_,

pp. 897–907.

Chin, W.W. (1998) The partial least squares approach to structural equation modeling. In: G.A. Marcoulides (ed.) Modern Methods for Business Research. Mahwah, NJ: Lawrence Erlbaum Associates.

Cicchetti, D.V., Showalter, D. & Tyrer, P.J. (1985) The effect of number of rating scale categories on levels of interrater reliability: a Monte Carlo investigation. Applied Psychological Measurement, 9_{, 1, pp. 31–36.}

Colman, A.M., Norris, C.E. & Preston, C.C. (1997) Comparing rating scales of different lengths: equivalence of scores from 5-point and 7-point scales. Psychological Reports, 80_{, pp. 355–362.}

Converse, J.M. & Presser, S. (1986) Survey Questions: Handcrafting the

Standardized Questionnaire. Newbury Park, CA: Sage.

(26)

ECSI (1998) European Customer Satisfaction Index. Report prepared for the ECSI Steering Committee.

Fornell, C. (1992) A national customer satisfaction barometer: the Swedish experience. Journal of Marketing, 56_{, 1, pp. 6–21.}

Fornell, C. & Larcker, D.F. (1981) Evaluating structural equation models with unobservable variables and measurement error. Journal of Marketing Research,

18_{, pp. 39–50.}

Fornell, C., Johnson, M.D., Anderson, E.W., Cha, J. & Everitt Bryant, B. (1996) The American Customer Satisfaction Index: nature, purpose and findings. Journal of Marketing, 60_{, 4, pp. 7–18.}

Fornell, C., Johnson, M.D., Anderson, E.W., Cha, J. & Everitt Bryant, B. (1998)

The American Customer Satisfaction Index: Methodology Report. Ann Arbor, MI:

University of Michigan.

Givon, M.M. & Shapira, Z. (1984) Response to rating scales: a theoretical model and its application to the number of categories problem. Journal of Marketing Research, 21_{, pp. 410–419.}

Green, P.E. & Rao, V.R. (1970) Rating scales and information recovery – how many scales and response categories to use. Journal of Marketing, 34_{, pp. 33–39.}

Haley, R.I. & Case, P.B. (1979) Testing thirteen attitude scales for agreement and brand discrimination. Journal of Marketing, 43_{, pp. 20–32.}

Jenkins, G.D. & Taber, T.D. (1977) A Monte Carlo study of factors affecting three indices of composite scale reliability. Journal of Applied Psychology, 62_,

pp. 392–398.

Johnson, M., Gustafsson, A., Andreason, T.W., Lervik, L. & Cha, G. (2001) The evolution and future of national customer satisfaction index models. Journal of

Economic Psychology, 22_{, pp. 217–245.}

Krosnick, J.A. & Alwin, D.F. (1989) Response strategies for coping with the cognitive demands of survey questions. Unpublished manuscript. Ann Arbor, MI: University of Michigan, Institute for Social Research.

Malhotra, N. & Birks, D. (2003) Marketing Research: An Applied Approach, 2nd European edn. Prentice Hall.

Neumann, L. & Neumann, Y. (1981) Comparison of six lengths of rating scales: students’ attitudes toward instruction. Psychological Reports, 48_{, pp. 399–404.}

Ramsay, J.O. (1973) The effect of number of categories in rating scales on precision of estimation of scale values. Psychometrika, 37_{, pp. 513–532.}

Reynolds, F.D. & Neter, J. (1982) How many categories for respondent classification. Journal of the Market Research Society, 24_{, 4, pp. 345–346.}

Stem, D.E. & Noazin, S. (1985) The effects of number of objects and scale positions on graphic position scale reliability. In: R.E. Luschet al. (eds) AMA Educators’

Proceedings. Chicago: Marketing Association, pp. 370–372.

Tourangeau, R. (1984) Cognitive sciences and survey methods. In: T.B. Jabine, M.L. Straf, J.M. Tanur & R. Tourangeau (eds) Cognitive Aspects of Survey

Methodology: Building a Bridge between Disciplines. Washington, DC: National

Academy Press, pp. 73–100.

Vilares, M. & Coelho, P. (2004) The employee–customer satisfaction chain in the ECSI model. European Journal of Marketing, 37_{, pp. 1703–1722.}

(27)

About the authors

Pedro Simões Coelho is Associate Professor at Instituto Superior de Estatística e Gestão de Informação of the Universidade Nova de Lisboa (ISEGI-UNL). He is also a researcher in the Statistics and Information Management Center (CEGI) at ISEGI-UNL, Vice-President of the Portuguese Association for Classification and Data Analysis (CLAD) and Vice-President of Qmetrics, SA. Additionally, he is co-coordinator of the Portuguese committee of the ECSI-Portugal (European Customer Satisfaction Index) project. At ISEGI-UNL he is Director of the Master degrees and is lecturing courses in survey methodology, marketing research, data collection methodologies and quantitative methods for marketing. Pedro Simões Coelho has been a consultant for several organisations, including the Portuguese Statistical Office. His main research interests are in survey methodology, structural equation modelling, customer satisfaction measurement, and the explanation of customer loyalty.

Susana Pereira Esteves (susanaesteves@isegi.unl.pt) is presently Assistant Professor at Instituto Superior de Estatística e Gestão de Informação of the Universidade Nova de Lisboa (ISEGI-UNL). She is a researcher in the Statistics and Information Management Research Center (CEGI) of the same university and is also a member of the workgroup of ECSI - Portugal (European Customer Satisfaction Index) project. Her current work focuses on marketing research, in particular customer satisfaction and loyalty measurement.

Address correspondence to: Professor Pedro Simões Coelho, ISEGI-UNL, Campus de Campolide, 1070-312 Lisboa, Portugal.

(28)